date:20161111

[PING][PATCH][2/2] Early LTO debug -- main part

2016-11-11 Thread Richard Biener

On Fri, 21 Oct 2016, Richard Biener wrote:

> 
> This is the main part of the early LTO debug support.  The main parts
> of the changes are to dwarf2out.c where most of the changes are related
> to the fact that we eventually have to output debug info twice, once
> for the early LTO part and once for the fat part of the object file.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu with ASAN and TSAN
> extra FAILs (see PR78063, a libbacktrace missing feature or libsanitizer
> being too pessimistic).  There's an extra
> 
> XPASS: gcc.dg/guality/inline-params.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> 
> the previously reported extra VLA guality FAILs are gone.
> 
> I've compared testresults with -flto -g added for all languages and
> only see expected differences (libstdc++ pretty printers now work,
> most scan-assembler-times debug testcases fail because we have everything
> twice now).
> 
> See https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01842.html for
> the last posting of this patch which has a high-level overview of
> Early LTO debug.  You may want to refer to the slides I presented
> at the GNU Cauldron as well.

I have refreshed the patch after the DWARF5 changes, re-LTO-bootstrapped
and retested (also comparing -g -flto with/without the patch) with no
changes in results.  Patch [1/2] still applies without changes.

Thanks,
Richard.

2016-10-21  Richard Biener  

* debug.h (struct gcc_debug_hooks): Add die_ref_for_decl and
register_external_die hooks.
(debug_false_tree_charstarstar_uhwistar): Declare.
(debug_nothing_tree_charstar_uhwi): Likewise.
* debug.c (do_nothing_debug_hooks): Adjust.
(debug_false_tree_charstarstar_uhwistar): New do nothing.
(debug_nothing_tree_charstar_uhwi): Likewise.
* dbxout.c (dbx_debug_hooks): Adjust.
(xcoff_debug_hooks): Likewise.
* sdbout.c (sdb_debug_hooks): Likewise.
* vmsdbgout.c (vmsdbg_debug_hooks): Likewise.

* dwarf2out.c (macinfo_label_base): New global.
(dwarf2out_register_external_die): New function for the
register_external_die hook.
(dwarf2out_die_ref_for_decl): Likewise for die_ref_for_decl.
(dwarf2_debug_hooks): Use them.
(dwarf2_lineno_debug_hooks): Adjust.
(struct die_struct): Add with_offset flag.
(DEBUG_LTO_DWO_INFO_SECTION, DEBUG_LTO_INFO_SECTION,
DEBUG_LTO_DWO_ABBREV_SECTION, DEBUG_LTO_ABBREV_SECTION,
DEBUG_LTO_DWO_MACINFO_SECTION, DEBUG_LTO_MACINFO_SECTION,
DEBUG_LTO_DWO_MACRO_SECTION, DEBUG_LTO_MACRO_SECTION,
DEBUG_LTO_LINE_SECTION, DEBUG_LTO_DWO_STR_OFFSETS_SECTION,
DEBUG_LTO_STR_DWO_SECTION, DEBUG_STR_LTO_SECTION): New macros
defining section names for the early LTO debug variants.
(reset_indirect_string): New helper.
(add_AT_external_die_ref): Helper for dwarf2out_register_external_die.
(print_dw_val): Add support for offsetted symbol references.
(compute_section_prefix_1): Split out worker to distinguish
the comdat from the LTO case.
(compute_section_prefix): Wrap old comdat case here.
(output_die): Skip DIE symbol output for the LTO added one.
Handle DIE symbol references with offset.
(output_comp_unit): Guard section name mangling properly.
For LTO debug sections emit a symbol at the section beginning
which we use to refer to its DIEs.
(add_abstract_origin_attribute): For DIEs registered via
dwarf2out_register_external_die directly refer to the early
DIE rather than indirectly through the shadow one we created.
(gen_array_type_die): When generating early LTO debug do
not emit DW_AT_string_length.
(gen_formal_parameter_die): Do not re-create DIEs for PARM_DECLs
late when in LTO.
(gen_subprogram_die): Adjust the check for whether we face
a concrete instance DIE for an inline we can reuse for the
late LTO case.  Likewise avoid another specification DIE
for early built declarations/definitions for the late LTO case.
(gen_variable_die): Add type references for late duplicated VLA dies
when in late LTO.
(gen_inlined_subroutine_die): Do not call dwarf2out_abstract_function,
we have the abstract instance already.
(process_scope_var): Adjust decl DIE contexts in LTO which
first puts them in limbo.
(gen_decl_die): Do not generate type DIEs late apart from
types for VLAs or for decls we do not yet have a DIE.
(dwarf2out_early_global_decl): Make sure to create DIEs
for abstract instances of a decl first.
(dwarf2out_late_global_decl): Adjust comment.
(output_macinfo_op): With multiple macro sections use
macinfo_label_base to distinguish labels.
(output_macinfo): Likewise.  Update macinfo_label_base.
Pass in the line info label.

Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 08:12:27PM +0300, Alexander Monakov wrote:
> gcc/
>   * internal-fn.c (expand_GOMP_SIMT_LANE): New.
>   (expand_GOMP_SIMT_VF): New.
>   (expand_GOMP_SIMT_LAST_LANE): New.
>   (expand_GOMP_SIMT_ORDERED_PRED): New.
>   (expand_GOMP_SIMT_VOTE_ANY): New.
>   (expand_GOMP_SIMT_XCHG_BFLY): New.
>   (expand_GOMP_SIMT_XCHG_IDX): New.
>   * internal-fn.def (GOMP_SIMT_LANE): New.
>   (GOMP_SIMT_VF): New.
>   (GOMP_SIMT_LAST_LANE): New.
>   (GOMP_SIMT_ORDERED_PRED): New.
>   (GOMP_SIMT_VOTE_ANY): New.
>   (GOMP_SIMT_XCHG_BFLY): New.
>   (GOMP_SIMT_XCHG_IDX): New.
>   * omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
>   (create_omp_child_function): ...here.  Set "omp target entrypoint"
>   or "omp declare target" attribute based on is_gimple_omp_offloaded.
>   (omp_max_simt_vf): New.  Use it...
>   (omp_max_vf): ...here.
>   (lower_rec_input_clauses): Add reduction lowering for SIMT execution.
>   (lower_lastprivate_clauses): Likewise, for "lastprivate" lowering.
>   (lower_omp_ordered): Likewise, for "ordered" lowering.
>   (expand_omp_simd): Add SIMT transforms.
>   (pass_data_lower_omp): Add PROP_gimple_lomp_dev.
>   (execute_omp_device_lower): New.
>   (pass_data_omp_device_lower): New.
>   (pass_omp_device_lower): New pass.
>   (make_pass_omp_device_lower): New.
>   * passes.def (pass_omp_device_lower): Position new pass.
>   * tree-pass.h (PROP_gimple_lomp_dev): Define.
>   (make_pass_omp_device_lower): Declare.

Ok for trunk, once the needed corresponding config/nvptx bits are committed,
with one nit below that needs immediate action and the rest can be resolved
incrementally.  I'd like to check in afterwards the attached patch, at least
for now, so that non-offloaded SIMD code is less affected.  Once you have
the intended outlining of SIMT regions for PTX offloading done (IMHO the
best place to do that is in omp expansion, not gimplification), you can
either base it on that, or revert and do earlier.

> +
> +/* Return maximum SIMT width if offloading may target SIMT hardware.  */
> +
> +static int
> +omp_max_simt_vf (void)
> +{
> +  if (!optimize)
> +return 0;
> +  if (ENABLE_OFFLOADING)
> +for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c; )
> +  {
> + if (!strncmp (c, "nvptx", strlen ("nvptx")))
> +   return 32;
> + else if ((c = strchr (c, ',')))
> +   c++;
> +  }
> +  return 0;
> +}

As discussed privately, this means one has to manually set OFFLOAD_TARGET_NAMES
in the environment when invoking ./cc1 or ./cc1plus in order to match ./gcc -B 
./
etc. behavior.  I think it would be better to change the driver so that
it sets OFFLOAD_TARGET_NAMES= in the environment when ENABLE_OFFLOADING, but
-foffload option is used to disable all offloading and then in this function
use the configured in offloading targets if ENABLE_OFFLOADING and
OFFLOAD_TARGET_NAMES is not in the environment.  Can be done incrementally.

> +
>  /* Return maximum possible vectorization factor for the target.  */
>  
>  static int
> @@ -4277,16 +4306,18 @@ omp_max_vf (void)
>|| global_options_set.x_flag_tree_vectorize)))
>  return 1;
>  
> +  int vf = 1;
>int vs = targetm.vectorize.autovectorize_vector_sizes ();
>if (vs)
> +vf = 1 << floor_log2 (vs);
> +  else
>  {
> -  vs = 1 << floor_log2 (vs);
> -  return vs;
> +  machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> +  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> + vf = GET_MODE_NUNITS (vqimode);
>  }
> -  machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> -  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> -return GET_MODE_NUNITS (vqimode);
> -  return 1;
> +  int svf = omp_max_simt_vf ();
> +  return MAX (vf, svf);

Increasing the vf even for host in non-offloaded regions is undesirable.
Can be partly solved by the attached patch I'm planning to apply
incrementally, the other part is for the simd modifier of schedule clause,
there I think what we want is use conditional expression (GOMP_USE_SIMT () ?
omp_max_simt_vf () : omp_max_vf).  I'll try to handle the schedule clause
later.

> +class pass_omp_device_lower : public gimple_opt_pass
> +{
> +public:
> +  pass_omp_device_lower (gcc::context *ctxt)
> +: gimple_opt_pass (pass_data_omp_device_lower, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *fun)
> +{
> +  /* FIXME: inlining does not propagate the lomp_dev property.  */
> +  return 1 || !(fun->curr_properties & PROP_gimple_lomp_dev);

Please change this into
(ENABLE_OFFLOADING && (flag_openmp || in_lto))
for now, so that we don't waste compile time even when clearly it
isn't needed, and incrementally change the inliner to propagate
the property.

Jakub
2016-11-11  Jakub Jelinek  

* internal-fn.c (expand_GOMP_USE

Re: gomp-nvptx branch - libgomp changes

2016-11-11 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 08:11:45PM +0300, Alexander Monakov wrote:
> libgomp/
> 
>   * Makefile.am (libgomp_la_SOURCES): Add atomic.c, icv.c, icv-device.c.
>   * Makefile.in. Regenerate.
>   * configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it...
>   (LIBGOMP_USE_PTHREADS): ...here; new define.
>   * configure: Regenerate.
>   * config.h.in: Likewise.
>   * config/posix/affinity.c: Move to...
>   * affinity.c: ...here (new file).  Guard use of PThreads-specific

Never seen pthreads capitalized this way, use pthreads or Pthreads
or POSIX Threads.

>   interface by LIBGOMP_USE_PTHREADS. 
>   * critical.c: Split out GOMP_atomic_{start,end} into...
>   * atomic.c: ...here (new file).
>   * env.c: Split out ICV definitions into...
>   * icv.c: ...here (new file) and...
>   * icv-device.c: ...here. New file.
>   * config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c.
>   (gomp_destroy_lock_30): Ditto.
>   (gomp_set_lock_30): Ditto.
>   (gomp_unset_lock_30): Ditto.
>   (gomp_test_lock_30): Ditto.
>   (gomp_init_nest_lock_30): Ditto.
>   (gomp_destroy_nest_lock_30): Ditto.
>   (gomp_set_nest_lock_30): Ditto.
>   (gomp_unset_nest_lock_30): Ditto.
>   (gomp_test_nest_lock_30): Ditto.
>   * lock.c: New.
>   * config/nvptx/lock.c: New.
>   * config/nvptx/bar.c: New.
>   * config/nvptx/bar.h: New.
>   * config/nvptx/doacross.h: New.
>   * config/nvptx/error.c: New.
>   * config/nvptx/icv-device.c: New.
>   * config/nvptx/mutex.h: New.
>   * config/nvptx/pool.h: New.
>   * config/nvptx/proc.c: New.
>   * config/nvptx/ptrlock.h: New.
>   * config/nvptx/sem.h: New.
>   * config/nvptx/simple-bar.h: New.
>   * config/nvptx/target.c: New.
>   * config/nvptx/task.c: New.
>   * config/nvptx/team.c: New.
>   * config/nvptx/time.c: New.
>   * config/posix/simple-bar.h: New.
>   * libgomp.h: Guard pthread.h inclusion.  Include simple-bar.h.
>   (gomp_num_teams_var): Declare.
>   (struct gomp_thread_pool): Change threads_dock member to
>   gomp_simple_barrier_t.
>   [__nvptx__] (gomp_thread): New implementation.
>   (gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS.
>   (gomp_thread_destructor): Ditto.
>   (gomp_init_thread_affinity): Ditto.
>   * team.c: Guard uses of PThreads-specific interfaces by

Ditto.

>   LIBGOMP_USE_PTHREADS.  Adjust all uses of threads_dock.
>   (gomp_free_thread) [__nvptx__]: Do not call 'free'.
> 
>   * config/nvptx/alloc.c: Delete.
>   * config/nvptx/barrier.c: Ditto.
>   * config/nvptx/fortran.c: Ditto.
>   * config/nvptx/iter.c: Ditto.
>   * config/nvptx/iter_ull.c: Ditto.
>   * config/nvptx/loop.c: Ditto.
>   * config/nvptx/loop_ull.c: Ditto.
>   * config/nvptx/ordered.c: Ditto.
>   * config/nvptx/parallel.c: Ditto.
>   * config/nvptx/section.c: Ditto.
>   * config/nvptx/single.c: Ditto.
>   * config/nvptx/splay-tree.c: Ditto.
>   * config/nvptx/work.c: Ditto.
> 
>   * testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass
>   -foffload=-lgfortran in addition to -lgfortran.
>   * testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto.
> 
>   * plugin/plugin-nvptx.c: Include .
>   (struct targ_fn_descriptor): Add new fields.
>   (struct ptx_device): Ditto.  Set them...
>   (nvptx_open_device): ...here.
>   (nvptx_adjust_launch_bounds): New.
>   (nvptx_host2dev): Allow NULL 'nvthd'.
>   (nvptx_dev2host): Ditto.
>   (GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
>   (link_ptx): Adjust log sizes.
>   (nvptx_host2dev): Allow NULL 'nvthd'.
>   (nvptx_dev2host): Ditto.
>   (nvptx_set_clocktick): New.  Use it...
>   (GOMP_OFFLOAD_load_image): ...here.  Set new targ_fn_descriptor
>   fields.
>   (GOMP_OFFLOAD_dev2dev): New.
>   (nvptx_adjust_launch_bounds): New.
>   (nvptx_stacks_size): New.
>   (nvptx_stacks_alloc): New.
>   (nvptx_stacks_free): New.
>   (GOMP_OFFLOAD_run): New.
>   (GOMP_OFFLOAD_async_run): New (stub).

Ok for trunk, assuming the config/nvptx bits it relies on are checked
in first.  Two nits inline, the first one can be handled incrementally,
the latter one probably just remove the #if 0 stuff and if needed, replace
with something different incrementally.

> +void
> +gomp_barrier_wait_last (gomp_barrier_t *bar)
> +{
> +#if 0
> +  gomp_barrier_state_t state = gomp_barrier_wait_start (bar);
> +  if (state & BAR_WAS_LAST)
> +gomp_barrier_wait_end (bar, state);
> +#else
> +  gomp_barrier_wait (bar);
> +#endif
> +}

~~~
Any plans to change that later, or shall the #if 0 stuff be just removed?

> +/* NVPTX is an accelerator-only target, so this should never be called.  */
> +
> +bool
> +gomp_target_task_fn (void *data)
> +{
> +  __builtin_unreachable ();
> +}

~~~
Not sure if we don

Re: gomp-nvptx branch status

2016-11-11 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 08:09:51PM +0300, Alexander Monakov wrote:
> I'd like to provide an overview of the gomp-nvptx branch status. In response 
> to
> this message I'll send two more emails, with libgomp and middle-end changes on
> the branch.  Some of the changes to libgomp such as build machinery 
> adaptations
> have already received substantial comments in 2015, but the middle-end stuff 
> is
> mostly unreviewed I believe.
> 
> Middle-end changes mostly amount to adding SIMD-to-SIMT transforms in 
> omp-low.c,
> as shown on the Cauldron.  SIMT outlining via gimplifier abuse is not there, 
> and
> neither is cloning of SIMD/SIMT loops.  Outlining is required for correctness,
> and cloning is useful as it allows to avoid intermixing SIMD+SIMT and thus be
> sure that SIMT lowering does not 'dirty' SIMD loops and regress host/MIC
> vectorization.  I could argue that it's possible to improve my SIMT lowering 
> to
> avoid some dirtying (like moving loop-invariant calls to GOMP_SIMT_VF()), but
> the need for outlining makes that moot anyway, I think.

Approved with small nits, only very few requiring immediate action, the rest
can be handled incrementally once the changes are in.
Please work with Bernd on the config/nvptx bits.

> To get great performance this will need further changes everywhere, including
> in target-independent code, due to accidents like this bug (which I'd like to
> ping given the topic): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68706 

Do you or anyone else have suggestions on how to find out the threshold
between when it is worth to use just a global lock wrt. many separate
atomics?  In any case, we'd need to analyze all the operations for whether
we can use atomics for them, if we need to do use the lock for any of them,
then using it for all of them is probably better than many atomics + one
GOMP_atomic_* pair.  Then there is the case of user defined reductions, we
should try harder to use atomics for them.

> With OpenMP/PTX offloading there are 5 additional failures in 
> check-target-libgomp:
> 
> Two due to tests using 'usleep' in a target region:
> FAIL: libgomp.c/target-32.c (test for excess errors)
> FAIL: libgomp.c/thread-limit-2.c (test for excess errors)

Could these be "solved" say by something like:

--- libgomp/testsuite/libgomp.c/target-32.c.jj  2015-11-14 19:38:31.0 
+0100
+++ libgomp/testsuite/libgomp.c/target-32.c 2016-11-11 09:29:50.411072865 
+0100
@@ -1,7 +1,20 @@
 #include 
 #include 
+#include 
 
-int main ()
+static inline
+do_sleep (int cnt)
+{
+  int i;
+  if (omp_is_initial_device ())
+usleep (cnt);
+  else
+for (i = 0; i < 10 * cnt; i++)
+  asm volatile ("" : : : "memory");
+}
+
+int
+main ()
 {
   int a = 0, b = 0, c = 0, d[7];

plus folding omp_is_initial_device as a builtin in the offloading compiler
(which we want to do anyway and similar builtin is folded for OpenACC
already)? 

> 
> Two with 'target nowait' (not implemented)
> FAIL: libgomp.c/target-33.c execution test
> FAIL: libgomp.c/target-34.c execution test
> 
> One with 'target link' (not implemented)
> FAIL: libgomp.c/target-link-1.c (test for excess errors)

Can you work on implementing these during stage3?

Jakub

Re: gomp-nvptx branch - libgomp changes

2016-11-11 Thread Alexander Monakov

> > * affinity.c: ...here (new file).  Guard use of PThreads-specific
> 
> Never seen pthreads capitalized this way, use pthreads or Pthreads
> or POSIX Threads.

Right, sorry, not sure where I got that.  Will use 'Pthreads'.

> > +void
> > +gomp_barrier_wait_last (gomp_barrier_t *bar)
> > +{
> > +#if 0
> > +  gomp_barrier_state_t state = gomp_barrier_wait_start (bar);
> > +  if (state & BAR_WAS_LAST)
> > +gomp_barrier_wait_end (bar, state);
> > +#else
> > +  gomp_barrier_wait (bar);
> > +#endif
> > +}
> 
> ~~~
> Any plans to change that later, or shall the #if 0 stuff be just removed?

Yes, I want to develop a better understanding of bar.h interface contracts and
see if it's possible to come up with something better suited on PTX.  So I left
the #if 0 to serve as a reminder what the original code is doing.

> > +/* NVPTX is an accelerator-only target, so this should never be called.  */
> > +
> > +bool
> > +gomp_target_task_fn (void *data)
> > +{
> > +  __builtin_unreachable ();
> > +}
> 
> ~~~
> Not sure if we don't want to gomp_fatal instead or something similarly
> loud.

Originally that was intentional, to allow GCC to DCE paths leading to
gomp_target_task_fn in PTX libgomp. But then I realized that that idea
doesn't work fully because such paths may contain calls to other functions,
and since those functions might not return, the whole path cannot be eliminated.
Only the part between the preceding non-pure/const call and gomp_target_task_fn
can be DCE'd.

> On a related topic, it might be useful to #ifdef out parts of task.c
> - gomp_target_task_completion, GOMP_PLUGIN_target_task_completion,
> gomp_create_target_task for nvptx libgomp.a - the first one should be
> stubbed, the rest left out.  And perhaps at least for now simplify the
> task priority stuff, as OMP_MAX_TASK_PRIORITY var will not be present
> on the offloading side anyway.  Can be done incrementally.

I'm assuming this should use a new macro LIBGOMP_ACCEL_ONLY (name ok?) to cut
off extraneous code paths?

Thanks.
Alexander

Re: gomp-nvptx branch - libgomp changes

2016-11-11 Thread Jakub Jelinek

On Fri, Nov 11, 2016 at 11:43:06AM +0300, Alexander Monakov wrote:
> > > +void
> > > +gomp_barrier_wait_last (gomp_barrier_t *bar)
> > > +{
> > > +#if 0
> > > +  gomp_barrier_state_t state = gomp_barrier_wait_start (bar);
> > > +  if (state & BAR_WAS_LAST)
> > > +gomp_barrier_wait_end (bar, state);
> > > +#else
> > > +  gomp_barrier_wait (bar);
> > > +#endif
> > > +}
> > 
> > ~~~
> > Any plans to change that later, or shall the #if 0 stuff be just removed?
> 
> Yes, I want to develop a better understanding of bar.h interface contracts and
> see if it's possible to come up with something better suited on PTX.  So I 
> left
> the #if 0 to serve as a reminder what the original code is doing.

Can you then turn it into a comment instead?  There doesn't need to be code,
it can just say that config/linux/bar.c (gomp_barrier_wait_last) has a
separate implementation for the last and that we should consider
reimplementing it similarly.
> 
> > > +/* NVPTX is an accelerator-only target, so this should never be called.  
> > > */
> > > +
> > > +bool
> > > +gomp_target_task_fn (void *data)
> > > +{
> > > +  __builtin_unreachable ();
> > > +}
> > 
> > ~~~
> > Not sure if we don't want to gomp_fatal instead or something similarly
> > loud.
> 
> Originally that was intentional, to allow GCC to DCE paths leading to
> gomp_target_task_fn in PTX libgomp. But then I realized that that idea
> doesn't work fully because such paths may contain calls to other functions,
> and since those functions might not return, the whole path cannot be 
> eliminated.
> Only the part between the preceding non-pure/const call and 
> gomp_target_task_fn
> can be DCE'd.
> 
> > On a related topic, it might be useful to #ifdef out parts of task.c
> > - gomp_target_task_completion, GOMP_PLUGIN_target_task_completion,
> > gomp_create_target_task for nvptx libgomp.a - the first one should be
> > stubbed, the rest left out.  And perhaps at least for now simplify the
> > task priority stuff, as OMP_MAX_TASK_PRIORITY var will not be present
> > on the offloading side anyway.  Can be done incrementally.
> 
> I'm assuming this should use a new macro LIBGOMP_ACCEL_ONLY (name ok?) to cut
> off extraneous code paths?

I'd go for LIBGOMP_OFFLOADED_ONLY, we don't really use accel/ACCEL in libgomp.

Jakub

Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Alexander Monakov

On Fri, 11 Nov 2016, Jakub Jelinek wrote:
[...]
> the intended outlining of SIMT regions for PTX offloading done (IMHO the
> best place to do that is in omp expansion, not gimplification)

Sorry, I couldn't find a good way to implement that during omp expansion.  The
reason I went for gimplification is automatic discovery of sharing clauses -
I'm assuming in expansion it's very hard to try and fill omp_data_[sio] without
gimplifier's help.  Does this sound sensible?

Thanks.
Alexander

Re: [PATCH] Fix PR78189

2016-11-11 Thread Richard Biener

On Thu, 10 Nov 2016, Christophe Lyon wrote:

> On 10 November 2016 at 09:34, Richard Biener  wrote:
> > On Wed, 9 Nov 2016, Christophe Lyon wrote:
> >
> >> On 9 November 2016 at 09:36, Bin.Cheng  wrote:
> >> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener  wrote:
> >> >> On Mon, 7 Nov 2016, Christophe Lyon wrote:
> >> >>
> >> >>> Hi Richard,
> >> >>>
> >> >>>
> >> >>> On 7 November 2016 at 09:01, Richard Biener  wrote:
> >> >>> >
> >> >>> > The following fixes an oversight when computing alignment in the
> >> >>> > vectorizer.
> >> >>> >
> >> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to 
> >> >>> > trunk.
> >> >>> >
> >> >>> > Richard.
> >> >>> >
> >> >>> > 2016-11-07  Richard Biener  
> >> >>> >
> >> >>> > PR tree-optimization/78189
> >> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): 
> >> >>> > Fix
> >> >>> > alignment computation.
> >> >>> >
> >> >>> > * g++.dg/torture/pr78189.C: New testcase.
> >> >>> >
> >> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
> >> >>> > ===
> >> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
> >> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
> >> >>> > @@ -0,0 +1,41 @@
> >> >>> > +/* { dg-do run } */
> >> >>> > +/* { dg-additional-options "-ftree-slp-vectorize 
> >> >>> > -fno-vect-cost-model" } */
> >> >>> > +
> >> >>> > +#include 
> >> >>> > +
> >> >>> > +struct A
> >> >>> > +{
> >> >>> > +  void * a;
> >> >>> > +  void * b;
> >> >>> > +};
> >> >>> > +
> >> >>> > +struct alignas(16) B
> >> >>> > +{
> >> >>> > +  void * pad;
> >> >>> > +  void * misaligned;
> >> >>> > +  void * pad2;
> >> >>> > +
> >> >>> > +  A a;
> >> >>> > +
> >> >>> > +  void Null();
> >> >>> > +};
> >> >>> > +
> >> >>> > +void B::Null()
> >> >>> > +{
> >> >>> > +  a.a = nullptr;
> >> >>> > +  a.b = nullptr;
> >> >>> > +}
> >> >>> > +
> >> >>> > +void __attribute__((noinline,noclone))
> >> >>> > +NullB(void * misalignedPtr)
> >> >>> > +{
> >> >>> > +  B* b = reinterpret_cast(reinterpret_cast >> >>> > *>(misalignedPtr) - offsetof(B, misaligned));
> >> >>> > +  b->Null();
> >> >>> > +}
> >> >>> > +
> >> >>> > +int main()
> >> >>> > +{
> >> >>> > +  B b;
> >> >>> > +  NullB(&b.misaligned);
> >> >>> > +  return 0;
> >> >>> > +}
> >> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >> >>> > index 9346cfe..b03cb1e 100644
> >> >>> > --- gcc/tree-vect-data-refs.c
> >> >>> > +++ gcc/tree-vect-data-refs.c
> >> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
> >> >>> > data_reference *dr)
> >> >>> >base = ref;
> >> >>> >while (handled_component_p (base))
> >> >>> >  base = TREE_OPERAND (base, 0);
> >> >>> > +  unsigned int base_alignment;
> >> >>> > +  unsigned HOST_WIDE_INT base_bitpos;
> >> >>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
> >> >>> > +  /* As data-ref analysis strips the MEM_REF down to its base 
> >> >>> > operand
> >> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have 
> >> >>> > to
> >> >>> > + adjust things to make base_alignment valid as the alignment of
> >> >>> > + DR_BASE_ADDRESS.  */
> >> >>> >if (TREE_CODE (base) == MEM_REF)
> >> >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
> >> >>> > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 
> >> >>> > 1)), 0));
> >> >>> > -  unsigned int base_alignment = get_object_alignment (base);
> >> >>> > +{
> >> >>> > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
> >> >>> > BITS_PER_UNIT;
> >> >>> > +  base_bitpos &= (base_alignment - 1);
> >> >>> > +}
> >> >>> > +  if (base_bitpos != 0)
> >> >>> > +base_alignment = base_bitpos & -base_bitpos;
> >> >>> > +  /* Also look at the alignment of the base address DR analysis
> >> >>> > + computed.  */
> >> >>> > +  unsigned int base_addr_alignment = get_pointer_alignment 
> >> >>> > (base_addr);
> >> >>> > +  if (base_addr_alignment > base_alignment)
> >> >>> > +base_alignment = base_addr_alignment;
> >> >>> >
> >> >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
> >> >>> >  DR_VECT_AUX (dr)->base_element_aligned = true;
> >> >>>
> >> >>> Since you committed this patch (r241892), I'm seeing execution 
> >> >>> failures:
> >> >>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
> >> >>>   gcc.dg/vect/pr40074.c execution test
> >> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
> >> >>> --with-fpu=neon-fp16
> >> >>> (using qemu as simulator)
> >> >>
> >> >> The difference is that we now vectorize the testcase with versioning
> >> >> for alignment (but it should never execute the vectorized variant).
> >> >> I need arm peoples help to understand what is wrong.
> >> > Hi All,
> >> > I will look at it.
> >> >
> >>
> >> Hi,
> >>
> >> This is causing new regressions on armeb:
> >>   gcc.d

Re: [PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-11 Thread Dominik Vogt

On Thu, Nov 10, 2016 at 06:17:57PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 11, 2016 at 12:47:02AM +0100, Dominik Vogt wrote:
> > On Thu, Nov 03, 2016 at 11:40:44AM +0100, Dominik Vogt wrote:
> > > The attached patch fixes the stack layout problems on AIX and
> > > Power as described here:
> > > 
> > >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359
> > > 
> > > The patch has been bootstrapped on AIX (32 Bit) and bootstrappend
> > > and regression tested on Power (biarch).  It needs more testing
> > > that I cannot do with the hardware available to me.
> > > 
> > > If the patch is good, this one can be re-applied:
> > > 
> > >   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01730.html
> > >   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01616.html
> > 
> > So, is this patch in order to be committed?  (Assuming that a
> > followup patch will clean up the rs6000.h+aix.h quirks.)
> 
> You say it needs more testing -- what testing?

Regression testing on AIX (David has done this in reply to the
original message), possibly also on 32-Bit Power, if that is a
reasonable target.

> (And it needs to be posted to gcc-patches@ of course).

This discussion was originally on gcc-patches, but somehow it got
removed from the recipient list.

> > > +   : (cfun->calls_alloca \
> > > +  ? RS6000_ALIGN (crtl->outgoing_args_size + RS6000_SAVE_AREA, 16)   
> > > \
> > > +  : (RS6000_ALIGN (crtl->outgoing_args_size, 16) + 
> > > RS6000_SAVE_AREA)))
> 
> Maybe you can make the comment explain these last two lines as well...  It
> seems to me you want to align STARTING_FRAME_OFFSET if calls_alloca?

Done.

> Also add a comment for the one in rs6000.h?

Done.

> > > +   RS6000_ALIGN (crtl->outgoing_args_size + (STACK_POINTER_OFFSET), 16)
> 
> You don't need parens around STACK_POINTER_OFFSET.

Done.

New patch attached (added more comments and removed parentheses;
not re-tested).  (The first attachment is a diff between the two
patches.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
--- old/0001-PR77359-Properly-align-local-variables-in-functions-.patch 
2016-11-11 09:52:16.535439142 +0100
+++ new/0001-PR77359-Properly-align-local-variables-in-functions-.patch 
2016-11-11 09:50:03.715439142 +0100
@@ -1,4 +1,4 @@
-From bd36042fd82e29204d2f10c180b9e7c27281eef2 Mon Sep 17 00:00:00 2001
+From faae30210f584bba92ab96aac479ae8f253e59b7 Mon Sep 17 00:00:00 2001
 From: Dominik Vogt 
 Date: Fri, 28 Oct 2016 12:59:55 +0100
 Subject: [PATCH 1/2] PR77359: Properly align local variables in functions
@@ -7,16 +7,16 @@
 See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359 for a discussion of the
 problem and the fix.
 ---
- gcc/config/rs6000/aix.h| 27 +++
+ gcc/config/rs6000/aix.h| 35 +++
  gcc/config/rs6000/rs6000.c |  9 +++--
- gcc/config/rs6000/rs6000.h | 14 --
- 3 files changed, 42 insertions(+), 8 deletions(-)
+ gcc/config/rs6000/rs6000.h | 26 ++
+ 3 files changed, 60 insertions(+), 10 deletions(-)
 
 diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
-index b254236..7773517 100644
+index b254236..f6eb122 100644
 --- a/gcc/config/rs6000/aix.h
 +++ b/gcc/config/rs6000/aix.h
-@@ -40,6 +40,33 @@
+@@ -40,6 +40,41 @@
  #undef  STACK_BOUNDARY
  #define STACK_BOUNDARY 128
  
@@ -27,7 +27,12 @@
 +
 +   On the RS/6000, the frame pointer is the same as the stack pointer,
 +   except for dynamic allocations.  So we start after the fixed area and
-+   outgoing parameter area.  */
++   outgoing parameter area.
++
++   If the function uses dynamic stack space (CALLS_ALLOCA is set), that
++   space needs to be aligned to STACK_BOUNDARY, i.e. the sum of the
++   sizes of the fixed area and the parameter area must be a multiple of
++   STACK_BOUNDARY.  */
 +
 +#undef STARTING_FRAME_OFFSET
 +#define STARTING_FRAME_OFFSET \
@@ -42,10 +47,13 @@
 +
 +   The default value for this macro is `STACK_POINTER_OFFSET' plus the
 +   length of the outgoing arguments.  The default is correct for most
-+   machines.  See `function.c' for details.  */
++   machines.  See `function.c' for details.
++
++   This value must be a multiple of STACK_BOUNDARY (hard coded in
++   `emit-rtl.c').  */
 +#undef STACK_DYNAMIC_OFFSET
 +#define STACK_DYNAMIC_OFFSET(FUNDECL) \
-+   RS6000_ALIGN (crtl->outgoing_args_size + (STACK_POINTER_OFFSET), 16)
++   RS6000_ALIGN (crtl->outgoing_args_size + STACK_POINTER_OFFSET, 16)
 +
  #undef  TARGET_IEEEQUAD
  #define TARGET_IEEEQUAD 0
@@ -71,10 +79,21 @@
  info->vars_size
+= RS6000_ALIGN (info->fixed_size + info->vars_size + info->parm_size,
 diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
-index 4b83abd..c11dc1b 100644
+index 4b83abd..afda416 100644
 --- a/gcc/config/rs6000/rs6000.h
 +++ b/gcc/config/rs6000/rs6000.h
-@@ -1728,9 +1728,12 @@ extern enum reg_class

Re: [PATCH] S390: Fix PR/77822.

2016-11-11 Thread Andreas Krebbel

On 11/08/2016 03:38 PM, Dominik Vogt wrote:
> The attached patch fixes PR/77822 on s390/s390x dor gcc-6 *only*.
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822
> 
> Bootstrapped and regression tested on s390 and s390x biarch on a
> zEC12.
> 
> For gcc-7, there will be a different patch.

Applied to GCC 6 branch.  Thanks!
Please remember adding the PR number to the changelog entries to trigger 
bugzilla adding a comment
to the PR.

As discussed offlist the range check for the position operand could be moved to 
a predicate.  This
will be part of the GCC head patch.

I've just noticed that I had such checks already for the insv patterns and have 
added one to the
expander as well later. So for zero_extract as target operand this appeared to 
be a problem even
before GCC 6.

Bye,

-Andreas-

Re: [PING 5, PATCH] Remove xfail from thread_local-order2.C.

2016-11-11 Thread Dominik Vogt

On Mon, Jun 20, 2016 at 02:41:21PM +0100, Dominik Vogt wrote:
> Patch:
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01587.html
> 
> On Wed, Jan 27, 2016 at 10:39:44AM +0100, Dominik Vogt wrote:
> > g++.dg/tls/thread_local-order2.C no longer fail with Glibc-2.18 or
> > newer since this commit:
> > 
> >   2014-08-01  Zifei Tong  
> > 
> > * libsupc++/atexit_thread.cc (HAVE___CXA_THREAD_ATEXIT_IMPL): 
> > Add
> > _GLIBCXX_ prefix to macro.
> > 
> >   git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@213504 
> > 138bc75d-0d04-0410-96
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2014-07/msg02091.html
> > 
> > So, is it time to remove the xfail from the test case?

> > gcc/testsuite/ChangeLog
> > 
> > * g++.dg/tls/thread_local-order2.C: Remove xfail.
> 
> > >From 0b0abbd2e6d9d8b6857622065bdcbdde31b5ddb0 Mon Sep 17 00:00:00 2001
> > From: Dominik Vogt 
> > Date: Wed, 27 Jan 2016 09:54:07 +0100
> > Subject: [PATCH] Remove xfail from thread_local-order2.C.
> > 
> > This should work with Glibc-2.18 or newer.
> > ---
> >  gcc/testsuite/g++.dg/tls/thread_local-order2.C | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order2.C 
> > b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
> > index f8df917..d3351e6 100644
> > --- a/gcc/testsuite/g++.dg/tls/thread_local-order2.C
> > +++ b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
> > @@ -2,7 +2,6 @@
> >  // that isn't reverse order of construction.  We need to move
> >  // __cxa_thread_atexit into glibc to get this right.
> >  
> > -// { dg-do run { xfail *-*-* } }
> >  // { dg-require-effective-target c++11 }
> >  // { dg-add-options tls }
> >  // { dg-require-effective-target tls_runtime }
> > -- 
> > 2.3.0

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Jakub Jelinek

On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> [...]
> > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > best place to do that is in omp expansion, not gimplification)
> 
> Sorry, I couldn't find a good way to implement that during omp expansion.  The
> reason I went for gimplification is automatic discovery of sharing clauses -
> I'm assuming in expansion it's very hard to try and fill omp_data_[sio] 
> without
> gimplifier's help.  Does this sound sensible?

Sure, for discovery of needed sharing clauses the gimplifier has the right
infrastructure.  But that doesn't mean you can't add those clauses at
gimplification time and do the outlining at omp expansion time.
That is what is done for omp parallel, task etc. as well.  If the standard
OpenMP clauses can't serve that purpose, there is always the possibility of
adding further internal clauses, that would e.g. be only considered for the
SIMT stuff.  For the outlining, our current infrastructure really wants to
have CFG etc., something you don't have at gimplification time.

Jakub

Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Alexander Monakov

On Fri, 11 Nov 2016, Jakub Jelinek wrote:

> On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> > On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > [...]
> > > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > > best place to do that is in omp expansion, not gimplification)
> > 
> > Sorry, I couldn't find a good way to implement that during omp expansion.  
> > The
> > reason I went for gimplification is automatic discovery of sharing clauses -
> > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] 
> > without
> > gimplifier's help.  Does this sound sensible?
> 
> Sure, for discovery of needed sharing clauses the gimplifier has the right
> infrastructure.  But that doesn't mean you can't add those clauses at
> gimplification time and do the outlining at omp expansion time.
> That is what is done for omp parallel, task etc. as well.  If the standard
> OpenMP clauses can't serve that purpose, there is always the possibility of
> adding further internal clauses, that would e.g. be only considered for the
> SIMT stuff.  For the outlining, our current infrastructure really wants to
> have CFG etc., something you don't have at gimplification time.

Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to inject
a parallel region with an artificial _simtreg_ clause, transforming

  #pragma omp simd
  for (...)

into

  #pragma omp parallel _simtreg_
#pragma omp simd
for (...)

and then expansion of 'omp parallel' can check presence of _simtreg_ clause and
emit a direct call rather than an invocation of GOMP_parallel.

(a few days ago I've sent you privately a patch implementing the above)

Thanks.
Alexander

Re: [PATCH v3] PR77359: Properly align local variables in functions calling alloca.

2016-11-11 Thread Dominik Vogt

On Fri, Nov 11, 2016 at 12:11:49AM -0500, David Edelsohn wrote:
> On Thu, Nov 10, 2016 at 6:47 PM, Dominik Vogt  wrote:
> > On Thu, Nov 03, 2016 at 11:40:44AM +0100, Dominik Vogt wrote:
> >> The attached patch fixes the stack layout problems on AIX and
> >> Power as described here:
> >>
> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359
> >>
> >> The patch has been bootstrapped on AIX (32 Bit) and bootstrappend
> >> and regression tested on Power (biarch).  It needs more testing
> >> that I cannot do with the hardware available to me.
> >>
> >> If the patch is good, this one can be re-applied:
> >>
> >>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01730.html
> >>   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01616.html
> >
> > So, is this patch in order to be committed?  (Assuming that a
> > followup patch will clean up the rs6000.h+aix.h quirks.)
> 
> Please also update the ASCII pictures above the rs6000_stack_info()
> function in rs6000.c to show / describe the new padding for alignment.

Like in the new patch?  (Please double check that I got this right
in all three cases).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/rs6000/rs6000.c (rs6000_stack_info): PR/77359: Properly align
local variables in functions calling alloca.  Also update the ASCII
drawings
* config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
PR/77359: Likewise.
* config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
PR/77359: Copy AIX specific versions of the rs6000.h macros to aix.h.
>From a80234c45173bebbfa07e810ff534c60591a8b76 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 28 Oct 2016 12:59:55 +0100
Subject: [PATCH] PR/77359: Properly align local variables in functions
 calling alloca.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359 for a discussion of the
problem and the fix.
---
 gcc/config/rs6000/aix.h| 35 +++
 gcc/config/rs6000/rs6000.c | 24 +++-
 gcc/config/rs6000/rs6000.h | 26 ++
 3 files changed, 72 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
index b254236..f6eb122 100644
--- a/gcc/config/rs6000/aix.h
+++ b/gcc/config/rs6000/aix.h
@@ -40,6 +40,41 @@
 #undef  STACK_BOUNDARY
 #define STACK_BOUNDARY 128
 
+/* Offset within stack frame to start allocating local variables at.
+   If FRAME_GROWS_DOWNWARD, this is the offset to the END of the
+   first local allocated.  Otherwise, it is the offset to the BEGINNING
+   of the first local allocated.
+
+   On the RS/6000, the frame pointer is the same as the stack pointer,
+   except for dynamic allocations.  So we start after the fixed area and
+   outgoing parameter area.
+
+   If the function uses dynamic stack space (CALLS_ALLOCA is set), that
+   space needs to be aligned to STACK_BOUNDARY, i.e. the sum of the
+   sizes of the fixed area and the parameter area must be a multiple of
+   STACK_BOUNDARY.  */
+
+#undef STARTING_FRAME_OFFSET
+#define STARTING_FRAME_OFFSET  \
+  (FRAME_GROWS_DOWNWARD
\
+   ? 0 \
+   : (cfun->calls_alloca   \
+  ? RS6000_ALIGN (crtl->outgoing_args_size + RS6000_SAVE_AREA, 16) \
+  : (RS6000_ALIGN (crtl->outgoing_args_size, 16) + RS6000_SAVE_AREA)))
+
+/* Offset from the stack pointer register to an item dynamically
+   allocated on the stack, e.g., by `alloca'.
+
+   The default value for this macro is `STACK_POINTER_OFFSET' plus the
+   length of the outgoing arguments.  The default is correct for most
+   machines.  See `function.c' for details.
+
+   This value must be a multiple of STACK_BOUNDARY (hard coded in
+   `emit-rtl.c').  */
+#undef STACK_DYNAMIC_OFFSET
+#define STACK_DYNAMIC_OFFSET(FUNDECL)  \
+   RS6000_ALIGN (crtl->outgoing_args_size + STACK_POINTER_OFFSET, 16)
+
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f9e4739..4f3b886 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25781,7 +25781,7 @@ rs6000_savres_strategy (rs6000_stack_t *info,
+---+
| saved TOC pointer | 20  40
+---+
-   | Parameter save area (P)   | 24  48
+   | Parameter save area (+padding*) (P)   | 24  48
+---+
| Alloca space (A)  | 24+Petc.
+---+
@@ -25802,6 +25802,9 @@ rs6000_savres_strategy (rs6000_stack_t *info,

[PATCH TEST]Only drop xfail for gcc.dg/vect/vect-cond-2.c on targets supporting vect_max_reduc

2016-11-11 Thread Bin Cheng

Hi,
Test gcc.dg/vect/vect-cond-2.c still requires vect_max_reduc to be vectorized, 
this patch adds the requirement.

Thanks,
bin

gcc/testsuite/ChangeLog
2016-11-09  Bin Cheng  

* gcc.dg/vect/vect-cond-2.c: Only drop xfail for targets supporting
vect_max_reduc.diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-2.c
index d7da803..646eac1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-2.c
@@ -39,6 +39,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! 
vect_max_reduc } } } } */

Re: [PATCH] S390: Fix PR/77822.

2016-11-11 Thread Matthias Klose

On 11.11.2016 09:58, Andreas Krebbel wrote:
> On 11/08/2016 03:38 PM, Dominik Vogt wrote:
>> The attached patch fixes PR/77822 on s390/s390x dor gcc-6 *only*.
>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822
>>
>> Bootstrapped and regression tested on s390 and s390x biarch on a
>> zEC12.
>>
>> For gcc-7, there will be a different patch.
> 
> Applied to GCC 6 branch.  Thanks!
> Please remember adding the PR number to the changelog entries to trigger 
> bugzilla adding a comment
> to the PR.
> 
> As discussed offlist the range check for the position operand could be moved 
> to a predicate.  This
> will be part of the GCC head patch.
> 
> I've just noticed that I had such checks already for the insv patterns and 
> have added one to the
> expander as well later. So for zero_extract as target operand this appeared 
> to be a problem even
> before GCC 6.

the gcc-6-branch now has the ChangeLog entry for gcc.target/s390/pr77822.c but
not the test case.

Matthias

Re: [Patch 1/5] OpenACC tile clause support, OMP_CLAUSE_TILE adjustments

2016-11-11 Thread Jakub Jelinek

Hi!

On Thu, Nov 10, 2016 at 06:44:52PM +0800, Chung-Lin Tang wrote:

Above this it is fine.

> @@ -9388,10 +9373,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>(OMP_FOR_INIT (for_stmt))
>  * 2);
>  }
> -  int collapse = 1;
> -  c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_COLLAPSE);
> -  if (c)
> -collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
> +  int collapse = 0;
> +  /* Find the first of COLLAPSE or TILE.  */
> +  for (c = OMP_FOR_CLAUSES (for_stmt); c; c = TREE_CHAIN (c))
> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_COLLAPSE)
> +  {
> + collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
> + if (collapse == 1)
> +   /* Not really collapsing.  */
> +   collapse = 0;
> + break;
> +  }
> +else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_TILE)
> +  {
> + collapse = list_length (OMP_CLAUSE_TILE_LIST (c));
> + break;
> +  }

I don't really like this, especially pretending collapse(1) or lack
of collapse clause e.g. on OpenMP construct is collapse(0).
I'd keep what it does, i.e. 
  int collapse = 1;
  c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_COLLAPSE);
  if (c)
collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
and in the first switch in gimplify_omp_for you can:
case OACC_LOOP:
  ort = ORT_ACC;
  c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_TILE);
  if (c)
tile = list_length (OMP_CLAUSE_TILE_LIST (c));
  break;
and then just use tile != 0 or whatever || with collapse > 1 where needed.

  > +
>for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++)
>  {
>t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);
> @@ -9807,7 +9805,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
> OMP_CLAUSE_LINEAR_STEP (c2) = OMP_CLAUSE_LINEAR_STEP (c);
>   }
>  
> -  if ((var != decl || collapse > 1) && orig_for_stmt == for_stmt)
> +  if ((var != decl || collapse) && orig_for_stmt == for_stmt)
>   {
> for (c = OMP_FOR_CLAUSES (for_stmt); c ; c = OMP_CLAUSE_CHAIN (c))
>   if (((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE

Like here.

> @@ -9817,7 +9815,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>&& OMP_CLAUSE_LINEAR_GIMPLE_SEQ (c) == NULL))
>   && OMP_CLAUSE_DECL (c) == decl)
> {
> - if (is_doacross && (collapse == 1 || i >= collapse))
> + if (is_doacross && (!collapse || i >= collapse))
> t = var;
>   else
> {

And not here.  You don't really have doacross loops in OpenACC, do you?

Jakub

Re: [Patch 2/5] OpenACC tile clause support, omp-low parts

2016-11-11 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 06:45:10PM +0800, Chung-Lin Tang wrote:
> 2016-XX-XX  Nathan Sidwell  
> 
> * internal-fn.def (GOACC_DIM_POS): Comment may be overly conservative.
> (GOACC_TILE): New.
> * internal-fn.c (expand_GOACC_TILE): New.
> 
> * omp-low.c (struct omp_for_data): Add tiling field.
> (struct oacc_loop): Change 'ifns' to vector of call stmts,
> add e_mask field.

Please avoid using 8 spaces instead of a tab in ChangeLog.

>   dump partitioning.
>   (oacc_loop_auto_partitions): Add outer_assign parm. Assign all but
> vector partitioning to outer loops.  Assign 2 partitions to loops
> when available. Add TILE handling.
> (oacc_loop_partition): Adjust oacc_loop_auto_partitions call.
> (execite_oacc_device_lower): Process GOACC_TILE fns, ignore unknown 
> specs.

Here too.  execute instead of execite?  And the last line is too long.

> @@ -626,7 +638,8 @@ extract_omp_for_data (gomp_for *for_stmt, struct o
>int cnt = fd->ordered ? fd->ordered : fd->collapse;
>for (i = 0; i < cnt; i++)
>  {
> -  if (i == 0 && fd->collapse == 1 && (fd->ordered == 0 || loops == NULL))
> +  if (i == 0 && fd->collapse == 1 && !fd->tiling
> +   && (fd->ordered == 0 || loops == NULL))
>   loop = &fd->loop;
>else if (loops != NULL)
>   loop = loops + i;

If the condition fits on one line, it can stay as is, if it can't, then
you should use a
if (i == 0
&& fd->collapse == 1
&& !fd->tiling
&& (fd->ordered == 0 || loops == NULL))
IMHO.

> +   tree tile = TREE_VALUE (tiling);
> +   gcall *call = gimple_build_call_internal
> + (IFN_GOACC_TILE, 5, num, loop_no, tile,
> +  /* gwv-outer=*/integer_zero_node,
> +  /* gwv-inner=*/integer_zero_node);

I don't really like the ( on separate line unless absolutely necessary.
So better:

  gcall *call
= gimple_build_call_internal (IFN_GOACC_TILE, 5, num, loop_no,
  tile, integer_zero_node,
  integer_zero_node);

> +   call = gimple_build_call_internal
> + (IFN_GOACC_LOOP, 7,
> +  build_int_cst (integer_type_node, IFN_GOACC_LOOP_OFFSET),
> +  dir, e_range, element_s, chunk, e_gwv, chunk);

Similarly.  For the build_int_cst argument just add a temporary with
a short name (e.g. t) and initialize it to build_int_cst before the
gimple_build_call_internal.

> +   gimple_call_set_lhs (call, e_offset);
> +   gimple_set_location (call, loc);
> +   gsi_insert_before (&gsi, call, GSI_SAME_STMT);
> +
> +   call = gimple_build_call_internal
> + (IFN_GOACC_LOOP, 7,
> +  build_int_cst (integer_type_node, IFN_GOACC_LOOP_BOUND),
> +  dir, e_range, element_s, chunk, e_gwv, e_offset);
> +   gimple_call_set_lhs (call, e_bound);
> +   gimple_set_location (call, loc);
> +   gsi_insert_before (&gsi, call, GSI_SAME_STMT);
> +
> +   call = gimple_build_call_internal
> + (IFN_GOACC_LOOP, 6,
> +  build_int_cst (integer_type_node, IFN_GOACC_LOOP_STEP),
> +  dir, e_range, element_s, chunk, e_gwv);

And again 2x.

>if (cont_bb)
>  {
> -  /* We now have one or two nested loops.  Update the loop
> +  /* We now have one,  two or three nested loops.  Update the loop

Only one space after , - we use 2 spaces only after full stop.

> @@ -11537,6 +11712,15 @@ expand_oacc_for (struct omp_region *region, struct
> body_loop->header = body_bb;
> body_loop->latch = cont_bb;
> add_loop (body_loop, parent);
> +
> +   if (fd->tiling)
> + {
> +   // Insert tiling's element loop

Please use /* */ style comment instead, plus full stop:
  /* Insert tiling's element loop.  */
> +   struct loop *inner_loop = alloc_loop ();
> +   inner_loop->header = elem_body_bb;
> +   inner_loop->latch = elem_cont_bb;
> +   add_loop (inner_loop, body_loop);

> +static void
> +oacc_xform_tile (gcall *call)
> +{
> +  gimple_stmt_iterator gsi = gsi_for_stmt (call);
> +  unsigned collapse = (unsigned) TREE_INT_CST_LOW (gimple_call_arg (call, 
> 0));
> +  /* Inner loops have higher loop_nos.  */
> +  unsigned loop_no = (unsigned) TREE_INT_CST_LOW (gimple_call_arg (call, 1));
> +  tree tile_size = gimple_call_arg (call, 2);
> +  unsigned e_mask = (unsigned) TREE_INT_CST_LOW (gimple_call_arg (call, 4));

Please use
  unsigned collapse = tree_to_uhwi (gimple_call_arg (call, 0));
etc. instead.

> +  tree lhs = gimple_call_lhs (call);
> +  tree type = TREE_TYPE (lhs);
> +  gimple_seq seq = NULL;
> +  tree span = build_int_cst (type, 1);
> +
> +  gcc_assert (!(e_mask
> + & ~(GOMP_DIM_MASK (GOMP_DIM_VECTOR)
> + | GOMP_DIM_MASK (GOMP_DIM_WORKER;
> +  push_gimplify_context (!seen_error ());
> +  if (
> +#ifndef ACCEL_COMPILER
> +

Re: [fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+

2016-11-11 Thread Rainer Orth

Hi Bruce,

> On 11/03/16 07:11, Rainer Orth wrote:
>>
>> Ok for mainline now, and for backports to the gcc-6 and gcc-5 branches
>> after some soak time?
>
> Yes, please.  Thanks.

unfortunately, I didn't look closly enough when checking for failures.
There is one which I thought was preexisting:

math.h /vol/gcc/src/hg/trunk/local/fixincludes/tests/base/math.h differ: char 
1135, line 57
*** math.h  Thu Nov 10 22:32:14 2016
--- /vol/gcc/src/hg/trunk/local/fixincludes/tests/base/math.h   Thu Nov 10 
18:49:36 2016
***
*** 54,61 
--- 54,63 
  
  #if defined( HPUX11_FABSF_CHECK )
  #ifdef _PA_RISC
+ #ifndef __cplusplus
  #  define fabsf(x) ((float)fabs((double)(float)(x)))
  #endif
+ #endif
  #endif  /* HPUX11_FABSF_CHECK */
  
Closer inspection shows that this is due to hpux11_fabsf using

bypass= "__cplusplus";

While this is fine for the fix itself, it's quite fragile since (as this
patch shows) it breaks as soon as any math.h test_text happens to
include __cplusplus, which is not that unlikely ;-)

I guess the solution is to use a more specific bypass pattern to make
the fix more robust.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Jakub Jelinek

On Fri, Nov 11, 2016 at 12:28:16PM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> 
> > On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> > > On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > > [...]
> > > > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > > > best place to do that is in omp expansion, not gimplification)
> > > 
> > > Sorry, I couldn't find a good way to implement that during omp expansion. 
> > >  The
> > > reason I went for gimplification is automatic discovery of sharing 
> > > clauses -
> > > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] 
> > > without
> > > gimplifier's help.  Does this sound sensible?
> > 
> > Sure, for discovery of needed sharing clauses the gimplifier has the right
> > infrastructure.  But that doesn't mean you can't add those clauses at
> > gimplification time and do the outlining at omp expansion time.
> > That is what is done for omp parallel, task etc. as well.  If the standard
> > OpenMP clauses can't serve that purpose, there is always the possibility of
> > adding further internal clauses, that would e.g. be only considered for the
> > SIMT stuff.  For the outlining, our current infrastructure really wants to
> > have CFG etc., something you don't have at gimplification time.
> 
> Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to 
> inject
> a parallel region with an artificial _simtreg_ clause, transforming
> 
>   #pragma omp simd
>   for (...)
> 
> into
> 
>   #pragma omp parallel _simtreg_
> #pragma omp simd
> for (...)
> 
> and then expansion of 'omp parallel' can check presence of _simtreg_ clause 
> and
> emit a direct call rather than an invocation of GOMP_parallel.

Well, I meant keep #pragma omp simd as is, just add some data-sharing-like
clauses _simt_shared_(x) or whatever you need, then the omplower versioning
patch I've posted could e.g. drop those _simt_shared_ or whatever else you
need clauses for the omp simd without _simt_ clause, omp lowering then would
do whatever is needed for those _simt_shared_ clauses and finally omp
expansion would outline it.  Adding omp parallel around the omp simd is just
weird, it has nothing to do with omp parallel.

Jakub

Re: [PATCH/AARCH64] Improved -mcpu/mtune/march=native handling

2016-11-11 Thread Richard Earnshaw

On 11/11/16 02:56, Andrew Pinski wrote:
> As I mentioned in my other emails, parsing /proc/cpuinfo has one issue
> is that the current parsing assumes many different things about the
> format.  So the best way to do this is to parse
> /sys/devices/system/cpu/cpuN/regs/identification/midr_el1 files
> instead.  To get which cpu are present (though not necessarily online)
> we parse "/sys/devices/system/cpu/present" file.  We fall back to
> parsing /proc/cpu if any parsing fails of these files including not
> finding out which cpu we are on.  The main reason why we fall back is
> because only newer kernels support exporting this file.  To get the
> features I just look at the hwcap that the kernel passes to userspace
> so I needed to add an extra argument to AARCH64_OPT_EXTENSION.  I also
> had to define some HWCAP_* macros in driver-aarch64.c since older
> kernels headers don't have these values defined.
> 
> It should also be possible to parse
> /sys/devices/system/cpu/cpu%d/cache%d directory to get cache
> information too but that is left for another patch and another time.
> 
> Since I don't have access to a big.LITTLE system, someone should test
> there with a new enough kernel; I was using stock 4.9.0-rc3.
> 
> OK?  Bootstrapped and tested on ThunderX on aarch64-linux-gnu with no
> regressions and making sure /proc/cpuinfo is not read (by using
> strace).
> 
> Thanks,
> Andrew Pinski
> 
> ChangeLog:
> * config/aarch64/aarch64-option-extensions.def: Document extra
> argument to AARCH64_OPT_EXTENSION.  Update for the extra argument for
> all of the option extensions.
> * config/aarch64/driver-aarch64.c: Include sys/auxv.h and asm/hwcap.h.

GCC supports native builds on freebsd as well as linux.  Isn't this
going to break that?

R.

> (HWCAP_CRC32): Define if needed.
> (HWCAP_ATOMICS): Likewise.
> (HWCAP_FPHP): Likewise.
> (HWCAP_ASIMDHP): Likewise.
> (aarch64_arch_extension): New field hwcap_mask.
> (AARCH64_OPT_EXTENSION): Handle extra argument.
> (AARCH64_BIG_LITTLE): Always put the larger core number first.
> (valid_bL_core_p): Don't check AARCH64_BIG_LITTLE for the opposite
> order as it already handles the order.
> (implementor_from_midr): New function.
> (part_no_from_midr): New function.
> (sysfsformat): New define.
> (host_detect_local_cpu_sys): New function.
> (host_detect_local_cpu): Call host_detect_local_cpu_sys if opening
> "/sys/devices/system/cpu/present" file worked.
> * common/config/aarch64/aarch64-common.c (AARCH64_OPT_EXTENSION):
> Handle extra argument.
>

Re: [PATCH, gcc, wwwdocs] Document upcoming Qualcomm Falkor processor support

2016-11-11 Thread Richard Earnshaw

On 10/11/16 23:28, Gerald Pfeifer wrote:
> On Fri, 11 Nov 2016, Siddhesh Poyarekar wrote:
>> This patch documents the newly added flag in gcc 7 for the upcoming
>> Qualcomm Falkor processor core.
> 
> Looks good to me.  Probably a good idea for one of the ARM maintainers
> to sign off, too.
> 
> Gerald
> 

This is fine.  The list of supported CPU names is getting a bit
repetitive, but that can be cleaned up nearer the release.

R.

[fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)

2016-11-11 Thread Rainer Orth

Since the recent libsanitizer import, macOS 10.12 bootstrap is broken:

*  unconditionally uses the Blocks extension only support by
  Clang without the customary #if __BLOCKS__ guard:

In file included from 
/vol/gcc/src/hg/trunk/local/libsanitizer/sanitizer_common/sanitizer_mac.cc:39:0:
/usr/include/os/trace.h:204:15: error: expected unqualified-id before '^' token
 typedef void (^os_trace_payload_t)(xpc_object_t xdict);
   ^
/usr/include/os/trace.h:204:15: error: expected ')' before '^' token
In file included from /usr/include/Availability.h:184:0,
 from /usr/include/stdio.h:65,
 from 
/vol/gcc/src/hg/trunk/local/libsanitizer/sanitizer_common/sanitizer_mac.cc:21:

  To fix this, I wrap both the typedef and its single user.

*  uses the __attribute__((availability))
  extension unconditionally instead of wrapping it as is done in many
  other places:

In file included from /usr/include/Availability.h:184:0,
 from /usr/include/stdio.h:65,
 from 
/vol/gcc/src/hg/trunk/local/libsanitizer/sanitizer_common/sanitizer_mac.cc:21:
/var/gcc/regression/trunk/10.12-gcc/build/./gcc/include-fixed/os/trace.h:304:1: 
error: 'introduced' was not declared in this scope
 __API_AVAILABLE(macosx(10.12), ios(10.0), watchos(3.0), tvos(10.0))
 ^

  I'm wrapping the internal __API_[ADU] macros as is done elsewhere.

* I came across a dangling _EOFix_.

The patch passes fixincludes make check (this time for real ;-) and
restores macOS 10.12 bootstrap.

Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2016-11-10  Rainer Orth  

PR sanitizer/78267
* inclhack.def (darwin_availabilityinternal, darwin_os_trace_1)
(darwin_os_trace_2): New fixes.
(hpux_stdint_least_fast): Remove spurious _EOFix_.
* fixincl.x: Regenerate.
* tests/bases/AvailabilityInternal.h: New file.
* tests/bases/os/trace.h: New file.

# HG changeset patch
# Parent  38420045cf6aa616f517ca5fcfd15f2b55e68cf0
Fix macOS 10.12  and  (PR sanitizer/78267)

diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -1338,6 +1338,32 @@ fix = {
 };
 
 /*
+ *  macOS 10.12  uses __attribute__((availability))
+ *  unconditionally.
+ */
+fix = {
+hackname  = darwin_availabilityinternal;
+mach  = "*-*-darwin*";
+files = AvailabilityInternal.h;
+select= "#define[ \t]+(__API_[ADU]\\([^)]*\\)).*";
+c_fix = format;
+c_fix_arg = <<- _EOFix_
+	#if defined(__has_attribute)
+	  #if __has_attribute(availability)
+	%0
+	  #else
+	#define %1
+	  #endif
+	#else
+	#define %1
+	#endif
+	_EOFix_;
+
+test_text = "#define __API_A(x) __attribute__((availability(__API_AVAILABLE_PLATFORM_##x)))\n"
+		"#define __API_D(msg,x) __attribute__((availability(__API_DEPRECATED_PLATFORM_##x,message=msg)))";
+};
+
+/*
  *  For the AAB_darwin7_9_long_double_funcs fix to be useful,
  *  you have to not use "" includes.
  */
@@ -1410,6 +1436,44 @@ fix = {
 };
 
 /*
+ *  macOS 10.12  os_trace_payload_t typedef uses Blocks
+ *  extension without guard.
+ */
+fix = {
+  hackname  = darwin_os_trace_1;
+  mach  = "*-*-darwin*";
+  files = os/trace.h;
+  select= "typedef.*\\^os_trace_payload_t.*";
+  c_fix = format;
+  c_fix_arg = "#if __BLOCKS__\n%0\n#endif";
+  test_text = "typedef void (^os_trace_payload_t)(xpc_object_t xdict);";
+};
+
+/*
+ *  In macOS 10.12 , need to guard users of os_trace_payload_t
+ *  typedef, too.
+ */
+fix = {
+  hackname  = darwin_os_trace_2;
+  mach  = "*-*-darwin*";
+  files = os/trace.h;
+  select= <<- _EOSelect_
+	__API_.*
+	OS_EXPORT.*
+	.*
+	_os_trace.*os_trace_payload_t payload);
+	_EOSelect_;
+  c_fix = format;
+  c_fix_arg = "#if __BLOCKS__\n%0\n#endif";
+  test_text = <<- _EOText_
+	__API_AVAILABLE(macosx(10.10), ios(8.0), watchos(2.0), tvos(8.0))
+	OS_EXPORT OS_NOTHROW OS_NOT_TAIL_CALLED
+	void
+	_os_trace_with_buffer(void *dso, const char *message, uint8_t type, const void *buffer, size_t buffer_size, os_trace_payload_t payload);
+	_EOText_;
+};
+
+/*
  *  __private_extern__ doesn't exist in FSF GCC.  Even if it did,
  *  why would you ever put it in a system header file?
  */
@@ -2638,7 +2702,6 @@ fix = {
 c-fix-arg = "#  define	UINT_%164_MAX	__UINT64_MAX__";
 test-text = "#  define   UINT_FAST64_MAXULLONG_MAX\n"
 		"#  define   UINT_LEAST64_MAXULLONG_MAX\n";
-	_EOFix_;
 };
 
 /*
diff --git a/fixincludes/tests/base/AvailabilityInternal.h b/fixincludes/tests/base/AvailabilityInternal.h
new file mode 100644
--- /dev/null
+++ b/fixincludes/tests/base/AvailabilityInternal.h
@@ -0,0 +1,31 @@
+/*  DO NOT EDIT THIS FILE.
+
+It has been auto-edited by fixincludes from:
+
+	"fixinc/tests/inc/AvailabilityInternal.h"
+
+This had to be done to c

Re: [PATCH][AArch64] Tweak Cortex-A57 vector cost

2016-11-11 Thread Richard Earnshaw

On 10/11/16 17:10, Wilco Dijkstra wrote:
> The existing vector costs stop some beneficial vectorization.  This is mostly 
> due
> to vector statement cost being set to 3 as well as vector loads having a 
> higher
> cost than scalar loads.  This means that even when we vectorize 4x, it is 
> possible
> that the cost of a vectorized loop is similar to the scalar version, and we 
> fail
> to vectorize.  For example for a particular loop the costs for -mcpu=generic 
> are:
> 
> note: Cost model analysis: 
>   Vector inside of loop cost: 146
>   Vector prologue cost: 5
>   Vector epilogue cost: 0
>   Scalar iteration cost: 50
>   Scalar outside cost: 0
>   Vector outside cost: 5
>   prologue iterations: 0
>   epilogue iterations: 0
>   Calculated minimum iters for profitability: 1
> note:   Runtime profitability threshold = 3
> note:   Static estimate profitability threshold = 3
> note: loop vectorized
> 
> 
> While -mcpu=cortex-a57 reports:
> 
> note: Cost model analysis: 
>   Vector inside of loop cost: 294
>   Vector prologue cost: 15
>   Vector epilogue cost: 0
>   Scalar iteration cost: 74
>   Scalar outside cost: 0
>   Vector outside cost: 15
>   prologue iterations: 0
>   epilogue iterations: 0
>   Calculated minimum iters for profitability: 31
> note:   Runtime profitability threshold = 30
> note:   Static estimate profitability threshold = 30
> note: not vectorized: vectorization not profitable.
> note: not vectorized: iteration count smaller than user specified loop bound 
> parameter or minimum profitable iterations (whichever is more conservative).
> 
> 
> Using a cost of 3 for a vector operation suggests they are 3 times as
> expensive as scalar operations.  Since most vector operations have a 
> similar throughput as scalar operations, this is not correct.
> 
> Using slightly lower values for these heuristics now allows this loop
> and many others to be vectorized.  On a proprietary benchmark the gain
> from vectorizing this loop is around 15-30% which shows vectorizing it is
> indeed beneficial.
> 
> ChangeLog:
> 2016-11-10  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (cortexa57_vector_cost):
>   Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost.
> 

OK.

R.

> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -382,12 +382,12 @@ static const struct cpu_vector_cost 
> cortexa57_vector_cost =
>1, /* scalar_stmt_cost  */
>4, /* scalar_load_cost  */
>1, /* scalar_store_cost  */
> -  3, /* vec_stmt_cost  */
> +  2, /* vec_stmt_cost  */
>3, /* vec_permute_cost  */
>8, /* vec_to_scalar_cost  */
>8, /* scalar_to_vec_cost  */
> -  5, /* vec_align_load_cost  */
> -  5, /* vec_unalign_load_cost  */
> +  4, /* vec_align_load_cost  */
> +  4, /* vec_unalign_load_cost  */
>1, /* vec_unalign_store_cost  */
>1, /* vec_store_cost  */
>1, /* cond_taken_branch_cost  */
>

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-11 Thread Kyrill Tkachov



On 10/11/16 23:39, Segher Boessenkool wrote:

On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:

On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
some interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%


Wow, this looks really good.  Thank you for implementing this.  If I
get some time I am going to try it out on other processors than A72
but I doubt I have time any time soon.

I'd love to hear what causes the slowdown for gobmk as well, btw.


I haven't yet gotten a direct answer for that (through performance analysis 
tools)
but I have noticed that load/store pairs are not generated as aggressively as I 
hoped.
They are being merged by the sched fusion pass and peepholes (which runs after 
this)
but it still misses cases. I've hacked the SWS hooks to generate pairs 
explicitly and that
increases the number of pairs and helps code size to boot. It complicates the 
logic of
the hooks a bit but not too much.

I'll make those changes and re-benchmark, hopefully that
will help performance.

Thanks,
Kyrill



Segher

Re: [PATCH 1/2][AArch64] Add bfx attribute

2016-11-11 Thread Richard Earnshaw

On 10/11/16 17:11, Wilco Dijkstra wrote:
> Currently the SBFM, UBFM and BFM instructions all use the attribute "bfm".
> SBFM and UBFM include all shifts on AArch64, which are simpler than bitfield
> insert.  Add a new bfx attribute for these instructions so that they can be
> modelled more accurately in the future.  There is no difference in code 
> generation.
> 
> ChangeLog:
> 2016-11-10  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3)
>   Use bfx attribute.
>   (aarch64_lshr_sisd_or_int_3): Likewise.
>   (aarch64_ashr_sisd_or_int_3): Likewise.
>   (si3_insn_uxtw): Likewise.
>   (3_insn): Likewise.
>   (_ashl): Likewise.
>   (zero_extend_lshr): Likewise.
>   (extend_ashr): Likewise.
>   (): Likewise.
>   (insv): Likewise.
>   (andim_ashift_bfiz): Likewise.
>   * config/aarch64/thunderx.md (thunderx_shift): Add bfx.
>   * config/arm/cortex-a53.md (cortex_a53_alu_shift): Likewise.
>   * config/arm/cortex-a57.md (cortex_a57_alu): Add bfx.
>   * config/arm/exynos-m1.md (exynos_m1_alu): Add bfx.
>   (exynos_m1_alu_p): Likewise.
>   * config/arm/types.md: Add bfx.
>   * config/arm/xgene1.md (xgene1_bfm): Add bfx.
> 

OK.

R.

Re: [PATCH] Fix PR31096

2016-11-11 Thread Hurugalawadi, Naveen

Hi,

Sorry for a very late reply as the mail was missed or overlooked.

>> could now move the test  tree_expr_nonzero_p next to 
>> tree_expr_nonnegative_p (it is redundant for  the last case). 

Done.

>> Often just a comment can really help here. 

Comments updated as per the suggestion

>> when C is zero and verify this transformation doesn't fire on that case.

Updated test to check with zero.

>> verifying that the operand orders change appropriately when dealing 
>> with a negative constant.

Done.

>> verify nothing happens with floating point or vector types.

Done.

Please review the patch and let me know if any modifications are required.
Regression tested on X86 and AArch64.

Thanks,
Naveen

2016-11-11  Naveen H.S  
gcc
* fold-const.c (tree_expr_nonzero_p) : Make non-static.
* fold-const.h (tree_expr_nonzero_p) : Declare.
* match.pd (cmp (mult:c @0 @1) (mult:c @2 @1) : New Pattern.
* match.pd (cmp (mult:c @0 @1) (mult:c @2 @1) : New Pattern.
gcc/testsuite
* gcc.dg/pr31096.c: New testcase.
* gcc.dg/pr31096-1.c: New testcase.diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index e14471e..8f13807 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9015,7 +9015,7 @@ tree_expr_nonzero_warnv_p (tree t, bool *strict_overflow_p)
 /* Return true when T is an address and is known to be nonzero.
Handle warnings about undefined signed overflow.  */
 
-static bool
+bool
 tree_expr_nonzero_p (tree t)
 {
   bool ret, strict_overflow_p;
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 46dcd28..fbe1328 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -169,6 +169,7 @@ extern tree size_diffop_loc (location_t, tree, tree);
 #define non_lvalue(T) non_lvalue_loc (UNKNOWN_LOCATION, T)
 extern tree non_lvalue_loc (location_t, tree);
 
+extern bool tree_expr_nonzero_p (tree);
 extern bool tree_expr_nonnegative_p (tree);
 extern bool tree_expr_nonnegative_warnv_p (tree, bool *, int = 0);
 extern tree make_range (tree, int *, tree *, tree *, bool *);
diff --git a/gcc/match.pd b/gcc/match.pd
index 29ddcd8..eecfe23 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
zerop
CONSTANT_CLASS_P
tree_expr_nonnegative_p
+   tree_expr_nonzero_p
integer_valued_real_p
integer_pow2p
HONOR_NANS)
@@ -1017,7 +1018,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (convert (bit_and (bit_not @1) @0
 
+/* For integral types with undefined overflow and C != 0 fold
+   x * C EQ/NE y * C into x EQ/NE y.  */
+(for cmp (eq ne)
+ (simplify
+  (cmp (mult:c @0 @1) (mult:c @2 @1))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+   && tree_expr_nonzero_p (@1))
+   (cmp @0 @2
+
+/* For integral types with undefined overflow and C != 0 fold
+   x * C RELOP y * C into:
 
+   x RELOP y for nonnegative C
+   y RELOP x for negative C  */
+(for cmp (lt gt le ge)
+ (simplify
+  (cmp (mult:c @0 @1) (mult:c @2 @1))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+   (if (tree_expr_nonnegative_p (@1) && tree_expr_nonzero_p (@1))
+(cmp @0 @2)
+   (if (TREE_CODE (@1) == INTEGER_CST
+	&& wi::lt_p (@1, 0, TYPE_SIGN (TREE_TYPE (@1
+(cmp @2 @0))
 
 /* ((X inner_op C0) outer_op C1)
With X being a tree where value_range has reasoned certain bits to always be
diff --git a/gcc/testsuite/gcc.dg/pr31096-1.c b/gcc/testsuite/gcc.dg/pr31096-1.c
new file mode 100644
index 000..e681f0f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr31096-1.c
@@ -0,0 +1,51 @@
+/* PR middle-end/31096 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define zero(name, op) \
+int name (int a, int b) \
+{ return a * 0 op b * 0; }
+
+zero(zeq, ==) zero(zne, !=) zero(zlt, <)
+zero(zgt, >)  zero(zge, >=) zero(zle, <=)
+
+#define unsign_pos(name, op) \
+int name (unsigned a, unsigned b) \
+{ return a * 4 op b * 4; }
+
+unsign_pos(upeq, ==) unsign_pos(upne, !=) unsign_pos(uplt, <)
+unsign_pos(upgt, >)  unsign_pos(upge, >=) unsign_pos(uple, <=)
+
+#define unsign_neg(name, op) \
+int name (unsigned a, unsigned b) \
+{ return a * -2 op b * -2; }
+
+unsign_neg(uneq, ==) unsign_neg(unne, !=) unsign_neg(unlt, <)
+unsign_neg(ungt, >)  unsign_neg(unge, >=) unsign_neg(unle, <=)
+
+#define float(name, op) \
+int name (float a, float b) \
+{ return a * 5 op b * 5; }
+
+float(feq, ==) float(fne, !=) float(flt, <)
+float(fgt, >)  float(fge, >=) float(fle, <=)
+
+#define float_val(name, op) \
+int name (int a, int b) \
+{ return a * 54.0 op b * 54.0; }
+
+float_val(fveq, ==) float_val(fvne, !=) float_val(fvlt, <)
+float_val(fvgt, >)  float_val(fvge, >=) float_val(fvle, <=)
+
+#define vec(name, op) \
+int name (int a, int b) \
+{ int c[10]; return a * c[1] op b * c[1]; }
+
+vec(veq, ==) vec(vne, !=) vec(vlt, <)
+vec(vgt, >)  vec(vge, >=) vec(vle, <=)
+
+/* { dg-f

Re: [PATCH 2/2][AArch64] Add bfx attribute

2016-11-11 Thread Richard Earnshaw (lists)

On 10/11/16 17:14, Wilco Dijkstra wrote:
> The second patch updates the Cortex-A57 scheduler now that we can 
> differentiate
> between shifts and bitfield inserts.  The Cortex-A57 Software Optimization 
> Guide
> indicates that BFM operations use the integer multi-cycle pipeline, while ARM
> UXTB/H instructions use the Integer 1 or Integer 0 pipelines, so swap the bfm
> and extend reservations.  This results in minor scheduling differences.
> 
> I think the XGene-1 scheduler might need a similar change as currently all 
> AArch64
> shifts are modelled as 2-cycle operations.
> 
> ChangeLog:
> 2016-11-10  Wilco Dijkstra  
> 
>   * config/arm/cortex-a57.md (cortex_a57_alu): Move extend here, move 
> bfm...
>   (cortex_a57_alu_shift): ...here.
> 

OK.

R.

Re: [PATCH][AArch64] Improve TI mode address offsets

2016-11-11 Thread Richard Earnshaw

On 10/11/16 17:16, Wilco Dijkstra wrote:
> Improve TI mode address offsets - these may either use LDP of 64-bit or
> LDR of 128-bit, so we need to use the correct intersection of offsets.
> When splitting a large offset into base and offset, use a signed 9-bit 
> unscaled offset.
> 
> Remove the Ump constraint on movti and movtf instructions as this blocks
> the reload optimizer from merging address CSEs (is this supposed to work
> only on 'm' constraints?).  The result is improved codesize, especially
> wrf and gamess in SPEC2006.
> 
> 
> int f (int x)
> {
>   __int128_t arr[100];
>   arr[31] = 0;
>   arr[48] = 0;
>   arr[79] = 0;
>   arr[65] = 0;
>   arr[70] = 0;
>   return arr[x];
> }
> 
> Before patch (note the multiple redundant add x1, sp, 1024):
> sub sp, sp, #1600
> sbfiz   x0, x0, 4, 32
> add x1, sp, 256
> stp xzr, xzr, [x1, 240]
> add x1, sp, 768
> stp xzr, xzr, [x1]
> add x1, sp, 1024
> stp xzr, xzr, [x1, 240]
> add x1, sp, 1024
> stp xzr, xzr, [x1, 16]
> add x1, sp, 1024
> stp xzr, xzr, [x1, 96]
> ldr w0, [sp, x0]
> add sp, sp, 1600
> ret
> 
> After patch:
> sub sp, sp, #1600
> sbfiz   x0, x0, 4, 32
> add x1, sp, 1024
> stp xzr, xzr, [sp, 496]
> stp xzr, xzr, [x1, -256]
> stp xzr, xzr, [x1, 240]
> stp xzr, xzr, [x1, 16]
> stp xzr, xzr, [x1, 96]
> ldr w0, [sp, x0]
> add sp, sp, 1600
> ret
> 
> 
> Bootstrap & regress OK.
> 
> ChangeLog:
> 2015-11-10  Wilco Dijkstra  
> 
> gcc/
>   * config/aarch64/aarch64.md (movti_aarch64): Change Ump to m.
>   (movtf_aarch64): Likewise.
>   * config/aarch64/aarch64.c (aarch64_classify_address):
>   Use correct intersection of offsets.
>   (aarch64_legitimize_address_displacement): Use 9-bit signed offsets.
>   (aarch64_legitimize_address): Use 9-bit signed offsets for TI/TF mode.
>   Use 7-bit signed scaled mode for modes > 16 bytes.
> 
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 3045e6d6447d5c1860feb51708eeb2a21d2caca9..45f44e96ba9e9d3c8c41d977aa509fa13398a8fd
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4066,7 +4066,8 @@ aarch64_classify_address (struct aarch64_address_info 
> *info,
>instruction memory accesses.  */
> if (mode == TImode || mode == TFmode)
>   return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
> - && offset_9bit_signed_unscaled_p (mode, offset));
> + && (offset_9bit_signed_unscaled_p (mode, offset)
> + || offset_12bit_unsigned_scaled_p (mode, offset)));
>  
> /* A 7bit offset check because OImode will emit a ldp/stp
>instruction (only big endian will get here).
> @@ -4270,18 +4271,19 @@ aarch64_legitimate_address_p (machine_mode mode, rtx 
> x,
>  /* Split an out-of-range address displacement into a base and offset.
> Use 4KB range for 1- and 2-byte accesses and a 16KB range otherwise
> to increase opportunities for sharing the base address of different sizes.
> -   For TI/TFmode and unaligned accesses use a 256-byte range.  */
> +   For unaligned accesses and TI/TF mode use the signed 9-bit range.  */
>  static bool
>  aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode 
> mode)
>  {
> -  HOST_WIDE_INT mask = GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3fff;
> +  HOST_WIDE_INT offset = INTVAL (*disp);
> +  HOST_WIDE_INT base = offset & ~(GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3ffc);
>  
> -  if (mode == TImode || mode == TFmode ||
> -  (INTVAL (*disp) & (GET_MODE_SIZE (mode) - 1)) != 0)
> -mask = 0xff;
> +  if (mode == TImode || mode == TFmode
> +  || (offset & (GET_MODE_SIZE (mode) - 1)) != 0)
> +base = (offset + 0x100) & ~0x1ff;
>  
> -  *off = GEN_INT (INTVAL (*disp) & ~mask);
> -  *disp = GEN_INT (INTVAL (*disp) & mask);
> +  *off = GEN_INT (base);
> +  *disp = GEN_INT (offset - base);
>return true;
>  }
>  
> @@ -5148,12 +5150,10 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
> machine_mode mode)
> x = gen_rtx_PLUS (Pmode, base, offset_rtx);
>   }
>  
> -  /* Does it look like we'll need a load/store-pair operation?  */
> +  /* Does it look like we'll need a 16-byte load/store-pair operation?  
> */
>HOST_WIDE_INT base_offset;
> -  if (GET_MODE_SIZE (mode) > 16
> -   || mode == TImode)
> - base_offset = ((offset + 64 * GET_MODE_SIZE (mode))
> -& ~((128 * GET_MODE_SIZE (mode)) - 1));
> +  if (GET_MODE_SIZE (mode) > 16)
> + base_offset = (offset + 0x400) & ~0x7f0;
>/* For offsets aren't a multiple of the access size, the limit is
>-256...255.  */
>else if (offset & (GET_MODE_S

Re: [PATCH][ARM] PR78255: wrong code generation for indirect sibling calls

2016-11-11 Thread Andre Vieira (lists)

On 10/11/16 14:54, Andre Vieira (lists) wrote:
> Hi,
> 
> As reported in PR78255 there is currently an issue with indirect sibling
> calls in ARM when the address of the sibling call is loaded into 'r3'
> and that same register is chosen to align the stack.  See the report for
> further information.
> 
> As I mentioned in the bugzilla ticket I am not sure this is the right
> approach, though it works... Bootstrapped on ARM and no regressions.
> 
> Do you think this is OK? Another solution would be to make sure that
> 'arm_get_frame_offsets' recalculates offsets after we know that the call
> is going to be indirect, i.e. after we know the address is going to be
> loaded into a register, but I do not know what a sane way would be to
> ensure this.
> 
> Regards,
> Andre
> 
> gcc/ChangeLog
> 2016-11-10  Andre Vieira  
> 
> * config/arm/arm.md (sibcall_internal): Add 'use' to pattern.
> (sibcall_value_internal): Likewise.
> (sibcall_insn): Likewise.
> (sibcall_value_insn): Likewise.
> 
> 
> gcc/testsuite/ChangeLog
> 2016-11-10  Andre Vieira  
> 
> * gcc.target/arm/pr78255.c: New.
> 

I was looking at the bootstrap results of the wrong patch. This one
seems to break ARM bootstrap, I am looking into it...

Cheers,
Andre

Re: [Patch 3/5] OpenACC tile clause support, C/C++ front-end parts

2016-11-11 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 06:46:16PM +0800, Chung-Lin Tang wrote:
> 2016-XX-XX  Nathan Sidwell  
> 
> c/
> * c-parser.c (c_parser_omp_clause_collapse): Disallow tile.
> (c_parser_oacc_clause_tile): Disallow collapse. Fix parsing and
> semantic checking.
> * c-parser.c (c_parser_omp_for_loop): Accept tiling constructs.
> 
> cp/
>   * parser.c (cp_parser_oacc_clause_tile): Disallow collapse.  Fix
> parsing.  Parse constant expression. Remove semantic checking.
> (cp_parser_omp_clause_collapse): Disallow tile.
> (cp_parser_omp_for_loop): Deal with tile clause.  Don't emit a

Similarly to the previous patch, some lines have spaces instead of tabs.

>   parse error about missing for after already emitting one.
>   Use more conventional for idiom for unbounded loop.
>   * pt.c (tsubst_omp_clauses): Require integral constant expression
>   for COLLAPSE and TILE.  Remove broken TILE subst.
>   * semantics.c (finish_omp_clauses): Correct TILE semantic check.
>   (finish_omp_for): Deal with tile clause.
> 
> gcc/testsuite/
> * c-c++-common/goacc/loop-auto-1.c: Adjust and add additional
> case.
> * c-c++-common/goacc/loop-auto-2.c: New.
> * c-c++-common/goacc/tile.c: Include stdbool, fix expected errors.
> * g++.dg/goacc/template.C: Test tile subst.  Adjust erroneous
> uses.
>   * g++.dg/goacc/tile-1.C: Check tile subst.
>   * gcc.dg/goacc/loop-processing-1.c: Adjust dg-final pattern.

> +   if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
> +   || TREE_CODE (expr) != INTEGER_CST

No need to test for INTEGER_CST, tree_fits_shwi_p will test that.

> +   || !tree_fits_shwi_p (expr)
> +   || tree_to_shwi (expr) <= 0)
>   {
> -   warning_at (expr_loc, 0,"% value must be positive");
> -   expr = integer_one_node;
> +   error_at (expr_loc, "% argument needs positive"
> + " integral constant");
> +   expr = integer_zero_node;
>   }
>   }

> @@ -14713,6 +14713,7 @@ tsubst_omp_clauses (tree clauses, enum c_omp_regio
>nc = copy_node (oc);
>OMP_CLAUSE_CHAIN (nc) = new_clauses;
>new_clauses = nc;
> +  bool needs_ice = false;
>  
>switch (OMP_CLAUSE_CODE (nc))
>   {
> @@ -14742,10 +14743,16 @@ tsubst_omp_clauses (tree clauses, enum c_omp_regio
>   = tsubst_omp_clause_decl (OMP_CLAUSE_DECL (oc), args, complain,
> in_decl);
> break;
> + case OMP_CLAUSE_COLLAPSE:
> + case OMP_CLAUSE_TILE:
> +   /* These clauses really need a positive integral constant
> +  expression, but we don't have a predicate for that
> +  (yet).  */
> +   needs_ice = true;
> +   /* FALLTHRU */

As I said earlier on gcc-patches, no need to change anything for
OMP_CLAUSE_COLLAPSE, we require that the argument is a constant integer
already at parsing time, it can't be e.g. a template integral parameter.
And for OMP_CLAUSE_TILE, please avoid the needs_ice var, instead don't fall
through into the tsubst_expr and copy it over and change the argument there
instead, it is short enough.

> +   if (TREE_CODE (t) != INTEGER_CST
> +   || !tree_fits_shwi_p (t)

Again, no need to check for INTEGER_CST when tree_fits_shwi_p will do that.

Jakub

Re: [PATCH][ARM] Improve max_insns_skipped logic

2016-11-11 Thread Richard Earnshaw

On 10/11/16 17:19, Wilco Dijkstra wrote:
> Improve the logic when setting max_insns_skipped.  Limit the maximum size of 
> IT
> to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
> increasing codesize.  
> 

You don't provide any information about what benefits this brings.

> Given 4 works well for Thumb-2, use the same limit for ARM
> for consistency.

Why?  Logic might suggest that given thumb has to execute an IT
instruction first, then allowing ARM to have one more conditional
instruction is the best answer.

R.

> ChangeLog:
> 2016-11-04  Wilco Dijkstra  
> 
> * config/arm/arm.c (arm_option_params_internal): Improve setting of
> max_insns_skipped.
> --
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
>targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
>  }
>  
> -  if (optimize_size)
> -{
> -  /* If optimizing for size, bump the number of instructions that we
> - are prepared to conditionally execute (even on a StrongARM).  */
> -  max_insns_skipped = 6;
> +  /* Increase the number of conditional instructions with -Os.  */
> +  max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped;
>  
> -  /* For THUMB2, we limit the conditional sequence to one IT block.  */
> -  if (TARGET_THUMB2)
> -max_insns_skipped = arm_restrict_it ? 1 : 4;
> -}
> -  else
> -/* When -mrestrict-it is in use tone down the if-conversion.  */
> -max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
> -  ? 1 : current_tune->max_insns_skipped;
> +  /* For THUMB2, we limit the conditional sequence to one IT block.  */
> +  if (TARGET_THUMB2)
> +max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
>  }
>  
>  /* True if -mflip-thumb should next add an attribute for the default
>

Re: [Patch 4/5] OpenACC tile clause support, Fortran front-end parts

2016-11-11 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 06:46:46PM +0800, Chung-Lin Tang wrote:
> 
> Thanks,
> Chung-Lin
> 
> 2016-XX-XX  Cesar Philippidis  
> 
>   fortran/
>   * openmp.c (resolve_oacc_positive_int_expr): Promote the warning
>   to an error.
>   (resolve_oacc_loop_blocks): Use integer zero to represent the '*'
>   tile argument.
>   (resolve_omp_clauses): Error on directives containing both tile
>   and collapse clauses.
>   * trans-openmp.c (gfc_trans_omp_do): Lower tiled loops like
>   collapsed loops.
> 
> 
>   gcc/testsuite/
>   * gfortran.dg/goacc/loop-2.f95: Change expected tile clause
> warnings to errors.
> * gfortran.dg/goacc/loop-5.f95: Likewise.
> * gfortran.dg/goacc/sie.f95: Likewise.
> * gfortran.dg/goacc/tile-1.f90: New test.
> * gfortran.dg/goacc/tile-2.f90: New test
> * gfortran.dg/goacc/tile-lowering.f95: New test.

Again, 8 spaces in ChangeLog.  Missing full stop after New test

> --- fortran/openmp.c  (revision 241809)
> +++ fortran/openmp.c  (working copy)
> @@ -3024,8 +3024,8 @@ resolve_oacc_positive_int_expr (gfc_expr *expr, co
>resolve_oacc_scalar_int_expr (expr, clause);
>if (expr->expr_type == EXPR_CONSTANT && expr->ts.type == BT_INTEGER
>&& mpz_sgn(expr->value.integer) <= 0)
> -gfc_warning (0, "INTEGER expression of %s clause at %L must be positive",
> -  clause, &expr->where);
> +gfc_error ("INTEGER expression of %s clause at %L must be positive",
> +clause, &expr->where);
>  }

This can't be against current trunk.  The current routine is shared with
OpenMP and gfc_error is undesirable there.

> @@ -3859,6 +3859,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_claus
>  if (omp_clauses->wait_list)
>for (el = omp_clauses->wait_list; el; el = el->next)
>   resolve_oacc_scalar_int_expr (el->expr, "WAIT");
> +  if (omp_clauses->collapse && omp_clauses->tile_list)
> +gfc_error ("Incompatible use of TILE and COLLAPSE at %L", &code->loc);

Shouldn't you in that case for error recovery clear collapse (or tile_list)?

Jakub

[PATCH 0/3] MIPS/GCC: Changes for `.insn' assembly annotation

2016-11-11 Thread Maciej W. Rozycki

Hi,

 This small patch series addresses an issue uncovered by a recent binutils 
master branch update, scheduled for the upcoming 2.28 release, where we 
fail to annotate stray code labels -- generally produced for code marked 
as unreachable -- that have no instruction following, with the `.insn' 
pseudo-op, as explicitly required with the microMIPS ISA and also needed 
with MIPS16 code.  The missing annotation causes assembly failures if such 
a label is a target of a branch.

 As updates have turned out to be required to our test harness I have 
prepared separate self-contained changes, comprising this series.  See the 
individual submissions for detailed patch descriptions.

  Maciej

Re: [Patch 5/5] OpenACC tile clause support, libgomp testsuite patches

2016-11-11 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 06:47:07PM +0800, Chung-Lin Tang wrote:
> Some additional tests and adjustments to existing ones were made.
> 
> 2016-XX-XX  Nathan Sidwell  
>   Chung-Lin Tang  
> 
> libgomp/
> * testsuite/libgomp.oacc-c-c++-common/tile-1.c: New.
>   * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust and
> add additional case.
>   * testsuite/libgomp.oacc-c-c++-common/vprop.c: XFAIL under
> "openacc_nvidia_accel_selected".
> * libgomp.oacc-fortran/nested-function-1.f90 (test2): Add 
> num_workers(8)
> clause.

Again, please fix up the whitespace in ChangeLog.  Ok with that fixed
(once the other patches are in of course).

Jakub

[PATCH 1/3] MIPS/GCC/test: Implement `-mmicromips' option test

2016-11-11 Thread Maciej W. Rozycki

Add an assembly snippet for `mips_option_tests' to verify that the 
target board can indeed run microMIPS code, support for which is 
optional in the MIPS architecture.

Unlike with the `-mips16' option test for the MIPS16 ASE do not rely on 
a function attribute to switch to the regular MIPS mode first in the 
wrapper arranged by `mips_first_unsupported_option' -- like with the 
`nomips16' attribute used there -- because support for regular MIPS code 
is optional for microMIPS processors.  Consequently microMIPS execution 
may be all that the target board supports and therefore whatever the 
instruction encoding for the target board -- between the regular MIPS 
and the microMIPS sets -- has been selected as the default in board 
options, it has to be respected.  Instead of a function attribute use 
`.set push' and `.set pop' pseudo-ops around `.set micromips' then, 
ensuring the test code is assembled using the microMIPS instruction 
encoding, regardless of what encoding is used for the surrounding code, 
and making sure said surrounding code is consistently assembled.

Use the JRADDIUSP instruction for the actual check as an additional 
safety measure, as this hardware instruction is only present in the 
microMIPS encoding.  The option test otherwise corresponds to the 
`-mips16' one, as the ISA bit has to be similarly set and restored.

gcc/testsuite/
* gcc.target/mips/mips.exp (mips_option_tests): Add 
`-mmicromips' array element.
---
 NB it looks to me like some of the other tests ought to be using `.set 
push' and `.set pop' too, as it's generally unsafe to leave ISA/ASE 
setting overrides behind at the conclusion of an inline asm for the 
assembler to continue using with all the compiler-generated code which 
follows.

 OK to apply?

  Maciej

gcc-mips-test-option-tests-mmicromips.diff
Index: gcc/gcc/testsuite/gcc.target/mips/mips.exp
===
--- gcc.orig/gcc/testsuite/gcc.target/mips/mips.exp 2016-10-25 
05:37:13.0 +0100
+++ gcc/gcc/testsuite/gcc.target/mips/mips.exp  2016-11-09 17:23:15.985788415 
+
@@ -360,6 +360,19 @@ set mips_option_tests(-mips16) {
 jalr $3
 move $31,$2
 }
+set mips_option_tests(-mmicromips) {
+move $2,$31
+bal 1f
+.set push
+.set micromips
+jraddiusp 0
+.set pop
+.align 2
+1:
+ori $3,$31,1
+jalr $3
+move $31,$2
+}
 set mips_option_tests(-mpaired-single) {
 .set mips64
 lui $2,0x3f80

[PATCH 2/3] MIPS/GCC/test: Implement `-mcode-readable=yes' option test

2016-11-11 Thread Maciej W. Rozycki

Add an assembly snippet for `mips_option_tests' to verify that 
instructions executed on the target board may freely access executable 
sections.

Depending on target board's execution environment, which may map 
executable pages with the Read Inhibit (RI) bit set in the TLB or may 
have a split SPRAM memory which directs instruction fetch and data read 
accesses to different parts of the memory, code may or may not be able 
to access parts of executable sections as data.

Therefore verify that data reads from executable sections both complete 
successfully and that correct data is retrieved (though with a small 
probability of a false positive).  Use MIPS16 execution to access data 
required as the `-mcode-readable=yes' option only affects MIPS16 code 
generation anyway, which however simplifies handling a little bit as 
PC-relative addressing can be used to calculate the location of data 
required, avoiding any issues if using the PIE or PIC code generation 
model.

Use the BREAK instruction to trap on a comparison check failure so that 
the effect is immediate and there is no need to wait for the test 
program to time out; resort to an infinite loop though in the unlikely 
event a Bp exception handler resumes execution beyond the trapping 
instruction.

NB there is no need to explicitly align data emitted with the `.word' 
pseudo-op, which is self-aligning.

gcc/testsuite/
* gcc.target/mips/mips.exp (mips_option_tests): Add 
`-mcode-readable=yes' array element.
---
 OK to apply?

  Maciej

gcc-mips-test-option-tests-mcode-readable-yes.diff
Index: gcc/gcc/testsuite/gcc.target/mips/mips.exp
===
--- gcc.orig/gcc/testsuite/gcc.target/mips/mips.exp 2016-11-09 
18:56:48.283197623 +
+++ gcc/gcc/testsuite/gcc.target/mips/mips.exp  2016-11-11 08:49:53.345912380 
+
@@ -401,6 +401,27 @@ set mips_option_tests(-mdspr2) {
 .set dspr2
 prepend $2,$3,11
 }
+set mips_option_tests(-mcode-readable=yes) {
+move $2,$31
+bal 1f
+.set mips16
+la $3,0f
+lw $3,($3)
+jr $31
+0:
+.word 0xfacebead
+.set nomips16
+.align 2
+1:
+ori $3,$31,1
+jalr $3
+li $4,0xfacebead
+beq $3,$4,2f
+break
+b .
+2:
+move $31,$2
+}
 
 # Canonicalize command-line option OPTION.
 proc mips_canonicalize_option { option } {

Re: [PATCH] S390: Fix PR/77822.

2016-11-11 Thread Andreas Krebbel

On 11/11/2016 10:42 AM, Matthias Klose wrote:
> On 11.11.2016 09:58, Andreas Krebbel wrote:
>> On 11/08/2016 03:38 PM, Dominik Vogt wrote:
>>> The attached patch fixes PR/77822 on s390/s390x dor gcc-6 *only*.
>>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822
>>>
>>> Bootstrapped and regression tested on s390 and s390x biarch on a
>>> zEC12.
>>>
>>> For gcc-7, there will be a different patch.
>>
>> Applied to GCC 6 branch.  Thanks!
>> Please remember adding the PR number to the changelog entries to trigger 
>> bugzilla adding a comment
>> to the PR.
>>
>> As discussed offlist the range check for the position operand could be moved 
>> to a predicate.  This
>> will be part of the GCC head patch.
>>
>> I've just noticed that I had such checks already for the insv patterns and 
>> have added one to the
>> expander as well later. So for zero_extract as target operand this appeared 
>> to be a problem even
>> before GCC 6.
> 
> the gcc-6-branch now has the ChangeLog entry for gcc.target/s390/pr77822.c but
> not the test case.
> 
> Matthias
> 
Fixed.

[PATCH 3/3] MIPS/GCC: Mark trailing labels with `.insn'

2016-11-11 Thread Maciej W. Rozycki

Mark trailing labels, either at the end of a function or preceding a 
MIPS16 constant pool, with the `.insn' assembly pseudo-op so that they
are considered code rather than data labels and consequently have ISA
annotation added in the symbol table by the assembler.

Such labels are created for some cases of unreachable code like with the 
following example:

$ cat switch.c
int
foo (int i)
{
  static int j;

  j += i;
  switch (i)
{
case -5:
  return -2;
case -3:
  return -1;
case 0:
  return 0;
case 3:
  return 1;
case 5:
  break;
default:
  __builtin_unreachable ();
}
  return j;
}
$ gcc -S -mips16 -O2 -o switch-mips16.s switch.c
$ cat switch-mips16.s
.file   1 "switch.c"
.section .mdebug.abi32
.previous
.nanlegacy
.module fp=xx
.module nooddspreg
.abicalls
.option pic0
.text
.align  2
.globl  foo
.setmips16
.setnomicromips
.entfoo
.type   foo, @function
foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
lw  $3,$L11
lw  $2,0($3)
addu$2,$4,$2
sw  $2,0($3)
addiu   $3,$4,5
sltu$3, 11
bteqz   $L2
sll $5, $3, 1
la  $3, $L4
addu$5, $3, $5
lh  $5, 0($5)
addu$3, $3, $5
j   $3
.align  1
.align  2
$L4:
.half   $L9-$L4
.half   $L2-$L4
.half   $L5-$L4
.half   $L2-$L4
.half   $L2-$L4
.half   $L6-$L4
.half   $L2-$L4
.half   $L2-$L4
.half   $L7-$L4
.half   $L2-$L4
.half   $L1-$L4
$L9:
li  $2,2
neg $2,$2
$L1:
jr  $31
$L7:
.setnoreorder
.setnomacro
jr  $31
li  $2,1
.setmacro
.setreorder

$L6:
.setnoreorder
.setnomacro
jr  $31
move$2,$4
.setmacro
.setreorder

$L5:
li  $2,1
.setnoreorder
.setnomacro
jr  $31
neg $2,$2
.setmacro
.setreorder

$L2:
.align  2
$L11:
.word   j.1474
.endfoo
.size   foo, .-foo
.local  j.1474
.comm   j.1474,4,4
.ident  "GCC: (GNU) 7.0.0 20160810 (experimental)"
$ 

where `$L2' is placed before MIPS16 constant pool data, or:

$ gcc -S -mmicromips -O2 -o switch-micromips.s switch.c
$ cat switch-micromips.s
.file   1 "switch.c"
.section .mdebug.abi32
.previous
.nanlegacy
.module fp=xx
.module nooddspreg
.abicalls
.option pic0
.text
.align  2
.globl  foo
.setnomips16
.setmicromips
.entfoo
.type   foo, @function
foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.setnomacro
lui $5,%hi(j.1401)
addiu   $3,$4,5
lw  $2,%lo(j.1401)($5)
sltu$6,$3,11
addu$2,$4,$2
beqz$6,$L2
sw  $2,%lo(j.1401)($5)

lui $5,%hi($L4)
addiu   $5,$5,%lo($L4)
lwxs$3,$3($5)
jrc $3
.rdata
.align  2
.align  2
$L4:
.word   $L9
.word   $L2
.word   $L5
.word   $L2
.word   $L2
.word   $L6
.word   $L2
.word   $L2
.word   $L7
.word   $L2
.word   $L1
.text
$L9:
li  $2,-2   # 0xfffe
$L1:
jrc $31
$L7:
jr  $31
li  $2,1# 0x1

$L6:
jr  $31
move$2,$4

$L5:
jr  $31
li  $2,-1   # 0x

$L2:
.setmacro
.setreorder
.endfoo
.size   foo, .-foo
.local  j.1401
.comm   j.1401,4,4
.ident  "GCC: (GNU) 7.0.0 20160810 (experimental)"

where the same label is placed at the end of a function.  This in turn 
makes recent trunk versions of gas complain:

$ as -o switch-mips16.o switch-mips16.s
switch-mips16.s: Assembler messages:
switch-mips16.s:26: Error: branch to a symbol in another ISA mode
$ 

as commit 9d862524f6ae ("MIPS: Verify the ISA mode and alignment of 
branch and jump targets") closed a hole in branch processing, making 
relocation calculation respect the ISA mode of the symbol referred.  
This allowed diagnosing the situation where an attempt is made to pass 
control from code assembled for one ISA mode to code assembled for a 
different ISA mode and either relaxing the branch to

Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-11 Thread Martin Liška

On 11/10/2016 06:31 PM, Nathan Sidwell wrote:
> On 11/10/2016 08:24 AM, Martin Liška wrote:
>> On 11/10/2016 05:17 PM, David Edelsohn wrote:
>>> Maybe instead of adding "maybe", we need to change the severity of the
>>> warning so that the warning is not emitted by default.
>>
>> Adding the warning option to -Wextra can be solution. Is it acceptable
>> approach?
> 
> I don't think that's good.  Now I understand the -pthreads thing, we have 
> different use cases.
> 
> 1) user explicitly said -fprofile-update=FOO.  They shouldn't have to enable 
> something else to get a diagnostic that FOO doesn't work.
> 
> 2) driver implicitly said -fprofile-update=FOO, because the user said 
> -pthreads but the driver doesn't know if FOO is acceptable.  We want to 
> silently fallback to the old behaviour.
> 
> The proposed solution addresses #2 by having the driver say 
> -fprofile-update=META-FOO.  My dislike is that we're exposing this to the 
> user and they're going to start using it.  That strikes me as undesirable.
> 
> How hard is it to implement the fprofile-update option value as a list. I.e. 
> '-fprofile-update=atomic,single', with semantics of 'pick the first one you 
> can do'? If that's straightforwards, then that seems to me as a better 
> solution for #2. [flyby-thought, have 'atomic,single' as an acceptable single 
> option value?]

Hello.

We use lists like for -fsanitize=address,undefined, however as -fprofile-update 
has only 3 (and passing 'single,atomic' does not make sense), I would prefer
to s/maybe-atomic/prefer-atomic. I guess handling the option list in gcc.c and 
doing substitutions would be very inconvenient.

Thanks,
MArtin

> 
> Failing that, Martin's solution is probably the sanest available solution, 
> but I'd like to rename 'maybe-atomic' to the more meaningful 'prefer-atomic'. 
>  With 'maybe-atomic', I'm left wondering if it looks at the phase of the moon.
> 
> nathan
>

Re: [PATCH][ARM] Fix ldrd offsets

2016-11-11 Thread Wilco Dijkstra

Ramana Radhakrishnan wrote:
> On Thu, Nov 3, 2016 at 12:20 PM, Wilco Dijkstra  
> wrote:

>   HOST_WIDE_INT val = INTVAL (index);
> - /* ??? Can we assume ldrd for thumb2?  */
> - /* Thumb-2 ldrd only has reg+const addressing modes.  */
> - /* ldrd supports offsets of +-1020.
> -    However the ldr fallback does not.  */
> - return val > -256 && val < 256 && (val & 3) == 0;
> + /* Thumb-2 ldrd only has reg+const addressing modes.
> +    Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
> +    If vldr is selected it uses arm_coproc_mem_operand.  */
> + if (TARGET_LDRD)

> I suspect this should be : if (TARGET_LDRD && !fix_cm3_ldrd)  - I am a
> bit worried about this change because of the non-uniformity with ldr
> and the fallout with other places where things may break with this.  I
> would like a test with -mcpu=cortex-m3/-mthumb as well for an
> arm-none-eabi target to see what the fallout of this change is on that

Well it works fine given that Thumb-2 supports add/sub up to 4KB, so
the existing expansion into add+ldrd for the fix_cm3_ldrd case works fine.
I ran a bootstrap with fix_cm3_ldrd forced to true, and that completed without
any issues.

Wilco

Re: [PATCH] Do not simplify "(and (reg) (const bit))" to if_then_else.

2016-11-11 Thread Dominik Vogt

On Mon, Nov 07, 2016 at 09:29:26PM +0100, Bernd Schmidt wrote:
> On 10/31/2016 08:56 PM, Dominik Vogt wrote:
> 
> >combine_simplify_rtx() tries to replace rtx expressions with just two
> >possible values with an experession that uses if_then_else:
> >
> >  (if_then_else (condition) (value1) (value2))
> >
> >If the original expression is e.g.
> >
> >  (and (reg) (const_int 2))
> 
> I'm not convinced that if_then_else_cond is the right place to do
> this. That function is designed to answer the question of whether an
> rtx has exactly one of two values and under which condition; I feel
> it should continue to work this way.
> 
> Maybe simplify_ternary_expression needs to be taught to deal with this case?

But simplify_ternary_expression isn't called with the following
test program (only tried it on s390x):

  void bar(int, int); 
  int foo(int a, int *b) 
  { 
if (a) 
  bar(0, *b & 2); 
return *b; 
  } 

combine_simplify_rtx() is called with 

  (sign_extend:DI (and:SI (reg:SI 61) (const_int 2)))

In the switch it calls simplify_unary_operation(), which return
NULL.  The next thing it does is call if_then_else_cond(), and
that calls itself with the sign_extend peeled off:

  (and:SI (reg:SI 61) (const_int 2))

takes the "BINARY_P (x)" path and returns false.  The problem
exists only if the (and ...) is wrapped in ..._extend, i.e. the
ondition dealing with (and ...) directly can be removed from the
patch.

So, all recursive calls to if_then_els_cond() return false, and
finally the condition in

else if (HWI_COMPUTABLE_MODE_P (mode) 
   && pow2p_hwi (nz = nonzero_bits (x, mode))

is true.

Thus, if if_then_else_cond should remain unchanged, the only place
to fix this would be after the call to if_then_else_cond() in
combine_simplify_rtx().  Actually, there already is some special
case handling to override the return code of if_then_else_cond():

  cond = if_then_else_cond (x, &true_rtx, &false_rtx); 
  if (cond != 0 
  /* If everything is a comparison, what we have is highly unlikely 
 to be simpler, so don't use it.  */ 
--->  && ! (COMPARISON_P (x) 
&& (COMPARISON_P (true_rtx) || COMPARISON_P (false_rtx 
{ 
  rtx cop1 = const0_rtx; 
  enum rtx_code cond_code = simplify_comparison (NE, &cond, &cop1); 
 
--->  if (cond_code == NE && COMPARISON_P (cond)) 
return x; 
  ...

Should be easy to duplicate the test in the if-body, if that is
what you prefer:

  ...
  if (HWI_COMPUTABLE_MODE_P (GET_MODE (x)) 
  && pow2p_hwi (nz = nonzero_bits (x, GET_MODE (x))) 
  && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND) 
&& GET_CODE (XEXP (x, 0)) == AND 
&& CONST_INT_P (XEXP (XEXP (x, 0), 0)) 
&& UINTVAL (XEXP (XEXP (x, 0), 0)) == nz)) 
return x; 

(untested)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-11 Thread Yuri Rumyantsev

Richard,

I prepare updated 3 patch with passing additional argument to
vect_analyze_loop as you proposed (untested).

You wrote:
tw, I wonder if you can produce a single patch containing just
epilogue vectorization, that is combine patches 1-3 but rip out
changes only needed by later patches?

Did you mean that I exclude all support for vectorization epilogues,
i.e. exclude from 2-nd patch all non-related changes
like

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 11863af..32011c1 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
   LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
   LOOP_VINFO_PEELING_FOR_NITER (res) = false;
   LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
+  LOOP_VINFO_CAN_BE_MASKED (res) = false;
+  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
+  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
+  LOOP_VINFO_MASK_EPILOGUE (res) = false;
+  LOOP_VINFO_NEED_MASKING (res) = false;
+  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;

Did you mean also that new combined patch must be working patch, i.e.
can be integrated without other patches?

Could you please look at updated patch?

Thanks.
Yuri.

2016-11-10 15:36 GMT+03:00 Richard Biener :
> On Thu, 10 Nov 2016, Richard Biener wrote:
>
>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>
>> > Richard,
>> >
>> > Here is updated 3 patch.
>> >
>> > I checked that all new tests related to epilogue vectorization passed with 
>> > it.
>> >
>> > Your comments will be appreciated.
>>
>> A lot better now.  Instead of the ->aux dance I now prefer to
>> pass the original loops loop_vinfo to vect_analyze_loop as
>> optional argument (if non-NULL we analyze the epilogue of that
>> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> original vectorization factor?  So we can pass down an (optional)
>> forced vectorization factor as well?
>
> Btw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
>
> Thanks,
> Richard.
>
>> Richard.
>>
>> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> > >
>> > >> Hi Richard,
>> > >>
>> > >> I did not understand your last remark:
>> > >>
>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> > >> >
>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> > >> >   && dump_enabled_p ())
>> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> > >> >"loop vectorized\n");
>> > >> > -   vect_transform_loop (loop_vinfo);
>> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>> > >> > num_vectorized_loops++;
>> > >> >/* Now that the loop has been vectorized, allow it to be 
>> > >> > unrolled
>> > >> >   etc.  */
>> > >> >  loop->force_vectorize = false;
>> > >> >
>> > >> > +   /* Add new loop to a processing queue.  To make it easier
>> > >> > +  to match loop and its epilogue vectorization in dumps
>> > >> > +  put new loop as the next loop to process.  */
>> > >> > +   if (new_loop)
>> > >> > + {
>> > >> > +   loops.safe_insert (i + 1, new_loop->num);
>> > >> > +   vect_loops_num = number_of_loops (cfun);
>> > >> > + }
>> > >> >
>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> > >> f> unction which will set up stuff properly (and also perform
>> > >> > the if-conversion of the epilogue there).
>> > >> >
>> > >> > That said, if we can get in non-masked epilogue vectorization
>> > >> > separately that would be great.
>> > >>
>> > >> Could you please clarify your proposal.
>> > >
>> > > When a loop was vectorized set things up to immediately vectorize
>> > > its epilogue, avoiding changing the loop iteration and avoiding
>> > > the re-use of ->aux.
>> > >
>> > > Richard.
>> > >
>> > >> Thanks.
>> > >> Yuri.
>> > >>
>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> > >> >
>> > >> >> Hi All,
>> > >> >>
>> > >> >> I re-send all patches sent by Ilya earlier for review which support
>> > >> >> vectorization of loop epilogues and loops with low trip count. We
>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was 
>> > >> >> not
>> > >> >> approved by Jeff.
>> > >> >>
>> > >> >> I did re-base of all patches and performed bootstrapping and
>> > >> >> regression testing that did not show any new failures. Also all
>> > >> >> changes related to new vect_do_peeling algorithm have been changed
>> > >> >> accordingly.
>> > >> >>
>> > >> >> Is it OK for trunk?
>> > >> >
>> > >> > I would have prefered that the series up to -03-nomask-tails would
>> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
>> > >> > the patchset is oddly separated.
>> > >> >
>> > >> > I have a comment on that part nevertheless:
>> > >> >
>> >

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-11 Thread Uros Bizjak

On Thu, Nov 10, 2016 at 6:18 PM, Andrew Senkevich
 wrote:
> 2016-11-10 19:36 GMT+03:00 Jakub Jelinek :
>> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>>> Hi,
>>>
>>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>>
>>> It requires additional patch for register allocator from Vladimir
>>> Makarov to be committed before.
>>
>> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
>> tabs), can you repost as attachment or configure your MUA not to do this?
>>
>> Just a couple of random nits follow:
>>
>>> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
>>
>> This mentions an option that doesn't exist, is that s/dd// ?
>
> Yes.
> Attached fixed version.

A couple of questions and comments below.

You are introducing flag2 ABI option flags. There are no tests for
corresponding __target__ attribute, please add some tests, similar to
gcc.target/i386/funcspec-?.c. These can be in a follow-up patch.

Please add new option to g++.dg/other/i386-{2,3}.C tests. These are
like gcc.target/i386/sse-{22,23}.c for c++.

Also, I guess we want to support these new options with
__builtin_cpu_supports. Please add this functionality in a follow-up
patch.

+(define_register_constraint "h" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS"
+ "Any EVEX encodable SSE register, which has number factor of four.")
+
No, we are extremely low on a single-letter constraints. We will use
these for possible future new register sets. Use Yv or something
similar instead.

+//additional structure for isa flags

Please use c comments throughout the patch.

@@ -1465,11 +1472,14 @@ enum reg_class
 {   0x11,0x1fe0,0x0 },   /* FLOAT_INT_REGS */\
 { 0x1ff100ff,0xffe0,   0x1f },   /* INT_SSE_REGS */  \
 { 0x1ff1,0xffe0,   0x1f },   /* FLOAT_INT_SSE_REGS */\
-   { 0x0,   0x0, 0x1fc0 },   /* MASK_EVEX_REGS */   \
+   { 0x0,   0x0, 0x1fc0 },   /* MASK_EVEX_REGS */\
{ 0x0,   0x0, 0x1fe0 },   /* MASK_REGS */ \
-{ 0x,0x,0x1 }\
+{ 0x1fe0,0xe000,   0x1f },   /* MOD4_SSE_REGS */ \
+{ 0x,0x,0x1 }\
 }

+/* { 0x0220,0x2000,   0x02 },*/   /* MOD4_SSE_REGS */
+

Please remove commented out code. Also, please fix whitespace at the new entry.

+mavx5124fmaps
+Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124FMAPS built-in functions and code generation.
+
+mavx5124vnniw
+Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124VNNIW built-in functions and code generation.

Too much "and"s in the description.

--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
--- a/gcc/machmode.h
+++ b/gcc/machmode.h

These are middle-end changes, you will need a separate review for these.

The x86 part of the patch is OK with the above changes and additional
target attribute test for flags2 ISA features..

Uros.

Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)

2016-11-11 Thread Mike Stump

On Nov 11, 2016, at 2:15 AM, Rainer Orth  wrote:
> The patch passes fixincludes make check (this time for real ;-) and
> restores macOS 10.12 bootstrap.

No objections from me.

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-11 Thread Jakub Jelinek

Hi!

I've noticed preexisting:

On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:

> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */
>  VECTOR_MODES (FLOAT, 32); /*V16HF V8SF V4DF */
>  VECTOR_MODES (FLOAT, 64); /*   V32HF V16SF V8DF */
>  VECTOR_MODES (FLOAT, 128);/*  V64HF V32SF V16DF */

The VECTOR_MODES (FLOAT, comments don't really match reality, shall we fix
that?  None of them create V*HF mode, but they do create V*TF mode.

Jakub

Re: [Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]

2016-11-11 Thread Andre Vehreschild

Hi Janus,

sorry, when I stepped on your toes. That was not my intention. While looking at
your patch and its environment those thoughts came to me. Good that you could
destroy my doubts. Thank you very much.

> In fact I have not thought about any further cases. Since you're not
> giving full examples, I can only guess what you mean: The cases in the
> attachment are working as expected. Anything else?

That was what I've thought about. Thanks for testing that I am was mistaken.

- Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH][ARM] Improve max_insns_skipped logic

2016-11-11 Thread Wilco Dijkstra

Richard Earnshaw wrote:
> On 10/11/16 17:19, Wilco Dijkstra wrote:
> > Improve the logic when setting max_insns_skipped.  Limit the maximum size 
> > of IT
> > to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
> > increasing codesize.  
>
> You don't provide any information about what benefits this brings.

It reduces codesize and improves performance as you avoid emitting a
second IT instruction for sequences of more than 4 conditional instructions.

> > Given 4 works well for Thumb-2, use the same limit for ARM
> > for consistency.
>
> Why?  Logic might suggest that given thumb has to execute an IT
> instruction first, then allowing ARM to have one more conditional
> instruction is the best answer.

Long conditional sequences are slow on modern cores - the value 6 for
max_insns_skipped is a few decades out of date as it was meant for ARM2!
Even with -Os the performance loss for larger values is not worth the
small codesize gain (there are many better options to reduce codesize
that actually improve performance at the same time). So using the same
code generation heuristics for ARM and Thumb-2 is a good idea.

Wilco

Re: [PATCHv2 6/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call

2016-11-11 Thread Kyrill Tkachov



On 09/11/16 14:53, Andre Vieira (lists) wrote:

On 27/10/16 11:01, Andre Vieira (lists) wrote:

On 25/10/16 17:30, Andre Vieira (lists) wrote:

On 24/08/16 12:01, Andre Vieira (lists) wrote:

On 25/07/16 14:26, Andre Vieira (lists) wrote:

This patch extends support for the ARMv8-M Security Extensions
'cmse_nonsecure_call' to use a new library function
'__gnu_cmse_nonsecure_call'. This library function is responsible for
(without using r0-r3 or d0-d7):
1) saving and clearing all callee-saved registers using the secure stack
2) clearing the LSB of the address passed in r4 and using blxns to
'jump' to it
3) clearing ASPR, including the 'ge bits' if DSP is enabled
4) clearing FPSCR if using non-soft float-abi
5) restoring callee-saved registers.

The decisions whether to include DSP 'ge bits' clearing and floating
point registers (single/double precision) all depends on the multilib used.

See Section 5.5 of ARM®v8-M Security Extensions
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm.c (detect_cmse_nonsecure_call): New.
 (cmse_nonsecure_call_clear_caller_saved): New.
 (arm_reorg): Use cmse_nonsecure_call_clear_caller_saved.
 * config/arm/arm-protos.h (detect_cmse_nonsecure_call): New.
 * config/arm/arm.md (call): Handle cmse_nonsecure_entry.
 (call_value): Likewise.
 (nonsecure_call_internal): New.
 (nonsecure_call_value_internal): New.
 * config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New.
 (*nonsecure_call_value_reg_thumb1_v5): New.
 * config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New.
 (*nonsecure_call_value_reg_thumb2): New.
 * config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New.

*** libgcc/ChangeLog ***
2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/cmse_nonsecure_call.S: New.
* config/arm/t-arm: Compile cmse_nonsecure_call.S


*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * gcc.target/arm/cmse/cmse.exp: Run tests in mainline dir.
 * gcc.target/arm/cmse/cmse-9.c: Added some extra tests.
 * gcc.target/arm/cmse/baseline/bitfield-4.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-5.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-6.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-7.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-8.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-9.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: New.
 * gcc.target/arm/cmse/baseline/cmse-11.c: New.
* gcc.target/arm/cmse/baseline/cmse-13.c: New.
* gcc.target/arm/cmse/baseline/cmse-6.c: New.
 * gcc/testsuite/gcc.target/arm/cmse/baseline/union-1.c: New.
 * gcc/testsuite/gcc.target/arm/cmse/baseline/union-2.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New.


Updated this patch to correctly clear only the cumulative
exception-status (0-4,7) and the condition code bits (28-31) of the FPSCR.



This patch extends support for the ARMv8-M Security Extensions
'cmse_nonsecure_call' to use a new library function
'__gnu_cmse_nonsecure_call'. This library function is responsible for
(without using r0-r3 or d0-d7):
1) saving and clearing all callee-saved registers using the secure stack
2) clearing the LSB of the address passed in r4 and using blxns to
'jump' to it
3) clearing ASPR, including the 'ge bits' if DSP is enabled
4) clearing the cumulative exception-status (0-4, 7) and the condition
bits (28-31) of the FPSCR if using non-soft float-abi
5) restoring callee-saved registers.

The decisions whether to include DSP 'ge bits' clearing and floating
point registers (single/double precision) all depends on the multilib used.

See Section 5.5 of ARM®v8-M Security Extensions
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

*** gcc/ChangeLog ***
2016-07-xx  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm.

Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-11 Thread Richard Biener

On Thu, 10 Nov 2016, Joseph Myers wrote:

> On Fri, 28 Oct 2016, Richard Biener wrote:
> 
> > +/* Parse a gimple expression.
> > +
> > +   gimple-expression:
> > + gimple-unary-expression
> > + gimple-call-statement
> > + gimple-binary-expression
> > + gimple-assign-expression
> > + gimple-cast-expression
> 
> I don't see any comments expanding what the syntax is for most of these 
> constructs.

Fixed hopefully - I also made us no longer use c_parser_cast_expression
and for expression leafs (GIMPLE operands) generally use
c_parser_gimple_postfix_expression instead of c_parser_cast_expression.
This meant exporting c_parser_type_name instead of
c_parser_cast_expression to handle parsing casts.

> > +  if (c_parser_next_token_is (parser, CPP_EQ))
> > +c_parser_consume_token (parser);
> 
> That implies you're allowing an optional '=' at this point in the syntax.  
> That doesn't seem to make sense to me; I'd expect you to do if (=) { 
> process assignment; } else { other cases; } or similar.

Fixed.  I've renamed c_parser_gimple_expression to
c_parser_gimple_statement to make it clearer it deals with parsing
single GIMPLE statements.  The only stmt w/o LHS it parses are calls
and thus if it is not a call we can simply require a =.

> > +  /* GIMPLE PHI expression.  */
> > +  if (c_parser_next_token_is_keyword (parser, RID_PHI))
> 
> I don't see this mentioned in any of the syntax comments.

Fixed.

> > +  struct {
> > +/* The expression at this stack level.  */
> > +struct c_expr expr;
> > +/* The operation on its left.  */
> > +enum tree_code op;
> > +/* The source location of this operation.  */
> > +location_t loc;
> > +  } stack[2];
> > +  int sp;
> > +  /* Location of the binary operator.  */
> > +  location_t binary_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
> > +#define POP
> >   \
> 
> This all looks like excess complexity.  The syntax in the comment 
> indicates that in GIMPLE, the operands of a binary expression are unary 
> expressions.  So nothing related to precedence is needed at all, and you 
> shouldn't need this stack construct.

Indeed, we do not handle recursive expressions.  I dropped the whole
stack logic and always use build2 (we don't want any operand promotion
either).

The following is the incremental patch I am installing on the branch
(after bootstrap, GIMPLE testcases still all pass).

I hope to get the frontend in for GCC 7 because only if we actually
start to use it we'll figure out where we need to improve.

Is the original patch with the change below ok for trunk?

Thanks,
Richard.

2016-11-11  Richard Biener  

c/
* c-parser.h (c_parser_cast_expression): Remove.
(c_parser_type_name): Export.
* c-parser.c (c_parser_cast_expression): Revert exporting.
(c_parser_type_name): Export.
* gimple-parser.c (c_parser_gimple_expression): Rename to ...
(c_parser_gimple_statement): ... this.  Amend syntax comments,
require CPP_EQ if not a call without LHS.  Implement cast
parsing without c_parser_cast_expression.  Move INDIRECT_REF
gimplification to c_parser_gimple_unary_expression.  Properly
dispatch to c_parser_gimple_unary_expression for __real and __imag.
(c_parser_gimple_binary_expression): Simplify by not allowing
nested expressions.  Remove support for non-GIMPLE && and ||.
Use c_parser_gimple_postfix_expression instead of
c_parser_cast_expression for operand parsing.
(c_parser_gimple_unary_expression): Remove support for non-GIMPLE !.
Use c_parser_gimple_postfix_expression instead of
c_parser_cast_expression for operand parsing.  Build MEM_REFs
directly.
(c_parser_gimple_paren_condition): Do not call
c_objc_common_truthvalue_conversion.
(c_parser_gimple_switch_stmt): Use c_parser_gimple_postfix_expression
for operand and case label parsing.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index a27eece..08a7d78 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1205,7 +1205,6 @@ static struct c_arg_info *c_parser_parms_list_declarator 
(c_parser *, tree,
 static struct c_parm *c_parser_parameter_declaration (c_parser *, tree);
 static tree c_parser_simple_asm_expr (c_parser *);
 static tree c_parser_attributes (c_parser *);
-static struct c_type_name *c_parser_type_name (c_parser *);
 static struct c_expr c_parser_initializer (c_parser *);
 static struct c_expr c_parser_braced_init (c_parser *, tree, bool,
   struct obstack *);
@@ -1233,6 +1232,7 @@ static struct c_expr c_parser_conditional_expression 
(c_parser *,
  struct c_expr *, tree);
 static struct c_expr c_parser_binary_expression (c_parser *, struct c_expr *,
 tree);
+static struct

Re: [PATCH] Support no newline in print_gimple_stmt

2016-11-11 Thread Richard Biener

On Thu, Nov 10, 2016 at 4:36 PM, Martin Liška  wrote:
> I've just noticed that tree-ssa-dse wrongly prints a new line to dump file.
> For the next stage1, I'll go through usages of print_gimple_stmt and remove
> extra new lines like:
>
> gcc/auto-profile.c:  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> gcc/auto-profile.c-  fprintf (dump_file, "\n");
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Err, why not just remove the excess newlines (and drop the ' quotes)?

Richard.

> Martin

[C++ PATCH, ABI] Fix mangling of TLS init and wrapper fns (PR c++/77285)

2016-11-11 Thread Jakub Jelinek

Hi!

The following testcase fails to link, because in one TU
(pr77285-2.C) we first mangle the TLS symbols and only afterwards
the TLS wrapper fn symbol (and don't emit TLS init fn symbol at all,
as there are no uses of the TLS var), while in the other TU
we first mangle TLS init and TLS wrapper symbol (therefore check_abi_tags
has not been called).

Not really sure if we want to call just check_abi_tags (based on the idea
that likely any uses of the TLS var from other TUs will be broken),
or use:
  if (abi_version_at_least (11))
maybe_check_abi_tags (variable, NULL_TREE, 11);
(perhaps with some extra new argument to maybe_check_abi_tags, some
enum, that would allow it to say which symbol it is - as it is not
initialization guard variable, but thread_local wrapper or thread_local
initialization function symbol).  For the latter speaks that if such a var
is exported, but not really used from other TUs, it is an ABI change.

2016-11-11  Jakub Jelinek  

PR c++/77285
* mangle.c (mangle_tls_init_fn, mangle_tls_wrapper_fn): Call
check_abi_tags.

* g++.dg/tls/pr77285-1.C: New test.
* g++.dg/tls/pr77285-2.C: New test.

--- gcc/cp/mangle.c.jj  2016-11-10 18:03:27.0 +0100
+++ gcc/cp/mangle.c 2016-11-11 12:53:55.657483383 +0100
@@ -4254,6 +4254,7 @@ mangle_guard_variable (const tree variab
 tree
 mangle_tls_init_fn (const tree variable)
 {
+  check_abi_tags (variable);
   start_mangling (variable);
   write_string ("_ZTH");
   write_guarded_var_name (variable);
@@ -4268,6 +4269,7 @@ mangle_tls_init_fn (const tree variable)
 tree
 mangle_tls_wrapper_fn (const tree variable)
 {
+  check_abi_tags (variable);
   start_mangling (variable);
   write_string (TLS_WRAPPER_PREFIX);
   write_guarded_var_name (variable);
--- gcc/testsuite/g++.dg/tls/pr77285-1.C.jj 2016-11-11 13:03:14.439409858 
+0100
+++ gcc/testsuite/g++.dg/tls/pr77285-1.C2016-11-11 12:58:58.663647680 
+0100
@@ -0,0 +1,7 @@
+// { dg-do link { target c++11 } }
+// { dg-require-effective-target tls }
+// { dg-additional-sources pr77285-2.C }
+
+struct __attribute__((abi_tag("tag"))) X { ~X () {} int i = 0; };
+thread_local X var1;
+X var2;
--- gcc/testsuite/g++.dg/tls/pr77285-2.C.jj 2016-11-11 13:03:17.725368262 
+0100
+++ gcc/testsuite/g++.dg/tls/pr77285-2.C2016-11-11 13:00:59.976112006 
+0100
@@ -0,0 +1,17 @@
+// PR c++/77285
+// { dg-do compile { target c++11 } }
+// { dg-require-effective-target tls }
+// { dg-final { scan-assembler "_Z4var1B3tag" } }
+// { dg-final { scan-assembler "_Z4var2B3tag" } }
+// { dg-final { scan-assembler "_ZTH4var1B3tag" } }
+// { dg-final { scan-assembler "_ZTW4var1B3tag" } }
+
+struct __attribute__((abi_tag("tag"))) X { ~X () {} int i = 0; };
+extern thread_local X var1;
+extern X var2;
+
+int
+main ()
+{
+ return var1.i + var2.i;
+}

Jakub

[PATCH] Fix PR71575

2016-11-11 Thread Richard Biener


The following fixes a graphite ICE because of a bogus assert.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-11-11  Richard Biener  

PR tree-optimization/71575
* graphite-isl-ast-to-gimple.c (copy_cond_phi_nodes): Remove
bogus assert.

* gcc.dg/graphite/pr71575-1.c: New testcase.
* gcc.dg/graphite/pr71575-2.c: Likewise.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 242004)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -2505,9 +2505,6 @@ copy_cond_phi_nodes (basic_block bb, bas
   tree res = gimple_phi_result (phi);
   if (virtual_operand_p (res))
continue;
-  if (is_gimple_reg (res) && scev_analyzable_p (res, region->region))
-   /* Cond phi nodes should not be scev_analyzable_p.  */
-   gcc_unreachable ();
 
   gphi *new_phi = create_phi_node (SSA_NAME_VAR (res), new_bb);
   tree new_res = create_new_def_for (res, new_phi,
Index: gcc/testsuite/gcc.dg/graphite/pr71575-1.c
===
--- gcc/testsuite/gcc.dg/graphite/pr71575-1.c   (revision 0)
+++ gcc/testsuite/gcc.dg/graphite/pr71575-1.c   (working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+void w(int x, double *y)
+{
+  int i, j;
+  double a;
+  double c[32];
+
+  for (i = 0; i < x; i++) {
+  for (j = 0; j < x - i; j++) {
+ c[j] = y[i];
+  }
+  y[i] = a;
+  a += c[0] + y[i];
+  }
+}
+
+void v(int x, double *y)
+{
+  w(x, y);
+}
Index: gcc/testsuite/gcc.dg/graphite/pr71575-2.c
===
--- gcc/testsuite/gcc.dg/graphite/pr71575-2.c   (revision 0)
+++ gcc/testsuite/gcc.dg/graphite/pr71575-2.c   (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -floop-nest-optimize" } */
+
+int *a;
+int b, c, d, e, g;
+char f;
+
+void fn1() {
+for (; c;) {
+   b = 0;
+   for (; b <= 2; b++) {
+   unsigned **h = (unsigned **) &a[b];
+   *h = (unsigned *)(__UINTPTR_TYPE__)((g && (e = d)) != f++);
+   }
+}
+}

[PATCH] Fix PR78295

2016-11-11 Thread Richard Biener


The following fixes an unwanted uninit warning.

Bootstrapped and tested on x86_64-unknown-linxu-gnu, applied.

Richard.

2016-11-11  Richard Biener  

PR middle-end/78295
* tree-ssa-uninit.c (warn_uninitialized_vars): Do not warn
about uninitialized destination arg of BIT_INSERT_EXPR.

* gcc.dg/uninit-pr78295.c: New testcase.

Index: gcc/tree-ssa-uninit.c
===
--- gcc/tree-ssa-uninit.c   (revision 242004)
+++ gcc/tree-ssa-uninit.c   (working copy)
@@ -212,6 +212,14 @@ warn_uninitialized_vars (bool warn_possi
 can warn about.  */
  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_USE)
{
+ /* BIT_INSERT_EXPR first operand should not be considered
+a use for the purpose of uninit warnings.  */
+ if (gassign *ass = dyn_cast  (stmt))
+   {
+ if (gimple_assign_rhs_code (ass) == BIT_INSERT_EXPR
+ && use_p->use == gimple_assign_rhs1_ptr (ass))
+   continue;
+   }
  use = USE_FROM_PTR (use_p);
  if (always_executed)
warn_uninit (OPT_Wuninitialized, use, SSA_NAME_VAR (use),
Index: gcc/testsuite/gcc.dg/uninit-pr78295.c
===
--- gcc/testsuite/gcc.dg/uninit-pr78295.c   (revision 0)
+++ gcc/testsuite/gcc.dg/uninit-pr78295.c   (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wall" } */
+
+typedef double vectype __attribute__ ((__vector_size__ (16)));
+
+vectype
+f (double x)
+{
+  vectype t;
+  for (int i = 0; i < 2; i++)
+t[i] = x; /* { dg-bogus "uninitialized" } */
+  return t;
+}

Re: [Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]

2016-11-11 Thread Janus Weil

Hi Andre,

> sorry, when I stepped on your toes. That was not my intention.

well, I kind of got the impression that me committing 'obvious'
patches was somehow getting in conflict with _your_ toes (though I
don't quite understand why). I care as much about the quality of
gfortran as you do, trust me. Sorry if I was sounding too harsh ...

Cheers,
Janus

Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-11-11 Thread Bernd Schmidt


On 10/19/2016 12:39 PM, Bernd Schmidt wrote:

I'll refrain from any further comments on the topic. The ptx patches
don't look unreasonable iff someone else decides that this version of
OpenMP support should be merged and I'll look into them in more detail
if that happens. Patch 2/8 is ok now.


Sounds like Jakub has made that decision. So I'll get out of the way and 
just approve all these.



Bernd

Re: [PATCH][AArch64] Improve TI mode address offsets

2016-11-11 Thread Wilco Dijkstra

Richard Earnshaw wrote:

> Has this patch been truncated?  The last line above looks to be part-way
> through a hunk.

Oops sorry, it seems the last few lines are missing. Here is the full version:

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
3045e6d6447d5c1860feb51708eeb2a21d2caca9..45f44e96ba9e9d3c8c41d977aa509fa13398a8fd
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4066,7 +4066,8 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
 instruction memory accesses.  */
  if (mode == TImode || mode == TFmode)
return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
-   && offset_9bit_signed_unscaled_p (mode, offset));
+   && (offset_9bit_signed_unscaled_p (mode, offset)
+   || offset_12bit_unsigned_scaled_p (mode, offset)));
 
  /* A 7bit offset check because OImode will emit a ldp/stp
 instruction (only big endian will get here).
@@ -4270,18 +4271,19 @@ aarch64_legitimate_address_p (machine_mode mode, rtx x,
 /* Split an out-of-range address displacement into a base and offset.
Use 4KB range for 1- and 2-byte accesses and a 16KB range otherwise
to increase opportunities for sharing the base address of different sizes.
-   For TI/TFmode and unaligned accesses use a 256-byte range.  */
+   For unaligned accesses and TI/TF mode use the signed 9-bit range.  */
 static bool
 aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode 
mode)
 {
-  HOST_WIDE_INT mask = GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3fff;
+  HOST_WIDE_INT offset = INTVAL (*disp);
+  HOST_WIDE_INT base = offset & ~(GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3ffc);
 
-  if (mode == TImode || mode == TFmode ||
-  (INTVAL (*disp) & (GET_MODE_SIZE (mode) - 1)) != 0)
-mask = 0xff;
+  if (mode == TImode || mode == TFmode
+  || (offset & (GET_MODE_SIZE (mode) - 1)) != 0)
+base = (offset + 0x100) & ~0x1ff;
 
-  *off = GEN_INT (INTVAL (*disp) & ~mask);
-  *disp = GEN_INT (INTVAL (*disp) & mask);
+  *off = GEN_INT (base);
+  *disp = GEN_INT (offset - base);
   return true;
 }
 
@@ -5148,12 +5150,10 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
  x = gen_rtx_PLUS (Pmode, base, offset_rtx);
}
 
-  /* Does it look like we'll need a load/store-pair operation?  */
+  /* Does it look like we'll need a 16-byte load/store-pair operation?  */
   HOST_WIDE_INT base_offset;
-  if (GET_MODE_SIZE (mode) > 16
- || mode == TImode)
-   base_offset = ((offset + 64 * GET_MODE_SIZE (mode))
-  & ~((128 * GET_MODE_SIZE (mode)) - 1));
+  if (GET_MODE_SIZE (mode) > 16)
+   base_offset = (offset + 0x400) & ~0x7f0;
   /* For offsets aren't a multiple of the access size, the limit is
 -256...255.  */
   else if (offset & (GET_MODE_SIZE (mode) - 1))
@@ -5167,6 +5167,8 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
   /* Small negative offsets are supported.  */
   else if (IN_RANGE (offset, -256, 0))
base_offset = 0;
+  else if (mode == TImode || mode == TFmode)
+   base_offset = (offset + 0x100) & ~0x1ff;
   /* Use 12-bit offset by access size.  */
   else
base_offset = offset & (~0xfff * GET_MODE_SIZE (mode));
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
24b7288976dd0452f41475e40f02750fc56a2a20..62eda569f9b642ac569a61718d7debf7eae1b59e
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1094,9 +1094,9 @@
 
 (define_insn "*movti_aarch64"
   [(set (match_operand:TI 0
-"nonimmediate_operand"  "=r, *w,r ,*w,r  ,Ump,Ump,*w,m")
+"nonimmediate_operand"  "=r, *w,r ,*w,r,m,m,*w,m")
(match_operand:TI 1
-"aarch64_movti_operand" " rn,r ,*w,*w,Ump,r  ,Z  , m,*w"))]
+"aarch64_movti_operand" " rn,r ,*w,*w,m,r,Z, m,*w"))]
   "(register_operand (operands[0], TImode)
 || aarch64_reg_or_zero (operands[1], TImode))"
   "@
@@ -1211,9 +1211,9 @@
 
 (define_insn "*movtf_aarch64"
   [(set (match_operand:TF 0
-"nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump,Ump")
+"nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r,m ,m")
(match_operand:TF 1
-"general_operand"  " w,?r, ?r,w ,Y,Y ,m,w,Ump,?r ,Y"))]
+"general_operand"  " w,?r, ?r,w ,Y,Y ,m,w,m ,?r,Y"))]
   "TARGET_FLOAT && (register_operand (operands[0], TFmode)
 || aarch64_reg_or_fp_zero (operands[1], TFmode))"
   "@

[Patch, Fortran, F03] PR 77501: ICE in gfc_match_generic, at fortran/decl.c:9429

2016-11-11 Thread Janus Weil

Hi all,

here is another rather simple patch for an ice-on-invalid problem. The
patch consists of three hunks, out of which the last two actually fix
two ICEs that occurred on the different test cases: The second one
removes an assert which seems unnecessary to me (since there are
anyway other cases where tb becomes NULL), and the third checks
whether the symtree exists already before creating a new one.

The first hunk is just cosmetics / code simplification, using
gfc_get_tbp_symtree instead of the (equivalent) combination of
gfc_find_symtree and gfc_new_symtree.

[Btw, speaking of gfc_get_tbp_symtree: Can anyone tell me by chance
why it is necessary to nullify 'result->n.tb' on a newly-created
symtree?]

The patch regtests cleanly on x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus



2016-11-11  Janus Weil  

PR fortran/77501
* decl.c (gfc_match_decl_type_spec): Use gfc_get_tbp_symtree,
fix indentation.
(gfc_match_generic): Remove an unnecessary assert.
Use gfc_get_tbp_symtree to avoid ICE.

2016-11-11  Janus Weil  

PR fortran/77501
* gfortran.dg/typebound_generic_16.f90: New test.
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c  (Revision 242066)
+++ gcc/fortran/decl.c  (Arbeitskopie)
@@ -3198,13 +3198,11 @@ gfc_match_decl_type_spec (gfc_typespec *ts, int im
  upe->attr.zero_comp = 1;
  if (!gfc_add_flavor (&upe->attr, FL_DERIVED, NULL,
   &gfc_current_locus))
- return MATCH_ERROR;
-   }
+ return MATCH_ERROR;
+   }
  else
{
- st = gfc_find_symtree (gfc_current_ns->sym_root, "STAR");
- if (st == NULL)
-   st = gfc_new_symtree (&gfc_current_ns->sym_root, "STAR");
+ st = gfc_get_tbp_symtree (&gfc_current_ns->sym_root, "STAR");
  st->n.sym = upe;
  upe->refs++;
}
@@ -9731,14 +9729,7 @@ gfc_match_generic (void)
gfc_symtree* st;
 
st = gfc_find_symtree (is_op ? ns->tb_uop_root : ns->tb_sym_root, name);
-   if (st)
- {
-   tb = st->n.tb;
-   gcc_assert (tb);
- }
-   else
- tb = NULL;
-
+   tb = st ? st->n.tb : NULL;
break;
   }
 
@@ -9783,10 +9774,8 @@ gfc_match_generic (void)
case INTERFACE_USER_OP:
  {
const bool is_op = (op_type == INTERFACE_USER_OP);
-   gfc_symtree* st;
-
-   st = gfc_new_symtree (is_op ? &ns->tb_uop_root : &ns->tb_sym_root,
- name);
+   gfc_symtree* st = gfc_get_tbp_symtree (is_op ? &ns->tb_uop_root :
+  &ns->tb_sym_root, name);
gcc_assert (st);
st->n.tb = tb;
 
! { dg-do compile }
!
! PR 77501: [F03] ICE in gfc_match_generic, at fortran/decl.c:9429
!
! Contributed by Gerhard Steinmetz 

module m1
  type t
  contains
generic :: f => g  ! { dg-error "must target a specific binding" }
generic :: g => h  ! { dg-error "Undefined specific binding" }
  end type
end

module m2
  type t
  contains
generic :: f => g  ! { dg-error "must target a specific binding" }
generic :: g => f  ! { dg-error "Undefined specific binding" }
  end type
end

[PATCH] docs: constify argument to __builtin_object_size()

2016-11-11 Thread Jakub Kicinski

It's OK to pass const pointers to __builtin_object_size(),
correct the documentation.

Signed-off-by: Jakub Kicinski 
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0669f7999beb..4378ab84b5d8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10110,7 +10110,7 @@ follow pointer assignments through non-trivial control 
flow they rely
 on various optimization passes enabled with @option{-O2}.  However, to
 a limited extent, they can be used without optimization as well.
 
-@deftypefn {Built-in Function} {size_t} __builtin_object_size (void * 
@var{ptr}, int @var{type})
+@deftypefn {Built-in Function} {size_t} __builtin_object_size (const void * 
@var{ptr}, int @var{type})
 is a built-in construct that returns a constant number of bytes from
 @var{ptr} to the end of the object @var{ptr} pointer points to
 (if known at compile time).  @code{__builtin_object_size} never evaluates
-- 
1.9.1

[committed] Add testcase from PR c++/72774

2016-11-11 Thread Jakub Jelinek

Hi!

This PR has been fixed by r240148.  I've checked in the testcase as obvious,
so we can close it.

2016-11-11  Jakub Jelinek  

PR c++/72774
* g++.dg/parse/pr72774.C: New test.

--- gcc/testsuite/g++.dg/parse/pr72774.C.jj 2016-11-11 14:37:39.073659504 
+0100
+++ gcc/testsuite/g++.dg/parse/pr72774.C2016-11-11 14:00:42.0 
+0100
@@ -0,0 +1,10 @@
+// PR c++/72774
+// { dg-do compile }
+
+void baz ();
+namespace A { void foo (); }
+void bar ()
+{
+  using A::foo;
+  0 ? static_cast (0) : baz;  // { dg-error "does not name a type" }
+}

Jakub

Re: [PATCH TEST]Only drop xfail for gcc.dg/vect/vect-cond-2.c on targets supporting vect_max_reduc

2016-11-11 Thread Richard Biener

On Fri, Nov 11, 2016 at 10:36 AM, Bin Cheng  wrote:
> Hi,
> Test gcc.dg/vect/vect-cond-2.c still requires vect_max_reduc to be 
> vectorized, this patch adds the requirement.

Ok.

Richard.

> Thanks,
> bin
>
> gcc/testsuite/ChangeLog
> 2016-11-09  Bin Cheng  
>
> * gcc.dg/vect/vect-cond-2.c: Only drop xfail for targets supporting
> vect_max_reduc.

[PATCH] Fix PR sanitizer/78270 (part 2)

2016-11-11 Thread Martin Liška

Hello.

Due to a stupid mistake I did, following patch is needed for the test-case
to properly save previous gimplify_ctxp->live_switch_vars.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I was able to run asan bootstrap on x86_64-linux-gnu and kernel build with
allyesconfig works fine.

Ready to be installed?
Martin
>From 53dd3c035283863a25a24feb90bf359295999bca Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Nov 2016 11:21:29 +0100
Subject: [PATCH] Fix PR sanitizer/78270 (part 2)

gcc/ChangeLog:

2016-11-11  Martin Liska  

	PR sanitizer/78270
	* gimplify.c (gimplify_switch_expr): Always save previous
	gimplify_ctxp->live_switch_vars.

gcc/testsuite/ChangeLog:

2016-11-11  Martin Liska  

	PR sanitizer/78270
	* gcc.dg/asan/pr78270-2.c: New test.
---
 gcc/gimplify.c|  8 
 gcc/testsuite/gcc.dg/asan/pr78270-2.c | 17 +
 2 files changed, 21 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr78270-2.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 16573dd..c23888b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2255,11 +2255,11 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
   gimplify_ctxp->case_labels.create (8);
 
   /* Do not create live_switch_vars if SWITCH_BODY is not a BIND_EXPR.  */
+  saved_live_switch_vars = gimplify_ctxp->live_switch_vars;
   if (TREE_CODE (SWITCH_BODY (switch_expr)) == BIND_EXPR)
-	{
-	  saved_live_switch_vars = gimplify_ctxp->live_switch_vars;
-	  gimplify_ctxp->live_switch_vars = new hash_set (4);
-	}
+	gimplify_ctxp->live_switch_vars = new hash_set (4);
+  else
+	gimplify_ctxp->live_switch_vars = NULL;
 
   bool old_in_switch_expr = gimplify_ctxp->in_switch_expr;
   gimplify_ctxp->in_switch_expr = true;
diff --git a/gcc/testsuite/gcc.dg/asan/pr78270-2.c b/gcc/testsuite/gcc.dg/asan/pr78270-2.c
new file mode 100644
index 000..d1f5d26
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr78270-2.c
@@ -0,0 +1,17 @@
+// { dg-do compile }
+// { dg-additional-options "-Wno-switch-unreachable" }
+
+int a;
+void
+fn1 ()
+{
+  switch (a)
+{
+  char b;
+case 8:
+  &b;
+  switch (0)
+	;
+}
+}
+
-- 
2.10.1

[PATCH] Dump probability for edges a frequency for BBs

2016-11-11 Thread Martin Liška

Hello.

I spent quite time during this stage1 playing with predictors and we found
with Honza multiple situations where a prediction was oddly calculated.
Thus, we're suggesting to enhance default dump format to show BB frequencies
and edge probabilities, as follows:

main (int a)
{
  int _1;

   [100.0%]:
  if (a_2(D) == 123)
goto  (); [18.8%]
  else
goto  (sparta); [81.2%]

sparta [81.2%]:
  switch (a_2(D))  [33.3%], case 1 ... 2:  [66.7%]>

 [27.1%]:

  # _1 = PHI <2(2), 3(4), a_2(D)(3)>
 [100.0%]:
  return _1;

}

That would exhibit these numbers to people, which would eventually report
strange numbers seen in dump files.
I was quite surprised that the patch does not break many scanning tests.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Thoughts?
Martin
>From 5b7d8393564a0111698b58989ac74b45cf019701 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 14:11:48 +0100
Subject: [PATCH] Dump probability for edges a frequency for BBs

gcc/ChangeLog:

2016-11-11  Martin Liska  

	* gimple-pretty-print.c (dump_edge_probability): New function.
	(dump_gimple_switch): Dump label edge probabilities.
	(dump_gimple_cond): Likewise.
	(dump_gimple_label): Dump
	(dump_gimple_bb_header): Dump basic block frequency.
	(pp_cfg_jump): Replace e->dest argument with e.
	(dump_implicit_edges): Likewise.
	* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at):
	Use gimple_bb (at) instead of at->bb.

gcc/testsuite/ChangeLog:

2016-11-11  Martin Liska  

	* gcc.dg/builtin-unreachable-6.c: Update test to not to scan
	parts for frequencies/probabilities.
	* gcc.dg/pr34027-1.c: Likewise.
	* gcc.dg/strict-overflow-2.c: Likewise.
	* gcc.dg/tree-ssa/20040703-1.c: Likewise.
	* gcc.dg/tree-ssa/builtin-sprintf-2.c: Likewise.
	* gcc.dg/tree-ssa/pr32044.c: Likewise.
	* gcc.dg/tree-ssa/vector-3.c: Likewise.
	* gcc.dg/tree-ssa/vrp101.c: Likewise.
	* gcc.dg/tree-ssa/dump-2.c: New test.
---
 gcc/gimple-pretty-print.c | 67 ---
 gcc/testsuite/gcc.dg/builtin-unreachable-6.c  |  2 +-
 gcc/testsuite/gcc.dg/pr34027-1.c  |  4 +-
 gcc/testsuite/gcc.dg/strict-overflow-2.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/20040703-1.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/dump-2.c|  9 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr32044.c   |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/vector-3.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp101.c|  2 +-
 gcc/tree-ssa-loop-ivopts.c|  2 +-
 11 files changed, 80 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/dump-2.c

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f588f5e..b8239c3 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"	/* for dump_flags */
 #include "value-prof.h"
 #include "trans-mem.h"
+#include "cfganal.h"
 
 #define INDENT(SPACE)			\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
@@ -71,6 +72,14 @@ debug_gimple_stmt (gimple *gs)
   print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS);
 }
 
+/* Dump E probability to BUFFER.  */
+
+static void
+dump_edge_probability (pretty_printer *buffer, edge e)
+{
+  pp_scalar (buffer, " [%.1f%%]",
+	 e->probability * 100.0 / REG_BR_PROB_BASE);
+}
 
 /* Print GIMPLE statement G to FILE using SPC indentation spaces and
FLAGS as in pp_gimple_stmt_1.  */
@@ -902,7 +911,20 @@ dump_gimple_switch (pretty_printer *buffer, gswitch *gs, int spc,
   gcc_checking_assert (case_label != NULL_TREE);
   dump_generic_node (buffer, case_label, spc, flags, false);
   pp_space (buffer);
-  dump_generic_node (buffer, CASE_LABEL (case_label), spc, flags, false);
+  tree label = CASE_LABEL (case_label);
+  dump_generic_node (buffer, label, spc, flags, false);
+
+  if (cfun && cfun->cfg)
+	{
+	  basic_block dest = label_to_block (label);
+	  if (dest)
+	{
+	  edge label_edge = find_edge (gimple_bb (gs), dest);
+	  if (label_edge)
+		dump_edge_probability (buffer, label_edge);
+	}
+	}
+
   if (i < gimple_switch_num_labels (gs) - 1)
 pp_string (buffer, ", ");
 }
@@ -932,6 +954,23 @@ dump_gimple_cond (pretty_printer *buffer, gcond *gs, int spc, int flags)
   dump_generic_node (buffer, gimple_cond_rhs (gs), spc, flags, false);
   if (!(flags & TDF_RHS_ONLY))
 	{
+	  edge_iterator ei;
+	  edge e, true_edge = NULL, false_edge = NULL;
+	  basic_block bb = gimple_bb (gs);
+
+	  if (bb)
+	{
+	  FOR_EACH_EDGE (e, ei, bb->succs)
+		{
+		  if (e->flags & EDGE_TRUE_VALUE)
+		true_edge = e;
+		  else if (e->flags & EDGE_FALSE_VALUE)
+		false_edge = e;
+		}
+	}
+
+	  bool has_edge_info = true_edge != NULL && false_edge != NULL;
+
 	  pp_right_paren (buffer);
 
 	  if (gimple_

Re: Go patch committed: copy signal code from Go 1.7 runtime

2016-11-11 Thread Rainer Orth

Hi Ian,

> This patch to the Go frontend and libgo copies the signal code from
> the Go 1.7 runtime.
>
> This adds a little shell script to auto-generate runtime.sigtable from
> the known signal names.
>
> This forces the main package to always import the runtime package.
> Otherwise some runtime package global variables may never be
> initialized.
>
> This sets the syscallsp and syscallpc fields of g when entering a
> syscall, so that the runtime package knows when a g is executing a
> syscall.
>
> This fixes runtime.funcPC to avoid dead store elimination of the
> interface value when the function is inlined.
>
> The signal code in C now has some target-specific code to return the
> PC where the signal occurred and to dump the registers on a hard
> crash.  This is what the gc toolchain does as well.  I wrote versions
> of that code for x86 GNU/Linux.  Other targets will fall back
> reasonably and display less information.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
> Bootstrapped and ran relevant tests on sparc-sun-solaris.  Committed
> to mainline.

this has caused a number of testsuite failures on Solaris 10/x86 only:
Solaris 11 and 12/x86 are fine, still waiting for Solaris 10/SPARC
results:

* 32-bit Solaris 10/x86:

+FAIL: os

+FAIL: net/http
+FAIL: os/exec

* 64-bit Solaris 10/x86:

+FAIL: os

mkdir: Failed to make directory "_obj"; File exists
mkdir: Failed to make directory "_test"; File exists
--- FAIL: TestStatStdin (0.16s)
os_test.go:1643: Failed to spawn child process: exit status 139 "Segment
ation Fault\n"
/vol/gcc/src/hg/trunk/local/libgo/testsuite/gotest[638]: 20371 Segmentation Faul
t
/vol/gcc/src/hg/trunk/local/libgo/testsuite/gotest[643]: 20373 Terminated
FAIL: os

+FAIL: log/syslog
+FAIL: net/http
+FAIL: os/exec
+FAIL: os/signal

All new failures are SEGVs, it seems, e.g.

LD_LIBRARY_PATH=../../.libs:../../../libgcc ./a.out -test.v=true
[...]
--- SKIP: TestMkdirAllAtSlash (0.00s)
path_test.go:223: could not create /_go_os_test/dir: mkdir 
/_go_os_test: permission denied
=== RUN   TestEPIPE
Segmentation Fault

Thread 13 received signal SIGSEGV, Segmentation fault.
runtime.getSiginfo (info=0x0, context=0xdfc03dfc)
at /vol/gcc/src/hg/trunk/local/libgo/runtime/go-signal.c:190
190 ret.sigaddr = (uintptr)(info->si_addr);

  info is NULL here!

(gdb) where
#0  runtime.getSiginfo (info=0x0, context=0xdfc03dfc)
at /vol/gcc/src/hg/trunk/local/libgo/runtime/go-signal.c:190
#1  0xfe7e9c8b in runtime.sighandler (gp=0xde570a00, ctxt=0xdfc03dfc, 
info=0x0, sig=13)
at /vol/gcc/src/hg/trunk/local/libgo/go/runtime/signal_sighandler.go:37
#2  runtime.sigtrampgo (sig=13, info=0x0, ctx=0xdfc03dfc)
at /vol/gcc/src/hg/trunk/local/libgo/go/runtime/signal_sigtramp.go:37
#3  0xfe457f90 in runtime.sigtramp (sig=13, info=0x0, context=0xdfc03dfc)
at /vol/gcc/src/hg/trunk/local/libgo/runtime/go-signal.c:56
#4  0xfdd593df in __sighndlr () from /lib/libc.so.1
#5  0xfdd4f0a7 in call_user_handler () from /lib/libc.so.1
#6  
#7  0xfdd5cd85 in _write () from /lib/libc.so.1
#8  0xfdd4cc60 in write () from /lib/libc.so.1
#9  0xfe80ebf5 in syscall.write (param=..., fd=4) at libcalls.go:1572
#10 syscall.Write (fd=4, param=...)
at /vol/gcc/src/hg/trunk/local/libgo/go/syscall/syscall_unix.go:232
#11 0x08067880 in os.write.pN7_os.File (f=0xdfbaec28, b=...)
at file_unix.go:260
#12 0x08065e2f in os.Write.pN7_os.File (f=0xdfbaec28, b=...) at file.go:142
#13 0x0807f0bf in os_test.TestEPIPE (t=0xddb1cc00) at pipe_test.go:31
#14 0xfe819c0c in testing.tRunner (param=, 
fn=0x808e6f0 )
libgo/go/testing/testing.go:609
#15 0xfe819c83 in testing.$thunk22 (__go_thunk_parameter=0xdfbaebf8)
at /vol/gcc/src/hg/trunk/local/libgo/go/testing/testing.go:645
#16 0xfe463e27 in kickoff ()
at /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:257
#17 0xfdcd5b92 in makecontext () from /lib/libc.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The siginfo_t * argument being NULL suggests SA_SIGINFO not being set,
although go/runtime/signal_gccgo.go (setsig) does AFAICS.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [Patch, Fortran, F03] PR 77501: ICE in gfc_match_generic, at fortran/decl.c:9429

2016-11-11 Thread Janus Weil

2016-11-11 14:38 GMT+01:00 Janus Weil :
> [Btw, speaking of gfc_get_tbp_symtree: Can anyone tell me by chance
> why it is necessary to nullify 'result->n.tb' on a newly-created
> symtree?]

Removing the corresponding line does not do any harm to the testsuite,
as I just verified:

Index: gcc/fortran/class.c
===
--- gcc/fortran/class.c(Revision 242066)
+++ gcc/fortran/class.c(Arbeitskopie)
@@ -2970,7 +2970,6 @@ gfc_get_tbp_symtree (gfc_symtree **root, const cha
 {
   result = gfc_new_symtree (root, name);
   gcc_assert (result);
-  result->n.tb = NULL;
 }

   return result;


After all, n.tb should anyway be NULL in a symtree we just created,
right? So is it ok to remove that?

Cheers,
Janus

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-11 Thread Yuri Rumyantsev

Richard,

Sorry for confusion but my updated patch  does not work properly, so I
need to fix it.

Yuri.

2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev :
> Richard,
>
> I prepare updated 3 patch with passing additional argument to
> vect_analyze_loop as you proposed (untested).
>
> You wrote:
> tw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
>
> Did you mean that I exclude all support for vectorization epilogues,
> i.e. exclude from 2-nd patch all non-related changes
> like
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 11863af..32011c1 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> +  LOOP_VINFO_NEED_MASKING (res) = false;
> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>
> Did you mean also that new combined patch must be working patch, i.e.
> can be integrated without other patches?
>
> Could you please look at updated patch?
>
> Thanks.
> Yuri.
>
> 2016-11-10 15:36 GMT+03:00 Richard Biener :
>> On Thu, 10 Nov 2016, Richard Biener wrote:
>>
>>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>
>>> > Richard,
>>> >
>>> > Here is updated 3 patch.
>>> >
>>> > I checked that all new tests related to epilogue vectorization passed 
>>> > with it.
>>> >
>>> > Your comments will be appreciated.
>>>
>>> A lot better now.  Instead of the ->aux dance I now prefer to
>>> pass the original loops loop_vinfo to vect_analyze_loop as
>>> optional argument (if non-NULL we analyze the epilogue of that
>>> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>> original vectorization factor?  So we can pass down an (optional)
>>> forced vectorization factor as well?
>>
>> Btw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Thanks,
>> Richard.
>>
>>> Richard.
>>>
>>> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
>>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>> > >
>>> > >> Hi Richard,
>>> > >>
>>> > >> I did not understand your last remark:
>>> > >>
>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>> > >> >
>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>> > >> >   && dump_enabled_p ())
>>> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>>> > >> >"loop vectorized\n");
>>> > >> > -   vect_transform_loop (loop_vinfo);
>>> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>>> > >> > num_vectorized_loops++;
>>> > >> >/* Now that the loop has been vectorized, allow it to be 
>>> > >> > unrolled
>>> > >> >   etc.  */
>>> > >> >  loop->force_vectorize = false;
>>> > >> >
>>> > >> > +   /* Add new loop to a processing queue.  To make it easier
>>> > >> > +  to match loop and its epilogue vectorization in dumps
>>> > >> > +  put new loop as the next loop to process.  */
>>> > >> > +   if (new_loop)
>>> > >> > + {
>>> > >> > +   loops.safe_insert (i + 1, new_loop->num);
>>> > >> > +   vect_loops_num = number_of_loops (cfun);
>>> > >> > + }
>>> > >> >
>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>>> > >> f> unction which will set up stuff properly (and also perform
>>> > >> > the if-conversion of the epilogue there).
>>> > >> >
>>> > >> > That said, if we can get in non-masked epilogue vectorization
>>> > >> > separately that would be great.
>>> > >>
>>> > >> Could you please clarify your proposal.
>>> > >
>>> > > When a loop was vectorized set things up to immediately vectorize
>>> > > its epilogue, avoiding changing the loop iteration and avoiding
>>> > > the re-use of ->aux.
>>> > >
>>> > > Richard.
>>> > >
>>> > >> Thanks.
>>> > >> Yuri.
>>> > >>
>>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
>>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>> > >> >
>>> > >> >> Hi All,
>>> > >> >>
>>> > >> >> I re-send all patches sent by Ilya earlier for review which support
>>> > >> >> vectorization of loop epilogues and loops with low trip count. We
>>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was 
>>> > >> >> not
>>> > >> >> approved by Jeff.
>>> > >> >>
>>> > >> >> I did re-base of all patches and performed bootstrapping and
>>> > >> >> regression testing that did not show any new failures. Also all
>>> > >> >> changes related to new vect_do_peeling algorithm have been changed
>>> > >> >> accordingly.
>>> > >> >>
>>> > >

Re: [PATCH] shrink-wrap: New spread_components

2016-11-11 Thread Kyrill Tkachov



On 09/11/16 21:46, Segher Boessenkool wrote:

This patch changes spread_components to use a simpler algorithm that
puts prologue components as early as possible, and epilogue components
as late as possible.  This allows better scheduling, and also saves a
bit of code size.  The blocks that run with some specific component
enabled after this patch is a strict superset of those that had it
before the patch.

It does this by finding for every component the basic blocks where that
component is not needed on some path from the entry block (it reuses
head_components to store this), and similarly the blocks where the
component is not needed on some path to the exit block (or the exit can
not be reached from that block) (stored in tail_components).  Blocks
that then are in neither of those two sets get the component active.

Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?


This also passess bootstrap and regtest on aarch64-none-linux-gnu with
the hooks implemented [1]. It doesn't fix the gobmk performance regression that
I reported in that patch, but I'm working on improving that patch in other
ways so there's still benchmarking and analysis to do.

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00945.html




Segher


2016-11-09  Segher Boessenkool  

* shrink-wrap.c (init_separate_shrink_wrap): Do not clear
head_components and tail_components.
(spread_components): New algorithm.
(emit_common_tails_for_components): Clear head_components and
tail_components.
(insert_prologue_epilogue_for_components): Write extra output to the
dump file for sibcalls and abnormal exits.

---
  gcc/shrink-wrap.c | 181 +++---
  1 file changed, 146 insertions(+), 35 deletions(-)

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 4395d8a..e480d4d 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -1131,8 +1131,6 @@ init_separate_shrink_wrap (sbitmap components)
SW (bb)->head_components = sbitmap_alloc (SBITMAP_SIZE (components));
SW (bb)->tail_components = sbitmap_alloc (SBITMAP_SIZE (components));
bitmap_clear (SW (bb)->has_components);
-  bitmap_clear (SW (bb)->head_components);
-  bitmap_clear (SW (bb)->tail_components);
  }
  }
  
@@ -1253,48 +1251,151 @@ place_prologue_for_one_component (unsigned int which, basic_block head)

  }
  }
  
-/* Mark HAS_COMPONENTS for every block dominated by at least one block with

-   HAS_COMPONENTS set for the respective components, starting at HEAD.  */
+/* Set HAS_COMPONENTS in every block to the maximum it can be set to without
+   setting it on any path from entry to exit where it was not already set
+   somewhere (or, for blocks that have no path to the exit, consider only
+   paths from the entry to the block itself).  */
  static void
-spread_components (basic_block head)
+spread_components (sbitmap components)
  {
-  basic_block bb = head;
-  bool first_visit = true;
-  /* This keeps a tally of all components active.  */
-  sbitmap components = SW (head)->has_components;
+  basic_block entry_block = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (cfun);
  
-  for (;;)

+  /* A stack of all blocks left to consider, and a bitmap of all blocks
+ on that stack.  */
+  vec todo;
+  todo.create (n_basic_blocks_for_fn (cfun));
+  bitmap seen = BITMAP_ALLOC (NULL);
+
+  sbitmap old = sbitmap_alloc (SBITMAP_SIZE (components));
+
+  /* Find for every block the components that are *not* needed on some path
+ from the entry to that block.  Do this with a flood fill from the entry
+ block.  Every block can be visited at most as often as the number of
+ components (plus one), and usually much less often.  */
+
+  if (dump_file)
+fprintf (dump_file, "Spreading down...\n");
+
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+bitmap_clear (SW (bb)->head_components);
+
+  bitmap_copy (SW (entry_block)->head_components, components);
+
+  edge e;
+  edge_iterator ei;
+
+  todo.quick_push (single_succ (entry_block));
+  bitmap_set_bit (seen, single_succ (entry_block)->index);
+  while (!todo.is_empty ())
  {
-  if (first_visit)
-   {
- bitmap_ior (SW (bb)->has_components, SW (bb)->has_components,
- components);
+  bb = todo.pop ();
  
-	  if (first_dom_son (CDI_DOMINATORS, bb))

-   {
- components = SW (bb)->has_components;
- bb = first_dom_son (CDI_DOMINATORS, bb);
- continue;
-   }
-   }
+  bitmap_copy (old, SW (bb)->head_components);
  
-  components = SW (bb)->has_components;

+  FOR_EACH_EDGE (e, ei, bb->preds)
+   bitmap_ior (SW (bb)->head_components, SW (bb)->head_components,
+   SW (e->src)->head_components);
  
-  if (next_dom_son (CDI_DOMINATORS, bb))

+  bitmap_and_compl (SW (bb)->head_components, SW (bb)->head_comp

[PING] [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

2016-11-11 Thread Bernd Edlinger

Ping...

the latest version of the patch was here:
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00505.html

Thanks
Bernd.

On 11/02/16 22:15, Bernd Edlinger wrote:
> On 11/02/16 18:51, Jason Merrill wrote:
>> On 11/02/2016 02:11 AM, Bernd Edlinger wrote:
>>> On 11/01/16 19:15, Bernd Edlinger wrote:
 On 11/01/16 18:11, Jason Merrill wrote:
> On Tue, Nov 1, 2016 at 11:45 AM, Bernd Edlinger
>  wrote:
>> On 11/01/16 16:20, Jason Merrill wrote:
>>> On 10/17/2016 03:18 PM, Bernd Edlinger wrote:
>>> I'm not even sure we need a new warning.  Can we combine this
>>> warning
>>> with the block that currently follows?
>>
>> After 20 years of not having a warning on that,
>> an implicitly enabled warning would at least break lots of bogus
>> test cases.
>
> Would it, though?  Which test cases still break with the current
> patch?

 Less than before, but there are still at least a few of them.

 I can make a list and send it tomorrow.
>>>
>>> FAIL: g++.dg/cpp1y/lambda-generic-udt.C  -std=gnu++14 (test for excess
>>> errors)
>>> FAIL: g++.dg/cpp1y/lambda-generic-xudt.C  -std=gnu++14 (test for excess
>>> errors)
>>> FAIL: g++.dg/init/new15.C  -std=c++11 (test for excess errors)
>>> FAIL: g++.dg/init/new15.C  -std=c++14 (test for excess errors)
>>> FAIL: g++.dg/init/new15.C  -std=c++98 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/tc1/dr20.C  -std=c++11 (test for excess errors)
>>> FAIL: g++.dg/tc1/dr20.C  -std=c++14 (test for excess errors)
>>> FAIL: g++.dg/tc1/dr20.C  -std=c++98 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
>>> -flto-partition=1to1 -fno-use-linker-plugin
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
>>> -flto-partition=none -fuse-linker-plugin
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
>>> -fuse-linker-plugin -fno-fat-lto-objects
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
>>> -flto-partition=1to1 -fno-use-linker-plugin
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
>>> -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
>>> -fuse-linker-plugin
>>> FAIL: g++.dg/lto/pr68811 cp_lto_pr68811_0.o assemble, -O2
>>> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++98 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++98 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++98 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++98 (test for excess
>>> errors)
>>>
>>>
>>> The lto test case does emit the warning when assembling, but
>>> it still produces an executable and even executes it.
>>>
>>> Also g++.dg/cpp1y/lambda-generic-udt.C, g++.dg/tc1/dr20.C
>>> and g++.old-deja/g++.other/vbase5.C are execution tests.
>>>
>>> So I was wrong to assume these were all compile-only tests.
>>>
>>> I think that list should be fixable, if we decide to enable
>>> the warning by default.
>>
>> Yes, either by fixing the prototypes or disabling the warning.
>>
>
> Yes, I am inclined to enable the warning by default now.
>
> Most of the tes

[PATCH] Introduce -fdump-ipa-clones dump output

2016-11-11 Thread Martin Liška

Hello.

Motivation for the patch is to dump IPA clones that were created
by all inter-procedural optimizations. Usage of such input is to track
set of functions where a code from another function can eventually occur.
Usage of the dump file can be seen here: [1].

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

[1] https://github.com/marxin/kgraft-analysis-tool
>From 700b9833771a5b646d3db44014af81c007dd48f4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 14:23:30 +0100
Subject: [PATCH] Introduce -fdump-ipa-clones dump output

gcc/ChangeLog:

2016-11-11  Martin Liska  

	* cgraph.c (symbol_table::initialize): Initialize
	ipa_clones_dump_file.
	(cgraph_node::remove): Report to ipa_clones_dump_file.
	* cgraph.h: Add new argument (suffix) to cloning methods.
	* cgraphclones.c (dump_callgraph_transformation): New function.
	(cgraph_node::create_clone): New argument.
	(cgraph_node::create_virtual_clone): Likewise.
	(cgraph_node::create_version_clone): Likewise.
	* dumpfile.c: Add .ipa-clones dump file.
	* dumpfile.h (enum tree_dump_index): Add TDI_clones
	* ipa-inline-transform.c (clone_inlined_nodes): Report operation
	to dump_callgraph_transformation.
---
 gcc/cgraph.c   |  9 +
 gcc/cgraph.h   | 20 
 gcc/cgraphclones.c | 44 ++--
 gcc/dumpfile.c |  2 ++
 gcc/dumpfile.h |  1 +
 gcc/ipa-inline-transform.c |  3 +++
 6 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index b702a7c..867e371 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -263,6 +263,9 @@ symbol_table::initialize (void)
 {
   if (!dump_file)
 dump_file = dump_begin (TDI_cgraph, NULL);
+
+  if (!ipa_clones_dump_file)
+ipa_clones_dump_file = dump_begin (TDI_clones, NULL);
 }
 
 /* Allocate new callgraph node and insert it into basic data structures.  */
@@ -1815,6 +1818,12 @@ cgraph_node::remove (void)
   cgraph_node *n;
   int uid = this->uid;
 
+  if (symtab->ipa_clones_dump_file && symtab->cloned_nodes.contains (this))
+fprintf (symtab->ipa_clones_dump_file,
+	 "Callgraph removal;%s;%d;%s;%d;%d\n", asm_name (), order,
+	 DECL_SOURCE_FILE (decl), DECL_SOURCE_LINE (decl),
+	 DECL_SOURCE_COLUMN (decl));
+
   symtab->call_cgraph_removal_hooks (this);
   remove_callers ();
   remove_callees ();
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cc730d2..2d59291 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -906,13 +906,14 @@ public:
  If the new node is being inlined into another one, NEW_INLINED_TO should be
  the outline function the new one is (even indirectly) inlined to.
  All hooks will see this in node's global.inlined_to, when invoked.
- Can be NULL if the node is not inlined.  */
+ Can be NULL if the node is not inlined.  SUFFIX is string that is appended
+ to the original name.  */
   cgraph_node *create_clone (tree decl, gcov_type count, int freq,
 			 bool update_original,
 			 vec redirect_callers,
 			 bool call_duplication_hook,
 			 cgraph_node *new_inlined_to,
-			 bitmap args_to_skip);
+			 bitmap args_to_skip, const char *sufix = NULL);
 
   /* Create callgraph node clone with new declaration.  The actual body will
  be copied later at compilation stage.  */
@@ -933,11 +934,14 @@ public:
 
  If non-NULL BLOCK_TO_COPY determine what basic blocks
  was copied to prevent duplications of calls that are dead
- in the clone.  */
+ in the clone.
+
+ SUFFIX is string that is appended to the original name.  */
 
   cgraph_node *create_version_clone (tree new_decl,
 vec redirect_callers,
-bitmap bbs_to_copy);
+bitmap bbs_to_copy,
+const char *suffix = NULL);
 
   /* Perform function versioning.
  Function versioning includes copying of the tree and
@@ -2223,6 +2227,10 @@ public:
   /* Return symbol used to separate symbol name from suffix.  */
   static char symbol_suffix_separator ();
 
+  FILE* GTY ((skip)) ipa_clones_dump_file;
+
+  hash_set  GTY ((skip)) cloned_nodes;
+
 private:
   /* Allocate new callgraph node.  */
   inline cgraph_node * allocate_cgraph_symbol (void);
@@ -2313,6 +2321,10 @@ tree clone_function_name (tree decl, const char *);
 void tree_function_versioning (tree, tree, vec *,
 			   bool, bitmap, bool, bitmap, basic_block);
 
+void dump_callgraph_transformation (const cgraph_node *original,
+const cgraph_node *clone,
+const char *suffix);
+
 /* In cgraphbuild.c  */
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 686c289..349892d 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -381,6 +381,28 @@ cgraph_node::expand_all_artificial_thunks ()
   e = e->next_caller;
 }
 
+void
+dump_callgraph_transformation (c

Re: [PATCH] Add AVX512 k-mask intrinsics

2016-11-11 Thread Uros Bizjak

Some quick remarks:

+(define_insn "kmovb"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
+ (unspec:QI
+  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
+  "@
+   kmovb\t{%k1, %0|%0, %k1}
+   kmovb\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "QI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovd"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
+ (unspec:SI
+  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovd\t{%k1, %0|%0, %k1}
+   kmovd\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "SI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovq"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
+ (unspec:DI
+  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovq\t{%k1, %0|%0, %k1}
+   kmovq\t{%1, %0|%0, %1}
+   kmovq\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "DI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])

- kmovd (and existing kmovw) should be using register_operand for
opreand 0. In this case, there is no need for MEM_P checks at all.
- In the insn constraint, pease check TARGET_AVX before checking MEM_P.
- please put these definitions above corresponding *mov??_internal patterns.

+//case USI_FTYPE_UQI:
+//case USI_FTYPE_UHI:

No commented-out code without a good reason, please.

Uros.

Re: [PATCH, GCC/ARM, ping] Optional -mthumb for Thumb only targets

2016-11-11 Thread Kyrill Tkachov



On 08/11/16 13:36, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 25/10/16 18:07, Thomas Preudhomme wrote:

Hi,

Currently when a user compiles for a thumb-only target (such as Cortex-M
processors) without specifying the -mthumb option GCC throws the error "target
CPU does not support ARM mode". This is suboptimal from a usability point of
view: the -mthumb could be deduced from the -march or -mcpu option when there is
no ambiguity.

This patch implements this behavior by extending the DRIVER_SELF_SPECS to
automatically append -mthumb to the command line for thumb-only targets. It does
so by checking the last -march option if any is given or the last -mcpu option
otherwise. There is no ordering issue because conflicting -mcpu and -march is
already handled.

Note that the logic cannot be implemented in function arm_option_override
because we need to provide the modified command line to the GCC driver for
finding the right multilib path and the function arm_option_override is executed
too late for that effect.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-10-18  Terry Guo  
Thomas Preud'homme 

PR target/64802
* common/config/arm/arm-common.c (arm_target_thumb_only): New function.
* config/arm/arm-opts.h: Include arm-flags.h.
(struct arm_arch_core_flag): Define.
(arm_arch_core_flags): Define.
* config/arm/arm-protos.h: Include arm-flags.h.
(FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M, FL_MODE26, FL_MODE32,
FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED, FL_STRONG, FL_ARCH5E,
FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF, FL_ARCH6K, FL_THUMB2, FL_NOTM,
FL_THUMB_DIV, FL_VFPV3, FL_NEON, FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV,
FL_ARCH8, FL_CRC32, FL_SMALLMUL, FL_NO_VOLATILE_CE, FL_IWMMXT,
FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1, FL2_ARCH8_2, FL2_FP16INST,
FL_TUNE, FL_FOR_ARCH2, FL_FOR_ARCH3, FL_FOR_ARCH3M, FL_FOR_ARCH4,
FL_FOR_ARCH4T, FL_FOR_ARCH5, FL_FOR_ARCH5T, FL_FOR_ARCH5E,
FL_FOR_ARCH5TE, FL_FOR_ARCH5TEJ, FL_FOR_ARCH6, FL_FOR_ARCH6J,
FL_FOR_ARCH6K, FL_FOR_ARCH6Z, FL_FOR_ARCH6ZK, FL_FOR_ARCH6KZ,
FL_FOR_ARCH6T2, FL_FOR_ARCH6M, FL_FOR_ARCH7, FL_FOR_ARCH7A,
FL_FOR_ARCH7VE, FL_FOR_ARCH7R, FL_FOR_ARCH7M, FL_FOR_ARCH7EM,
FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A, FL2_FOR_ARCH8_2A, FL_FOR_ARCH8M_BASE,
FL_FOR_ARCH8M_MAIN, arm_feature_set, ARM_FSET_MAKE,
ARM_FSET_MAKE_CPU1, ARM_FSET_MAKE_CPU2, ARM_FSET_CPU1, ARM_FSET_CPU2,
ARM_FSET_EMPTY, ARM_FSET_ANY, ARM_FSET_HAS_CPU1, ARM_FSET_HAS_CPU2,
ARM_FSET_HAS_CPU, ARM_FSET_ADD_CPU1, ARM_FSET_ADD_CPU2,
ARM_FSET_DEL_CPU1, ARM_FSET_DEL_CPU2, ARM_FSET_UNION, ARM_FSET_INTER,
ARM_FSET_XOR, ARM_FSET_EXCLUDE, ARM_FSET_IS_EMPTY,
ARM_FSET_CPU_SUBSET): Move to ...
* config/arm/arm-flags.h: This new file.
* config/arm/arm.h (TARGET_MODE_SPEC_FUNCTIONS): Define.
(EXTRA_SPEC_FUNCTIONS): Add TARGET_MODE_SPEC_FUNCTIONS to its value.
(TARGET_MODE_SPECS): Define.
(DRIVER_SELF_SPECS): Add TARGET_MODE_SPECS to its value.


*** gcc/testsuite/ChangeLog ***

2016-10-11  Thomas Preud'homme 

PR target/64802
* gcc.target/arm/optional_thumb-1.c: New test.
* gcc.target/arm/optional_thumb-2.c: New test.
* gcc.target/arm/optional_thumb-3.c: New test.


No regression when running the testsuite for -mcpu=cortex-m0 -mthumb,
-mcpu=cortex-m0 -marm and -mcpu=cortex-a8 -marm

Is this ok for trunk?



This looks like a useful usability improvement.
This is ok after a bootstrap on an arm-none-linux-gnueabihf target.

Sorry for the delay,
Kyrill



Best regards,

Thomas

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-11 Thread Yuri Rumyantsev

Richard,

Here is fixed version of updated patch 3.

Any comments will be appreciated.

Thanks.
Yuri.

2016-11-11 17:15 GMT+03:00 Yuri Rumyantsev :
> Richard,
>
> Sorry for confusion but my updated patch  does not work properly, so I
> need to fix it.
>
> Yuri.
>
> 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev :
>> Richard,
>>
>> I prepare updated 3 patch with passing additional argument to
>> vect_analyze_loop as you proposed (untested).
>>
>> You wrote:
>> tw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Did you mean that I exclude all support for vectorization epilogues,
>> i.e. exclude from 2-nd patch all non-related changes
>> like
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 11863af..32011c1 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>
>> Did you mean also that new combined patch must be working patch, i.e.
>> can be integrated without other patches?
>>
>> Could you please look at updated patch?
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-10 15:36 GMT+03:00 Richard Biener :
>>> On Thu, 10 Nov 2016, Richard Biener wrote:
>>>
 On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:

 > Richard,
 >
 > Here is updated 3 patch.
 >
 > I checked that all new tests related to epilogue vectorization passed 
 > with it.
 >
 > Your comments will be appreciated.

 A lot better now.  Instead of the ->aux dance I now prefer to
 pass the original loops loop_vinfo to vect_analyze_loop as
 optional argument (if non-NULL we analyze the epilogue of that
 loop_vinfo).  OTOH I remember we mainly use it to get at the
 original vectorization factor?  So we can pass down an (optional)
 forced vectorization factor as well?
>>>
>>> Btw, I wonder if you can produce a single patch containing just
>>> epilogue vectorization, that is combine patches 1-3 but rip out
>>> changes only needed by later patches?
>>>
>>> Thanks,
>>> Richard.
>>>
 Richard.

 > 2016-11-08 15:38 GMT+03:00 Richard Biener :
 > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
 > >
 > >> Hi Richard,
 > >>
 > >> I did not understand your last remark:
 > >>
 > >> > That is, here (and avoid the FOR_EACH_LOOP change):
 > >> >
 > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
 > >> >   && dump_enabled_p ())
 > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
 > >> >"loop vectorized\n");
 > >> > -   vect_transform_loop (loop_vinfo);
 > >> > +   new_loop = vect_transform_loop (loop_vinfo);
 > >> > num_vectorized_loops++;
 > >> >/* Now that the loop has been vectorized, allow it to be 
 > >> > unrolled
 > >> >   etc.  */
 > >> >  loop->force_vectorize = false;
 > >> >
 > >> > +   /* Add new loop to a processing queue.  To make it easier
 > >> > +  to match loop and its epilogue vectorization in dumps
 > >> > +  put new loop as the next loop to process.  */
 > >> > +   if (new_loop)
 > >> > + {
 > >> > +   loops.safe_insert (i + 1, new_loop->num);
 > >> > +   vect_loops_num = number_of_loops (cfun);
 > >> > + }
 > >> >
 > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
 > >> f> unction which will set up stuff properly (and also perform
 > >> > the if-conversion of the epilogue there).
 > >> >
 > >> > That said, if we can get in non-masked epilogue vectorization
 > >> > separately that would be great.
 > >>
 > >> Could you please clarify your proposal.
 > >
 > > When a loop was vectorized set things up to immediately vectorize
 > > its epilogue, avoiding changing the loop iteration and avoiding
 > > the re-use of ->aux.
 > >
 > > Richard.
 > >
 > >> Thanks.
 > >> Yuri.
 > >>
 > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
 > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
 > >> >
 > >> >> Hi All,
 > >> >>
 > >> >> I re-send all patches sent by Ilya earlier for review which support
 > >> >> vectorization of loop epilogues and loops with low trip count. We
 > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was 
 > >> >> not
 > >> >> approved by Jeff

[PATCH, GCC] Recognize partial load of source expression on big endian targets

2016-11-11 Thread Thomas Preudhomme


Hi,

To fix PR69714, code was added to disable bswap when the resulting symbolic 
expression (a load or load + byte swap) is smaller than the source expression 
(eg. some of the bytes accessed in the source code gets bitwise ANDed with 0). 
As explained in [1], there was already two pieces of code written independently 
in bswap to deal with that case and that's the interaction of the two that 
caused the bug.


[1] https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00948.html

PR69714 proves that this pattern do occur in real code so this patch set out to 
reenable the optimization and remove the big endian adjustment in bswap_replace: 
the change in find_bswap_or_nop ensures that either we cancel the optimization 
or we don't and there is no need for offset adjustement. As explained in [2], 
the current code only support loss of bytes at the highest addresses because 
there is no code to adjust the address of the load. However, for little and big 
endian targets the bytes at highest address translate into different byte 
significance in the result. This patch first separate cmpxchg and cmpnop 
adjustement into 2 steps and then deal with endianness correctly for the second 
step.


[2] https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00119.html

Ideally we would want to still be able to do the adjustment to deal with load or 
load+bswap at an offset from the byte at lowest memory address accessed but this 
would require more code to recognize it properly for both little endian and big 
endian and will thus have to wait GCC 8 stage 1.


ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2016-11-10  Thomas Preud'homme  

* tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg
and cmpnop in two steps: first the ones not accessed in original gimple
expression in a endian independent way and then the ones not accessed
in the final result in an endian-specific way.
(bswap_replace): Stop doing big endian adjustment.


Testsuite does not show any regression on an armeb-none-eabi GCC cross-compiler 
targeting ARM Cortex-M3 and on an x86_64-linux-gnu bootstrapped native GCC 
compiler. Bootstrap on powerpc in progress.


Is this ok for trunk provided that the powerpc bootstrap succeeds?

Best regards,

Thomas
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index c315da88ce4feea1196a0416e4ea02e2a75a4377..b28c808c55489ae1ae16c173d66c561c1897e6ab 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2504,9 +2504,11 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number *n, int limit)
 static gimple *
 find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
 {
-  /* The number which the find_bswap_or_nop_1 result should match in order
- to have a full byte swap.  The number is shifted to the right
- according to the size of the symbolic number before using it.  */
+  unsigned rsize;
+  uint64_t tmpn, mask;
+/* The number which the find_bswap_or_nop_1 result should match in order
+   to have a full byte swap.  The number is shifted to the right
+   according to the size of the symbolic number before using it.  */
   uint64_t cmpxchg = CMPXCHG;
   uint64_t cmpnop = CMPNOP;
 
@@ -2527,28 +2529,38 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
 
   /* Find real size of result (highest non-zero byte).  */
   if (n->base_addr)
-{
-  unsigned HOST_WIDE_INT rsize;
-  uint64_t tmpn;
-
-  for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
-  if (BYTES_BIG_ENDIAN && n->range != rsize)
-	/* This implies an offset, which is currently not handled by
-	   bswap_replace.  */
-	return NULL;
-  n->range = rsize;
-}
+for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
+  else
+rsize = n->range;
 
-  /* Zero out the extra bits of N and CMP*.  */
+  /* Zero out the bits corresponding to untouched bytes in original gimple
+ expression.  */
   if (n->range < (int) sizeof (int64_t))
 {
-  uint64_t mask;
-
   mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
   cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
   cmpnop &= mask;
 }
 
+  /* Zero out the bits corresponding to unused bytes in the result of the
+ gimple expression.  */
+  if (rsize < n->range)
+{
+  if (BYTES_BIG_ENDIAN)
+	{
+	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
+	  cmpxchg &= mask;
+	  cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
+	}
+  else
+	{
+	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
+	  cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
+	  cmpnop &= mask;
+	}
+  n->range = rsize;
+}
+
   /* A complete byte swap should make the symbolic number to start with
  the largest digit in the highest order byte. Unchanged symbolic
  number indicates a read with same endianness as target architecture.  */
@@ -2636,26 +2648,6 @@ bswap_replace (g

Re: [PATCH] libgo: Fix GOARCH_INT64ALIGN

2016-11-11 Thread Ian Lance Taylor

On Sat, Nov 5, 2016 at 6:01 AM, Andreas Schwab  wrote:
> The alignment of int64 is 8 everywhere except m68k, where the biggest
> alignment is 2, and x86-32, where the biggest field alignment is 4.
> This fixes all select tests on powerpc -m32.
>
> Andreas.
>
> diff --git a/libgo/configure b/libgo/configure
> index 7a9df58c21..adabb74baa 100755
> --- a/libgo/configure
> +++ b/libgo/configure
> @@ -13648,7 +13648,6 @@ case ${host} in
>  GOARCH_FAMILY=ARM
>  GOARCH_CACHELINESIZE=32
>  GOARCH_PCQUANTUM=4
> -GOARCH_INT64ALIGN=4
>  GOARCH_MINFRAMESIZE=4
>  ;;
>i[34567]86-*-* | x86_64-*-*)
> @@ -13685,7 +13684,7 @@ rm -f core conftest.err conftest.$ac_objext 
> conftest.$ac_ext
>  GOARCH_BIGENDIAN=1
>  GOARCH_CACHELINESIZE=16
>  GOARCH_PCQUANTUM=4
> -GOARCH_INT64ALIGN=4
> +GOARCH_INT64ALIGN=2
>  ;;
>mips*-*-*)
>  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> @@ -13747,7 +13746,6 @@ rm -f core conftest.err conftest.$ac_objext 
> conftest.$ac_ext
>  case "$mips_abi" in
>  "o32" | "n32")
>  GOARCH_FAMILY=MIPS
> -   GOARCH_INT64ALIGN=4
> GOARCH_MINFRAMESIZE=4
>  ;;
>  "n64" | "o64")
> @@ -13778,7 +13776,6 @@ if ac_fn_c_try_compile "$LINENO"; then :
>GOARCH=ppc
>  GOARCH_FAMILY=PPC
>  GOARCH_BIGENDIAN=1
> -GOARCH_INT64ALIGN=4
>
>  else
>
> @@ -13816,7 +13813,6 @@ _ACEOF
>  if ac_fn_c_try_compile "$LINENO"; then :
>GOARCH=s390
>  GOARCH_FAMILY=S390
> -GOARCH_INT64ALIGN=4
>  GOARCH_MINFRAMESIZE=4
>
>  else
> diff --git a/libgo/configure.ac b/libgo/configure.ac
> index ed2edd3b69..09add8d136 100644
> --- a/libgo/configure.ac
> +++ b/libgo/configure.ac
> @@ -230,7 +230,6 @@ case ${host} in
>  GOARCH_FAMILY=ARM
>  GOARCH_CACHELINESIZE=32
>  GOARCH_PCQUANTUM=4
> -GOARCH_INT64ALIGN=4
>  GOARCH_MINFRAMESIZE=4
>  ;;
>  changequote(,)dnl
> @@ -262,7 +261,7 @@ GOARCH_HUGEPAGESIZE="1 << 21"
>  GOARCH_BIGENDIAN=1
>  GOARCH_CACHELINESIZE=16
>  GOARCH_PCQUANTUM=4
> -GOARCH_INT64ALIGN=4
> +GOARCH_INT64ALIGN=2
>  ;;
>mips*-*-*)
>  AC_COMPILE_IFELSE([
> @@ -296,7 +295,6 @@ GOARCH_HUGEPAGESIZE="1 << 21"
>  case "$mips_abi" in
>  "o32" | "n32")
>  GOARCH_FAMILY=MIPS
> -   GOARCH_INT64ALIGN=4
> GOARCH_MINFRAMESIZE=4
>  ;;
>  "n64" | "o64")
> @@ -323,7 +321,6 @@ GOARCH_HUGEPAGESIZE="1 << 21"
>  [GOARCH=ppc
>  GOARCH_FAMILY=PPC
>  GOARCH_BIGENDIAN=1
> -GOARCH_INT64ALIGN=4
>  ],
>  [
>  GOARCH_FAMILY=PPC64
> @@ -347,7 +344,6 @@ GOARCH_BIGENDIAN=1
>  #endif],
>  [GOARCH=s390
>  GOARCH_FAMILY=S390
> -GOARCH_INT64ALIGN=4
>  GOARCH_MINFRAMESIZE=4
>  ], [GOARCH=s390x
>  GOARCH_FAMILY=S390X
> --


Thanks.

Committed to mainline.

Ian

Re: [PATCH] libiberty: Add Rust symbol demangling.

2016-11-11 Thread Ian Lance Taylor

On Thu, Nov 3, 2016 at 10:39 AM, Mark Wielaard  wrote:
>
> include/ChangeLog:
>
> 2016-11-03  David Tolnay 
>Mark Wielaard  
>
>* demangle.h (DMGL_RUST): New macro.
>(DMGL_STYLE_MASK): Add DMGL_RUST.
>(demangling_styles): Add dlang_rust.
>(RUST_DEMANGLING_STYLE_STRING): New macro.
>(RUST_DEMANGLING): New macro.
>(rust_demangle): New prototype.
>(rust_is_mangled): Likewise.
>(rust_demangle_sym): Likewise.
>
> libiberty/ChangeLog:
>
> 2016-11-03  David Tolnay 
>Mark Wielaard  
>
>* Makefile.in (CFILES): Add rust-demangle.c.
>(REQUIRED_OFILES): Add rust-demangle.o.
>* cplus-dem.c (libiberty_demanglers): Add rust_demangling case.
>(cplus_demangle): Handle RUST_DEMANGLING.
>(rust_demangle): New function.
>* rust-demangle.c: New file.
>* testsuite/Makefile.in (really-check): Add check-rust-demangle.
>(check-rust-demangle): New rule.
>* testsuite/rust-demangle-expected: New file.

Are you completely confident that Rust mangling will never change to
start requiring more space in the demangled string?  If that could
ever happen, you have chosen an unfortunate API.

Has David Tolnay signed the FSF copyright agreement?  I don't see him
on the list.

Other than that, this patch looks OK.

Ian

Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-11 Thread Nathan Sidwell


On 11/11/2016 02:47 AM, Martin Liška wrote:


We use lists like for -fsanitize=address,undefined, however as -fprofile-update 
has only 3 (and passing 'single,atomic' does not make sense), I would prefer
to s/maybe-atomic/prefer-atomic. I guess handling the option list in gcc.c and 
doing substitutions would be very inconvenient.


ok.


--
Nathan Sidwell

Re: [PATCH] Support no newline in print_gimple_stmt

2016-11-11 Thread Martin Liška

On 11/11/2016 01:10 PM, Richard Biener wrote:
> On Thu, Nov 10, 2016 at 4:36 PM, Martin Liška  wrote:
>> I've just noticed that tree-ssa-dse wrongly prints a new line to dump file.
>> For the next stage1, I'll go through usages of print_gimple_stmt and remove
>> extra new lines like:
>>
>> gcc/auto-profile.c:  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>> gcc/auto-profile.c-  fprintf (dump_file, "\n");
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> Err, why not just remove the excess newlines (and drop the ' quotes)?

OK, let's do it simple ;) There's a new output:

  Deleted dead store: *p_2(D) = 0;

Ready to install the patch after it finishes regression tests?

Thanks,
Martin

> 
> Richard.
> 
>> Martin

>From c781ca49f502b205d55f411051a3e9881d2c9d7b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Nov 2016 15:51:22 +0100
Subject: [PATCH] Fix dump output in dse_optimize_stmt

gcc/ChangeLog:

2016-11-11  Martin Liska  

	* tree-ssa-dse.c (dse_optimize_stmt): Remove quotes and extra
	new line.
---
 gcc/tree-ssa-dse.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 372a0be..778b363 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -236,9 +236,9 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 		{
-		  fprintf (dump_file, "  Deleted dead call '");
+		  fprintf (dump_file, "  Deleted dead call: ");
 		  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
-		  fprintf (dump_file, "'\n");
+		  fprintf (dump_file, "\n");
 		}
 
 	  tree lhs = gimple_call_lhs (stmt);
@@ -292,9 +292,9 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 	{
-	  fprintf (dump_file, "  Deleted dead store '");
+	  fprintf (dump_file, "  Deleted dead store: ");
 	  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
-	  fprintf (dump_file, "'\n");
+	  fprintf (dump_file, "\n");
 	}
 
   /* Then we need to fix the operand of the consuming stmt.  */
-- 
2.10.1

Re: [PATCH] Add AVX512 k-mask intrinsics

2016-11-11 Thread Marc Glisse


On Fri, 11 Nov 2016, Andrew Senkevich wrote:


+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+}


(picking one random example)
Is a builtin really needed here? What would happen if you used

  return __A & __B;

?

--
Marc Glisse

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-11 Thread Kyrill Tkachov



On 11/11/16 10:17, Kyrill Tkachov wrote:


On 10/11/16 23:39, Segher Boessenkool wrote:

On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:

On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
some interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%


Wow, this looks really good.  Thank you for implementing this.  If I
get some time I am going to try it out on other processors than A72
but I doubt I have time any time soon.

I'd love to hear what causes the slowdown for gobmk as well, btw.


I haven't yet gotten a direct answer for that (through performance analysis 
tools)
but I have noticed that load/store pairs are not generated as aggressively as I 
hoped.
They are being merged by the sched fusion pass and peepholes (which runs after 
this)
but it still misses cases. I've hacked the SWS hooks to generate pairs 
explicitly and that
increases the number of pairs and helps code size to boot. It complicates the 
logic of
the hooks a bit but not too much.

I'll make those changes and re-benchmark, hopefully that
will help performance.



And here's a version that explicitly emits pairs. I've looked at assembly 
codegen on SPEC2006
and it generates quite a few more LDP/STP pairs than the original version.
I kicked off benchmarks over the weekend to see the effect.
Andrew, if you want to try it out (more benchmarking and testing always 
welcome) this is the
one to try.

Thanks,
Kyrill




Thanks,
Kyrill



Segher




commit bedb71d6f6f772eed33ba35e93cc4104326675da
Author: Kyrylo Tkachov 
Date:   Tue Oct 11 09:25:54 2016 +0100

[AArch64] Separate shrink wrapping hooks implementation

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 325e725..15b5bdf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1138,7 +1138,7 @@ aarch64_is_extend_from_extract (machine_mode mode, rtx mult_imm,
 
 /* Emit an insn that's a simple single-set.  Both the operands must be
known to be valid.  */
-inline static rtx
+inline static rtx_insn *
 emit_set_insn (rtx x, rtx y)
 {
   return emit_insn (gen_rtx_SET (x, y));
@@ -3135,6 +3135,9 @@ aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset,
 	  || regno == cfun->machine->frame.wb_candidate2))
 	continue;
 
+  if (cfun->machine->reg_is_wrapped_separately[regno])
+   continue;
+
   reg = gen_rtx_REG (mode, regno);
   offset = start_offset + cfun->machine->frame.reg_offset[regno];
   mem = gen_mem_ref (mode, plus_constant (Pmode, stack_pointer_rtx,
@@ -3143,6 +3146,7 @@ aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset,
   regno2 = aarch64_next_callee_save (regno + 1, limit);
 
   if (regno2 <= limit
+	  && !cfun->machine->reg_is_wrapped_separately[regno2]
 	  && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD)
 	  == cfun->machine->frame.reg_offset[regno2]))
 
@@ -3191,6 +3195,9 @@ aarch64_restore_callee_saves (machine_mode mode,
regno <= limit;
regno = aarch64_next_callee_save (regno + 1, limit))
 {
+  if (cfun->machine->reg_is_wrapped_separately[regno])
+   continue;
+
   rtx reg, mem;
 
   if (skip_wb
@@ -3205,6 +3212,7 @@ aarch64_restore_callee_saves (machine_mode mode,
   regno2 = aarch64_next_callee_save (regno + 1, limit);
 
   if (regno2 <= limit
+	  && !cfun->machine->reg_is_wrapped_separately[regno2]
 	  && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD)
 	  == cfun->machine->frame.reg_offset[regno2]))
 	{
@@ -3224,6 +3232,273 @@ aarch64_restore_callee_saves (machine_mode mode,
 }
 }
 
+static inline bool
+offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
+			   HOST_WIDE_INT offset)
+{
+  return offset >= -256 && offset < 256;
+}
+
+static inline bool
+offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
+{
+  return (offset >= 0
+	  && offset < 4096 * GET_MODE_SIZE (mode)
+	  && offset % GET_MODE_SIZE (mode) == 0);
+}
+
+bool
+aarch64_offset_7bit_signed_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
+{
+  return (offset >= -64 * GET_MODE_SIZE (mode)
+	  && offset < 64 * GET_MODE_SIZE (mode)
+	  && offset % GET_MODE_SIZE (mode) == 0);
+}
+
+/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
+
+static sbitmap
+aarch64_get_separate_components (void)
+{
+  aarch64_layout_frame ();
+
+  sbitmap components = sbitmap_alloc (V31_REGNUM + 1);
+  bitmap_clear (components);
+
+  /* The registers we need saved to the frame.  */
+  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
+if (aarch64_register_saved_on_entry (regno))
+  {
+	HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
+	if (!frame_pointer_needed)
+	  offset += cfun->machine->frame.frame_size
+		- cfun->machine->frame.hard_fp_offset;
+	/* Check that we can access the st

[PATCH][PR sanitizer/78307] Fix missing symbols in libubsan after recent merge.

2016-11-11 Thread Maxim Ostapenko


Hi,

this patch fixes PR sanitizer/78307 by adding removed by last merge 
(although unused in GCC) interface functions:


__ubsan_handle_cfi_bad_icall
__ubsan_handle_cfi_bad_icall_abort
__ubsan_handle_cfi_bad_type
__ubsan_handle_cfi_bad_type_abort

Just added missed stubs via corresponding arguments translation logic.
I've also added new libsanitizer/LOCAL_PATCHES file to track GCC local 
changes in libsanitizer.


The abidiff output now looks like this:

Functions changes summary: 0 Removed, 0 Changed (1 filtered out), 7 
Added functions

Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 0 Removed, 10 Added function symbols 
not referenced by debug info
Variable symbols changes summary: 0 Removed, 0 Added variable symbol not 
referenced by debug info


7 Added functions:

  'function void __sanitizer_cov_trace_pc_guard_init()' 
{__sanitizer_cov_trace_pc_guard_init}
  'function int __sanitizer_install_malloc_and_free_hooks(void (typedef 
__sanitizer::uptr)*, void ()*)' {__sanitizer_install_malloc_and_free_hooks}
  'function void __sanitizer_set_report_fd(void*)' 
{__sanitizer_set_report_fd}
  'function void __sanitizer_symbolize_global(__sanitizer::uptr, const 
char*, char*, __sanitizer::uptr)' {__sanitizer_symbolize_global}
  'function void __sanitizer_symbolize_pc(__sanitizer::uptr, const 
char*, char*, __sanitizer::uptr)'{__sanitizer_symbolize_pc}
  'function void 
__ubsan::__ubsan_handle_cfi_check_fail(__ubsan::CFICheckFailData*, 
__ubsan::ValueHandle, __sanitizer::uptr)' {__ubsan_handle_cfi_check_fail}
  'function void 
__ubsan::__ubsan_handle_cfi_check_fail_abort(__ubsan::CFICheckFailData*, 
__ubsan::ValueHandle, __sanitizer::uptr)' 
{__ubsan_handle_cfi_check_fail_abort}


10 Added function symbols not referenced by debug info:

  __sanitizer_cov_trace_cmp1
  __sanitizer_cov_trace_cmp2
  __sanitizer_cov_trace_cmp4
  __sanitizer_cov_trace_cmp8
  __sanitizer_cov_trace_div4
  __sanitizer_cov_trace_div8
  __sanitizer_cov_trace_gep
  __sanitizer_cov_trace_pc_guard
  __sanitizer_cov_trace_pc_indir
  internal_sigreturn

Tested on x86_64-unknown-linux-gnu. OK for mainline?

-Maxim
libsanitizer/ChangeLog:

2016-11-11  Maxim Ostapenko  

	PR sanitizer/78307
	* ubsan/ubsan_handlers.cc (__ubsan_handle_cfi_bad_icall): New function.
	( __ubsan_handle_cfi_bad_icall_abort): Likewise. 
	* ubsan/ubsan_handlers.h (struct CFIBadIcallData): New type.
	* ubsan/ubsan_handlers_cxx.cc (__ubsan_handle_cfi_bad_type): New
	function.
	(__ubsan_handle_cfi_bad_type_abort): Likewise.
	* ubsan/ubsan_handlers_cxx.h (struct CFIBadTypeData): New type.
	(__ubsan_handle_cfi_bad_type): Export function.
	(__ubsan_handle_cfi_bad_type_abort): Likewise.
	* LOCAL_PATCHES: New file.
	* HOWTO_MERGE: Update documentation.

diff --git a/libsanitizer/HOWTO_MERGE b/libsanitizer/HOWTO_MERGE
index d0eca40..81121aa 100644
--- a/libsanitizer/HOWTO_MERGE
+++ b/libsanitizer/HOWTO_MERGE
@@ -11,7 +11,8 @@ general list of actions required to perform the merge:
   in corresponding CMakeLists.txt and config-ix.cmake files from compiler-rt source
   directory.
 * Apply all needed GCC-specific patches to libsanitizer (note that some of
-  them might be already included to upstream).
+  them might be already included to upstream).  The list of these patches is stored
+  into LOCAL_PATCHES file.
 * Apply all necessary compiler changes.  Be especially careful here, you must
   not break ABI between compiler and library.  You can reveal these changes by
   inspecting the history of AddressSanitizer.cpp and ThreadSanitizer.cpp files
@@ -37,3 +38,4 @@ general list of actions required to perform the merge:
   in libasan, configure/Makefile changes). The review process has O(N^2) complexity, so you
   would simplify and probably speed up the review process by doing this.
 * Send your patches for review to GCC Patches Mailing List (gcc-patches@gcc.gnu.org).
+* Update LOCAL_PATCHES file when you've committed the whole patch set with new revisions numbers.
diff --git a/libsanitizer/ubsan/ubsan_handlers.cc b/libsanitizer/ubsan/ubsan_handlers.cc
index 0e343d3..5631e45 100644
--- a/libsanitizer/ubsan/ubsan_handlers.cc
+++ b/libsanitizer/ubsan/ubsan_handlers.cc
@@ -558,6 +558,21 @@ static void HandleCFIBadType(CFICheckFailData *Data, ValueHandle Vtable,
 #endif
 }  // namespace __ubsan
 
+void __ubsan::__ubsan_handle_cfi_bad_icall(CFIBadIcallData *CallData,
+   ValueHandle Function) {
+  GET_REPORT_OPTIONS(false);
+  CFICheckFailData Data = {CFITCK_ICall, CallData->Loc, CallData->Type};
+  handleCFIBadIcall(&Data, Function, Opts);
+}
+
+void __ubsan::__ubsan_handle_cfi_bad_icall_abort(CFIBadIcallData *CallData,
+ ValueHandle Function) {
+  GET_REPORT_OPTIONS(true);
+  CFICheckFailData Data = {CFITCK_ICall, CallData->Loc, CallData->Type};
+  handleCFIBadIcall(&Data, Function, Opts);
+  Die();
+}
+
 void __ubsan::__ubsan_handle_cfi_check_

Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-11-11 Thread Alexander Monakov

On Fri, 11 Nov 2016, Bernd Schmidt wrote:
> On 10/19/2016 12:39 PM, Bernd Schmidt wrote:
> > I'll refrain from any further comments on the topic. The ptx patches
> > don't look unreasonable iff someone else decides that this version of
> > OpenMP support should be merged and I'll look into them in more detail
> > if that happens. Patch 2/8 is ok now.
> 
> Sounds like Jakub has made that decision. So I'll get out of the way and just
> approve all these.

For the avoidance of doubt, is this a statement of intent, or an actual approval
for the patchset?

After these backend modifications and the rest of libgomp/middle-end changes are
applied, trunk will need the following flip-the-switch patch to allow OpenMP
offloading for NVPTX.  OK?

Thanks.
Alexander

PR target/67822
* config/nvptx/mkoffload.c (main): Allow -fopenmp.

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index c8eed45..e99ef37 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -517,8 +524,8 @@ main (int argc, char **argv)
 fatal_error (input_location, "cannot open '%s'", ptx_cfile_name);

   /* PR libgomp/65099: Currently, we only support offloading in 64-bit
- configurations.  PR target/67822: OpenMP offloading to nvptx fails.  */
-  if (offload_abi == OFFLOAD_ABI_LP64 && !fopenmp)
+ configurations.  */
+  if (offload_abi == OFFLOAD_ABI_LP64)
 {
   ptx_name = make_temp_file (".mkoffload");
   obstack_ptr_grow (&argv_obstack, "-o");

[Patch v4 0/17] Add support for _Float16 to AArch64 and ARM

2016-11-11 Thread James Greenhalgh


Hi,

This patch set enables the _Float16 type specified in ISO/IEC TS 18661-3
for AArch64 and ARM. The patch set has been posted over the past two months,
with many of the target-independent changes approved. I'm reposting it in
entirity in the form I hope to commit it to trunk.

The patch set can be roughly split in three; first, hookization of
TARGET_FLT_EVAL_METHOD, and changes to the excess precision logic in the
compiler to handle the new values for FLT_EVAL_METHOD defined in
ISO/IEC TS-18661-3. Second, the AArch64 changes required to enable _Float16,
and finally the ARM changes required to enable _Float16.

The broad goals and an outline of each patch in the patch set were
described in https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02383.html .
As compared to the original submission, the patch set has grown an ARM
port, and has had several rounds of technical review on the target
independent aspects.

This has resulted in many of the patches already being approved, a full
summary of the status of each ticket is immediately below.

Clearly the focus for review of this patch set now needs to be the AArch64
and ARM ports, I hope the appropriate maintainers will be able to do so in
time for the patch set to be accepted for GCC 7.

I've built and tested the full patch set on ARM (cross and native),
AArch64 (cross and native) and x86_64 (native) with no identified issues.

Thanks,
James

--
Target independent changes

10 patches, 9 previously approved, 1 New implementing testsuite
changes to enable _Float16 tests in more circumstances on ARM.
--

[Patch 1/17] Add a new target hook for describing excess precision intentions

  Approved: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00781.html

[Patch 2/17] Implement TARGET_C_EXCESS_PRECISION for i386

  Blanket approved by Jeff in:
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html

[Patch 3/17] Implement TARGET_C_EXCESS_PRECISION for s390

  Approved: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01554.html

[Patch 4/17] Implement TARGET_C_EXCESS_PRECISION for m68k

  Blanket approved by Jeff in:
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html
  And by Andreas: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02414.html

  There was a typo in the original patch, fixed in:
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01173.html
  which I would apply as an "obvious" fix to the original patch.

[Patch 5/17] Add -fpermitted-flt-eval-methods=[c11|ts-18661-3]

  Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02405.html

  Joseph had a comment in
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00335.html that the tests
  should check FLT_EVAL_METHOD from  rather than
  __FLT_EVAL_METHOD__. Rather than implement that suggestion, I added tests
  to patch 6 which tested the  macro, and left the tests in this
  patch testing the internal macro.

[Patch 6/17] Migrate excess precision logic to use TARGET_EXCESS_PRECISION

  Approved (after removing a rebase bug):
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00231.html

[Patch 7/17] Delete TARGET_FLT_EVAL_METHOD and poison it.

  Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02401.html

[Patch 8/17] Make _Float16 available if HFmode is available

  Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02403.html

[Patch libgcc 9/17] Update soft-fp from glibc

  Self approved under policy that we can update libraries which GCC mirrors
  without further approval.

[Patch testsuite patch 10/17] Add options for floatN when checking effective 
target for support

  NEW!


AArch64 changes

3 patches, none reviewed


[Patch AArch64 11/17] Add floatdihf2 and floatunsdihf2 patterns

  Not reviewed, last pinged (^6):
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00584.html

[Patch libgcc AArch64 12/17] Enable hfmode soft-float conversions and 
truncations

  Not reviewed:
  https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02395.html

[Patch AArch64 13/17] Enable _Float16 for AArch64

  Not reviewed:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01176.html


ARM changes

4 patches, none OK'ed

[Patch 14/17] [libgcc, ARM] Generalise float-to-half conversion function.

  Respun in this submission to avoid a bug identified during extended
  multilib testing, original submission here:
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00714.html

[Patch 15/17 libgcc] Add double to half conversions.

  Reviewed, but not approved:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01975.html

  Changes were required to patch 16/17, and patch 10/17 was introduced to
  implement the suggested testsuite patch. The original patch still stands,
  and needs reviewed:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01914.html

[Patch 16/17 ARM] Half to double precision conversions

  Resubmitted: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00233.html
  Needs review.

[Patch ARM 17/17] Enable _Float16 for

[Patch 3/17] Implement TARGET_C_EXCESS_PRECISION for s390

2016-11-11 Thread James Greenhalgh


---
This patch has already been approved:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01554.html
---

This patch ports the logic from s390's TARGET_FLT_EVAL_METHOD to the new
target hook TARGET_C_EXCESS_PRECISION, following the guidance in this thread
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01172.html

In this patch we handle EXCESS_PRECISION_TYPE_IMPLICIT like
EXCESS_PRECISION_TYPE_FAST. This has the consequence of meaning float_t in
glibc is calculated in a different precision (double) to that indicated by
FLT_EVAL_METHOD.

---
gcc/

2016-11-09  James Greenhalgh  

* config/s390/s390.c (s390_excess_precision): New.
(TARGET_C_EXCESS_PRECISION): Define.

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3f98cd8..311a7f2 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -15260,6 +15260,43 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
   return NULL;
 }
 
+/* Implement TARGET_C_EXCESS_PRECISION.
+
+   FIXME: For historical reasons, float_t and double_t are typedef'ed to
+   double on s390, causing operations on float_t to operate in a higher
+   precision than is necessary.  However, it is not the case that SFmode
+   operations have implicit excess precision, and we generate more optimal
+   code if we let the compiler know no implicit extra precision is added.
+
+   That means when we are compiling with -fexcess-precision=fast, the value
+   we set for FLT_EVAL_METHOD will be out of line with the actual precision of
+   float_t (though they would be correct for -fexcess-precision=standard).
+
+   A complete fix would modify glibc to remove the unnecessary typedef
+   of float_t to double.  */
+
+static enum flt_eval_method
+s390_excess_precision (enum excess_precision_type type)
+{
+  switch (type)
+{
+  case EXCESS_PRECISION_TYPE_IMPLICIT:
+  case EXCESS_PRECISION_TYPE_FAST:
+	/* The fastest type to promote to will always be the native type,
+	   whether that occurs with implicit excess precision or
+	   otherwise.  */
+	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+  case EXCESS_PRECISION_TYPE_STANDARD:
+	/* Otherwise, when we are in a standards compliant mode, to
+	   ensure consistency with the implementation in glibc, report that
+	   float is evaluated to the range and precision of double.  */
+	return FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE;
+  default:
+	gcc_unreachable ();
+}
+  return FLT_EVAL_METHOD_UNPREDICTABLE;
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -15320,6 +15357,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
 #define TARGET_ASM_CAN_OUTPUT_MI_THUNK hook_bool_const_tree_hwi_hwi_const_tree_true
 
+#undef TARGET_C_EXCESS_PRECISION
+#define TARGET_C_EXCESS_PRECISION s390_excess_precision
+
 #undef  TARGET_SCHED_ADJUST_PRIORITY
 #define TARGET_SCHED_ADJUST_PRIORITY s390_adjust_priority
 #undef TARGET_SCHED_ISSUE_RATE

[Patch 2/17] Implement TARGET_C_EXCESS_PRECISION for i386

2016-11-11 Thread James Greenhalgh


---
This patch has been approved:
  https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html
---

This patch ports the logic from i386's TARGET_FLT_EVAL_METHOD to the new
target hook TARGET_C_EXCESS_PRECISION.

Bootstrapped and tested with no issues.

OK?

Thanks,
James

---
gcc/

2016-11-09  James Greenhalgh  

* config/i386/i386.c (ix86_excess_precision): New.
(TARGET_C_EXCESS_PRECISION): Define.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..794b149 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -50634,6 +50634,44 @@ ix86_expand_divmod_libfunc (rtx libfunc, machine_mode mode,
   *rem_p = rem;
 }
 
+/* Set the value of FLT_EVAL_METHOD in float.h.  When using only the
+   FPU, assume that the fpcw is set to extended precision; when using
+   only SSE, rounding is correct; when using both SSE and the FPU,
+   the rounding precision is indeterminate, since either may be chosen
+   apparently at random.  */
+
+static enum flt_eval_method
+ix86_excess_precision (enum excess_precision_type type)
+{
+  switch (type)
+{
+  case EXCESS_PRECISION_TYPE_FAST:
+	/* The fastest type to promote to will always be the native type,
+	   whether that occurs with implicit excess precision or
+	   otherwise.  */
+	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+  case EXCESS_PRECISION_TYPE_STANDARD:
+  case EXCESS_PRECISION_TYPE_IMPLICIT:
+	/* Otherwise, the excess precision we want when we are
+	   in a standards compliant mode, and the implicit precision we
+	   provide can be identical.  */
+	if (!TARGET_80387)
+	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+	else if (TARGET_MIX_SSE_I387)
+	  return FLT_EVAL_METHOD_UNPREDICTABLE;
+	else if (!TARGET_SSE_MATH)
+	  return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
+	else if (TARGET_SSE2)
+	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+	else
+	  return FLT_EVAL_METHOD_UNPREDICTABLE;
+  default:
+	gcc_unreachable ();
+}
+
+  return FLT_EVAL_METHOD_UNPREDICTABLE;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -50865,6 +50903,8 @@ ix86_run_selftests (void)
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST ix86_md_asm_adjust
 
+#undef TARGET_C_EXCESS_PRECISION
+#define TARGET_C_EXCESS_PRECISION ix86_excess_precision
 #undef TARGET_PROMOTE_PROTOTYPES
 #define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
 #undef TARGET_SETUP_INCOMING_VARARGS

[Patch 1/17] Add a new target hook for describing excess precision intentions

2016-11-11 Thread James Greenhalgh


---
This patch has already been approved:

  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00781.html
---

This patch introduces TARGET_C_EXCESS_PRECISION. This hook takes a tri-state
argument, one of EXCESS_PRECISION_TYPE_IMPLICIT,
EXCESS_PRECISION_TYPE_STANDARD, EXCESS_PRECISION_TYPE_FAST. Which relate to
the implicit extra precision added by the target, the excess precision that
should be guaranteed for -fexcess-precision=standard, and the excess
precision that should be added for performance under -fexcess-precision=fast .

Thanks
James

---
gcc/

2016-11-09  James Greenhalgh  

* target.def (excess_precision): New hook.
* target.h (flt_eval_method): New.
(excess_precision_type): Likewise.
* targhooks.c (default_excess_precision): New.
* targhooks.h (default_excess_precision): New.
* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): New.
* doc/tm.texi: Regenerate.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 869f858..09f8213 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -331,6 +331,24 @@ enum symbol_visibility
   VISIBILITY_INTERNAL
 };
 
+/* enums used by the targetm.excess_precision hook.  */
+
+enum flt_eval_method
+{
+  FLT_EVAL_METHOD_UNPREDICTABLE = -1,
+  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT = 0,
+  FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE = 1,
+  FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE = 2,
+  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 = 16
+};
+
+enum excess_precision_type
+{
+  EXCESS_PRECISION_TYPE_IMPLICIT,
+  EXCESS_PRECISION_TYPE_STANDARD,
+  EXCESS_PRECISION_TYPE_FAST
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 85341ae..fecb08c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -947,6 +947,10 @@ sign-extend the result to 64 bits.  On such machines, set
 Do not define this macro if it would never modify @var{m}.
 @end defmac
 
+@deftypefn {Target Hook} {enum flt_eval_method} TARGET_C_EXCESS_PRECISION (enum excess_precision_type @var{type})
+Return a value, with the same meaning as @code{FLT_EVAL_METHOD} C that describes which excess precision should be applied.  @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT}, @code{EXCESS_PRECISION_TYPE_FAST}, or @code{EXCESS_PRECISION_TYPE_STANDARD}.  For @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which precision and range operations will be implictly evaluated in regardless of the excess precision explicitly added.  For @code{EXCESS_PRECISION_TYPE_STANDARD} and @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the explicit excess precision that should be added depending on the value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_PROMOTE_FUNCTION_MODE (const_tree @var{type}, machine_mode @var{mode}, int *@var{punsignedp}, const_tree @var{funtype}, int @var{for_return})
 Like @code{PROMOTE_MODE}, but it is applied to outgoing function arguments or
 function return values.  The target hook should return the new mode
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 400d574..03758b5 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -921,6 +921,8 @@ sign-extend the result to 64 bits.  On such machines, set
 Do not define this macro if it would never modify @var{m}.
 @end defmac
 
+@hook TARGET_C_EXCESS_PRECISION
+
 @hook TARGET_PROMOTE_FUNCTION_MODE
 
 @defmac PARM_BOUNDARY
diff --git a/gcc/target.def b/gcc/target.def
index caeeff9..bf4fb29 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5452,6 +5452,23 @@ DEFHOOK_UNDOC
  machine_mode, (char c),
  default_mode_for_suffix)
 
+DEFHOOK
+(excess_precision,
+ "Return a value, with the same meaning as @code{FLT_EVAL_METHOD} C that\
+ describes which excess precision should be applied.  @var{type} is\
+ either @code{EXCESS_PRECISION_TYPE_IMPLICIT},\
+ @code{EXCESS_PRECISION_TYPE_FAST}, or\
+ @code{EXCESS_PRECISION_TYPE_STANDARD}.  For\
+ @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which\
+ precision and range operations will be implictly evaluated in regardless\
+ of the excess precision explicitly added.  For\
+ @code{EXCESS_PRECISION_TYPE_STANDARD} and\
+ @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the\
+ explicit excess precision that should be added depending on the\
+ value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.",
+ enum flt_eval_method, (enum excess_precision_type type),
+ default_excess_precision)
+
 HOOK_VECTOR_END (c)
 
 /* Functions specific to the C++ frontend.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 866747a..73e1c25 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2135,4 +2135,12 @@ default_min_arithmetic_precision (void)
   return WORD_REGISTER_OPERATIONS ? BITS_PER_WORD : BITS_PER_UNIT;
 }
 
+/* Default implementation of TARGET_

[Patch 4/17] Implement TARGET_C_EXCESS_PRECISION for m68k

2016-11-11 Thread James Greenhalgh


---
This patch was approved in the original form, and the delta to here would
apply under the obvious rule:
  https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02414.html
---

On Fri, Sep 30, 2016 at 11:28:28AM -0600, Jeff Law wrote:
> On 09/30/2016 11:01 AM, James Greenhalgh wrote:
> >
> >Hi,
> >
> >This patch ports the logic from m68k's TARGET_FLT_EVAL_METHOD to the new
> >target hook TARGET_C_EXCESS_PRECISION.
> >
> >Patch tested by building an m68k-none-elf toolchain and running
> >m68k.exp (without the ability to execute) with no regressions, and manually
> >inspecting the output assembly code when compiling
> >testsuite/gcc.target/i386/excess-precision* to show no difference in
> >code-generation.
> >
> >OK?
> >
> >Thanks,
> >James
> >
> >---
> >gcc/
> >
> >2016-09-30  James Greenhalgh  
> >
> > * config/m68k/m68k.c (m68k_excess_precision): New.
> > (TARGET_C_EXCESS_PRECISION): Define.
> OK when prereqs are approved.  Similarly for other targets where you
> needed to add this hook.

Thanks Jeff, Andreas,

I spotted a very silly bug when I was retesting this patch set - when I
swapped the namespace for the new traget macro it changed from
TARGET_EXCESS_PRECISION to TARGET_C_EXCESS_PRECISION but I failed to
update the m68k patch to reflect that.

This second revision fixes that (obvious) oversight.

Thanks,
James

---
gcc/

2016-11-09  James Greenhalgh  

* config/m68k/m68k.c (m68k_excess_precision): New.
(TARGET_C_EXCESS_PRECISION): Define.

diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index ce56692..22165d6 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -183,6 +183,8 @@ static rtx m68k_function_arg (cumulative_args_t, machine_mode,
 static bool m68k_cannot_force_const_mem (machine_mode mode, rtx x);
 static bool m68k_output_addr_const_extra (FILE *, rtx);
 static void m68k_init_sync_libfuncs (void) ATTRIBUTE_UNUSED;
+static enum flt_eval_method
+m68k_excess_precision (enum excess_precision_type);
 
 /* Initialize the GCC target structure.  */
 
@@ -323,6 +325,9 @@ static void m68k_init_sync_libfuncs (void) ATTRIBUTE_UNUSED;
 #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA m68k_output_addr_const_extra
 
+#undef TARGET_C_EXCESS_PRECISION
+#define TARGET_C_EXCESS_PRECISION m68k_excess_precision
+
 /* The value stored by TAS.  */
 #undef TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
 #define TARGET_ATOMIC_TEST_AND_SET_TRUEVAL 128
@@ -6532,4 +6537,36 @@ m68k_epilogue_uses (int regno ATTRIBUTE_UNUSED)
 	  == m68k_fk_interrupt_handler));
 }
 
+
+/* Implement TARGET_C_EXCESS_PRECISION.
+
+   Set the value of FLT_EVAL_METHOD in float.h.  When using 68040 fp
+   instructions, we get proper intermediate rounding, otherwise we
+   get extended precision results.  */
+
+static enum flt_eval_method
+m68k_excess_precision (enum excess_precision_type type)
+{
+  switch (type)
+{
+  case EXCESS_PRECISION_TYPE_FAST:
+	/* The fastest type to promote to will always be the native type,
+	   whether that occurs with implicit excess precision or
+	   otherwise.  */
+	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+  case EXCESS_PRECISION_TYPE_STANDARD:
+  case EXCESS_PRECISION_TYPE_IMPLICIT:
+	/* Otherwise, the excess precision we want when we are
+	   in a standards compliant mode, and the implicit precision we
+	   provide can be identical.  */
+	if (TARGET_68040 || ! TARGET_68881)
+	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+
+	return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
+  default:
+	gcc_unreachable ();
+}
+  return FLT_EVAL_METHOD_UNPREDICTABLE;
+}
+
 #include "gt-m68k.h"

[Patch 5/17] Add -fpermitted-flt-eval-methods=[c11|ts-18661-3]

2016-11-11 Thread James Greenhalgh


---
This patch was approved here:
  https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02405.html

Joseph had a comment in
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00335.html that the tests
should check FLT_EVAL_METHOD from  rather than
__FLT_EVAL_METHOD__. Rather than implement that suggestion, I added tests
to patch 6 which tested the  macro, and left the tests in this
patch testing the internal macro.
---

Hi,

This option is added to control which values of FLT_EVAL_METHOD the
compiler is allowed to set.

ISO/IEC TS 18661-3 defines new permissible values for
FLT_EVAL_METHOD that indicate that operations and constants with
a semantic type that is an interchange or extended format should be
evaluated to the precision and range of that type.  These new values are
a superset of those permitted under C99/C11, which does not specify the
meaning of other positive values of FLT_EVAL_METHOD.  As such, code
conforming to C11 may not have been written expecting the possibility of
the new values.

-fpermitted-flt-eval-methods specifies whether the compiler
should allow only the values of FLT_EVAL_METHOD specified in C99/C11,
or the extended set of values specified in ISO/IEC TS 18661-3.

The two possible values this option can take are "c11" or "ts-18661-3".

The default when in a standards compliant mode (-std=c11 or similar)
is -fpermitted-flt-eval-methods=c11.  The default when in a GNU
dialect (-std=gnu11 or similar) is -fpermitted-flt-eval-methods=ts-18661-3.

I've added two testcases which test that when this option, or a C standards
dialect, would restrict the range of values to {-1, 0, 1, 2}, those are
the only values we see. At this stage in the patch series this trivially
holds for all targets.

Bootstrapped on x86_64 with no issues.

Thanks,
James

---
gcc/c-family/

2016-11-09  James Greenhalgh  

* c-opts.c (c_common_post_options): Add logic to handle the default
case for -fpermitted-flt-eval-methods.

gcc/

2016-11-09  James Greenhalgh  

* common.opt (fpermitted-flt-eval-methods): New.
* doc/invoke.texi (-fpermitted-flt-eval-methods): Document it.
* flag_types.h (permitted_flt_eval_methods): New.

gcc/testsuite/

2016-11-09  James Greenhalgh  

* gcc.dg/fpermitted-flt-eval-methods_1.c: New.
* gcc.dg/fpermitted-flt-eval-methods_2.c: New.

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index de260e7..57717ff 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -788,6 +788,18 @@ c_common_post_options (const char **pfilename)
   && flag_unsafe_math_optimizations == 0)
 flag_fp_contract_mode = FP_CONTRACT_OFF;
 
+  /* If we are compiling C, and we are outside of a standards mode,
+ we can permit the new values from ISO/IEC TS 18661-3 for
+ FLT_EVAL_METHOD.  Otherwise, we must restrict the possible values to
+ the set specified in ISO C99/C11.  */
+  if (!flag_iso
+  && !c_dialect_cxx ()
+  && (global_options_set.x_flag_permitted_flt_eval_methods
+	  == PERMITTED_FLT_EVAL_METHODS_DEFAULT))
+flag_permitted_flt_eval_methods = PERMITTED_FLT_EVAL_METHODS_TS_18661;
+  else
+flag_permitted_flt_eval_methods = PERMITTED_FLT_EVAL_METHODS_C11;
+
   /* By default we use C99 inline semantics in GNU99 or C99 mode.  C99
  inline semantics are not supported in GNU89 or C89 mode.  */
   if (flag_gnu89_inline == -1)
diff --git a/gcc/common.opt b/gcc/common.opt
index 314145a..915c406 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1326,6 +1326,21 @@ Enum(excess_precision) String(fast) Value(EXCESS_PRECISION_FAST)
 EnumValue
 Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD)
 
+; Whether we permit the extended set of values for FLT_EVAL_METHOD
+; introduced in ISO/IEC TS 18661-3, or limit ourselves to those in C99/C11.
+fpermitted-flt-eval-methods=
+Common Joined RejectNegative Enum(permitted_flt_eval_methods) Var(flag_permitted_flt_eval_methods) Init(PERMITTED_FLT_EVAL_METHODS_DEFAULT)
+-fpermitted-flt-eval-methods=[c11|ts-18661]	Specify which values of FLT_EVAL_METHOD are permitted.
+
+Enum
+Name(permitted_flt_eval_methods) Type(enum permitted_flt_eval_methods) UnknownError(unknown specification for the set of FLT_EVAL_METHOD values to permit %qs)
+
+EnumValue
+Enum(permitted_flt_eval_methods) String(c11) Value(PERMITTED_FLT_EVAL_METHODS_C11)
+
+EnumValue
+Enum(permitted_flt_eval_methods) String(ts-18661-3) Value(PERMITTED_FLT_EVAL_METHODS_TS_18661)
+
 ffast-math
 Common Optimization
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f133b3a..75ff8ec 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -378,7 +378,8 @@ Objective-C and Objective-C++ Dialects}.
 -flto-partition=@var{alg} -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
 -fmove-loop-invariants -fno-branch-count-reg @gol
--fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse @gol
+-fno-defer-pop -fno-fp-int-builtin-inexact @gol
+-fpermitt

[Patch 6/17] Migrate excess precision logic to use TARGET_EXCESS_PRECISION

2016-11-11 Thread James Greenhalgh


---
This patch has already been approved:
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00784.html
---

This patch moves the logic for excess precision from using the
TARGET_FLT_EVAL_METHOD macro to the TARGET_EXCESS_PRECISION hook
introduced earlier in the patch series.

Briefly; we have four things to change.

  1) The logic in tree.c::excess_precision_type .
  Here we want to ask the target which excess preicion it would like for
  whichever of -fexcess-precision=standard or -fexcess-precision=fast is
  in use, then apply that.

  2) The logic in c-family/c-cppbuiltin.c::c_cpp_flt_eval_method_iec_559 .
  We want to update this to return TRUE if the excess precision proposed
  by the front-end is the excess precision that will actually be used.

  3) The logic in c-family/c-cppbuiltin.c::c_cpp_builtin for setting
  __FLT_EVAL_METHOD__ .
  Which is now little more complicated, and makes use of
  -fpermitted-flt-eval-methods from patch 5. We also set
  __FLT_EVAL_METHOD_TS_18661_3__.

  4) The logic in c-family/c-cppbuiltin.c::c_cpp_builtin for setting
  __LIBGCC_*_EXCESS_PRECISION__ .

Having moved the logic in to those areas, we can simplify
toplev.c::init_excess_precision , which now only retains the assert that
-fexcess-precision=default has been rewritten by the language front-end, and
the set from the command-line variable to the internal variable.

Finally we need to update  to return the appropriate value for
FLT_EVAL_METHOD based on the new macros we add.

The documentation in invoke.texi is not quite right for the impact of
-fexcess-precision, so I've rewritten the text to read a little more
generic.

Thanks,
James

---
gcc/

2016-11-09  James Greenhalgh  

* toplev.c (init_excess_precision): Delete most logic.
* tree.c (excess_precision_type): Rewrite to use
TARGET_EXCESS_PRECISION.
* doc/invoke.texi (-fexcess-precision): Document behaviour in a
more generic fashion.
* ginclude/float.h: Wrap definition of FLT_EVAL_METHOD in
__STDC_WANT_IEC_60559_TYPES_EXT__.

gcc/c-family/

2016-11-09  James Greenhalgh  

* c-common.c (excess_precision_mode_join): New.
(c_ts18661_flt_eval_method): New.
(c_c11_flt_eval_method): Likewise.
(c_flt_eval_method): Likewise.
* c-common.h (excess_precision_mode_join): New.
(c_flt_eval_method): Likewise.
* c-cppbuiltin.c (c_cpp_flt_eval_method_iec_559): New.
(cpp_iec_559_value): Call it.
(c_cpp_builtins): Modify logic for __LIBGCC_*_EXCESS_PRECISION__,
call c_flt_eval_method to set __FLT_EVAL_METHOD__ and
__FLT_EVAL_METHOD_TS_18661_3__.

gcc/testsuite/

2016-11-09  James Greenhalgh  

* gcc.dg/fpermitted-flt-eval-methods_3.c: New.
* gcc.dg/fpermitted-flt-eval-methods_4.c: Likewise.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 307862b..9f0b4a6 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -7944,4 +7944,86 @@ cb_get_suggestion (cpp_reader *, const char *goal,
   return bm.get_best_meaningful_candidate ();
 }
 
+/* Return the latice point which is the wider of the two FLT_EVAL_METHOD
+   modes X, Y.  This isn't just  >, as the FLT_EVAL_METHOD values added
+   by C TS 18661-3 for interchange  types that are computed in their
+   native precision are larger than the C11 values for evaluating in the
+   precision of float/double/long double.  If either mode is
+   FLT_EVAL_METHOD_UNPREDICTABLE, return that.  */
+
+enum flt_eval_method
+excess_precision_mode_join (enum flt_eval_method x,
+			enum flt_eval_method y)
+{
+  if (x == FLT_EVAL_METHOD_UNPREDICTABLE
+  || y == FLT_EVAL_METHOD_UNPREDICTABLE)
+return FLT_EVAL_METHOD_UNPREDICTABLE;
+
+  /* GCC only supports one interchange type right now, _Float16.  If
+ we're evaluating _Float16 in 16-bit precision, then flt_eval_method
+ will be FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.  */
+  if (x == FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16)
+return y;
+  if (y == FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16)
+return x;
+
+  /* Other values for flt_eval_method are directly comparable, and we want
+ the maximum.  */
+  return MAX (x, y);
+}
+
+/* Return the value that should be set for FLT_EVAL_METHOD in the
+   context of ISO/IEC TS 18861-3.
+
+   This relates to the effective excess precision seen by the user,
+   which is the join point of the precision the target requests for
+   -fexcess-precision={standard,fast} and the implicit excess precision
+   the target uses.  */
+
+static enum flt_eval_method
+c_ts18661_flt_eval_method (void)
+{
+  enum flt_eval_method implicit
+= targetm.c.excess_precision (EXCESS_PRECISION_TYPE_IMPLICIT);
+
+  enum excess_precision_type flag_type
+= (flag_excess_precision_cmdline == EXCESS_PRECISION_STANDARD
+   ? EXCESS_PRECISION_TYPE_STANDARD
+   : EXCESS_PRECISION_TYPE_FAST);
+
+  enum flt_eval_method requested
+= targetm.c.excess_precision (flag_type);
+
+  retu

Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-11-11 Thread Bernd Schmidt


On 11/11/2016 04:35 PM, Alexander Monakov wrote:


For the avoidance of doubt, is this a statement of intent, or an actual approval
for the patchset?

After these backend modifications and the rest of libgomp/middle-end changes are
applied, trunk will need the following flip-the-switch patch to allow OpenMP
offloading for NVPTX.  OK?


Ok for everything.


Bernd

[Patch 7/17] Delete TARGET_FLT_EVAL_METHOD and poison it.

2016-11-11 Thread James Greenhalgh


---
This patch was approved:

  https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02401.html
---

Hi,

We've removed all uses of TARGET_FLT_EVAL_METHOD, so we can remove it
and poison it.

Bootstrapped and tested on x86-64 and AArch64. Tested on s390 and m68k
to the best of my ability (no execute tests).

Thanks,
James

---
gcc/

2016-11-09  James Greenhalgh  

* config/s390/s390.h (TARGET_FLT_EVAL_METHOD): Delete.
* config/m68k/m68k.h (TARGET_FLT_EVAL_METHOD): Delete.
* config/i386/i386.h (TARGET_FLT_EVAL_METHOD): Delete.
* defaults.h (TARGET_FLT_EVAL_METHOD): Delete.
* doc/tm.texi.in (TARGET_FLT_EVAL_METHOD): Delete.
* doc/tm.texi: Regenerate.
* system.h (TARGET_FLT_EVAL_METHOD): Poison.

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..a2fcdcc 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -686,17 +686,6 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
   SUBTARGET_EXTRA_SPECS
 
 
-/* Set the value of FLT_EVAL_METHOD in float.h.  When using only the
-   FPU, assume that the fpcw is set to extended precision; when using
-   only SSE, rounding is correct; when using both SSE and the FPU,
-   the rounding precision is indeterminate, since either may be chosen
-   apparently at random.  */
-#define TARGET_FLT_EVAL_METHOD		\
-  (TARGET_80387\
-   ? (TARGET_MIX_SSE_I387 ? -1		\
-  : (TARGET_SSE_MATH ? (TARGET_SSE2 ? 0 : -1) : 2))			\
-   : 0)
-
 /* Whether to allow x87 floating-point arithmetic on MODE (one of
SFmode, DFmode and XFmode) in the current excess precision
configuration.  */
diff --git a/gcc/config/m68k/m68k.h b/gcc/config/m68k/m68k.h
index 2aa858f..2021e9d 100644
--- a/gcc/config/m68k/m68k.h
+++ b/gcc/config/m68k/m68k.h
@@ -281,11 +281,6 @@ along with GCC; see the file COPYING3.  If not see
 #define LONG_DOUBLE_TYPE_SIZE			\
   ((TARGET_COLDFIRE || TARGET_FIDOA) ? 64 : 80)
 
-/* Set the value of FLT_EVAL_METHOD in float.h.  When using 68040 fp
-   instructions, we get proper intermediate rounding, otherwise we
-   get extended precision results.  */
-#define TARGET_FLT_EVAL_METHOD ((TARGET_68040 || ! TARGET_68881) ? 0 : 2)
-
 #define BITS_BIG_ENDIAN 1
 #define BYTES_BIG_ENDIAN 1
 #define WORDS_BIG_ENDIAN 1
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 501c8e4..6be4d34 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -247,11 +247,6 @@ extern const char *s390_host_detect_local_cpu (int argc, const char **argv);
 #define S390_TDC_INFINITY (S390_TDC_POSITIVE_INFINITY \
 			  | S390_TDC_NEGATIVE_INFINITY )
 
-/* This is used by float.h to define the float_t and double_t data
-   types.  For historical reasons both are double on s390 what cannot
-   be changed anymore.  */
-#define TARGET_FLT_EVAL_METHOD 1
-
 /* Target machine storage layout.  */
 
 /* Everything is big-endian.  */
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 9c40002..2536f76 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -942,9 +942,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define REG_WORDS_BIG_ENDIAN WORDS_BIG_ENDIAN
 #endif
 
-#ifndef TARGET_FLT_EVAL_METHOD
-#define TARGET_FLT_EVAL_METHOD 0
-#endif
 
 #ifndef TARGET_DEC_EVAL_METHOD
 #define TARGET_DEC_EVAL_METHOD 2
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index fecb08c..13e66f7 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -1566,13 +1566,6 @@ uses this macro should also arrange to use @file{t-gnu-prefix} in
 the libgcc @file{config.host}.
 @end defmac
 
-@defmac TARGET_FLT_EVAL_METHOD
-A C expression for the value for @code{FLT_EVAL_METHOD} in @file{float.h},
-assuming, if applicable, that the floating-point control word is in its
-default state.  If you do not define this macro the value of
-@code{FLT_EVAL_METHOD} will be zero.
-@end defmac
-
 @defmac WIDEST_HARDWARE_FP_SIZE
 A C expression for the size in bits of the widest floating-point format
 supported by the hardware.  If you define this macro, you must specify a
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 03758b5..7f75a44 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1402,13 +1402,6 @@ uses this macro should also arrange to use @file{t-gnu-prefix} in
 the libgcc @file{config.host}.
 @end defmac
 
-@defmac TARGET_FLT_EVAL_METHOD
-A C expression for the value for @code{FLT_EVAL_METHOD} in @file{float.h},
-assuming, if applicable, that the floating-point control word is in its
-default state.  If you do not define this macro the value of
-@code{FLT_EVAL_METHOD} will be zero.
-@end defmac
-
 @defmac WIDEST_HARDWARE_FP_SIZE
 A C expression for the size in bits of the widest floating-point format
 supported by the hardware.  If you define this macro, you must specify a
diff --git a/gcc/system.h b/gcc/system.h
index 8c6127c..6319f57 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -896,7 +896,7 @@ extern void fancy_abort (const char *, int, const char

[Patch 8/17] Make _Float16 available if HFmode is available

2016-11-11 Thread James Greenhalgh


---
This patch was approved:
  https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02403.html
---

Hi,

Now that we've worked on -fexcess-precision, the comment in targhooks.c
no longer holds. We can now permit _Float16 on any target which provides
HFmode and supports HFmode in libgcc.

Bootstrapped and tested on x86-64, and in series on AArch64.

Thanks,
James

---
2016-11-09  James Greenhalgh  

* targhooks.c (default_floatn_mode): Enable _Float16 if a target
provides HFmode.

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 73e1c25..a80b301 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -514,10 +514,12 @@ default_floatn_mode (int n, bool extended)
   switch (n)
 	{
 	case 16:
-	  /* We do not use HFmode for _Float16 by default because the
-	 required excess precision support is not present and the
-	 interactions with promotion of the older __fp16 need to
-	 be worked out.  */
+	  /* Always enable _Float16 if we have basic support for the mode.
+	 Targets can control the range and precision of operations on
+	 the _Float16 type using TARGET_C_EXCESS_PRECISION.  */
+#ifdef HAVE_HFmode
+	  cand = HFmode;
+#endif
 	  break;
 
 	case 32:

[Patch testsuite patch 10/17] Add options for floatN when checking effective target for support

2016-11-11 Thread James Greenhalgh


Hi,

As Joseph and I discussed in
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00239.html the
check_effective_target_float tests should add any options which the
target requires to enable those float types (add_options_for_float.

This patch ensures those options get added, which presently only changes
the testsuite behaviour for the (out-of-tree but submitted for review)
ARM support for _Float16.

Tested on arm-none-linux-gnueabihf with those patches applied, to show that
the _Float16 tests now run.

OK?

Thanks,
James

---

gcc/testsuite/

2016-11-07  James Greenhalgh  

* lib/target-supports.exp (check_effective_target_float16): Add
options for _Float16.
(check_effective_target_float32): Add options for _Float32.
(check_effective_target_float64): Add options for _Float64.
(check_effective_target_float128): Add options for _Float128.
(check_effective_target_float32x): Add options for _Float32x.
(check_effective_target_float64x): Add options for _Float64x.
(check_effective_target_float128x): Add options for _Float128x.

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b683c09..b917250 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2483,43 +2483,43 @@ proc check_effective_target_has_q_floating_suffix { } {
 proc check_effective_target_float16 {} {
 return [check_no_compiler_messages_nocache float16 object {
 _Float16 x;
-}]
+} [add_options_for_float16 ""]]
 }
 
 proc check_effective_target_float32 {} {
 return [check_no_compiler_messages_nocache float32 object {
 _Float32 x;
-}]
+} [add_options_for_float32 ""]]
 }
 
 proc check_effective_target_float64 {} {
 return [check_no_compiler_messages_nocache float64 object {
 _Float64 x;
-}]
+} [add_options_for_float64 ""]]
 }
 
 proc check_effective_target_float128 {} {
 return [check_no_compiler_messages_nocache float128 object {
 _Float128 x;
-}]
+} [add_options_for_float128 ""]]
 }
 
 proc check_effective_target_float32x {} {
 return [check_no_compiler_messages_nocache float32x object {
 _Float32x x;
-}]
+} [add_options_for_float32x ""]]
 }
 
 proc check_effective_target_float64x {} {
 return [check_no_compiler_messages_nocache float64x object {
 _Float64x x;
-}]
+} [add_options_for_float64x ""]]
 }
 
 proc check_effective_target_float128x {} {
 return [check_no_compiler_messages_nocache float128x object {
 _Float128x x;
-}]
+} [add_options_for_float128x ""]]
 }
 
 # Likewise, but runtime support for any special options used as well

[Patch libgcc 9/17] Update soft-fp from glibc

2016-11-11 Thread James Greenhalgh


---
This patch can be self approved
---

Hi,

This patch merges in the support added to glibc for HFmode conversions in
this patch:

commit 87ab10d6524fe4faabd7eb3eac5868165ecfb323
Author: James Greenhalgh 
Date:   Wed Sep 21 21:02:54 2016 +

[soft-fp] Add support for various half-precision conversion routines.

This patch adds conversion routines required for _Float16 support in
AArch64.

These are one-step conversions to and from TImode and TFmode. We need
these on AArch64 regardless of presence of the ARMv8.2-A 16-bit
floating-point extensions.

In the patch, soft-fp/half.h is derived from soft-fp/single.h .  The
conversion routines are derivatives of their respective SFmode
variants.

* soft-fp/extendhftf2.c: New.
* soft-fp/fixhfti.c: Likewise.
* soft-fp/fixunshfti.c: Likewise.
* soft-fp/floattihf.c: Likewise.
* soft-fp/floatuntihf.c: Likewise.
* soft-fp/half.h: Likewise.
* soft-fp/trunctfhf2.c: Likewise.

Any patch merging from upstream is preapproved acording to our commit
policies, but I'll hold off on committing it until the others in this
series have been approved.

Thanks,
James

---
libgcc/

2016-11-09  James Greenhalgh  

* soft-fp/extendhftf2.c: New.
* soft-fp/fixhfti.c: Likewise.
* soft-fp/fixunshfti.c: Likewise.
* soft-fp/floattihf.c: Likewise.
* soft-fp/floatuntihf.c: Likewise.
* soft-fp/half.h: Likewise.
* soft-fp/trunctfhf2.c: Likewise.

diff --git a/libgcc/soft-fp/extendhftf2.c b/libgcc/soft-fp/extendhftf2.c
new file mode 100644
index 000..6ff6438
--- /dev/null
+++ b/libgcc/soft-fp/extendhftf2.c
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE quad
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "quad.h"
+
+TFtype
+__extendhftf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_Q (R);
+  TFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+  FP_EXTEND (Q, H, 4, 1, R, A);
+#else
+  FP_EXTEND (Q, H, 2, 1, R, A);
+#endif
+  FP_PACK_RAW_Q (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/fixhfti.c b/libgcc/soft-fp/fixhfti.c
new file mode 100644
index 000..3610f4c
--- /dev/null
+++ b/libgcc/soft-fp/fixhfti.c
@@ -0,0 +1,45 @@
+/* Software floating-point emulation.
+   Convert IEEE half to 128bit signed integer
+   Copyright (C) 2007-2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTI

[Patch libgcc AArch64 12/17] Enable hfmode soft-float conversions and truncations

2016-11-11 Thread James Greenhalgh


Hi,

This patch enables the conversion functions we need for AArch64's _Float16
support. To do that we need to implement TARGET_SCALAR_MODE_SUPPORTED_P,
so do that now.

OK?

Thanks,
James

---
gcc/

2016-11-09  James Greenhalgh  

* config/aarch64/aarch64-c.c (aarch64_scalar_mode_supported_p): New.
(TARGET_SCALAR_MODE_SUPPORTED_P): Define.

libgcc/

2016-11-09  James Greenhalgh  

* config/aarch64/sfp-machine.h (_FP_NANFRAC_H): Define.
(_FP_NANSIGN_H): Likewise.
* config/aarch64/t-softfp (softfp_extensions): Add hftf.
(softfp_truncations): Add tfhf.
(softfp_extras): Add required conversion functions.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b7d4640..ec17af4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14168,6 +14168,17 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 }
 }
 
+/* Implement TARGET_SCALAR_MODE_SUPPORTED_P - return TRUE
+   if MODE is HFmode, and punt to the generic implementation otherwise.  */
+
+static bool
+aarch64_scalar_mode_supported_p (machine_mode mode)
+{
+  return (mode == HFmode
+	  ? true
+	  : default_scalar_mode_supported_p (mode));
+}
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
@@ -14378,6 +14389,9 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS aarch64_rtx_costs_wrapper
 
+#undef TARGET_SCALAR_MODE_SUPPORTED_P
+#define TARGET_SCALAR_MODE_SUPPORTED_P aarch64_scalar_mode_supported_p
+
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE aarch64_sched_issue_rate
 
diff --git a/libgcc/config/aarch64/sfp-machine.h b/libgcc/config/aarch64/sfp-machine.h
index 5efa245..da154dd 100644
--- a/libgcc/config/aarch64/sfp-machine.h
+++ b/libgcc/config/aarch64/sfp-machine.h
@@ -42,9 +42,11 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
 
 #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
 #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
 #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
 #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
+#define _FP_NANSIGN_H		0
 #define _FP_NANSIGN_S		0
 #define _FP_NANSIGN_D		0
 #define _FP_NANSIGN_Q		0
diff --git a/libgcc/config/aarch64/t-softfp b/libgcc/config/aarch64/t-softfp
index 586dca2..c4ce0dc 100644
--- a/libgcc/config/aarch64/t-softfp
+++ b/libgcc/config/aarch64/t-softfp
@@ -1,8 +1,9 @@
 softfp_float_modes := tf
 softfp_int_modes := si di ti
-softfp_extensions := sftf dftf
-softfp_truncations := tfsf tfdf
+softfp_extensions := sftf dftf hftf
+softfp_truncations := tfsf tfdf tfhf
 softfp_exclude_libgcc2 := n
+softfp_extras := fixhfti fixunshfti floattihf floatuntihf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes

[Patch AArch64 11/17] Add floatdihf2 and floatunsdihf2 patterns

2016-11-11 Thread James Greenhalgh


Hi,

This patch adds patterns for conversion from 64-bit integer to 16-bit
floating-point values under AArch64 targets which don't have support for
the ARMv8.2-A 16-bit floating point extensions.

We implement these by first saturating to a SImode (we know that any
values >= 65504 will round to infinity after conversion to HFmode), then
converting to a DFmode (unsigned conversions could go to SFmode, but there
is no performance benefit to this). Then converting to HFmode.

Having added these patterns, the expansion path in "expand_float" will
now try to use them for conversions from SImode to HFmode as there is no
floatsihf2 pattern. expand_float first tries widening the integer size and
looking for a match, so it will try SImode -> DImode. But our DI mode
pattern is going to then saturate us back to SImode which is wasteful.

Better, would be for us to provide float(uns)sihf2 patterns directly.
So that's what this patch does.

The testcase add in this patch would fail on trunk for AArch64. There is
no libgcc routine to make the conversion, and we don't provide appropriate
patterns in the backend, so we get a link-time error.

Bootstrapped and tested on aarch64-none-linux-gnu

OK for trunk?

James

---
2016-11-09  James Greenhalgh  

* config/aarch64/aarch64.md (sihf2): Convert to expand.
(dihf2): Likewise.
(aarch64_fp16_hf2): New.

2016-11-09  James Greenhalgh  

* gcc.target/aarch64/floatdihf2_1.c: New.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 46eaa30..b818968 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4627,7 +4627,14 @@
   [(set_attr "type" "f_cvti2f")]
 )
 
-(define_insn "hf2"
+;; If we do not have ARMv8.2-A 16-bit floating point extensions, the
+;; midend will arrange for an SImode conversion to HFmode to first go
+;; through DFmode, then to HFmode.  But first it will try converting
+;; to DImode then down, which would match our DImode pattern below and
+;; give very poor code-generation.  So, we must provide our own emulation
+;; of the mid-end logic.
+
+(define_insn "aarch64_fp16_hf2"
   [(set (match_operand:HF 0 "register_operand" "=w")
 	(FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
   "TARGET_FP_F16INST"
@@ -4635,6 +4642,53 @@
   [(set_attr "type" "f_cvti2f")]
 )
 
+(define_expand "sihf2"
+  [(set (match_operand:HF 0 "register_operand")
+	(FLOATUORS:HF (match_operand:SI 1 "register_operand")))]
+  "TARGET_FLOAT"
+{
+  if (TARGET_FP_F16INST)
+emit_insn (gen_aarch64_fp16_sihf2 (operands[0], operands[1]));
+  else
+{
+  rtx convert_target = gen_reg_rtx (DFmode);
+  emit_insn (gen_sidf2 (convert_target, operands[1]));
+  emit_insn (gen_truncdfhf2 (operands[0], convert_target));
+}
+  DONE;
+}
+)
+
+;; For DImode there is no wide enough floating-point mode that we
+;; can convert through natively (TFmode would work, but requires a library
+;; call).  However, we know that any value >= 65504 will be rounded
+;; to infinity on conversion.  This is well within the range of SImode, so
+;; we can:
+;;   Saturate to SImode.
+;;   Convert from that to DFmode
+;;   Convert from that to HFmode (phew!).
+;; Note that the saturation to SImode requires the SIMD extensions.  If
+;; we ever need to provide this pattern where the SIMD extensions are not
+;; available, we would need a different approach.
+
+(define_expand "dihf2"
+  [(set (match_operand:HF 0 "register_operand")
+	(FLOATUORS:HF (match_operand:DI 1 "register_operand")))]
+  "TARGET_FLOAT && (TARGET_FP_F16INST || TARGET_SIMD)"
+{
+  if (TARGET_FP_F16INST)
+emit_insn (gen_aarch64_fp16_dihf2 (operands[0], operands[1]));
+  else
+{
+  rtx sat_target = gen_reg_rtx (SImode);
+  emit_insn (gen_aarch64_qmovndi (sat_target, operands[1]));
+  emit_insn (gen_sihf2 (operands[0], sat_target));
+}
+
+  DONE;
+}
+)
+
 ;; Convert between fixed-point and floating-point (scalar modes)
 
 (define_insn "3"
diff --git a/gcc/testsuite/gcc.target/aarch64/floatdihf2_1.c b/gcc/testsuite/gcc.target/aarch64/floatdihf2_1.c
new file mode 100644
index 000..9eaa4ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/floatdihf2_1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Test that conversion from 32-bit and 64-bit integers can be done
+   without a call to the support library.  */
+
+#pragma GCC target ("arch=armv8.2-a+nofp16")
+
+__fp16
+foo (int x)
+{
+  return x;
+}
+
+__fp16
+bar (unsigned int x)
+{
+  return x;
+}
+
+__fp16
+fool (long long x)
+{
+  return x;
+}
+
+__fp16
+barl (unsigned long long x)
+{
+  return x;
+}
+
+
+/* { dg-final { scan-assembler-not "__float\\\[ds\\\]ihf2" } } */
+/* { dg-final { scan-assembler-not "__floatun\\\[ds\\\]ihf2" } } */

[Patch AArch64 13/17] Enable _Float16 for AArch64

2016-11-11 Thread James Greenhalgh


 Hi,

This patch adds the back-end wiring to get AArch64 support for
the _Float16 type working.

Bootstrapped on AArch64 with no issues.

OK?

Thanks,
James

---
2016-11-09  James Greenhalgh  

* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Update
__FLT_EVAL_METHOD__ and __FLT_EVAL_METHOD_C99__ when we switch
architecture levels.
* config/aarch64/aarch64.c (aarch64_promoted_type): Only promote
the aarch64_fp16_type_node, not all HFmode types.
(aarch64_libgcc_floating_mode_supported_p): Support HFmode.
(aarch64_scalar_mode_supported_p): Likewise.
(aarch64_excess_precision): New.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Define.
(TARGET_SCALAR_MODE_SUPPORTED_P): Likewise.
(TARGET_C_EXCESS_PRECISION): Likewise.

2016-11-09  James Greenhalgh  

* gcc.target/aarch64/_Float16_1.c: New.
* gcc.target/aarch64/_Float16_2.c: Likewise.
* gcc.target/aarch64/_Float16_3.c: Likewise.

diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 422e322..320b912 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -133,6 +133,16 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 
   aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
   aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
+
+  /* Not for ACLE, but required to keep "float.h" correct if we switch
+ target between implementations that do or do not support ARMv8.2-A
+ 16-bit floating-point extensions.  */
+  cpp_undef (pfile, "__FLT_EVAL_METHOD__");
+  builtin_define_with_int_value ("__FLT_EVAL_METHOD__",
+ c_flt_eval_method (true));
+  cpp_undef (pfile, "__FLT_EVAL_METHOD_C99__");
+  builtin_define_with_int_value ("__FLT_EVAL_METHOD_C99__",
+ c_flt_eval_method (false));
 }
 
 /* Implement TARGET_CPU_CPP_BUILTINS.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ec17af4..824b27c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14143,12 +14143,20 @@ aarch64_vec_fpconst_pow_of_2 (rtx x)
   return firstval;
 }
 
-/* Implement TARGET_PROMOTED_TYPE to promote __fp16 to float.  */
+/* Implement TARGET_PROMOTED_TYPE to promote 16-bit floating point types
+   to float.
+
+   __fp16 always promotes through this hook.
+   _Float16 may promote if TARGET_FLT_EVAL_METHOD is 16, but we do that
+   through the generic excess precision logic rather than here.  */
+
 static tree
 aarch64_promoted_type (const_tree t)
 {
-  if (SCALAR_FLOAT_TYPE_P (t) && TYPE_PRECISION (t) == 16)
+
+  if (TYPE_P (t) && TYPE_MAIN_VARIANT (t) == aarch64_fp16_type_node)
 return float_type_node;
+
   return NULL_TREE;
 }
 
@@ -14168,6 +14176,17 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 }
 }
 
+/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
+   if MODE is HFmode, and punt to the generic implementation otherwise.  */
+
+static bool
+aarch64_libgcc_floating_mode_supported_p (machine_mode mode)
+{
+  return (mode == HFmode
+	  ? true
+	  : default_libgcc_floating_mode_supported_p (mode));
+}
+
 /* Implement TARGET_SCALAR_MODE_SUPPORTED_P - return TRUE
if MODE is HFmode, and punt to the generic implementation otherwise.  */
 
@@ -14179,6 +14198,47 @@ aarch64_scalar_mode_supported_p (machine_mode mode)
 	  : default_scalar_mode_supported_p (mode));
 }
 
+/* Set the value of FLT_EVAL_METHOD.
+   ISO/IEC TS 18661-3 defines two values that we'd like to make use of:
+
+0: evaluate all operations and constants, whose semantic type has at
+   most the range and precision of type float, to the range and
+   precision of float; evaluate all other operations and constants to
+   the range and precision of the semantic type;
+
+N, where _FloatN is a supported interchange floating type
+   evaluate all operations and constants, whose semantic type has at
+   most the range and precision of _FloatN type, to the range and
+   precision of the _FloatN type; evaluate all other operations and
+   constants to the range and precision of the semantic type;
+
+   If we have the ARMv8.2-A extensions then we support _Float16 in native
+   precision, so we should set this to 16.  Otherwise, we support the type,
+   but want to evaluate expressions in float precision, so set this to
+   0.  */
+
+static enum flt_eval_method
+aarch64_excess_precision (enum excess_precision_type type)
+{
+  switch (type)
+{
+  case EXCESS_PRECISION_TYPE_FAST:
+  case EXCESS_PRECISION_TYPE_STANDARD:
+	/* We can calculate either in 16-bit range and precision or
+	   32-bit range and precision.  Make that decision based on whether
+	   we have native support for the ARMv8.2-A 16-bit floating-point
+	   instructions or not.  */
+	return (TARGET_FP_F16INST
+		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
+		: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
+  ca

[Patch 14/17] [libgcc, ARM] Generalise float-to-half conversion function.

2016-11-11 Thread James Greenhalgh


Hi,

I'm adapting this patch from work started by Matthew Wahab.

Conversions from double precision floats to the ARM __fp16 are required
to round only once. A conversion function for double to __fp16 to
support this on soft-fp targets. This and the following patch add this
conversion function by reusing the exising float to __fp16 function
config/arm/fp16.c:__gnu_f2h_internal.

This patch generalizes __gnu_f2h_internal by adding a specification of
the source format and reworking the code to make use of it. Initially,
only the binary32 format is supported.

A previous version of this patch had a bug handling rounding, the update
in this patch should be sufficient to fix the bug,

replacing:

>   else
> mask = 0x1fff;

With:

 mask = (point - 1) >> 10;

I've tested that fix throwing semi-random bit-patterns at the conversion
function to confirm that the software implementation now matches the
hardware behaviour for this routine.

Additionally, bootstrapped again, and cross-tested with no issues.

OK?

Thanks,
James



libgcc/

2016-11-09  James Greenhalgh  
Matthew Wahab  

* config/arm/fp16.c (struct format): New.
(binary32): New.
(__gnu_float2h_internal): New.  Body moved from
__gnu_f2h_internal and generalize.
(_gnu_f2h_internal): Move body to function __gnu_float2h_internal.
Call it with binary32.

diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index 39c863c..ba89796 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -22,40 +22,74 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+struct format
+{
+  /* Number of bits.  */
+  unsigned long long size;
+  /* Exponent bias.  */
+  unsigned long long bias;
+  /* Exponent width in bits.  */
+  unsigned long long exponent;
+  /* Significand precision in explicitly stored bits.  */
+  unsigned long long significand;
+};
+
+static const struct format
+binary32 =
+{
+  32,   /* size.  */
+  127,  /* bias.  */
+  8,/* exponent.  */
+  23/* significand.  */
+};
+
 static inline unsigned short
-__gnu_f2h_internal(unsigned int a, int ieee)
+__gnu_float2h_internal (const struct format* fmt,
+			unsigned long long a, int ieee)
 {
-  unsigned short sign = (a >> 16) & 0x8000;
-  int aexp = (a >> 23) & 0xff;
-  unsigned int mantissa = a & 0x007f;
-  unsigned int mask;
-  unsigned int increment;
+  unsigned long long point = 1ULL << fmt->significand;;
+  unsigned short sign = (a >> (fmt->size - 16)) & 0x8000;
+  int aexp;
+  unsigned long long mantissa;
+  unsigned long long mask;
+  unsigned long long increment;
+
+  /* Get the exponent and mantissa encodings.  */
+  mantissa = a & (point - 1);
+
+  mask = (1 << fmt->exponent) - 1;
+  aexp = (a >> fmt->significand) & mask;
 
-  if (aexp == 0xff)
+  /* Infinity, NaN and alternative format special case.  */
+  if (((unsigned int) aexp) == mask)
 {
   if (!ieee)
 	return sign;
   if (mantissa == 0)
 	return sign | 0x7c00;	/* Infinity.  */
   /* Remaining cases are NaNs.  Convert SNaN to QNaN.  */
-  return sign | 0x7e00 | (mantissa >> 13);
+  return sign | 0x7e00 | (mantissa >> (fmt->significand - 10));
 }
 
+  /* Zero.  */
   if (aexp == 0 && mantissa == 0)
 return sign;
 
-  aexp -= 127;
+  /* Construct the exponent and mantissa.  */
+  aexp -= fmt->bias;
+
+  /* Decimal point is immediately after the significand.  */
+  mantissa |= point;
 
-  /* Decimal point between bits 22 and 23.  */
-  mantissa |= 0x0080;
   if (aexp < -14)
 {
-  mask = 0x00ff;
+  mask = point | (point - 1);
+  /* Minimum exponent for half-precision is 2^-24.  */
   if (aexp >= -25)
 	mask >>= 25 + aexp;
 }
   else
-mask = 0x1fff;
+mask = (point - 1) >> 10;
 
   /* Round.  */
   if (mantissa & mask)
@@ -64,8 +98,8 @@ __gnu_f2h_internal(unsigned int a, int ieee)
   if ((mantissa & mask) == increment)
 	increment = mantissa & (increment << 1);
   mantissa += increment;
-  if (mantissa >= 0x0100)
-   	{
+  if (mantissa >= (point << 1))
+	{
 	  mantissa >>= 1;
 	  aexp++;
 	}
@@ -93,7 +127,13 @@ __gnu_f2h_internal(unsigned int a, int ieee)
 
   /* We leave the leading 1 in the mantissa, and subtract one
  from the exponent bias to compensate.  */
-  return sign | (((aexp + 14) << 10) + (mantissa >> 13));
+  return sign | (((aexp + 14) << 10) + (mantissa >> (fmt->significand - 10)));
+}
+
+static inline unsigned short
+__gnu_f2h_internal (unsigned int a, int ieee)
+{
+  return __gnu_float2h_internal (&binary32, (unsigned long long) a, ieee);
 }
 
 unsigned int

[Patch 16/17 libgcc ARM] Half to double precision conversions

2016-11-11 Thread James Greenhalgh


Hi,

This patch adds the half-to-double conversions, both as library functions,
or when supported in hardware, using the appropriate instructions.

That means adding support for the __gnu_d2h_{ieee/alternative} library calls
added in patch 2/4, and providing a more aggressive truncdfhf2 where we can.

This also lets us remove the implementation of TARGET_CONVERT_TO_TYPE.

Bootstrapped on an ARMv8-A machine,and crosstested with no issues.

OK?

Thanks,
James

---
gcc/

2016-11-09  James Greenhalgh  

* config/arm/arm.c (arm_convert_to_type): Delete.
(TARGET_CONVERT_TO_TYPE): Delete.
(arm_init_libfuncs): Enable trunc_optab from DFmode to HFmode.
(arm_libcall_uses_aapcs_base): Add trunc_optab from DF- to HFmode.
* config/arm/arm.h (TARGET_FP16_TO_DOUBLE): New.
* config/arm/arm.md (truncdfhf2): Only convert through SFmode if we
are in fast math mode, and have no single step hardware instruction.
(extendhfdf2): Only expand through SFmode if we don't have a
single-step hardware instruction.
* config/arm/vfp.md (*truncdfhf2): New.
(extendhfdf2): Likewise.

gcc/testsuite/

2016-11-09  James Greenhalgh  

* gcc.target/arm/fp16-rounding-alt-1.c (ROUNDED): Change expected
result.
* gcc.target/arm/fp16-rounding-ieee-1.c (ROUNDED): Change expected
result.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 239117f..b9097c5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -245,7 +245,6 @@ static bool arm_output_addr_const_extra (FILE *, rtx);
 static bool arm_allocate_stack_slots_for_args (void);
 static bool arm_warn_func_return (tree);
 static tree arm_promoted_type (const_tree t);
-static tree arm_convert_to_type (tree type, tree expr);
 static bool arm_scalar_mode_supported_p (machine_mode);
 static bool arm_frame_pointer_required (void);
 static bool arm_can_eliminate (const int, const int);
@@ -654,9 +653,6 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_PROMOTED_TYPE
 #define TARGET_PROMOTED_TYPE arm_promoted_type
 
-#undef TARGET_CONVERT_TO_TYPE
-#define TARGET_CONVERT_TO_TYPE arm_convert_to_type
-
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
 
@@ -2535,6 +2531,11 @@ arm_init_libfuncs (void)
 			 ? "__gnu_h2f_ieee"
 			 : "__gnu_h2f_alternative"));
 
+  set_conv_libfunc (trunc_optab, HFmode, DFmode,
+			(arm_fp16_format == ARM_FP16_FORMAT_IEEE
+			 ? "__gnu_d2h_ieee"
+			 : "__gnu_d2h_alternative"));
+
   /* Arithmetic.  */
   set_optab_libfunc (add_optab, HFmode, NULL);
   set_optab_libfunc (sdiv_optab, HFmode, NULL);
@@ -5262,6 +5263,8 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
 			SFmode));
   add_libcall (libcall_htab, convert_optab_libfunc (trunc_optab, SFmode,
 			DFmode));
+  add_libcall (libcall_htab,
+		   convert_optab_libfunc (trunc_optab, HFmode, DFmode));
 }
 
   return libcall && libcall_htab->find (libcall) != NULL;
@@ -22517,23 +22520,6 @@ arm_promoted_type (const_tree t)
   return NULL_TREE;
 }
 
-/* Implement TARGET_CONVERT_TO_TYPE.
-   Specifically, this hook implements the peculiarity of the ARM
-   half-precision floating-point C semantics that requires conversions between
-   __fp16 to or from double to do an intermediate conversion to float.  */
-
-static tree
-arm_convert_to_type (tree type, tree expr)
-{
-  tree fromtype = TREE_TYPE (expr);
-  if (!SCALAR_FLOAT_TYPE_P (fromtype) || !SCALAR_FLOAT_TYPE_P (type))
-return NULL_TREE;
-  if ((TYPE_PRECISION (fromtype) == 16 && TYPE_PRECISION (type) > 32)
-  || (TYPE_PRECISION (type) == 16 && TYPE_PRECISION (fromtype) > 32))
-return convert (type, convert (float_type_node, expr));
-  return NULL_TREE;
-}
-
 /* Implement TARGET_SCALAR_MODE_SUPPORTED_P.
This simply adds HFmode as a supported mode; even though we don't
implement arithmetic on this type directly, it's supported by
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index c8d7462..e759720 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -179,6 +179,11 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_FP16			\
   (ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_FP16))
 
+/* FPU supports converting between HFmode and DFmode in a single hardware
+   step.  */
+#define TARGET_FP16_TO_DOUBLE		\
+  (TARGET_HARD_FLOAT && (TARGET_FP16 && TARGET_VFP5))
+
 /* FPU supports fused-multiply-add operations.  */
 #define TARGET_FMA (TARGET_FPU_REV >= 4)
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8393f65..4074773 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5177,20 +5177,35 @@
   ""
 )
 
-;; DFmode to HFmode conversions have to go through SFmode.
+;; DFmode to HFmode conversions on targets without a single-step hardware
+;; instruction for it would have to go through SFmode.  This is dangerous
+;; as i

[Patch 15/17 libgcc ARM] Add double to half conversions.

2016-11-11 Thread James Greenhalgh


Hi,

Conversions from double precision floats to the ARM __fp16 are required
to round only once.

This patch adds a functions named __gnu_d2h_ieee and
__gnu_d2h_alternative for double to __fp16 conversions in IEEE and ARM
alternative format. The make use of the existing __gnu_float2h_internal
conversion function which rounds once only.

Bootstrapped on an ARMv8-A machine with no issues, and cross-tested with
a range of multilibs.

OK?

Thanks,
James
---

libgcc/

2016-11-09  James Greenhalgh  
Matthew Wahab  

* config/arm/fp16.c (binary64): New.
(__gnu_d2h_internal): New.
(__gnu_d2h_ieee): New.
(__gnu_d2h_alternative): New.

diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index ba89796..a656988 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -43,6 +43,15 @@ binary32 =
   23/* significand.  */
 };
 
+static const struct format
+binary64 =
+{
+  64,/* size.  */
+  1023,  /* bias.  */
+  11,/* exponent.  */
+  52 /* significand.  */
+};
+
 static inline unsigned short
 __gnu_float2h_internal (const struct format* fmt,
 			unsigned long long a, int ieee)
@@ -136,6 +145,12 @@ __gnu_f2h_internal (unsigned int a, int ieee)
   return __gnu_float2h_internal (&binary32, (unsigned long long) a, ieee);
 }
 
+static inline unsigned short
+__gnu_d2h_internal (unsigned long long a, int ieee)
+{
+  return __gnu_float2h_internal (&binary64, a, ieee);
+}
+
 unsigned int
 __gnu_h2f_internal(unsigned short a, int ieee)
 {
@@ -184,3 +199,15 @@ __gnu_h2f_alternative(unsigned short a)
 {
   return __gnu_h2f_internal(a, 0);
 }
+
+unsigned short
+__gnu_d2h_ieee (unsigned long long a)
+{
+  return __gnu_d2h_internal (a, 1);
+}
+
+unsigned short
+__gnu_d2h_alternative (unsigned long long x)
+{
+  return __gnu_d2h_internal (x, 0);
+}

[Patch ARM 17/17] Enable _Float16 for ARM.

2016-11-11 Thread James Greenhalgh


Hi,

Finally, having added support for single-step DFmode to HFmode conversions,
this patch adds support for _Float16 to the ARM back-end.

That means making sure that only __fp16 promotes and adding similar hooks to
those used in the AArch64 port giving the excess precision rules, and
marking HFmode as supported in libgcc.

Bootstrapped on an ARMv8-A machine, and crosstested with no issues.

OK?

Thanks,
James

---
gcc/

2016-11-09  James Greenhalgh  

PR target/63250
* config/arm/arm-builtins.c (arm_simd_floatHF_type_node): Rename to...
(arm_fp16_type_node): ...This, make visibile.
(arm_simd_builtin_std_type): Rename arm_simd_floatHF_type_node to
arm_fp16_type_node.
(arm_init_simd_builtin_types): Likewise.
(arm_init_fp16_builtins): Likewise.
* config/arm/arm.c (arm_excess_precision): New.
(arm_floatn_mode): Likewise.
(TARGET_C_EXCESS_PRECISION): Likewise.
(TARGET_FLOATN_MODE): Likewise.
(arm_promoted_type): Only promote arm_fp16_type_node.
* config/arm/arm.h (arm_fp16_type_node): Declare.

gcc/testsuite/

2016-11-09  James Greenhalgh  

* lib/target-supports.exp (add_options_for_float16): Add
-mfp16-format=ieee when testign arm*-*-*.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index e73043d..5ed38d1 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -652,7 +652,8 @@ static struct arm_simd_type_info arm_simd_types [] = {
 };
 #undef ENTRY
 
-static tree arm_simd_floatHF_type_node = NULL_TREE;
+/* The user-visible __fp16 type.  */
+tree arm_fp16_type_node = NULL_TREE;
 static tree arm_simd_intOI_type_node = NULL_TREE;
 static tree arm_simd_intEI_type_node = NULL_TREE;
 static tree arm_simd_intCI_type_node = NULL_TREE;
@@ -739,7 +740,7 @@ arm_simd_builtin_std_type (enum machine_mode mode,
 case XImode:
   return arm_simd_intXI_type_node;
 case HFmode:
-  return arm_simd_floatHF_type_node;
+  return arm_fp16_type_node;
 case SFmode:
   return float_type_node;
 case DFmode:
@@ -840,8 +841,8 @@ arm_init_simd_builtin_types (void)
   /* Continue with standard types.  */
   /* The __builtin_simd{64,128}_float16 types are kept private unless
  we have a scalar __fp16 type.  */
-  arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
-  arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node;
+  arm_simd_types[Float16x4_t].eltype = arm_fp16_type_node;
+  arm_simd_types[Float16x8_t].eltype = arm_fp16_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
@@ -1754,11 +1755,11 @@ arm_init_iwmmxt_builtins (void)
 static void
 arm_init_fp16_builtins (void)
 {
-  arm_simd_floatHF_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (arm_simd_floatHF_type_node) = GET_MODE_PRECISION (HFmode);
-  layout_type (arm_simd_floatHF_type_node);
+  arm_fp16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (arm_fp16_type_node) = GET_MODE_PRECISION (HFmode);
+  layout_type (arm_fp16_type_node);
   if (arm_fp16_format)
-(*lang_hooks.types.register_builtin_type) (arm_simd_floatHF_type_node,
+(*lang_hooks.types.register_builtin_type) (arm_fp16_type_node,
 	   "__fp16");
 }
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b9097c5..4e1d47b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -266,6 +266,7 @@ static bool arm_builtin_support_vector_misalignment (machine_mode mode,
 		 int misalignment,
 		 bool is_packed);
 static void arm_conditional_register_usage (void);
+static enum flt_eval_method arm_excess_precision (enum excess_precision_type);
 static reg_class_t arm_preferred_rename_class (reg_class_t rclass);
 static unsigned int arm_autovectorize_vector_sizes (void);
 static int arm_default_branch_cost (bool, bool);
@@ -299,6 +300,7 @@ static bool arm_asm_elf_flags_numeric (unsigned int flags, unsigned int *num);
 static unsigned int arm_elf_section_type_flags (tree decl, const char *name,
 		int reloc);
 static void arm_expand_divmod_libfunc (rtx, machine_mode, rtx, rtx, rtx *, rtx *);
+static machine_mode arm_floatn_mode (int, bool);
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -444,6 +446,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef  TARGET_ASM_INTERNAL_LABEL
 #define TARGET_ASM_INTERNAL_LABEL arm_internal_label
 
+#undef TARGET_FLOATN_MODE
+#define TARGET_FLOATN_MODE arm_floatn_mode
+
 #undef  TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL arm_function_ok_for_sibcall
 
@@ -734,6 +739,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_EXPAND_DIVMOD_LIBFUNC
 #define TARGET_EXPAND_DIVMOD_LIBFUNC arm_expand_divmod_libfunc
 
+#undef TARGET_C_EXCESS_PRECISION
+#define TARGET_C_EXCESS_PRECISION arm_excess_precision
+
 struct g

1 2 >

1 - 100 of 185 matches

Mail list logo