Re: [gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 11:24:16PM +0300, Alexander Monakov wrote:
> > These patches provide stub functionality, which
> > is easy enough, but I can't tell whether there's a credible plan to provide 
> > a
> > full implementation. GPUs really need a different programming model than
> > normal CPUs, which is something I learned the hard way, and I'm not terribly
> > optimistic about porting libgomp to ptx. (I may be wrong.)
> 
> Right, libgomp running on ptx would have to do many things differently from
> how it does now (and some drop entirely, like affinity).  Thankfully it can be

Sure, affinity doesn't have to be supported.  And, eventually some
simpler constructs can be e.g. inlined by the compiler if it is desirable.
Some constructs like tasking though are just too complex to handle them
without sharing code in the library.  Static scheduling loops are already
expanded inline by the compiler except for ordered loops (which are again
hard to handle without library side), other scheduling kinds IMHO just can
be shared with the CPU implementation, etc.

> implemented piecemeal in config/nvptx, without #ifdef butchery in the primary
> source files.  The plan towards providing a full implementation is thus to

We really don't need to avoid all #ifdef stuff, just keep it to a reasonable
maintanable level.

> > In one patch you mention newlib pthread type definitions - are you aware 
> > that
> > there is no real pthreads implementation in the ptx newlib? The ptx newlib 
> > is
> > really only provided for a minimal subset of libc functionality.
> 
> Sure, I'm aware.  The point was to make libgomp.h valid to be included into
> the rest of to-be-ported source files, keeping modifications to it to a
> minimum.  If the idea is that relying on #include  available on
> nvptx in the first place is too much of a hack, we can discuss alternatives :)

I'd say for e.g. libgomp.h it is acceptable to use what I've posted in
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01418.html, so HAVE_PTHREAD_H
and LIBGOMP_USE_PTHREAD guards.  It is likely some other offloading target
in the future (somebody has been talking about e.g. ARM offloading to
Epiphany (Parallella board)) will have the same need (i.e. no pthreads, and
either a dummy pthread.h around, or not at all).
Plus of course we need NVPTX version of gomp_thread (), that can be guarded
with __nvptx__ ifdef (if the implementation is small, but I'd hope it is,
some CTA local pointer and pointer arithmetics - indexed by %tid.x / WRAP_SZ
or something similar.

> > My other concern would be not to approve changes to the gomp-4_0-branch that
> > could derail or slow down the effort to implement OpenACC, which has a much
> > better chance of being in gcc-6 than this effort. You might want to make a
> > private branch for your work.
> 
> I'm unclear how this work might hurt the OpenACC efforts, and in any case I
> intend to be careful.  I don't imagine there will be conflicting requirements
> to source code changes along the way.  In defense of the idea of working on
> gomp4 branch, I expect that interleaving OpenACC and OpenMP work on a common
> branch will cause less pain in case of inadvertent breakage than a merge
> afterward.  Jakub, since you suggested submitting for gomp-4_0-branch, what's
> your recommendation here?

My suggestion for this to be added to gomp-4_0-branch rather than e.g.
gomp-4_1-branch or trunk directly is that even at the beginning it has some
dependencies on the stuff that has not been merged into trunk yet, in
particular the nvptx changes to libgomp that are on the branch and the code
to link libgcc and/or libgomp statically into the nvptx offloaded chunks.

Once those pieces are merged into trunk, obviously it could be developed on
some other branch, but I'd hope none of the changes actually can be
problematic to the OpenACC effort, OpenACC uses from the libgomp only a
minimum files and that I bet is not going to change too much with the
patches.

As for merging plans, the OpenMP 4.1 standard is approaching its final form
quickly, so I expect to merge gomp-4_1-branch to trunk around October 15th.
It would be nice if the gomp-4_0-branch stuff (at least the parts
Thomas/Nathan want to see in GCC 6) were in the process of being merged
shortly after that (I know I'm behind with patch review and am very sorry
for that, will try to find more time for that in the second half of October
and early November).  As for this NVPTX OpenMP 4.1 port, I'd say it really
depends on how invasive it is to other parts of the compiler.  Parts of it
that can't destabilize OpenMP 4.1 host or XeonPhi/XeonPhi-emul nor OpenACC
support can go even during stage3 (of course on a case by case basis).

So I'd like to ask Thomas/Nathan if they are ok with this stuff being on
the gomp-4_0-branch for now, once all the prerequisities it needs are on the
trunk, it can go into its own branch.

Jakub


Re: [gomp4 1/8] nvptx: remove assumption of OpenACC attrs presence

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:15PM +0300, Alexander Monakov wrote:
> This patch makes one OpenACC-specific path in nvptx_record_offload_symbol
> optional.
> 
>   * config/nvptx/nvptx.c (nvptx_record_offload_symbol): Allow missing
> OpenACC attributes.

LGTM, but as it is a nvptx backend change, please check with the nvptx
maintainers (Bernd/Nathan), and for the whole patch series, please wait for
Thomas/Nathan if they are ok with having the stuff on their branch.

Jakub


Re: [gomp4 2/8] nvptx mkoffload: do not restrict to OpenACC

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:16PM +0300, Alexander Monakov wrote:
> This patch allows to meaningfully invoke mkoffload with -fopenmp.  The check
> for -fopenacc flag is specific to gomp4 branch: trunk does not have it.
> 
>   * config/nvptx/mkoffload.c (main): Do not check for -fopenacc.
> ---
>  gcc/config/nvptx/mkoffload.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)

LGTM.

Jakub


Re: [gomp4 3/8] libgomp: provide target-to-host fallback diagnostic

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:17PM +0300, Alexander Monakov wrote:
> This patch allows to see when target regions are executed on host with
> GOMP_DEBUG=1 in the environment.
> 
>   * target.c (GOMP_target): Use gomp_debug on fallback path.
> ---
>  libgomp/target.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libgomp/target.c b/libgomp/target.c
> index 6ca80ad..1cc2098 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1008,6 +1008,7 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
>|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
>  {
>/* Host fallback.  */
> +  gomp_debug (0, "%s: target region executing on host\n", __FUNCTION__);
>struct gomp_thread old_thr, *thr = gomp_thread ();
>old_thr = *thr;
>memset (thr, '\0', sizeof (*thr));

Ok.

Jakub


Re: [gomp4 4/8] libgomp: minimal OpenMP support in plugin-nvptx.c

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:18PM +0300, Alexander Monakov wrote:
> This is a minimal patch for NVPTX OpenMP offloading, using Jakub's initial
> implementation.  It allows to successfully run '#pragma omp target', without
> any parallel execution: 1 team of 1 thread is spawned on the device, and
> target regions with '#pragma omp parallel' will fail with a link error.
> 
>   * plugin/plugin-nvptx.c (nvptx_host2dev): Allow NULL 'nvthd'.
> (nvptx_dev2host): Ditto.
> (GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
> (GOMP_OFFLOAD_run): New.

Ok.

Jakub


Re: [gomp4 5/8] libgomp: provide sem.h, mutex.h, ptrlock.h on nvptx

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:19PM +0300, Alexander Monakov wrote:
> This patch provides minimal non-stub implementations for libgomp
> mutex/ptrlock/semaphore, using atomic ops and busy waiting.  The goal here is
> to at least provide stub struct declarations necessary to unbreak libgomp.h.
> 
> Atomics with busy waiting seems to be the only way to provide such primitives
> for inter-team synchronizations, but for intra-team ops a more efficient
> implementation may be possible.

I expect almost all the synchronization primitives can be just intra-team,
the only possible exception (though not required by the standard) would be
the locks used in atomic.c I'd say.  But I guess this is ok for now as the
first step.
> 
> (all functionality is unused since consumers are stubbed out in config/nvptx)
> 
>   * config/nvptx/mutex.h: New file.
>   * config/nvptx/ptrlock.h: New file.
>   * config/nvptx/sem.h: New file.

Jakub


Re: [gomp4 6/8] libgomp: provide stub bar.h on nvptx

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:20PM +0300, Alexander Monakov wrote:
> This stub header only provides empty struct gomp_barrier_t.  For now I've
> punted on providing a minimally-correct implementation.
> 
>   * config/nvptx/bar.h: New file.
> ---
>  libgomp/config/nvptx/bar.h | 38 ++
>  1 file changed, 38 insertions(+)
>  create mode 100644 libgomp/config/nvptx/bar.h

Ok (barrier is complicated by the need to handle explicit tasks and
cancellation), so it will not be just bar.sync insn alone (bet bar.arrive
followed by task handling/cancellation checking and finally bar.sync or
so?).

Jakub


Re: [gomp4 7/8] libgomp: work around missing pthread_attr_t on nvptx

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:21PM +0300, Alexander Monakov wrote:
> Although newlib headers define most pthreads types, pthread_attr_t is not
> available.  Macro-replace it by 'void' to keep the prototype of
> gomp_init_thread_affinity unchanged, and do not declare gomp_thread_attr.
> 
>   * libgomp.h: Define pthread_attr_t to void on NVPTX.
> ---
>  libgomp/libgomp.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
> index d51b08b..f4255b4 100644
> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -510,8 +510,13 @@ static inline struct gomp_task_icv *gomp_icv (bool write)
>  return &gomp_global_icv;
>  }
>  
> +#ifdef __nvptx__
> +/* pthread_attr_t is not provided by newlib on NVPTX.  */
> +#define pthread_attr_t void
> +#else
>  /* The attributes to be used during thread creation.  */
>  extern pthread_attr_t gomp_thread_attr;
> +#endif
>  
>  /* Function prototypes.  */
>  

I'd prefer here the https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01418.html
changes to libgomp.h and associated configury changes.

Jakub


Re: [gomp4 8/8] libgomp: provide ICVs via env.c on nvptx

2015-09-24 Thread Jakub Jelinek
On Wed, Sep 23, 2015 at 08:22:22PM +0300, Alexander Monakov wrote:
> This patch ports env.c to NVPTX.  It drops all environment parsing routines
> since there's no "environment" on the device.  For now, the useful effect of
> the patch is providing 'omp_is_initial_device' to distinguish host execution
> from target execution in user code.
> 
> Several functions use gomp_icv, which is not adjusted for NVPTX and thus will
> try to use EMUTLS.  The intended way forward is to provide a custom
> implementation of gomp_icv on NVPTX, likely via pre-allocating storage prior
> to spawning a team.
> 
>   * config/nvptx/env.c: New file.

I don't like this, there is just too much code duplication in this case and
it is going to be a maintainance nightmare going forward (e.g.
gomp-4_1-branch adds further functions, etc.).
I'd suggest split the toplevel env.c into two files, icv.c which would
contain the global variables and most of the small API functions, and env.c
which would contain the global constructor, env var parsing, printing and
perhaps omp_is_initial_device ().  Then nvptx.c would use the toplevel icv.c
and provide its own env.c with just omp_is_initial_device () (which of
course eventually can be inlined by the compiler on NVPTX target or perhaps
any ACCEL_COMPILER, but we need to provide a library version anyway, you can
take address of the function etc.).

Are you ok with that?

Jakub


Re: [RFC] Try vector as a new representation for vector masks

2015-09-24 Thread Richard Biener
On Wed, Sep 23, 2015 at 8:44 PM, Richard Henderson  wrote:
> On 09/23/2015 06:53 AM, Richard Biener wrote:
>> I think independent improvements are
>>
>>  1) remove (most) of the bool patterns from the vectorizer
>>
>>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded
>>
>> (same for COND_EXPR?)
>
> Careful.
>
> The reason that COND_EXPR have embedded comparisons is to handle flags
> registers.  You can't separate the setting of the flags from the using of the
> flags on most targets, because there's only one flags register.
>
> The same is true for VEC_COND_EXPR with respect to MIPS.  The base 
> architecture
> has 8 floating-point comparison result flags, and the vector compare
> instructions are fixed to set fcc[0:width-1].  So again there's only one
> possible output location for the result of the compare.
>
> MIPS is going to present a problem if we attempt to generalize logical
> combinations of these vector, since one has to use several instructions
> (or one insn and pre-load constants into two registers) to get the fcc bits 
> out
> into a form we can manipulate.

Both are basically a (target) restriction on how we should expand a conditional
move (and its condition).  It's techincally convenient to tie both together by
having them in the same statement but it's also techincally very incovenient
in other places.  I'd say for targets where

tem_1 = a_2 < b_3;
res_4 = tem_1 ? c_5 : d_6;
res_7 = tem_1 ? x_8 : z_9;

presents a serious issue ("re-using" the flags register) out-of-SSA
should duplicate
the conditionals so that TER can do its job (and RTL expansion should use TER
to get at the flags setter).  I imagine that if we expand the above to
adjacent statements
the CPUs can re-use the condition code.

To me where the condition is in GIMPLE is an implementation detail and the
inconveniences outweight the benefits.

Maybe we should make the effects of TER on the statement schedule explicitely
visible to make debugging that easier and remove the implicit scheduling from
the SSA name expansion code (basically require SSA names do have expanded defs).
That way we have the chance to perform pre-expansion "scheduling" in a more
predictable way leaving only the parts of the expansion using TER that want to
see a bigger expression (like [VEC_]COND_EXPR expansion eventually).

Richard.

>
> r~


Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-24 Thread Richard Biener
On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm  wrote:
> On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote:
>> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz  wrote:
>> > Hi,
>> >
>> > On Tue, 22 Sep 2015, David Malcolm wrote:
>> >
>> >> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
>> >> table ever get smaller, or does it only ever get inserted into?
>> >
>> > It only ever grows.
>> >
>> >> An idea I had is that we could stash short ranges directly into the 32
>> >> bits of location_t, by offsetting the per-column-bits somewhat.
>> >
>> > It's certainly worth an experiment: let's say you restrict yourself to
>> > tokens less than 8 characters, you need an additional 3 bits (using one
>> > value, e.g. zero, as the escape value).  That leaves 20 bits for the line
>> > numbers (for the normal 8 bit columns), which might be enough for most
>> > single-file compilations.  For LTO compilation this often won't be enough.
>> >
>> >> My plan is to investigate the impact these patches have on the time and
>> >> memory consumption of the compiler,
>> >
>> > When you do so, make sure you're also measuring an LTO compilation with
>> > debug info of something big (firefox).  I know that we already had issues
>> > with the size of the linemap data in the past for these cases (probably
>> > when we added columns).
>>
>> The issue we have with LTO is that the linemap gets populated in quite
>> random order and thus we repeatedly switch files (we've mitigated this
>> somewhat for GCC 5).  We also considered dropping column info
>> (and would drop range info) as diagnostics are from optimizers only
>> with LTO and we keep locations merely for debug info.
>
> Thanks.  Presumably the mitigation you're referring to is the
> lto_location_cache class in lto-streamer-in.c?
>
> Am I right in thinking that, right now, the LTO code doesn't support
> ad-hoc locations? (presumably the block pointers only need to exist
> during optimization, which happens after the serialization)

LTO code does support ad-hoc locations but they are "restored" only
when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA).

> The obvious simplification would be, as you suggest, to not bother
> storing range information with LTO, falling back to just the existing
> representation.  Then there's no need to extend LTO to serialize ad-hoc
> data; simply store the underlying locus into the bit stream.  I think
> that this happens already: lto-streamer-out.c calls expand_location and
> stores the result, so presumably any ad-hoc location_t values made by
> the v2 patches would have dropped their range data there when I ran the
> test suite.

Yep.  We only preserve BLOCKs, so if you don't add extra code to
preserve ranges they'll be "dropped".

> If it's acceptable to not bother with ranges for LTO, one way to do the
> "stashing short ranges into the location_t" idea might be for the
> bits-per-range of location_t values to be a property of the line_table
> (or possibly the line map), set up when the struct line_maps is created.
> For non-LTO it could be some tuned value (maybe from a param?); for LTO
> it could be zero, so that we have as many bits as before for line/column
> data.

That could be a possibility (likewise for column info?)

Richard.

> Hope this sounds sane
> Dave
>


Re: [PATCH] update a few places for the change from gimple_statement_base to gimple

2015-09-24 Thread Richard Biener
On Thu, Sep 24, 2015 at 4:25 AM,   wrote:
> From: Trevor Saunders 
>
> Hi,
>
> This fixes up a few remaining references to gimple_statement_base that were 
> just brought up.
>
> bootstrapped on x86_64-linux-gnu, but the only non comment / doc change is 
> gdbhooks.py, ok?

Ok.

Richard.

> Trev
>
> gcc/ChangeLog:
>
> 2015-09-23  Trevor Saunders  
>
> * doc/gimple.texi: Update references to gimple_statement_base.
> * gdbhooks.py: Likewise.
> * gimple.h: Likewise.
> ---
>  gcc/doc/gimple.texi | 12 ++--
>  gcc/gdbhooks.py |  2 +-
>  gcc/gimple.h| 10 +-
>  3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> index 543de90..d089d4f 100644
> --- a/gcc/doc/gimple.texi
> +++ b/gcc/doc/gimple.texi
> @@ -92,8 +92,8 @@ groups: a header describing the instruction and its 
> locations,
>  and a variable length body with all the operands. Tuples are
>  organized into a hierarchy with 3 main classes of tuples.
>
> -@subsection @code{gimple_statement_base} (gsbase)
> -@cindex gimple_statement_base
> +@subsection @code{gimple} (gsbase)
> +@cindex gimple
>
>  This is the root of the hierarchy, it holds basic information
>  needed by most GIMPLE statements. There are some fields that
> @@ -223,7 +223,7 @@ is then inherited from the other two tuples.
>
>  @itemize @bullet
>  @item @code{gsbase}
> -Inherited from @code{struct gimple_statement_base}.
> +Inherited from @code{struct gimple}.
>
>  @item @code{def_ops}
>  Array of pointers into the operand array indicating all the slots that
> @@ -300,7 +300,7 @@ kinds, along with their relationships to @code{GSS_} 
> values (layouts) and
>  @code{GIMPLE_} values (codes):
>
>  @smallexample
> -   gimple_statement_base
> +   gimple
>   |layout: GSS_BASE
>   |used for 4 codes: GIMPLE_ERROR_MARK
>   |  GIMPLE_NOP
> @@ -2654,7 +2654,7 @@ any new basic blocks which are necessary.
>
>  The first step in adding a new GIMPLE statement code, is
>  modifying the file @code{gimple.def}, which contains all the GIMPLE
> -codes.  Then you must add a corresponding gimple_statement_base subclass
> +codes.  Then you must add a corresponding gimple subclass
>  located in @code{gimple.h}.  This in turn, will require you to add a
>  corresponding @code{GTY} tag in @code{gsstruct.def}, and code to handle
>  this tag in @code{gss_for_code} which is located in @code{gimple.c}.
> @@ -2667,7 +2667,7 @@ in @code{gimple.c}.
>  You will probably want to create a function to build the new
>  gimple statement in @code{gimple.c}.  The function should be called
>  @code{gimple_build_@var{new-tuple-name}}, and should return the new tuple
> -as a pointer to the appropriate gimple_statement_base subclass.
> +as a pointer to the appropriate gimple subclass.
>
>  If your new statement requires accessors for any members or
>  operands it may have, put simple inline accessors in
> diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
> index 3a62a2d..2b9a94c 100644
> --- a/gcc/gdbhooks.py
> +++ b/gcc/gdbhooks.py
> @@ -484,7 +484,7 @@ def build_pretty_printer():
>   'cgraph_node', CGraphNodePrinter)
>  pp.add_printer_for_types(['dw_die_ref'],
>   'dw_die_ref', DWDieRefPrinter)
> -pp.add_printer_for_types(['gimple', 'gimple_statement_base *',
> +pp.add_printer_for_types(['gimple', 'gimple *',
>
># Keep this in the same order as gimple.def:
>'gimple_cond', 'const_gimple_cond',
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 91c26b6..30b1041 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -123,7 +123,7 @@ enum gimple_rhs_class
>  };
>
>  /* Specific flags for individual GIMPLE statements.  These flags are
> -   always stored in gimple_statement_base.subcode and they may only be
> +   always stored in gimple.subcode and they may only be
> defined for statement codes that do not use subcodes.
>
> Values for the masks can overlap as long as the overlapping values
> @@ -380,7 +380,7 @@ struct GTY((tag("GSS_BIND")))
>tree vars;
>
>/* [ WORD 8 ]
> - This is different than the BLOCK field in gimple_statement_base,
> + This is different than the BLOCK field in gimple,
>   which is analogous to TREE_BLOCK (i.e., the lexical block holding
>   this statement).  This field is the equivalent of BIND_EXPR_BLOCK
>   in tree land (i.e., the lexical scope defined by this bind).  See
> @@ -744,7 +744,7 @@ struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
>
>
>  /* GIMPLE_OMP_ATOMIC_LOAD.
> -   Note: This is based on gimple_statement_base, not g_s_omp, because g_s_omp
> +   Note: This is based on gimple, not g_s_omp, because g_s_omp
> contains a sequence, which we don't need here.  */
>
>  struct GTY((tag("GSS_OMP_ATOMIC_LOAD")))
> @@ -1813,7 +1813,7 @@ gimple_set_no_warning (gimple *stmt, bool no_warning)
>
> You can learn m

Re: Openacc launch API

2015-09-24 Thread Jakub Jelinek
On Fri, Sep 18, 2015 at 11:13:03AM +0200, Bernd Schmidt wrote:
> On 09/17/2015 04:40 PM, Nathan Sidwell wrote:
> 
> >Added call to gomp_fatal, indicating libgomp is out of date. Also added
> >a default to the switch following with the same effect.  The trouble
> >with implementing handling of device_type here now, is difficulty in
> >testing its correctness.  If it were  buggy we'd be in a worse position
> >than not having it.
> 
> Is that so difficult though? See if nvptx ignores (let's say) intelmic
> arguments in favour of the default and accepts nvptx ones.
> 
> >+  if (num_waits > 8)
> >+gomp_fatal ("too many waits for legacy interface");
> >+
> >+  va_start (ap, num_waits);
> >+  for (ix = 0; ix != num_waits; ix++)
> >+waits[ix] = va_arg (ap, int);
> >+  waits[ix] = 0;
> >+  va_end (ap);
> 
> I still don't like this. I think there are at least two better alternatives:
> add a new GOMP_LAUNCH_key which makes GOACC_parallel read a number of waits
> from a va_list * pointer passed after it, or just admit that the legacy
> function always does host fallback and just truncate the current version
> after
> 
>   if (host_fallback)
> {
>   goacc_save_and_set_bind (acc_device_host);
>   fn (hostaddrs);
>   goacc_restore_bind ();
>   return;
> }
> 
> (which incidentally ignores all the wait arguments).

Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
anyway because of the incompatible PTX version, then why don't you just
do
  goacc_save_and_set_bind (acc_device_host);
  fn (hostaddrs);
  goacc_restore_bind ();
and nothing else in GOACC_parallel?  If it doesn't always do host fallback,
then I wonder if e.g. the waits wouldn't be better represented as an array
of ints, GOMP_LAUNCH_WAIT op would then encode num_waits and be followed
by a va_arg (ap, int *) with num_waits entries in it.  No need to pass
va_list around, instead just the pointer, and the compatibility entry point
would alloca an array, stuff the waits in it and pass GOMP_LAUNCH_WAIT with
the allocated array.

Other than that, I think Bernd has covered all the issues I had.

Jakub


Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg

2015-09-24 Thread Thomas Schwinge
Hi Tom!

On Thu, 24 Sep 2015 08:36:27 +0200, Tom de Vries  wrote:
> On 24/09/15 08:23, Thomas Schwinge wrote:
> > On Tue, 11 Aug 2015 20:53:39 +0200, Tom de Vries  
> > wrote:
> >> Don't create superfluous parm in expand_omp_taskreg
> >>
> >> 2015-08-11  Tom de Vries  
> >>
> >>* omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
> >>parm_decl, rather than generating a dummy default def in cfun.
> >>* tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
> >>ssa_name from cfun and child_fn do not share a stmt as def stmt.
> >>(move_stmt_op): Handle PARM_DECl.
> >>(gather_ssa_name_hash_map_from): New function.
> >>(move_sese_region_to_fn): Add default defs for function params, and add
> >>them to vars_map.  Release copied ssa names.
> >>* tree-cfg.h (gather_ssa_name_hash_map_from): Declare.
> >
> > Do I understand correct that with this change present on trunk (which I'm
> > currently merging into gomp-4_0-branch), the changes you've earlier done
> > on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
> > gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
> > much of the following patches can be reverted now (listed backwards in
> > time)?
> 
> indeed, in the above commit we release the dangling ssa names in 
> move_sese_region_to_fn. So after committing this patch to the 
> gomp-4_0-branch, the call to release_dangling_ssa_names is no longer 
> necessary, and the function release_dangling_ssa_names can be removed.

From IRC:

 vries: Are you totally busy right now, or could you spend
  an hour on backporting to gomp-4_0-branch your trunk commit that I
  mentorioned earlier today?
 tschwinge: shouldn't be a problem
 veWell, I'm asking because in my merge tree, I'm running
  into an assertion that you added there -- not sure yet whether I've
  done something wrong, though.
 vries: ^
 tschwinge: it would be useful for me to know which assertion

So far, I only looked at libgomp test results, and there, none of the
OpenMP but quite a number of the OpenACC tests fail as follows, for
example libgomp.oacc-c-c++-common/kernels-1.c:

[...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ 
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/kernels-1.c
 -B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/ 
-B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs 
-I[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp 
-I[...]/source-gcc/libgomp/testsuite/../../include 
-I[...]/source-gcc/libgomp/testsuite/.. 
-I/usr/local/cuda-5.5/targets/x86_64-linux/include -fmessage-length=0 
-fno-diagnostics-show-caret -fdiagnostics-color=never 
-B[...]/install/offload-nvptx-none/libexec/gcc/x86_64-pc-linux-gnu/6.0.0 
-B[...]/install/offload-nvptx-none/bin 
-B[...]/install/offload-x86_64-intelmicemul-linux-gnu/libexec/gcc/x86_64-pc-linux-gnu/6.0.0
 -B[...]/install/offload-x86_64-intelmicemul-linux-gnu/bin -fopenacc 
-I[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 
-L[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs -lm -o ./kernels-1.exe
In file included from 
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/kernels-1.c:6:0:

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses.h:
 In function 'main':

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses.h:182:9:
 internal compiler error: in replace_ssa_name, at tree-cfg.c:6423
0xb0518a replace_ssa_name
[...]/source-gcc/gcc/tree-cfg.c:6423
0xb05407 move_stmt_op
[...]/source-gcc/gcc/tree-cfg.c:6501
0xd75e43 walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), 
void*, hash_set >*, tree_node* 
(*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*))
[...]/source-gcc/gcc/tree.c:11341
0x89832c walk_gimple_op(gimple*, tree_node* (*)(tree_node**, int*, void*), 
walk_stmt_info*)
[...]/source-gcc/gcc/gimple-walk.c:204
0x898814 walk_gimple_stmt(gimple_stmt_iterator*, tree_node* 
(*)(gimple_stmt_iterator*, bool*, walk_stmt_info*), tree_node* (*)(tree_node**, 
int*, void*), walk_stmt_info*)
[...]/source-gcc/gcc/gimple-walk.c:562
0xb088e3 move_block_to_fn
[...]/source-gcc/gcc/tree-cfg.c:6774
0xb088e3 move_sese_region_to_fn(function*, basic_block_def*, 
basic_block_def*, tree_node*)
[...]/source-gcc/gcc/tree-cfg.c:7238
0x9c1133 expand_omp_target
[...]/source-gcc/gcc/omp-low.c:9802
0x9c3a1c expand_omp
[...]/source-gcc/gcc/omp-low.c:10240
0x9cc3fe execute_expand_omp
[...]/source-gcc/gcc/omp-low.c:10486
0x9cc568 execute
[...]/source-gcc/gcc/omp-low.c:10609

source-gcc/gcc/tree-cfg.c:

[...]
  6405  /* Creates an ssa name in 

Re: [build] Support PIE on Solaris

2015-09-24 Thread Rainer Orth
Rainer Orth  writes:

> Beyond the reasons for the bundled Solaris CRTs already cited in
>
>   https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01638.html
>
> they need to be PIC to support position independent executables (PIE).
>
> While linker support for PIE has existed in Solaris ld since at least
> Solaris 11.2 and GNU ld has just gotten the last (mostly cosmetic) bit
> for binutils 2.26, there were no usable CRTs before.
>
> Now those pieces are in place, this patch enables PIE if the necessary
> support (linker and CRTs) is detected.  It's mostly straightforward,
> adapting specs changes in gnu-user.h and allowing for differences in
> linker options.
>
> crtp.o, crtpg.o, and gmon.o are now compiled as PIC to also work with
> PIE.  I don't thing there's any point to have separate PIC and non-PIC
> versions here.
>
> During early development of the patch, I found that gmon.c includes the
> trailing NULs in error messages it prints.  Now corrected, though not
> strictly related to the patch.
>
> Contrary to other targets, where -pie seems to be silently ignored if
> PIE support is missing, I've decided to have gcc error out on Solaris in
> this situation.  This also allows to easily distinguish between
> configurations with and without PIE support in the testsuite.  
>
> Tested on i386-pc-solaris2.1[012] and sparc-sun-solaris2.1[012] with
> both as/ld and gas/gld, and x86_64-unknown-linux-gnu.
>
> I've also bootstrapped on i386-pc-solaris2.12 and sparc-sun-solaris2.12
> with --enable-default-pie.  There are a couple of new failures, but they
> also occur on Linux/x86_64 and I've already filed PRs for (most of?)
> them.
>
> Again, perhaps with exception of the obvious hunk in gcc.c, this patch
> is purely Solaris-specific, so I'll commit it in a couple of days.  I'd
> also like to backport it to the gcc-5 branch after some soak time on
> mainline.

I've now installed both the previous Solaris CRTs patch and this one.  A
final round of testing revealed a problem with gld PIE support
detection, though: by mistake I initially did the gas/gld testing with
an unmodified gld 2.25.  While this works just fine on Solaris/x86 (with
the exception of the PIE executables not being marked with DF_1_PIE in
DF_FLAGS_1, a purely informational thing), Solaris/SPARC was different:
even in a default build, many PIE tests failed with

gld-2.25: read-only segment has dynamic relocations.

which doesn't happen with a gld 2.25.51 with the Solaris PIE patch.
Also, Solaris ld links those exact same objects just fine.

Therefore I'm now requiring gld 2.26 on Solaris for PIE support, as in
the following patchlet.  Tested by configuring with ld, gld 2.25 and a
gld 2.25.51 faked to call itself gld 2.26 on i386-pc-solaris2.11 and the
bundled gld 2.23.2 on x86_64-pc-linux-gnu and checking the HAVE_LD_PIE
is set correctly.

diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4751,7 +4751,12 @@ AC_MSG_RESULT($gcc_cv_ld_eh_frame_ciev3)
 AC_MSG_CHECKING(linker position independent executable support)
 gcc_cv_ld_pie=no
 if test $in_tree_ld = yes ; then
-  if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 15 -o "$gcc_cv_gld_major_version" -gt 2 \
+  case "$target" in
+# Full PIE support on Solaris was only introduced in gld 2.26.
+*-*-solaris2*)  gcc_gld_pie_min_version=26 ;;
+*) 		gcc_gld_pie_min_version=15 ;;
+  esac
+  if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge "$gcc_gld_pie_min_version" -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes; then
 gcc_cv_ld_pie=yes
   fi
@@ -4759,6 +4764,14 @@ elif test x$gcc_cv_ld != x; then
   # Check if linker supports -pie option
   if $gcc_cv_ld --help 2>/dev/null | grep -- -pie > /dev/null; then
 gcc_cv_ld_pie=yes
+case "$target" in
+  *-*-solaris2*)
+	if echo "$ld_ver" | grep GNU > /dev/null \
+	  && test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -lt 26; then
+	  gcc_cv_ld_pie=no
+	fi
+	;;
+esac
   else
 case "$target" in
   *-*-solaris2.1[[1-9]]*)
@@ -4772,7 +4785,7 @@ elif test x$gcc_cv_ld != x; then
 fi
 if test x"$gcc_cv_ld_pie" = xyes; then
 	AC_DEFINE(HAVE_LD_PIE, 1,
-[Define if your linker supports -pie option.])
+[Define if your linker supports PIE option.])
 fi
 AC_MSG_RESULT($gcc_cv_ld_pie)
 

Despite that patch, with --enable-default-pie, there are many failures
on sparc-sun-solaris2.12 with gas/gld 2.26, both the same error as above
and execution failures in boehm-gc, libgomp, and libjava.  Given that
those errors don't occur with as/ld or on i386-pc-solaris2.12, ISTM that
there's something amiss with gld on Solaris/SPARC.  Given that this is a
non-recommended and niche configuration, I'm committing the patch
anyway.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, i386, AVX-512] Fix register name while checking for AVX-512VBMI presence.

2015-09-24 Thread Kirill Yukhin
On 24 Sep 11:20, Uros Bizjak wrote:
> On Thu, Sep 24, 2015 at 11:07 AM, Kirill Yukhin  
> wrote:
> > Hello Uroš,
> > I've comitted (into main trunk) patch in the bottom which
> > checks first bit of ecx (instead of ebx) to verify of
> > AVX-512VBMI presents on the system (according to SDM).
> >
> > It makes those tests `SKIPPED' on KNL.
> >
> > Is it OK to check it into gcc-5-branch?
> 
> Yes. Usually, I backport my patches after a couple of day of "soaking"
> in mainline, so autotesters have a chance to pick the patch and put it
> through their tests.
> 
> BR,
> Uros.

Thanks! I'll wait for the end of the WW.

CC-ing gcc-patches which I gorgot.

--
K

> 
> > gcc/testsuite/
> > * gcc.target/i386/avx512vbmi-check.h (main): Fix register
> > name while checking for AVX-512VBMI presence.
> >
> > --
> > Thanks, K
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h 
> > b/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h
> > index 591ff06..97aca27 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/avx512vbmi-check.h
> > @@ -25,7 +25,7 @@ main ()
> >
> >__cpuid_count (7, 0, eax, ebx, ecx, edx);
> >
> > -  if ((avx512f_os_support ()) && ((ebx & bit_AVX512VBMI) == 
> > bit_AVX512VBMI))
> > +  if ((avx512f_os_support ()) && ((ecx & bit_AVX512VBMI) == 
> > bit_AVX512VBMI))
> > {
> >   do_test ();
> >  #ifdef DEBUG
> >


Re: [ubsan PATCH] Fix uninitialized var issue (PR sanitizer/64906)

2015-09-24 Thread Marek Polacek
On Wed, Sep 23, 2015 at 08:55:53PM +0200, Bernd Schmidt wrote:
> On 09/23/2015 06:07 PM, Marek Polacek wrote:
> >Given that the code above seems to be useless now, I think let's put this
> >patch in as-is, backport it to gcc-5, then remove those redundant hunks on
> >trunk and add the testcase above.  Do you agree?
> 
> Sounds reasonable. If you can find a point in the history where that code
> wasn't useless, it would be good to help us understand why it's there.

I did some archeology.  The code wasn't useless since it was added (r211859)
till r226110 where I added some unshare_exprs.  On the testcase I posted
earlier in the thread that makes a difference:

@@ -11,7 +11,7 @@
   else
 {
   <<< Unknown tree: void_cst >>>
-}, (long unsigned int) (s->a[i] << SAVE_EXPR );;
+}, (long unsigned int) (s->a[UBSAN_BOUNDS (0B, SAVE_EXPR , 0);,
SAVE_EXPR ;] << SAVE_EXPR );;
 }

So we instrument the array multiple times as it's not shared anymore.

Ok to proceed with the plan I mentioned above?

Marek


Re: (patch,rfc) s/gimple/gimple */

2015-09-24 Thread Thomas Schwinge
Hi!

On Sat, 19 Sep 2015 20:55:35 -0400, Trevor Saunders  
wrote:
> On Fri, Sep 18, 2015 at 09:32:37AM -0600, Jeff Law wrote:
> > On 09/18/2015 07:32 AM, Trevor Saunders wrote:
> > >On Wed, Sep 16, 2015 at 03:11:14PM -0400, David Malcolm wrote:
> > >>On Wed, 2015-09-16 at 09:16 -0400, Trevor Saunders wrote:
> > >>>I gave changing from gimple to gimple * a shot last week.

> ok, its committed now :)

[...]/source-gcc/gcc/tree-object-size.c:62:13: warning: 'bool 
plus_stmt_object_size(object_size_info*, tree, gimple)' declared 'static' but 
never defined [-Wunused-function]
 static bool plus_stmt_object_size (struct object_size_info *, tree, 
gimple);
 ^
[...]/source-gcc/gcc/tree-object-size.c:63:13: warning: 'bool 
cond_expr_object_size(object_size_info*, tree, gimple)' declared 'static' but 
never defined [-Wunused-function]
 static bool cond_expr_object_size (struct object_size_info *, tree, 
gimple);
 ^

Not sure why your automation didn't catch these?  Anyway, in r228080 I
now committed these additional changes (as obvious):

commit 24500bbaac87c5e55ded55cb1d4aabca89be1649
Author: tschwinge 
Date:   Thu Sep 24 09:27:12 2015 +

Additional changes to switch from gimple to gimple *

gcc/
* tree-object-size.c (plus_stmt_object_size)
(cond_expr_object_size): Change the formal parameters from gimple
to gimple *.
* tree-ssa-sccvn.h (vn_nary_op_insert_stmt): Likewise.
* tree-ssa-sccvn.c (vn_nary_op_insert_stmt): Make it static.
* tree-ssa-sccvn.h (vn_nary_op_insert_stmt): Don't declare.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228080 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |9 +
 gcc/tree-object-size.c |4 ++--
 gcc/tree-ssa-alias.c   |4 ++--
 gcc/tree-ssa-sccvn.c   |2 +-
 gcc/tree-ssa-sccvn.h   |1 -
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 9c2ad9d..7bc8e91 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-09-24  Thomas Schwinge  
+
+   * tree-object-size.c (plus_stmt_object_size)
+   (cond_expr_object_size): Change the formal parameters from gimple
+   to gimple *.
+   * tree-ssa-sccvn.h (vn_nary_op_insert_stmt): Likewise.
+   * tree-ssa-sccvn.c (vn_nary_op_insert_stmt): Make it static.
+   * tree-ssa-sccvn.h (vn_nary_op_insert_stmt): Don't declare.
+
 2015-09-24  Rainer Orth  
 
* configure.ac (gcc_cv_ld_pie): Check for gld >= 2.26 on Solaris.
diff --git gcc/tree-object-size.c gcc/tree-object-size.c
index f76f160..230761b 100644
--- gcc/tree-object-size.c
+++ gcc/tree-object-size.c
@@ -59,8 +59,8 @@ static void collect_object_sizes_for (struct object_size_info 
*, tree);
 static void expr_object_size (struct object_size_info *, tree, tree);
 static bool merge_object_sizes (struct object_size_info *, tree, tree,
unsigned HOST_WIDE_INT);
-static bool plus_stmt_object_size (struct object_size_info *, tree, gimple);
-static bool cond_expr_object_size (struct object_size_info *, tree, gimple);
+static bool plus_stmt_object_size (struct object_size_info *, tree, gimple *);
+static bool cond_expr_object_size (struct object_size_info *, tree, gimple *);
 static void init_offset_limit (void);
 static void check_for_plus_in_loops (struct object_size_info *, tree);
 static void check_for_plus_in_loops_1 (struct object_size_info *, tree,
diff --git gcc/tree-ssa-alias.c gcc/tree-ssa-alias.c
index f3674ae..5ff2275 100644
--- gcc/tree-ssa-alias.c
+++ gcc/tree-ssa-alias.c
@@ -76,12 +76,12 @@ along with GCC; see the file COPYING3.  If not see
 
The main alias-oracle entry-points are
 
-   bool stmt_may_clobber_ref_p (gimple, tree)
+   bool stmt_may_clobber_ref_p (gimple *, tree)
 
  This function queries if a statement may invalidate (parts of)
  the memory designated by the reference tree argument.
 
-   bool ref_maybe_used_by_stmt_p (gimple, tree)
+   bool ref_maybe_used_by_stmt_p (gimple *, tree)
 
  This function queries if a statement may need (parts of) the
  memory designated by the reference tree argument.
diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
index 57c1b55..ce79842 100644
--- gcc/tree-ssa-sccvn.c
+++ gcc/tree-ssa-sccvn.c
@@ -2684,7 +2684,7 @@ vn_nary_op_insert (tree op, tree result)
 /* Insert the rhs of STMT into the current hash table with a value number of
RESULT.  */
 
-vn_nary_op_t
+static vn_nary_op_t
 vn_nary_op_insert_stmt (gimple *stmt, tree result)
 {
   vn_nary_op_t vno1
diff --git gcc/tree-ssa-sccvn.h gcc/tree-ssa-sccvn.h
index 92ca85a..d0a911f 100644
--- gcc/tree-ssa-sccvn.h
+++ gcc/tree-ssa-sccvn.h
@@ -204,7 +204,6 @@ tree vn_nary_op_lookup_stmt (gimple *, vn_nary_op_t *);
 tree vn_nary_op_lookup_pieces (unsigned int, enum tree_code,
   tree, tree *, vn_nary_op_t *);
 vn_nary_op_t vn_nary_op_insert (tree, tree);
-vn_nary_op

ARM: fp16 Fix PR 67624 - Incorrect conversion of float Infinity to __fp16

2015-09-24 Thread Richard Earnshaw
This patch fixes the bug reported in PR67624 where a single-precision
infinity value was being incorrectly converted to a half-precision NaN
value.
Since we need to preserve the semantics of converting signaling NaNs to
quiet NaNs, we test explicitly for this case.

Built and tested on an arm-eabi cross.  Applied to trunk.


libgcc:
PR libgcc/67624
* config/arm/fp16.c (__gnu_f2h_internal): Handle infinity correctly.

gcc/testsuite:
PR libgcc/67624
* gcc.target/arm/fp16-inf.c: New test.

diff --git a/gcc/testsuite/gcc.target/arm/fp16-inf.c 
b/gcc/testsuite/gcc.target/arm/fp16-inf.c
new file mode 100644
index 000..ce5c197
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/fp16-inf.c
@@ -0,0 +1,14 @@
+/* { dg-do run } */
+/* { dg-options "-O -mfp16-format=ieee" } */
+
+/* Conversion of infinity to __fp16 and back again should preserve the
+   value.  */
+volatile float f = __builtin_inf ();
+
+int main ()
+{
+  __fp16 h = f;
+  if (h != __builtin_inf ())
+__builtin_abort ();
+  return 0;
+}
diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index 86a6348..8eaae2b 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -35,9 +35,12 @@ __gnu_f2h_internal(unsigned int a, int ieee)
 {
   if (!ieee)
return sign;
+  if (mantissa == 0)
+   return sign | 0x7c00;   /* Infinity.  */
+  /* Remaining cases are NaNs.  Convert SNaN to QNaN.  */
   return sign | 0x7e00 | (mantissa >> 13);
 }
-  
+
   if (aexp == 0 && mantissa == 0)
 return sign;
 
@@ -49,7 +52,7 @@ __gnu_f2h_internal(unsigned int a, int ieee)
 {
   mask = 0x00ff;
   if (aexp >= -25)
-mask >>= 25 + aexp;
+   mask >>= 25 + aexp;
 }
   else
 mask = 0x1fff;


Re: Openacc launch API

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 10:40 AM, Jakub Jelinek wrote:

Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
anyway because of the incompatible PTX version, then why don't you just
do
   goacc_save_and_set_bind (acc_device_host);
   fn (hostaddrs);
   goacc_restore_bind ();
and nothing else in GOACC_parallel?


That was essentially my suggestion.


Other than that, I think Bernd has covered all the issues I had.


What is your opinion on the forward compatibility issue? Is it something 
we care about?



Bernd



Re: Openacc launch API

2015-09-24 Thread Jakub Jelinek
On Thu, Sep 24, 2015 at 11:50:56AM +0200, Bernd Schmidt wrote:
> On 09/24/2015 10:40 AM, Jakub Jelinek wrote:
> >Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
> >anyway because of the incompatible PTX version, then why don't you just
> >do
> >   goacc_save_and_set_bind (acc_device_host);
> >   fn (hostaddrs);
> >   goacc_restore_bind ();
> >and nothing else in GOACC_parallel?
> 
> That was essentially my suggestion.
> 
> >Other than that, I think Bernd has covered all the issues I had.
> 
> What is your opinion on the forward compatibility issue? Is it something we
> care about?

For the (unlikely) case of using a newer GCC compiled binaries or libraries
with older libgomp, I'd prefer something other than silent crash.
Often it will not just start at all, because the binary needs newer symbols
from the library, if that is not the case, then supposedly ignoring some
newer features is fine, gomp_fatal is acceptable too though, but assert
failure is not.

Jakub


Re: [PATCH 0/4] bb-reorder: Add the "simple" algorithm

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 12:06 AM, Segher Boessenkool wrote:

The current basic block reordering always uses the "software trace cache"
algorithm.  That has a few problems:

1) It increases code size substantially; this makes it not suitable for
-O1 or -Os, and not at all for some architectures;
2) but it is enabled for -Os and all targets;
3) and -O1 gets nothing, resulting in pretty jumpy code.


A general question first, I see code in bb-reorder.c (in copy_bb_p) that 
limits the amount of code growth if not optimizing for speed. Is that 
not working as expected or not sufficient?


Your code looks like a nice clean algorithm so I have no objections to 
it (detailed comments to follow), but I want to make sure it is 
necessary to add it.



Bernd


Re: Openacc launch API

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 11:56 AM, Jakub Jelinek wrote:

On Thu, Sep 24, 2015 at 11:50:56AM +0200, Bernd Schmidt wrote:

On 09/24/2015 10:40 AM, Jakub Jelinek wrote:

Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
anyway because of the incompatible PTX version, then why don't you just
do
   goacc_save_and_set_bind (acc_device_host);
   fn (hostaddrs);
   goacc_restore_bind ();
and nothing else in GOACC_parallel?


That was essentially my suggestion.


Other than that, I think Bernd has covered all the issues I had.


What is your opinion on the forward compatibility issue? Is it something we
care about?


For the (unlikely) case of using a newer GCC compiled binaries or libraries
with older libgomp, I'd prefer something other than silent crash.
Often it will not just start at all, because the binary needs newer symbols
from the library, if that is not the case, then supposedly ignoring some
newer features is fine, gomp_fatal is acceptable too though, but assert
failure is not.


In that case the patch is OK with the change suggested above.


Bernd


Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 12:06 AM, Segher Boessenkool wrote:

This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
as default.  For -O2 and up (except -Os) it is switched to "stc" instead.
Targets that never want STC can override this.  This changes -freorder-blocks
to be on at -O1 and up (was -O2 and up).

In effect, the changes are for -O1 (which now gets "simple" instead of
nothing), -Os (which now gets "simple" instead of "stc", since STC results
in much bigger code), and for targets that wish to never use STC (not in
this patch though).


This should be merged with its documentation in 4/4, and personally I'd 
have no problem reviewing a patch with 2/3/4 all in one. Splitting 
patches is most helpful if there are parts that rearrange things such as 
your 1/4, or if there are multiple independent functional changes. I'm 
not saying you did anything wrong by splitting, just that maybe you made 
unnecessary work for yourself.


No objections to 3/4 and 4/4 otherwise.


Bernd


Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 12:06 AM, Segher Boessenkool wrote:

This is the meat of this series: a new algorithm to do basic block
reordering.  It uses the simple greedy approach to maximum weighted
matching, where the weights are the predicted execution frequency of
the edges.  This always finds a solution that is within a factor two
of optimal, if you disregard loops (which we cannot allow) and the
complications of block partitioning.


Looks really good for the most part.

The comment at the top of the file should be updated to mention both 
algorithms.



+  /* Sort the edges, the most desirable first.  */
+
+  std::stable_sort (edges, edges + n, edge_order);


Any thoughts on this vs qsort? Do you need a stable sort?


+  int j;
+  for (j = 0; j < n; j++)


for (int j ...
here and in the other loop that uses j.


+  /* If the entry edge no longer falls through we have to make a new
+ block so it can do so again.  */
+
+  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
+{
+  force_nonfallthru (e);
+  e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+  BB_COPY_PARTITION (e->src, e->dest);
+}
+}


That's a bit odd, can this situation be prevented earlier? Why wouldn't 
we force the entry edge to fall thru?



Bernd



Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-09-24 Thread Bernd Schmidt

On 09/23/2015 04:48 PM, Dominik Vogt wrote:

On Tue, Sep 22, 2015 at 01:56:15PM -0600, Jeff Law wrote: > Is
there some good reason these aren't hooks?

No, that was just inobservance.  New version attached.  Would it be
preferrable to initialize the hooks with a NULL pointer and test
the pointer before calling them?  (That way the changes to
hooks.[ch] could be dropped.)


There are already several hooks/macros in use for this kind of thing, 
have you checked that they are not usable for your purpose? There's 
ASM_DECLARE_FUNCTION_NAME, which is used by nvptx for example, and 
there's also ASM_OUTPUT_FUNCTION_PREFIX, which is apparently used by 
nothing in the current tree. For the end you could use 
ASM_DECLARE_FUNCTION_SIZE.


FWIW I prefer the initialization with functions rather than NULL.


Bernd



[PATCH] Fix PR67699, remove abstract origin streaming

2015-09-24 Thread Richard Biener

The following patch removes streaming of abstract origins into ltrans
boundaries.  This was introduced by rev. 201468 but I can't find the
post of the change on the mailinglist and thus its reasoning.  Fact is
we never stream DECL_ABSTRACT_ORIGIN, so doing the abstract origin
handling must have had other side-effects.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-09-24  Richard Biener  

PR lto/67699
* lto-cgraph.c (compute_ltrans_boundary): Do not stream
abstract origins.

* g++.dg/pr67699.C: New testcase.

Index: gcc/lto-cgraph.c
===
--- gcc/lto-cgraph.c(revision 228074)
+++ gcc/lto-cgraph.c(working copy)
@@ -895,14 +895,6 @@ compute_ltrans_boundary (lto_symtab_enco
   add_node_to (encoder, node, true);
   lto_set_symtab_encoder_in_partition (encoder, node);
   create_references (encoder, node);
-  /* For proper debug info, we need to ship the origins, too.  */
-  if (DECL_ABSTRACT_ORIGIN (node->decl))
-   {
- struct cgraph_node *origin_node
- = cgraph_node::get_create (DECL_ABSTRACT_ORIGIN (node->decl));
- origin_node->used_as_abstract_origin = true;
- add_node_to (encoder, origin_node, true);
-   }
 }
   for (lsei = lsei_start_variable_in_partition (in_encoder);
!lsei_end_p (lsei); lsei_next_variable_in_partition (&lsei))
@@ -914,13 +906,6 @@ compute_ltrans_boundary (lto_symtab_enco
   lto_set_symtab_encoder_in_partition (encoder, vnode);
   lto_set_symtab_encoder_encode_initializer (encoder, vnode);
   create_references (encoder, vnode);
-  /* For proper debug info, we need to ship the origins, too.  */
-  if (DECL_ABSTRACT_ORIGIN (vnode->decl))
-   {
- varpool_node *origin_node
-   = varpool_node::get (DECL_ABSTRACT_ORIGIN (vnode->decl));
- lto_set_symtab_encoder_in_partition (encoder, origin_node);
-   }
 }
   /* Pickle in also the initializer of all referenced readonly variables
  to help folding.  Constant pool variables are not shared, so we must


Re: [AArch64] Fix Prefetch ICE

2015-09-24 Thread Marcus Shawcroft
On 24 September 2015 at 07:47, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> Please find attached the patch that fixes an ICE for prefetch.
>
> The predicate is too lose for the constraints. Hence, the patch tightens
> up the predicate to be exactly as constraint  allows, avoids a “reload”
> and allows better code generation.
>
> Submitted on behalf of Andrew Pinski.
>
> Thanks,
> Naveen
>
> 2015-09-24  Andrew Pinski  
>
> ChangeLog
>
> * config/aarch64/aarch64.md (prefetch):
> Change the predicate of operand 0 to register_operand.

Hi, OK and can you back port to 5 ? Thanks /Marcus


[PATCH, fortran] Revival of AUTOMATIC patch

2015-09-24 Thread Jim MacArthur
Hi all, I'm following up on some old work my colleague Mark Doffman did to try 
and get support for the AUTOMATIC keyword into trunk. In the enclosed patch 
I've addressed the problem with it accepting 'automatic' outside -std=gnu (it 
will now only accept AUTOMATIC under -std=gnu or -std=legacy). I've also added 
some test cases and documentation.

To address some of the other questions about this patch:

* AUTOMATIC isn't in any official standard, but is supported by the Sun/Oracle 
Fortran compiler: 
http://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vn79/index.html#z400073dc651 
and the IBM XL compiler: 
https://www-304.ibm.com/support/docview.wss?uid=swg27018978&aid=1

* Making this patch is our second choice after modifying our source code. The 
scale of our source means it's not practical to manually modify it. For other 
legacy features we've been able to do some automated transforms, but we can't 
figure out any way to do this for AUTOMATIC. There's a chance there will be 
some other people out there stuck with legacy code who will benefit from this 
change.

* I agree that 'automatic' can be easily confused with automatic objects. We 
could rename the keyword to something else (perhaps 'stack'), but then that 
removes compatibility with the Sun and IBM compilers.

This has been tested with check-gfortran for x86_64-pc-linux-gnu host & 
target; there are no unexpected failures and the new test cases pass.

Mark Doffman's original emails were in January and February 2014 in case you 
want to review them.

I am in the process of arranging copyright assignment. In the meantime, does 
this look remotely OK?

2015-09-23  Jim MacArthur  

   * decl.c (match_attr_spec): Add DECL_AUTOMATIC to enum. Recognise
   the 'automatic' keyword and call gfc_add_automatic when it is used.
   (gfc_match_automatic): New function. Match 'automatic' as a
   statement and call gfc_add_automatic when it is used.
   * gfortran.h (symbol_attribute): Add 'automatic' to bitfield.
   (gfc_add_automatic) Add declaration.
   * gfortran.texi: Document AUTOMATIC statement.
   * match.h (gfc_match_automatic): Add declaration.
   * symbol.c (check_conflict): Check for conflict between AUTOMATIC
   and SAVE attributes.
   * symbol.c (gfc_add_automatic): New function. Add automatic attribute,
   if the current standard allows it, otherwise fail.
   (gfc_copy_attr): Copy automatic attribute.
   * trans-decl.c (gfc_finish_var_decl): Do not make variables static
   if they have the 'automatic' attribute.
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c  (revision 228055)
+++ gcc/fortran/decl.c  (working copy)
@@ -3445,9 +3445,9 @@
 DECL_ALLOCATABLE = GFC_DECL_BEGIN, DECL_DIMENSION, DECL_EXTERNAL,
 DECL_IN, DECL_OUT, DECL_INOUT, DECL_INTRINSIC, DECL_OPTIONAL,
 DECL_PARAMETER, DECL_POINTER, DECL_PROTECTED, DECL_PRIVATE,
-DECL_PUBLIC, DECL_SAVE, DECL_TARGET, DECL_VALUE, DECL_VOLATILE,
-DECL_IS_BIND_C, DECL_CODIMENSION, DECL_ASYNCHRONOUS, DECL_CONTIGUOUS,
-DECL_NONE, GFC_DECL_END /* Sentinel */
+DECL_PUBLIC, DECL_SAVE, DECL_AUTOMATIC, DECL_TARGET, DECL_VALUE,
+DECL_VOLATILE, DECL_IS_BIND_C, DECL_CODIMENSION, DECL_ASYNCHRONOUS,
+DECL_CONTIGUOUS, DECL_NONE, GFC_DECL_END /* Sentinel */
   };
 
 /* GFC_DECL_END is the sentinel, index starts at 0.  */
@@ -3508,6 +3508,14 @@
  d = DECL_ASYNCHRONOUS;
}
  break;
+
+   case 'u':
+ if (match_string_p ("tomatic"))
+   {
+ /* Matched "automatic".  */
+ d = DECL_AUTOMATIC;
+   }
+ break;
}
  break;
 
@@ -3774,6 +3782,9 @@
  case DECL_SAVE:
attr = "SAVE";
break;
+ case DECL_AUTOMATIC:
+   attr = "AUTOMATIC";
+   break;
  case DECL_TARGET:
attr = "TARGET";
break;
@@ -3942,6 +3953,10 @@
  t = gfc_add_save (¤t_attr, SAVE_EXPLICIT, NULL, &seen_at[d]);
  break;
 
+   case DECL_AUTOMATIC:
+ t = gfc_add_automatic (¤t_attr, NULL, &seen_at[d]);
+ break;
+
case DECL_TARGET:
  t = gfc_add_target (¤t_attr, &seen_at[d]);
  break;
@@ -7389,7 +7404,41 @@
   return MATCH_ERROR;
 }
 
+match
+gfc_match_automatic (void)
+{
+  gfc_symbol *sym;
+  match m;
 
+  gfc_match (" ::");
+
+  for (;;)
+{
+  m = gfc_match_symbol (&sym, 0);
+  switch (m)
+   {
+   case MATCH_YES:
+ if (!gfc_add_automatic (&sym->attr, sym->name,
+&gfc_current_locus))
+   return MATCH_ERROR;
+ if (gfc_match_eos () == MATCH_YES)
+   return MATCH_YES;
+ if (gfc_match_char (',') != MATCH_YES)
+   goto syntax;
+ break;
+   case MATCH_NO:
+ got

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 01:53 PM, Mark Wielaard wrote:

Even if there are such constructs in header files and they aren't
actually bugs or people are unwilling to fix the issue with something
that is more idiomatic C then there are various ways to suppress the
warning. Either just don't use -Wunused-variable or add
-Wno-unused-const-variable. Add an explicit __attribute__((used)) or
just add a #pragma GCC system_header to the .h file.

If we are concerned that this generates warnings that aren't easy to
avoid then we might want to add that particular check behind -Wextra.
But is that really necessary? I am not against implementing an extra
warning exception/flag if it really is necessary. But it does introduce
more complexity and makes the warning less consistent. So what would be
a good way to find out one way or another whether the extra complexity
is needed?


I think at this point we have reports of just two packages generating 
extra warnings, with the warnings at least justifiable in both cases. So 
my vote would be to leave things as-is for now and see if more reports 
come in. It is after all expected that a new warning option generates 
new warnings.



Bernd



Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-24 Thread Mark Wielaard
On Wed, 2015-09-23 at 12:25 -0600, Jeff Law wrote:
> On 09/18/2015 08:29 PM, Martin Sebor wrote:
> >> I guess it is not the 'const' I think should be handled special but the
> >> 'static'.  Having unused static variables (const or not) declared in a
> >> header file but unused seems reasonable since the header file may be
> >> included in multiple .c files each of which uses a subset of the static
> >> variables.
> >
> > I tend to agree. I suppose diagnosing unused non-const static
> > definitions might be helpful but I can't think of a good reason
> > to diagnose unused initialized static consts in C. Especially
> > since they're not diagnosed in C++.
> >
> > Would diagnosing them in source files while avoiding the warning
> > for static const definitions in headers be an acceptable compromise?
> It's probably worth a try.

I am a little concerned that would hide some real issues. In the case of
glibc the header files were actually only used by one main file, so the
variables were indeed unused and needed to be investigated why they were
there in the first place. In the case of wine the issue was that the
header file contained non-idiomatic and somewhat unreadable C constructs
that could easily be replaced by more readable defines for 16bit char
string constants.

Even if there are such constructs in header files and they aren't
actually bugs or people are unwilling to fix the issue with something
that is more idiomatic C then there are various ways to suppress the
warning. Either just don't use -Wunused-variable or add
-Wno-unused-const-variable. Add an explicit __attribute__((used)) or
just add a #pragma GCC system_header to the .h file.

If we are concerned that this generates warnings that aren't easy to
avoid then we might want to add that particular check behind -Wextra.
But is that really necessary? I am not against implementing an extra
warning exception/flag if it really is necessary. But it does introduce
more complexity and makes the warning less consistent. So what would be
a good way to find out one way or another whether the extra complexity
is needed?

Cheers,

Mark


[hsa] Gridification via whole construct nest cloning

2015-09-24 Thread Martin Jambor
Hi,

this is a rewrite of a major portion of gridification code.  The
previous, loop-only copying quickly proved to be just too hacky.  The
new method uses already existing copy_gimple_seq_and_replace_locals to
copy the whole nest including the parallel and possibly teams and
distribute statements and only marks them as phony so that the
statements are deleted at the end of the lowering phase, so semantics
of most of sharing clauses just work without duplicating all the code
processing them.  Reductions don't but I hope I only need to pass them
by_ref to get an initial versions (using atomic instructions) work.

I have already committed the patch to the hsa branch.  You need the
new version of HSA run-time which was released yesterday to use the
branch, at least at -O0 (but because it got released I do not need any
extra copy-propagation like I though I would just a couple of days
ago).

Thanks,

Martin


2015-09-24  Martin Jambor  

* gsstruct.def (GSS_OMP_TEAMS_LAYOUT): New.
* gimple.def (GIMPLE_OMP_TEAMS): Change layout.
* gimple.h (gomp_for): New field kernel_phony.
(gimple_statement_omp_parallel_layout): Likewise.
(gimple_statement_omp_single_layout): Fixed offset in comment.
(gomp_teams): New field kernel_phony.
(gimple_omp_for_kernel_phony): New function.
(gimple_omp_for_set_kernel_phony): Likewise.
(gimple_omp_parallel_kernel_phony): Likewise.
(gimple_omp_parallel_set_kernel_phony): Likewise.
(gimple_omp_teams_kernel_phony): Likewise.
(gimple_omp_teams_set_kernel_phony): Likewise.
* omp-low.c (omp_context): Removed field kernel_inner_loop, added
field kernel_seq.
(fixup_child_record_type): Make sure receiver_decl exists before
modifying it.
(scan_omp_parallel): Only create child function if statement is
not phony.
(single_stmt_in_seq_skip_bind): Add asserts.
(kernel_remap_info): Removed.
(gather_inner_locals): Likewise.
(target_follows_kernelizable_pattern): Removed kri argument,
return bool.
(find_mark_kernel_components): New function.
(attempt_target_kernelization): Removed kri parameter, use
copy_gimple_seq_and_replace_locals for copying, and
find_mark_kernel_components for marking.  Fixup blocks.
(remap_kernel_blocks): Removed.
(scan_omp_kernel_loop): Likewise.
(scan_omp_target): Removed kri variable, scan kernel_seq as any
other gimple_seq.
(expand_target_kernel_body): Get block from appropriate place.  Remove
the correct edge.  Make sure also all simbling regions of inner for
loop are expanded.
(lower_omp_for): Do not emit phony constructs.
(lower_omp_taskreg): Likewise.
(lower_omp_target): Adjusted to use sequence in context.
(lower_omp_teams): Do not emit phony constructs.

diff --git a/gcc/gimple.def b/gcc/gimple.def
index ba1f0e5..a3a4eca 100644
--- a/gcc/gimple.def
+++ b/gcc/gimple.def
@@ -373,7 +373,7 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", 
GSS_OMP_PARALLEL_LAYOUT)
 /* GIMPLE_OMP_TEAMS  represents #pragma omp teams
BODY is the sequence of statements inside the single section.
CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
-DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT)
+DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_TEAMS_LAYOUT)
 
 /* GIMPLE_OMP_GPUKERNEL  represents a parallel loop lowered for execution
on a GPU.  It is an artificial statement created by omp lowering.  */
diff --git a/gcc/gimple.h b/gcc/gimple.h
index d7eb7fc..6f6d8cf 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -615,6 +615,12 @@ struct GTY((tag("GSS_OMP_FOR")))
   /* [ WORD 11 ]
  Pre-body evaluated before the loop body begins.  */
   gimple_seq pre_body;
+
+  /* [ WORD 12 ]
+ If set, this statement is part of a gridified kernel, its clauses need to
+ be scanned and lowered but the statement should be discarded after
+ lowering.  */
+  bool kernel_phony;
 };
 
 
@@ -637,7 +643,7 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT")))
  Shared data argument.  */
   tree data_arg;
 
-  /* TODO: These are only good for omp target, move there when the changes are
+  /* TODO: Revisit placement of the followinf three fields when the changes are
  final.  Also, add getter and setter methods.  */
 
   /* [ WORD 11 ] */
@@ -650,6 +656,12 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT")))
 
   /* [ WORD 13 ] */
   struct gimple_omp_for_iter * GTY((length ("%h.kernel_collapse"))) 
kernel_iter;
+
+  /* [ WORD 9 ]  */
+  /* If set, this statement is part of a gridified kernel, its clauses need to
+ be scanned and lowered but the statement should be discarded after
+ lowering.  */
+  bool kernel_phony;
 };
 
 /* GIMPLE_OMP_PARALLEL or GIMPLE_TASK */
@@ -732,14 +744,14 @@ struct GTY((tag("GSS_OMP_CONTINUE")))
   tree control_use;
 }

Re: [RS6000] Don't pass --oformat to ld

2015-09-24 Thread Alan Modra
On Thu, Sep 24, 2015 at 02:24:25PM +1000, Michael Ellerman wrote:
> On Wed, 2015-09-02 at 11:05 +0930, Alan Modra wrote:
> > bugzilla.redhat.com/show_bug_cgi?id=1255946 shows that gcc built with
> > both powerpc64-linux and powerpc64le-linux support passes wrong linker
> > options when trying to link in the non-default endian.  A --oformat
> > option coming from LINK_TARGET_SPEC is only correct for 32-bit.
> > 
> > It turns out that GNU ld -m options select a particular ld emulation
> > (e*.c file in ld build dir) which provides compiled-in scripts or
> > selects a script from ldscripts/.  Each of these has an OUTPUT_FORMAT
> > statement, which does the same thing as --oformat.  --oformat is
> > therefore redundant when using GNU ld built this century, except
> > possibly when a user overrides the default ld script with -Wl,-T and
> > their script neglects OUTPUT_FORMAT, and it isn't the default output.
> > I don't think it's worth fixing this possible use case.
> > 
> > Bootstrap and testing in progress.  OK for mainline assuming all is
> > OK?
> > 
> > * config/rs6000/sysv4le.h (LINK_TARGET_SPEC): Don't define.
> > * config/rs6000/sysv4.h (LINK_TARGET_SPEC): Likewise.
> > (LINK_SPEC, SUBTARGET_EXTRA_SPECS): Delete link_target.
> 
> Hi Alan,
> 
> If you could please backport this to the gcc-5-branch, that would helpful for
> us (kernel folks).

Bootstrapped and regression tested powerpc64le-linux.  Is this OK for
the branch too, David?

-- 
Alan Modra
Australia Development Lab, IBM


Re: fdiagnostics-color=never does not disable color for some diagnostics

2015-09-24 Thread Jason Merrill

On 09/22/2015 04:23 PM, Manuel López-Ibáñez wrote:

+error_at (loc, "-Werror=%s: no option -%s", arg, new_option);
+  else if (!(cl_options[option_index].flags & CL_WARNING))
+error_at (loc, "-Werror=%s: -%s is not an option that controls warnings",


Won't these incorrectly start with "-Werror=Wsomething:" rather than the 
"-Werror=something" that the user wrote?


Jason



Re: [gomp4 8/8] libgomp: provide ICVs via env.c on nvptx

2015-09-24 Thread Alexander Monakov
On Thu, 24 Sep 2015, Jakub Jelinek wrote:

> On Wed, Sep 23, 2015 at 08:22:22PM +0300, Alexander Monakov wrote:
> > This patch ports env.c to NVPTX.  It drops all environment parsing routines
> > since there's no "environment" on the device.  For now, the useful effect of
> > the patch is providing 'omp_is_initial_device' to distinguish host execution
> > from target execution in user code.
> > 
> > Several functions use gomp_icv, which is not adjusted for NVPTX and thus 
> > will
> > try to use EMUTLS.  The intended way forward is to provide a custom
> > implementation of gomp_icv on NVPTX, likely via pre-allocating storage prior
> > to spawning a team.
> > 
> > * config/nvptx/env.c: New file.
> 
> I don't like this, there is just too much code duplication in this case and
> it is going to be a maintainance nightmare going forward (e.g.
> gomp-4_1-branch adds further functions, etc.).
> I'd suggest split the toplevel env.c into two files, icv.c which would
> contain the global variables and most of the small API functions, and env.c
> which would contain the global constructor, env var parsing, printing and
> perhaps omp_is_initial_device ().  Then nvptx.c would use the toplevel icv.c
> and provide its own env.c with just omp_is_initial_device () (which of
> course eventually can be inlined by the compiler on NVPTX target or perhaps
> any ACCEL_COMPILER, but we need to provide a library version anyway, you can
> take address of the function etc.).
> 
> Are you ok with that?

Definitely, thanks for the suggestion!  While implementing that, I considered
that it should be more natural to keep only env processing in env.c, and split
device-related functionality in another file, icv-device.c.  That way, nvptx
can keep a zero-sized env.c, use generic icv.c, and provide its overrides in
icv-device.c.  If that's too fancy I can revert to your suggested approach.
How does the following patch look?


[gomp4] libgomp: split ICV functionality out of env.c

Split env.c, leaving only processing of environment variables in the original
file.  Move most of ICV definitions and associated API entrypoints into icv.c,
except target-related API entrypoints, which are moved into icv-device.c.  The
intention is to allow offload-only architectures to use the generic icv.c.

* Makefile.am (libgomp_la_SOURCES): Add icv.c and icv-device.c.
* Makefile.in: Regenerate.
* env.c: Split out ICV definitions into...
* icv.c: ...here (new file) and...
* icv-device.c: ...here. New file.
---
 libgomp/Makefile.am  |  16 ++--
 libgomp/Makefile.in  |  34 +
 libgomp/env.c| 204 +--
 libgomp/icv-device.c |  78 
 libgomp/icv.c| 181 +
 5 files changed, 291 insertions(+), 222 deletions(-)
 create mode 100644 libgomp/icv-device.c
 create mode 100644 libgomp/icv.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 5411278..b3a09b0 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -58,12 +58,12 @@ libgomp_la_LDFLAGS = $(libgomp_version_info) 
$(libgomp_version_script) \
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
 libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 
-libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
-   iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
-   task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-   time.c fortran.c affinity.c target.c splay-tree.c libgomp-plugin.c \
-   oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
-   oacc-plugin.c oacc-cuda.c
+libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c icv.c \
+   icv-device.c iter.c iter_ull.c loop.c loop_ull.c ordered.c parallel.c \
+   sections.c single.c task.c team.c work.c lock.c mutex.c proc.c sem.c \
+   bar.c ptrlock.c time.c fortran.c affinity.c target.c splay-tree.c \
+   libgomp-plugin.c oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
+   oacc-async.c oacc-plugin.c oacc-cuda.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
@@ -95,6 +95,10 @@ fortran.lo: libgomp_f.h
 fortran.o: libgomp_f.h
 env.lo: libgomp_f.h
 env.o: libgomp_f.h
+icv.lo: libgomp_f.h
+icv.o: libgomp_f.h
+icv-device.lo: libgomp_f.h
+icv-device.o: libgomp_f.h
 
 
 # Automake Documentation:
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 79745ce..e2e0e42 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -162,13 +162,13 @@ libgomp_plugin_nvptx_la_LINK = $(LIBTOOL) --tag=CC \
 libgomp_la_LIBADD =
 @USE_FORTRAN_TRUE@am__objects_1 = openacc.lo
 am_libgomp_la_OBJECTS = alloc.lo barrier.lo critical.lo env.lo \
-   error.lo iter.lo iter_ull.lo loop.lo loop_ull.lo ordered.lo \
-   parallel.lo sections.lo single.lo task.lo team.lo work.lo \
-   lock.lo mutex.lo proc.lo sem.lo bar.lo ptrlock.lo time.lo \
-   fortran.lo affinity.lo target.lo

Re: [PATCH 0/4] bb-reorder: Add the "simple" algorithm

2015-09-24 Thread Segher Boessenkool
On Thu, Sep 24, 2015 at 11:56:22AM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >The current basic block reordering always uses the "software trace cache"
> >algorithm.  That has a few problems:
> >
> >1) It increases code size substantially; this makes it not suitable for
> >-O1 or -Os, and not at all for some architectures;
> >2) but it is enabled for -Os and all targets;
> >3) and -O1 gets nothing, resulting in pretty jumpy code.
> 
> A general question first, I see code in bb-reorder.c (in copy_bb_p) that 
> limits the amount of code growth if not optimizing for speed. Is that 
> not working as expected or not sufficient?

It works.  The "simple" algorithm generates slightly smaller code though
(less than a percent).  Defaulting -Os to STC is easy of course; do you
prefer that?

> Your code looks like a nice clean algorithm so I have no objections to 
> it (detailed comments to follow), but I want to make sure it is 
> necessary to add it.

It's not just for -Os, but also for -O1 (where we currently don't reorder
at all, although various passes leave the config in a pretty sorry state --
like, we run shrink-wrapping at -O1, and it can make quite a mess if some
blocks are copied and others not; but this is just an example, it was the
trigger for me though).

And, when I wrote the original for this, it was for a target where STC
does not help at all (there is no instruction cache); "simple" saves a
lot of space at -O2.  Quite important for embedded targets.

Finally, it lets us easily plug in other algorithms.


Segher


Re: [C++ Patch] PR 53856

2015-09-24 Thread Jason Merrill

On 09/22/2015 03:31 PM, Paolo Carlini wrote:

 msg = G_("default template arguments may not be used in "
 "partial specializations");
+  else if (current_class_type && !CLASSTYPE_IS_TEMPLATE (current_class_type))
+/* Per [temp.param]/9, "A default template-argument shall not be
+   specified in the template-parameter-lists of the definition of
+   a member of a class template that appears outside of the member's
+   class.", thus if we aren't handling a member of a class template
+   there is no need to examine the parameters.  */
+last_level_to_check = template_class_depth (current_class_type) + 1;
   else
 msg = G_("default argument for template parameter for class enclosing 
%qD");


Why not handle this below, with the other code that sets 
last_level_to_check?


Jason



Re: libgo patch committed: rewrite lfstack to look more like gc code

2015-09-24 Thread Rainer Orth
Ian Lance Taylor  writes:

> This patch by Michael Hudson-Doyle rewrites the lfstack code in libgo
> to look more like that in the gc library.  It also fixes it for arm64.
> Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
> Committed to mainline.

This patch broke Solaris/x86 bootstrap: the amd64 lfstack.goc fails to
compile:

$ /bin/ksh ./libtool --tag=CC   --mode=compile 
/var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/xgcc 
-B/var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/ 
-B/usr/local/i386-pc-solaris2.12/bin/ -B/usr/local/i386-pc-solaris2.12/lib/ 
-isystem /usr/local/i386-pc-solaris2.12/include -isystem 
/usr/local/i386-pc-solaris2.12/sys-include  -m64 -DHAVE_CONFIG_H -I. 
-I/vol/gcc/src/hg/trunk/solaris/libgo  -I 
/vol/gcc/src/hg/trunk/solaris/libgo/runtime 
-I/vol/gcc/src/hg/trunk/solaris/libgo/../libffi/include -I../libffi/include 
-pthread  -fexceptions -fnon-call-exceptions -fplan9-extensions  -Wall -Wextra 
-Wwrite-strings -Wcast-qual -Werror -minline-all-stringops -D_GNU_SOURCE 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I 
/vol/gcc/src/hg/trunk/solaris/libgo/../libgcc -I 
/vol/gcc/src/hg/trunk/solaris/libgo/../libbacktrace -I ../../../gcc/include -g 
-O2 -MT lfstack.lo -MD -MP -MF .deps/lfstack.Tpo -c -o lfstack.lo lfstack.c
libtool: compile:  /var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/xgcc 
-B/var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/ 
-B/usr/local/i386-pc-solaris2.12/bin/ -B/usr/local/i386-pc-solaris2.12/lib/ 
-isystem /usr/local/i386-pc-solaris2.12/include -isystem 
/usr/local/i386-pc-solaris2.12/sys-include -m64 -DHAVE_CONFIG_H -I. 
-I/vol/gcc/src/hg/trunk/solaris/libgo -I 
/vol/gcc/src/hg/trunk/solaris/libgo/runtime 
-I/vol/gcc/src/hg/trunk/solaris/libgo/../libffi/include -I../libffi/include 
-pthread -fexceptions -fnon-call-exceptions -fplan9-extensions -Wall -Wextra 
-Wwrite-strings -Wcast-qual -Werror -minline-all-stringops -D_GNU_SOURCE 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I 
/vol/gcc/src/hg/trunk/solaris/libgo/../libgcc -I 
/vol/gcc/src/hg/trunk/solaris/libgo/../libbacktrace -I ../../../gcc/include -g 
-O2 -MT lfstack.lo -MD -MP -MF .deps/lfstack.Tpo -c lfstack.c  -fPIC -DPIC -o 
.libs/lfstack.o
/vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:32:22: error: 
redefinition of 'lfPack'
/vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:16:22: note: previous 
definition of 'lfPack' was here
 static inline uint64 lfPack(LFNode *node, uintptr cnt) {
  ^
/vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc: In function 'lfPack':
/vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:33:38: error: 
'PTR_BITS' undeclared (first use in this function)
/vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:33:38: note: each 
undeclared identifier is reported only once for each function it appears in
/vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc: At top level:
/vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:35:23: error: 
redefinition of 'lfUnpack'
  return ((uint64)(node)<<(64-PTR_BITS)) | (cnt&(((1<

Re: libgo patch committed: rewrite lfstack to look more like gc code

2015-09-24 Thread Ian Lance Taylor
On Thu, Sep 24, 2015 at 6:25 AM, Rainer Orth
 wrote:
> Ian Lance Taylor  writes:
>
>> This patch by Michael Hudson-Doyle rewrites the lfstack code in libgo
>> to look more like that in the gc library.  It also fixes it for arm64.
>> Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
>> Committed to mainline.
>
> This patch broke Solaris/x86 bootstrap: the amd64 lfstack.goc fails to
> compile:
>
> $ /bin/ksh ./libtool --tag=CC   --mode=compile 
> /var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/xgcc 
> -B/var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/ 
> -B/usr/local/i386-pc-solaris2.12/bin/ -B/usr/local/i386-pc-solaris2.12/lib/ 
> -isystem /usr/local/i386-pc-solaris2.12/include -isystem 
> /usr/local/i386-pc-solaris2.12/sys-include  -m64 -DHAVE_CONFIG_H -I. 
> -I/vol/gcc/src/hg/trunk/solaris/libgo  -I 
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime 
> -I/vol/gcc/src/hg/trunk/solaris/libgo/../libffi/include -I../libffi/include 
> -pthread  -fexceptions -fnon-call-exceptions -fplan9-extensions  -Wall 
> -Wextra -Wwrite-strings -Wcast-qual -Werror -minline-all-stringops 
> -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I 
> /vol/gcc/src/hg/trunk/solaris/libgo/../libgcc -I 
> /vol/gcc/src/hg/trunk/solaris/libgo/../libbacktrace -I ../../../gcc/include 
> -g -O2 -MT lfstack.lo -MD -MP -MF .deps/lfstack.Tpo -c -o lfstack.lo lfstack.c
> libtool: compile:  /var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/xgcc 
> -B/var/gcc/gcc-6.0.0-20150924/12-gcc/./gcc/ 
> -B/usr/local/i386-pc-solaris2.12/bin/ -B/usr/local/i386-pc-solaris2.12/lib/ 
> -isystem /usr/local/i386-pc-solaris2.12/include -isystem 
> /usr/local/i386-pc-solaris2.12/sys-include -m64 -DHAVE_CONFIG_H -I. 
> -I/vol/gcc/src/hg/trunk/solaris/libgo -I 
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime 
> -I/vol/gcc/src/hg/trunk/solaris/libgo/../libffi/include -I../libffi/include 
> -pthread -fexceptions -fnon-call-exceptions -fplan9-extensions -Wall -Wextra 
> -Wwrite-strings -Wcast-qual -Werror -minline-all-stringops -D_GNU_SOURCE 
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I 
> /vol/gcc/src/hg/trunk/solaris/libgo/../libgcc -I 
> /vol/gcc/src/hg/trunk/solaris/libgo/../libbacktrace -I ../../../gcc/include 
> -g -O2 -MT lfstack.lo -MD -MP -MF .deps/lfstack.Tpo -c lfstack.c  -fPIC -DPIC 
> -o .libs/lfstack.o
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:32:22: error: 
> redefinition of 'lfPack'
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:16:22: note: previous 
> definition of 'lfPack' was here
>  static inline uint64 lfPack(LFNode *node, uintptr cnt) {
>   ^
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc: In function 'lfPack':
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:33:38: error: 
> 'PTR_BITS' undeclared (first use in this function)
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:33:38: note: each 
> undeclared identifier is reported only once for each function it appears in
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc: At top level:
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:35:23: error: 
> redefinition of 'lfUnpack'
>   return ((uint64)(node)<<(64-PTR_BITS)) | (cnt&(((1<^
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:19:23: note: previous 
> definition of 'lfUnpack' was here
>  static inline LFNode* lfUnpack(uint64 val) {
>^
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc: In function 
> 'lfUnpack':
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:31:26: error: 
> 'PTR_BITS' undeclared (first use in this function)
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:36:31: note: in 
> expansion of macro 'CNT_BITS'
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc: In function 'lfPack':
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:34:1: error: control 
> reaches end of non-void function [-Werror=return-type]
>  static inline uint64 lfPack(LFNode *node, uintptr cnt) {
>  ^
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc: In function 
> 'lfUnpack':
> /vol/gcc/src/hg/trunk/solaris/libgo/runtime/lfstack.goc:37:1: error: control 
> reaches end of non-void function [-Werror=return-type]
>  static inline LFNode* lfUnpack(uint64 val) {
>  ^
> cc1: all warnings being treated as errors
> make: *** [lfstack.lo] Error 1

Bother.  Sorry about that.  Should be fixed with this patch.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 228064)
+++ gcc/go/gofrontend/MER

Re: [RS6000] Don't pass --oformat to ld

2015-09-24 Thread David Edelsohn
On Thu, Sep 24, 2015 at 8:54 AM, Alan Modra  wrote:
> On Thu, Sep 24, 2015 at 02:24:25PM +1000, Michael Ellerman wrote:
>> On Wed, 2015-09-02 at 11:05 +0930, Alan Modra wrote:
>> > bugzilla.redhat.com/show_bug_cgi?id=1255946 shows that gcc built with
>> > both powerpc64-linux and powerpc64le-linux support passes wrong linker
>> > options when trying to link in the non-default endian.  A --oformat
>> > option coming from LINK_TARGET_SPEC is only correct for 32-bit.
>> >
>> > It turns out that GNU ld -m options select a particular ld emulation
>> > (e*.c file in ld build dir) which provides compiled-in scripts or
>> > selects a script from ldscripts/.  Each of these has an OUTPUT_FORMAT
>> > statement, which does the same thing as --oformat.  --oformat is
>> > therefore redundant when using GNU ld built this century, except
>> > possibly when a user overrides the default ld script with -Wl,-T and
>> > their script neglects OUTPUT_FORMAT, and it isn't the default output.
>> > I don't think it's worth fixing this possible use case.
>> >
>> > Bootstrap and testing in progress.  OK for mainline assuming all is
>> > OK?
>> >
>> > * config/rs6000/sysv4le.h (LINK_TARGET_SPEC): Don't define.
>> > * config/rs6000/sysv4.h (LINK_TARGET_SPEC): Likewise.
>> > (LINK_SPEC, SUBTARGET_EXTRA_SPECS): Delete link_target.
>>
>> Hi Alan,
>>
>> If you could please backport this to the gcc-5-branch, that would helpful for
>> us (kernel folks).
>
> Bootstrapped and regression tested powerpc64le-linux.  Is this OK for
> the branch too, David?

Backporting this bug fix is fine with me.

Thanks!
David


Re: libgo patch committed: rewrite lfstack to look more like gc code

2015-09-24 Thread Rainer Orth
Hi Ian,

>> cc1: all warnings being treated as errors
>> make: *** [lfstack.lo] Error 1
>
> Bother.  Sorry about that.  Should be fixed with this patch.
> Committed to mainline.

compiles again, thanks.  I just lost it in the maze of ifdefs...

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [gomp4 8/8] libgomp: provide ICVs via env.c on nvptx

2015-09-24 Thread Jakub Jelinek
On Thu, Sep 24, 2015 at 04:15:28PM +0300, Alexander Monakov wrote:
> Definitely, thanks for the suggestion!  While implementing that, I considered
> that it should be more natural to keep only env processing in env.c, and split
> device-related functionality in another file, icv-device.c.  That way, nvptx
> can keep a zero-sized env.c, use generic icv.c, and provide its overrides in
> icv-device.c.  If that's too fancy I can revert to your suggested approach.
> How does the following patch look?

icv-device.c looks reasonable.  Note, the wording is that it is UB if (some
of those) functions are called from target regions.  That means the routines
still should be defined somewhere, but can be just stubbed.

> [gomp4] libgomp: split ICV functionality out of env.c
> 
> Split env.c, leaving only processing of environment variables in the original
> file.  Move most of ICV definitions and associated API entrypoints into icv.c,
> except target-related API entrypoints, which are moved into icv-device.c.  The
> intention is to allow offload-only architectures to use the generic icv.c.
> 
>   * Makefile.am (libgomp_la_SOURCES): Add icv.c and icv-device.c.
> * Makefile.in: Regenerate.
> * env.c: Split out ICV definitions into...
> * icv.c: ...here (new file) and...
> * icv-device.c: ...here. New file.

LGTM, except:

> @@ -95,6 +95,10 @@ fortran.lo: libgomp_f.h
>  fortran.o: libgomp_f.h
>  env.lo: libgomp_f.h
>  env.o: libgomp_f.h
> +icv.lo: libgomp_f.h
> +icv.o: libgomp_f.h
> +icv-device.lo: libgomp_f.h
> +icv-device.o: libgomp_f.h

You don't really want this, it is enough to include it in env.c only.

> +/* This file defines OpenMP API entry points that accelerator targets are
> +   expected to replace.  */
> +
> +#include "libgomp.h"
> +#include "libgomp_f.h"

And please leave out the libgomp_f.h include here.

> --- /dev/null
> +++ b/libgomp/icv.c
> @@ -0,0 +1,181 @@
> +/* Copyright (C) 2015 Free Software Foundation, Inc.

I'd say as the file is a copy of the source of env.c that originates back to
2005, it should be 2005-2015 (both files).

> +/* This file defines the OpenMP internal control variables and associated
> +   OpenMP API entry points.  */
> +
> +#include "libgomp.h"
> +#include "libgomp_f.h"

Like above.

Just to make sure, ChangeLog entries on the gomp-4*-branch branches go into
ChangeLog.gomp.

Jakub


Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm

2015-09-24 Thread Segher Boessenkool
On Thu, Sep 24, 2015 at 12:32:59PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >This is the meat of this series: a new algorithm to do basic block
> >reordering.  It uses the simple greedy approach to maximum weighted
> >matching, where the weights are the predicted execution frequency of
> >the edges.  This always finds a solution that is within a factor two
> >of optimal, if you disregard loops (which we cannot allow) and the
> >complications of block partitioning.
> 
> Looks really good for the most part.
> 
> The comment at the top of the file should be updated to mention both 
> algorithms.

Will do.

> >+  /* Sort the edges, the most desirable first.  */
> >+
> >+  std::stable_sort (edges, edges + n, edge_order);
> 
> Any thoughts on this vs qsort? Do you need a stable sort?

We always need stable sorts in GCC; things are not reproducible across
targets with qsort (not every qsort is the same).

Also, you sometimes have two edges back-to-back, with the same weights;
a stable sort ensures we don't put a jump in the middle of that if we
can help it.

> >+  int j;
> >+  for (j = 0; j < n; j++)
> 
> for (int j ...
> here and in the other loop that uses j.

That is so ugly.  Will change though :-)

> >+  /* If the entry edge no longer falls through we have to make a new
> >+ block so it can do so again.  */
> >+
> >+  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> >+  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
> >+{
> >+  force_nonfallthru (e);
> >+  e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
> >+  BB_COPY_PARTITION (e->src, e->dest);
> >+}
> >+}
> 
> That's a bit odd, can this situation be prevented earlier?

We could always add an extra jump at the start, but that's about the
same code so not helpful.

> Why wouldn't we force the entry edge to fall thru?

Because it pessimises code.  If the model thinks the first block after
entry belongs somewhere in the middle of a fall-through sequence, it
usually is right (typical examples are loops that start with the loop
test).  This algorithm does not peel loops (it does no duplication at
all).

All the optimisable blocks end with an unconditional jump, and this algo
tries to remove as many of those as it can; logically, the start block
has such a jump as well.


Segher


Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up

2015-09-24 Thread Segher Boessenkool
On Thu, Sep 24, 2015 at 12:28:08PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
> >as default.  For -O2 and up (except -Os) it is switched to "stc" instead.
> >Targets that never want STC can override this.  This changes 
> >-freorder-blocks
> >to be on at -O1 and up (was -O2 and up).
> >
> >In effect, the changes are for -O1 (which now gets "simple" instead of
> >nothing), -Os (which now gets "simple" instead of "stc", since STC results
> >in much bigger code), and for targets that wish to never use STC (not in
> >this patch though).
> 
> This should be merged with its documentation in 4/4, and personally I'd 
> have no problem reviewing a patch with 2/3/4 all in one. Splitting 
> patches is most helpful if there are parts that rearrange things such as 
> your 1/4, or if there are multiple independent functional changes. I'm 
> not saying you did anything wrong by splitting, just that maybe you made 
> unnecessary work for yourself.

I had the patches like that in my git tree, so I figured I'd send it like
that, makes review slightly easier (not a big deal for small patches like
this of course).  I did not waste time splitting things up, don't worry :-)


Segher


[PATCH][committed] Fix aarch64/target_attr_10.c test following r227997

2015-09-24 Thread Szabolcs Nagy

gcc commit r227997 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01455.html
changed "error:" to "note:" for "called from here" messages,
but missed adjusting an aarch64 target test.

committed as obvious.

gcc/testsuite/ChangeLog:

2015-09-24  Szabolcs Nagy  

* gcc.target/aarch64/target_attr_10.c (foo): Use dg-message for note.

diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_10.c b/gcc/testsuite/gcc.target/aarch64/target_attr_10.c
index b2c48c4..6d05771 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_10.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_10.c
@@ -10,7 +10,7 @@ __attribute__ ((target ("+nosimd")))
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, uint8x16_t c)
 {
-  return vbslq_u8 (a, b, c); /* { dg-error "called from here" } */
+  return vbslq_u8 (a, b, c); /* { dg-message "called from here" } */
 }
 
 /* { dg-error "inlining failed in call to always_inline" "" { target *-*-* } 0 } */


Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-09-24 Thread Dominik Vogt
On Thu, Sep 24, 2015 at 12:41:24PM +0200, Bernd Schmidt wrote:
> On 09/23/2015 04:48 PM, Dominik Vogt wrote:
> >On Tue, Sep 22, 2015 at 01:56:15PM -0600, Jeff Law wrote: > Is
> >there some good reason these aren't hooks?
> >
> >No, that was just inobservance.  New version attached.  Would it be
> >preferrable to initialize the hooks with a NULL pointer and test
> >the pointer before calling them?  (That way the changes to
> >hooks.[ch] could be dropped.)
> 
> There are already several hooks/macros in use for this kind of
> thing, have you checked that they are not usable for your purpose?
> There's ASM_DECLARE_FUNCTION_NAME, which is used by nvptx for
> example, and there's also ASM_OUTPUT_FUNCTION_PREFIX, which is
> apparently used by nothing in the current tree. For the end you
> could use ASM_DECLARE_FUNCTION_SIZE.

Hm, ASM_DECLARE_FUNCTION_NAME, ASM_DECLARE_FUNCTION_LABEL and
ASM_OUTPUT_FUNCTION_PREFIX are all called too late in the code to
be useful.  The new hook (or any replacement for it) must be
emitted:

 * before any alignmanet of the function is done,
 * before the constant pool is generated (however, s390 places the
   constant pool after the function anyway).

ASM_DECLARE_FUNCTION_SIZE on the ohter hand is called too early;
the end hook must be called after generating the constant pool.
For example, activating or deactivating the vector extension on a
z13 affects alignment of vector type constants.  So,
unfortunately, I see no way to make use of the existing hooks.
You're right that there are plenty already.

Hm, I wonder whether wrapping all these section switches in
assemble_start/end_function in ".machine" pseudoops (that's what
we need the hooks for; similar to .arch for ix86) has any real
effect.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-24 Thread Nathan Sidwell

On 09/24/15 03:21, Jakub Jelinek wrote:


So I'd like to ask Thomas/Nathan if they are ok with this stuff being on
the gomp-4_0-branch for now, once all the prerequisities it needs are on the
trunk, it can go into its own branch.


Let Thomas & I think about it.  Now that the new launch API is approved (working 
on the changes requested).  I expect to merge another chunk of ptx-specific bits 
regarding mkoffloads and friends next week[*].  That might solve the branch 
dependency problem Jakub discusses.



nathan

[*] I think they'll be patches I can self approve, given what they'll be 
affecting.


Re: [PATCH, PR67405, committed] Avoid NULL pointer dereference

2015-09-24 Thread Ilya Enkovich
2015-09-15 14:01 GMT+03:00 Ilya Enkovich :
> 2015-09-15 13:32 GMT+03:00 Richard Biener :
>> On Tue, Sep 15, 2015 at 11:28 AM, Ilya Enkovich  
>> wrote:
>>
>> I see.  I wonder why we even call chkp_find_bound_slots if seen_errors().
>
> Even with errors we still gimplify function. Function parameters
> gimplification checks where parameters are passed to possibly copy
> some of them. It triggers ix86_function_arg_advance which uses
> chkp_find_bound_slots to skip required amount of bounds registers.
>
>> I suppose only recursing for COMPLETE_TYPE_P () would work?
>
> Yep, it should work. I'll rework my fix.

It turned out to be wrong. For this test

struct S
{
  S f;
};

void fn1 (S p1) {}

Structure S is considered as complete (has size 8 for some reason) at
fn1 gimplification. Thus even with complete type check I still hit
this field with error_node instead of a type and NULL at
DECL_FIELD_BIT_OFFSET. Should my current fix be OK then?

Thanks,
Ilya


Re: [PATCH, PR67405, committed] Avoid NULL pointer dereference

2015-09-24 Thread Richard Biener
On Thu, Sep 24, 2015 at 4:07 PM, Ilya Enkovich  wrote:
> 2015-09-15 14:01 GMT+03:00 Ilya Enkovich :
>> 2015-09-15 13:32 GMT+03:00 Richard Biener :
>>> On Tue, Sep 15, 2015 at 11:28 AM, Ilya Enkovich  
>>> wrote:
>>>
>>> I see.  I wonder why we even call chkp_find_bound_slots if seen_errors().
>>
>> Even with errors we still gimplify function. Function parameters
>> gimplification checks where parameters are passed to possibly copy
>> some of them. It triggers ix86_function_arg_advance which uses
>> chkp_find_bound_slots to skip required amount of bounds registers.
>>
>>> I suppose only recursing for COMPLETE_TYPE_P () would work?
>>
>> Yep, it should work. I'll rework my fix.
>
> It turned out to be wrong. For this test
>
> struct S
> {
>   S f;
> };
>
> void fn1 (S p1) {}
>
> Structure S is considered as complete (has size 8 for some reason) at
> fn1 gimplification. Thus even with complete type check I still hit
> this field with error_node instead of a type and NULL at
> DECL_FIELD_BIT_OFFSET. Should my current fix be OK then?

What's the current fix again?  The NULL check on DECL_FIELD_BIT_OFFSET?

I still don't like that.  The frontend should leave us with something
easier here :/

And I wonder if we really need to gimplify when we've seen errors (yeah, we'll
get more diagnostics but also ICE-after-errors like this).

Richard.

> Thanks,
> Ilya


Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm

2015-09-24 Thread Segher Boessenkool
On Thu, Sep 24, 2015 at 08:39:30AM -0500, Segher Boessenkool wrote:
> > Any thoughts on this vs qsort? Do you need a stable sort?
> 
> We always need stable sorts in GCC; things are not reproducible across
> targets with qsort (not every qsort is the same).

s/targets/hosts/


Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up

2015-09-24 Thread Andi Kleen
Segher Boessenkool  writes:
>
> In effect, the changes are for -O1 (which now gets "simple" instead of
> nothing), -Os (which now gets "simple" instead of "stc", since STC results
> in much bigger code), and for targets that wish to never use STC (not in
> this patch though).

Do you have some data on the code size differences with -Os?

-Andi


Re: [gomp4 7/8] libgomp: work around missing pthread_attr_t on nvptx

2015-09-24 Thread Alexander Monakov
> I'd prefer here the https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01418.html
> changes to libgomp.h and associated configury changes.

OK, like the following?

[gomp4] libgomp: guard pthreads usage by LIBGOMP_USE_PTHREADS

This allows to avoid referencing pthread types and functions on nvptx.

* configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it...
(LIBGOMP_USE_PTHREADS): ...here; new define.
* configure: Regenerate.
* config.h.in: Likewise.
* libgomp.h: Guard pthread.h inclusion.
(gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS.
(gomp_init_thread_affinity): Ditto.
---
 libgomp/config.h.in  | 3 +++
 libgomp/configure| 7 +++
 libgomp/configure.ac | 6 ++
 libgomp/libgomp.h| 6 ++
 4 files changed, 22 insertions(+)

diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 7685bfb..ba64fd7 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -91,6 +91,9 @@
 /* Define to 1 if GNU symbol versioning is used for libgomp. */
 #undef LIBGOMP_GNU_SYMBOL_VERSIONING
 
+/* Define to 1 if libgomp should use POSIX threads. */
+#undef LIBGOMP_USE_PTHREADS
+
 /* Define to the sub-directory in which libtool stores uninstalled libraries.
*/
 #undef LT_OBJDIR
diff --git a/libgomp/configure b/libgomp/configure
index 7407b4c..de87d4a 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15043,6 +15043,7 @@ case "$host" in
 ;;
   nvptx*-*-*)
 # NVPTX does not support Pthreads, has its own code replacement.
+libgomp_use_pthreads=no
 ;;
   *)
 # Check to see if -pthread or -lpthread is needed.  Prefer the former.
@@ -15088,6 +15089,12 @@ rm -f core conftest.err conftest.$ac_objext \
 conftest$ac_exeext conftest.$ac_ext
 esac
 
+if test x$libgomp_use_pthreads != xno; then
+
+$as_echo "#define LIBGOMP_USE_PTHREADS 1" >>confdefs.h
+
+fi
+
 # Plugins for offload execution, configure.ac fragment.  -*- mode: autoconf -*-
 #
 # Copyright (C) 2014-2015 Free Software Foundation, Inc.
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index b1696d0..3bce745 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -181,6 +181,7 @@ case "$host" in
 ;;
   nvptx*-*-*)
 # NVPTX does not support Pthreads, has its own code replacement.
+libgomp_use_pthreads=no
 ;;
   *)
 # Check to see if -pthread or -lpthread is needed.  Prefer the former.
@@ -202,6 +203,11 @@ case "$host" in
[AC_MSG_ERROR([Pthreads are required to build libgomp])])])
 esac
 
+if test x$libgomp_use_pthreads != xno; then
+  AC_DEFINE(LIBGOMP_USE_PTHREADS, 1,
+[Define to 1 if libgomp should use POSIX threads.])
+fi
+
 m4_include([plugin/configfrag.ac])
 
 # Check for functions needed.
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d51b08b..1454adf 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -40,7 +40,9 @@
 #include "gstdint.h"
 #include "libgomp-plugin.h"
 
+#ifdef HAVE_PTHREAD_H
 #include 
+#endif
 #include 
 #include 
 #include 
@@ -510,15 +512,19 @@ static inline struct gomp_task_icv *gomp_icv (bool write)
 return &gomp_global_icv;
 }
 
+#ifdef LIBGOMP_USE_PTHREADS
 /* The attributes to be used during thread creation.  */
 extern pthread_attr_t gomp_thread_attr;
+#endif
 
 /* Function prototypes.  */
 
 /* affinity.c */
 
 extern void gomp_init_affinity (void);
+#ifdef LIBGOMP_USE_PTHREADS
 extern void gomp_init_thread_affinity (pthread_attr_t *, unsigned int);
+#endif
 extern void **gomp_affinity_alloc (unsigned long, bool);
 extern void gomp_affinity_init_place (void *);
 extern bool gomp_affinity_add_cpus (void *, unsigned long, unsigned long,
-- 
1.8.3.1



Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up

2015-09-24 Thread Segher Boessenkool
On Thu, Sep 24, 2015 at 08:12:55AM -0700, Andi Kleen wrote:
> Segher Boessenkool  writes:
> >
> > In effect, the changes are for -O1 (which now gets "simple" instead of
> > nothing), -Os (which now gets "simple" instead of "stc", since STC results
> > in much bigger code), and for targets that wish to never use STC (not in
> > this patch though).
> 
> Do you have some data on the code size differences with -Os?

It's about 0.1% for a quick combine.ii sniff test; I don't have big test
results for -Os.

"Much bigger code" is a mischaracterisation, that is true for -O2, not -Os.


Segher


Re: fdiagnostics-color=never does not disable color for some diagnostics

2015-09-24 Thread Manuel López-Ibáñez
On 24 September 2015 at 15:06, Jason Merrill  wrote:
> On 09/22/2015 04:23 PM, Manuel López-Ibáñez wrote:
>>
>> +error_at (loc, "-Werror=%s: no option -%s", arg, new_option);
>> +  else if (!(cl_options[option_index].flags & CL_WARNING))
>> +error_at (loc, "-Werror=%s: -%s is not an option that controls
>> warnings",
>
>
> Won't these incorrectly start with "-Werror=Wsomething:" rather than the
> "-Werror=something" that the user wrote?

They follow the pattern of the code they replace:

-{
-  error_at (loc, "-Werror=%s: no option -%s", arg, new_option);
-}

where 'arg' is what the user wrote after '=', and new_option is:

   new_option[0] = 'W';
   strcpy (new_option + 1, arg);

Or am I misunderstanding you?

Cheers,

Manuel.


[PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-24 Thread Marek Polacek
As Richi said in ,
using recorded SSA name range infos in VRP is likely to expose errors in the
ranges.  This PR is such a case.  As discussed in the PR, after tail merging
via PRE the range infos cannot be relied upon anymore, so we need to clear
them.

Since tree-ssa-ifcombine.c already had code to clean up the flow data in a BB,
I've factored it out to a common function.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?

2015-09-24  Marek Polacek  

PR tree-optimization/67690
* tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
reset_flow_sensitive_info_in_bb.
* tree-ssa-tail-merge.c (replace_block_by): Likewise.
* tree-ssanames.c: Include "gimple-iterator.h".
(reset_flow_sensitive_info_in_bb): New function.
* tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.

* gcc.dg/torture/pr67690.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr67690.c 
gcc/testsuite/gcc.dg/torture/pr67690.c
index e69de29..491de51 100644
--- gcc/testsuite/gcc.dg/torture/pr67690.c
+++ gcc/testsuite/gcc.dg/torture/pr67690.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+const int c1 = 1;
+const int c2 = 2;
+
+int
+check (int i)
+{
+  int j;
+  if (i >= 0)
+j = c2 - i;
+  else
+j = c2 - i;
+  return c2 - c1 + 1 > j;
+}
+
+int invoke (int *pi) __attribute__ ((noinline,noclone));
+int
+invoke (int *pi)
+{
+  return check (*pi);
+}
+
+int
+main ()
+{
+  int i = c1;
+  int ret = invoke (&i);
+  if (!ret)
+__builtin_abort ();
+  return 0;
+}
diff --git gcc/tree-ssa-ifcombine.c gcc/tree-ssa-ifcombine.c
index 9f04174..66be430 100644
--- gcc/tree-ssa-ifcombine.c
+++ gcc/tree-ssa-ifcombine.c
@@ -769,16 +769,7 @@ pass_tree_ifcombine::execute (function *fun)
  {
/* Clear range info from all stmts in BB which is now executed
   conditional on a always true/false condition.  */
-   for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
-!gsi_end_p (gsi); gsi_next (&gsi))
- {
-   gimple *stmt = gsi_stmt (gsi);
-   ssa_op_iter i;
-   tree op;
-   FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
- reset_flow_sensitive_info (op);
- }
-
+   reset_flow_sensitive_info_in_bb (bb);
cfg_changed |= true;
  }
 }
diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
index 0ce59e8..487961e 100644
--- gcc/tree-ssa-tail-merge.c
+++ gcc/tree-ssa-tail-merge.c
@@ -1534,6 +1534,10 @@ replace_block_by (basic_block bb1, basic_block bb2)
   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
 }
 
+  /* Clear range info from all stmts in BB2 -- this transformation
+ could make them out of date.  */
+  reset_flow_sensitive_info_in_bb (bb2);
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);
diff --git gcc/tree-ssanames.c gcc/tree-ssanames.c
index 4199290..5393865 100644
--- gcc/tree-ssanames.c
+++ gcc/tree-ssanames.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "tree.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
 #include "hard-reg-set.h"
 #include "ssa.h"
 #include "alias.h"
@@ -544,6 +545,21 @@ reset_flow_sensitive_info (tree name)
 SSA_NAME_RANGE_INFO (name) = NULL;
 }
 
+/* Clear all flow sensitive data from all statements in BB.  */
+
+void
+reset_flow_sensitive_info_in_bb (basic_block bb)
+{
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+   gsi_next (&gsi))
+{
+  gimple *stmt = gsi_stmt (gsi);
+  ssa_op_iter i;
+  tree op;
+  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
+   reset_flow_sensitive_info (op);
+}
+}
 
 /* Release all the SSA_NAMEs created by STMT.  */
 
diff --git gcc/tree-ssanames.h gcc/tree-ssanames.h
index 22ff609..5688ca5 100644
--- gcc/tree-ssanames.h
+++ gcc/tree-ssanames.h
@@ -95,6 +95,7 @@ extern tree duplicate_ssa_name_fn (struct function *, tree, 
gimple *);
 extern void duplicate_ssa_name_range_info (tree, enum value_range_type,
   struct range_info_def *);
 extern void reset_flow_sensitive_info (tree);
+extern void reset_flow_sensitive_info_in_bb (basic_block);
 extern void release_defs (gimple *);
 extern void replace_ssa_name_symbol (tree, tree);
 

Marek


[patch] libstdc++/67707 Leave moved-from std::deque in a valid state

2015-09-24 Thread Jonathan Wakely

Apparently when adding allocator propagation to std::deque I didn't
read my own comment in this constructor:

 _Deque_base(const allocator_type& __a)
 : _M_impl(__a)
 { /* Caller must initialize map. */ }

This adds the missing initialization.

Tested ppc64le-linux, committed to trunk.

This needs to be backported to gcc-5-branch too.

commit 386b9e9d927e5d0bcca8815b6219581ea58723bf
Author: Jonathan Wakely 
Date:   Thu Sep 24 16:10:27 2015 +0100

Leave moved-from std::deque in a valid state

	PR libstdc++/67707
	* include/bits/stl_deque.h (_Deque_base::_M_move_impl): Initialize
	empty object.
	* testsuite/23_containers/deque/allocator/move.cc: Check moved-from
	deque.

diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index f674245..f81ffd9 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -644,6 +644,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	_Tp_alloc_type __sink __attribute((__unused__)) {std::move(__alloc)};
 	// Create an empty map that allocates using the moved-from allocator.
 	_Deque_base __empty{__alloc};
+	__empty._M_initialize_map(0);
 	// Now safe to modify current allocator and perform non-throwing swaps.
 	_Deque_impl __ret{std::move(_M_get_Tp_allocator())};
 	_M_impl._M_swap_data(__ret);
diff --git a/libstdc++-v3/testsuite/23_containers/deque/allocator/move.cc b/libstdc++-v3/testsuite/23_containers/deque/allocator/move.cc
index c858437..1b8a0e4 100644
--- a/libstdc++-v3/testsuite/23_containers/deque/allocator/move.cc
+++ b/libstdc++-v3/testsuite/23_containers/deque/allocator/move.cc
@@ -36,6 +36,11 @@ void test01()
   VERIFY(1 == v1.get_allocator().get_personality());
   VERIFY(1 == v2.get_allocator().get_personality());
   VERIFY( it == v2.begin() );
+
+  // PR libstdc++/67707
+  VERIFY( v1.size() == 0 );
+  v1 = test_type();
+  VERIFY( v1.size() == 0 );
 }
 
 void test02()
@@ -47,6 +52,11 @@ void test02()
   test_type v2(std::move(v1), alloc_type(2));
   VERIFY(1 == v1.get_allocator().get_personality());
   VERIFY(2 == v2.get_allocator().get_personality());
+
+  // PR libstdc++/67707
+  VERIFY( v1.size() == 0 );
+  v1 = test_type();
+  VERIFY( v1.size() == 0 );
 }
 
 int main()


Re: [PATCH 1/4] bb-reorder: Split out STC

2015-09-24 Thread Steven Bosscher
On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> This first patch simply factors code a little bit.
>
>
> 2015-09-23   Segher Boessenkool  
>
> * bb-reorder.c (reorder_basic_blocks_software_trace_cache): New
> function, factored out from ...
> (reorder_basic_blocks): ... here.

OK.

Ciao!
Steven


Re: [gomp4 7/8] libgomp: work around missing pthread_attr_t on nvptx

2015-09-24 Thread Jakub Jelinek
On Thu, Sep 24, 2015 at 06:18:10PM +0300, Alexander Monakov wrote:
> > I'd prefer here the https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01418.html
> > changes to libgomp.h and associated configury changes.
> 
> OK, like the following?
> 
> [gomp4] libgomp: guard pthreads usage by LIBGOMP_USE_PTHREADS
> 
> This allows to avoid referencing pthread types and functions on nvptx.
> 
>   * configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it...
> (LIBGOMP_USE_PTHREADS): ...here; new define.
> * configure: Regenerate.
> * config.h.in: Likewise.
> * libgomp.h: Guard pthread.h inclusion.
> (gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS.
> (gomp_init_thread_affinity): Ditto.

Yeah, thanks.

Jakub


Re: fdiagnostics-color=never does not disable color for some diagnostics

2015-09-24 Thread Jason Merrill

On 09/24/2015 11:32 AM, Manuel López-Ibáñez wrote:

On 24 September 2015 at 15:06, Jason Merrill  wrote:

On 09/22/2015 04:23 PM, Manuel López-Ibáñez wrote:


+error_at (loc, "-Werror=%s: no option -%s", arg, new_option);
+  else if (!(cl_options[option_index].flags & CL_WARNING))
+error_at (loc, "-Werror=%s: -%s is not an option that controls
warnings",



Won't these incorrectly start with "-Werror=Wsomething:" rather than the
"-Werror=something" that the user wrote?


They follow the pattern of the code they replace:

-{
-  error_at (loc, "-Werror=%s: no option -%s", arg, new_option);
-}

where 'arg' is what the user wrote after '=', and new_option is:

new_option[0] = 'W';
strcpy (new_option + 1, arg);

Or am I misunderstanding you?


No, you're right, I was misreading.  The patch is OK.

Jason




Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm

2015-09-24 Thread Steven Bosscher
On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> +  /* First, collect all edges that can be optimized by reordering blocks:
> + simple jumps and conditional jumps, as well as the function entry edge. 
>  */
> +
> +  int n = 0;
> +  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +{
> +  rtx_insn *end = BB_END (bb);
> +
> +  if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
> +   continue;

Should handle ASM jumps.


> +  FOR_ALL_BB_FN (bb, cfun)
> +bb->aux = bb;

Bit tricky for the ENTRY and EXIT blocks, that are not really basic
blocks. After the pass, EXIT should not end up pointing to itself.
Maybe use FOR_EACH_BB_FN and set ENTRY separately?


Other than that, looks good to me.

Ciao!
Steven


[gomp4] Merge trunk r228054 (2015-09-23) into gomp-4_0-branch

2015-09-24 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r228091:

commit 25dd642bd78afedeb6f94c98ed4fe767133b65ec
Merge: a203723 3bf38a0
Author: tschwinge 
Date:   Thu Sep 24 16:00:49 2015 +

svn merge -r 226769:228054 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228091 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-24 Thread Steve Ellcey
On Thu, 2015-09-24 at 13:56 +0200, Bernd Schmidt wrote:

> I think at this point we have reports of just two packages generating 
> extra warnings, with the warnings at least justifiable in both cases. So 
> my vote would be to leave things as-is for now and see if more reports 
> come in. It is after all expected that a new warning option generates 
> new warnings.
> 
> 
> Bernd

At least one of the warnings in glibc is not justified (in my opinion).
The header file timezone/private.h defines time_t_min and time_t_max.
These are not used in any of the timezone files built by glibc but if
you look at the complete tz package they are used when building other
objects that are not part of the glibc tz component and that include
private.h.

I would make two arguments about why I don't think we should warn.

One is that 'static int const foo = 1' seems a lot like '#define foo 1'
and we don't complain about the macro foo not being used.  If we
complain about the unused const, why not complain about the unused
macro?  We don't complain because we know it would result in too many
warnings in existing code.  If we want people to move away from macros,
and I think we do, then we should not make it harder to do so by
introducing new warnings when they change.

The other is that C++ does not complain about this.  I know that C and
C++ are different languages with different rules but it seems like this
difference is a difference that doesn't have to exist.  Either both
should complain or neither should complain.  I can't think of any valid
reason for one to complain and the other not to.

I think using the used attribute is probably a reasonable way to address
this issue if we continue to generate the warning but I still feel it is
a bad warning in that it will (sometimes) warn about a coding style that
seems perfectly reasonable to me.

Steve Ellcey
sell...@imgtec.com



Re: [ubsan PATCH] Fix uninitialized var issue (PR sanitizer/64906)

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 11:32 AM, Marek Polacek wrote:

On Wed, Sep 23, 2015 at 08:55:53PM +0200, Bernd Schmidt wrote:

On 09/23/2015 06:07 PM, Marek Polacek wrote:

Given that the code above seems to be useless now, I think let's put this
patch in as-is, backport it to gcc-5, then remove those redundant hunks on
trunk and add the testcase above.  Do you agree?


Sounds reasonable. If you can find a point in the history where that code
wasn't useless, it would be good to help us understand why it's there.


I did some archeology.  The code wasn't useless since it was added (r211859)
till r226110 where I added some unshare_exprs.  On the testcase I posted
earlier in the thread that makes a difference:

@@ -11,7 +11,7 @@
else
  {
<<< Unknown tree: void_cst >>>
-}, (long unsigned int) (s->a[i] << SAVE_EXPR );;
+}, (long unsigned int) (s->a[UBSAN_BOUNDS (0B, SAVE_EXPR , 0);,
SAVE_EXPR ;] << SAVE_EXPR );;
  }

So we instrument the array multiple times as it's not shared anymore.

Ok to proceed with the plan I mentioned above?


Yes.


Bernd



Re: [RFC] Try vector as a new representation for vector masks

2015-09-24 Thread Richard Henderson
On 09/24/2015 01:09 AM, Richard Biener wrote:
> Both are basically a (target) restriction on how we should expand a 
> conditional
> move (and its condition).  It's techincally convenient to tie both together by
> having them in the same statement but it's also techincally very incovenient
> in other places.  I'd say for targets where
> 
> tem_1 = a_2 < b_3;
> res_4 = tem_1 ? c_5 : d_6;
> res_7 = tem_1 ? x_8 : z_9;
> 
> presents a serious issue ("re-using" the flags register) out-of-SSA should
> duplicate the conditionals so that TER can do its job (and RTL expansion
> should use TER to get at the flags setter).

Sure it's a target restriction, but it's an extremely common one.  Essentially
all of our production platforms have it.  What do we gain by adding some sort
of target hook for this?

> I imagine that if we expand the above to adjacent statements the CPUs can
> re-use the condition code.

Sure, but IMO it should be the job of RTL CSE to make that decision, after all
of the uses (and clobbers) of the flags register have been exposed.

> To me where the condition is in GIMPLE is an implementation detail and the
> inconveniences outweight the benefits.

Why is a 3-operand gimple statement fine, but a 4-operand gimple statement
inconvenient?


r~


Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm

2015-09-24 Thread Segher Boessenkool
On Thu, Sep 24, 2015 at 06:03:33PM +0200, Steven Bosscher wrote:
> On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> > +  /* First, collect all edges that can be optimized by reordering blocks:
> > + simple jumps and conditional jumps, as well as the function entry 
> > edge.  */
> > +
> > +  int n = 0;
> > +  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> > +
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +{
> > +  rtx_insn *end = BB_END (bb);
> > +
> > +  if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
> > +   continue;
> 
> Should handle ASM jumps.

Right, those are considered as optimisable now, although they are not.
Will fix.

> > +  FOR_ALL_BB_FN (bb, cfun)
> > +bb->aux = bb;
> 
> Bit tricky for the ENTRY and EXIT blocks, that are not really basic
> blocks. After the pass, EXIT should not end up pointing to itself.

But it doesn't, the next line already takes care of it.


Segher


Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread Richard Henderson
On 09/23/2015 07:39 PM, David Edelsohn wrote:
> Richard and Richard,
> 
> Appended is the updated version of the DWARF support patch for AIX.  I
> still can split out the length computation into a separate helper
> function, but, as I mentioned, it won't apply to the instance that
> uses a delta of two labels.
> 
> This version sets have_macinfo to False and disables add_AT_loc_list.
> It also define XCOFF_DEBUGGING_INFO to 0 by default in dwarf2out.c and
> dwarf2asm.c.
> 
> Thanks, David
> 
> * dwarf2out.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
> (have_macinfo): Force to False for XCOFF_DEBUGGING_INFO.
> (add_AT_loc_list): Return early if XCOFF_DEBUGGING_INFO.
> (output_compilation_unit_header): Don't output length on AIX.
> (output_pubnames): Don't output length on AIX.
> (output_aranges): Delete argument. Compute length locally. Don't
> output length on AIX.
> (output_line_info): Don't output length on AIX.
> (dwarf2out_finish): Don't compute aranges_length.
> * dwarf2asm.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
> (dw2_asm_output_nstring): Emit .byte not .ascii on AIX.
> * config/rs6000/rs6000.c (rs6000_output_dwrf_dtprel): Emit correct
> symbol decoration for AIX.
> (rs6000_xcoff_debug_unwind_info): New.
> (rs6000_xcoff_asm_named_section): Emit .dwsect pseudo-op
> for SECTION_DEBUG.
> (rs6000_xcoff_declare_function_name): Emit different
> .function pseudo-op when DWARF2_DEBUG. Don't call
> xcoffout_declare_function for DWARF2_DEBUG.
> * config/rs6000/xcoff.h (TARGET_DEBUG_UNWIND_INFO):
> Redefine.
> * config/rs6000/aix71.h (DWARF2_DEBUGGING_INFO): Define.
> (PREFERRED_DEBUGGING_TYPE): Define.
> (DEBUG_INFO_SECTION): Define.
> (DEBUG_ABBREV_SECTION): Define.
> (DEBUG_ARANGES_SECTION): Define.
> (DEBUG_LINE_SECTION): Define.
> (DEBUG_PUBNAMES_SECTION): Define.
> (DEBUG_PUBTYPES_SECTION): Define.
> (DEBUG_STR_SECTION): Define.
> (DEBUG_RANGES_SECTION): Define.

Ok.

> +  else if (TARGET_XCOFF && GET_CODE (x) == SYMBOL_REF
> +  && SYMBOL_REF_TLS_MODEL (x) != 0)
> +{
> +  if (SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_EXEC)
> +   fputs ("@le", file);
> +  else if (SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_INITIAL_EXEC)
> +   fputs ("@ie", file);
> +  else if (SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_GLOBAL_DYNAMIC
> +  || SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_DYNAMIC)
> +   fputs ("@m", file);
> +}

FWIW, I would have written this:


else if (TARGET_XCOFF && GET_CODE (x) == SYMBOL_REF)
  {
switch (SYMBOL_REF_TLS_MODEL (x))
  {
  case 0:
break;
  case TLS_MODEL_LOCAL_EXEC:
fputs ("@le", file);
break;
  case TLS_MODEL_INITIAL_EXEC:
fputs ("@ie", file);
break;
  case TLS_MODEL_GLOBAL_DYNAMIC:
  case TLS_MODEL_LOCAL_DYNAMIC:
fputs ("@m", file);
break;
  default:
gcc_unreachable ();
  }
  }


r~


Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-24 Thread Bernd Schmidt

On 09/24/2015 06:11 PM, Steve Ellcey wrote:

At least one of the warnings in glibc is not justified (in my opinion).
The header file timezone/private.h defines time_t_min and time_t_max.
These are not used in any of the timezone files built by glibc but if
you look at the complete tz package they are used when building other
objects that are not part of the glibc tz component and that include
private.h.


The standard C way of writing this would be to declare time_t_min in the 
header and have its definition in another file, or use a TIME_T_MIN 
macro as glibc does in mktime.c. That file even has a local redefinition:

  time_t time_t_min = TIME_T_MIN;
So at the very least the warning points at code that has some oddities.


I would make two arguments about why I don't think we should warn.

One is that 'static int const foo = 1' seems a lot like '#define foo 1'
and we don't complain about the macro foo not being used.  If we
complain about the unused const, why not complain about the unused
macro?  We don't complain because we know it would result in too many
warnings in existing code.  If we want people to move away from macros,
and I think we do, then we should not make it harder to do so by
introducing new warnings when they change.

The other is that C++ does not complain about this.  I know that C and
C++ are different languages with different rules but it seems like this
difference is a difference that doesn't have to exist.  Either both
should complain or neither should complain.  I can't think of any valid
reason for one to complain and the other not to.


Well, they _are_ different languages, and handling of const is one place 
where they differ. For example, C++ consts can be used in places where 
constant expressions are required. The following is a valid C++ program 
but not a C program:


const int v = 200;
int t[v];

The result is that the typical programming style for C is to have 
constants #defined, while for C++ you can find more examples like the 
above; I recall Stroustrup explicitly advocating that in the 
introductory books I read 20 years ago, and using it as a selling point 
for C++. Existing practice is important when deciding what to warn 
about, and for the moment I remain convinced that C practice is 
sufficiently different from C++.



Bernd


[PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-24 Thread Teresa Johnson
This patch unsets -freorder-blocks-and-partition when -fprofile-use
is not specified. Function splitting was not actually being performed
in that case, as probably_never_executed_bb_p does not distinguish
any basic blocks as being cold vs hot when there is no profile data.

Leaving it enabled, however, causes the assembly code generator to create
(empty) cold sections and labels, leading to unnecessary size overhead.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2015-09-24  Teresa Johnson  

* opts.c (finish_options): Unset -freorder-blocks-and-partition
if not using profile.

Index: opts.c
===
--- opts.c  (revision 228062)
+++ opts.c  (working copy)
@@ -821,7 +821,17 @@ finish_options (struct gcc_options *opts, struct g
   opts->x_flag_reorder_blocks = 1;
 }

+  /* Disable -freorder-blocks-and-partition when -fprofile-use is not in
+ effect. Function splitting was not actually being performed in that case,
+ as probably_never_executed_bb_p does not distinguish any basic blocks as
+ being cold vs hot when there is no profile data. Leaving it enabled,
+ however, causes the assembly code generator to create (empty) cold
+ sections and labels, leading to unnecessary size overhead.  */
   if (opts->x_flag_reorder_blocks_and_partition
+  && !opts_set->x_flag_profile_use)
+opts->x_flag_reorder_blocks_and_partition = 0;
+
+  if (opts->x_flag_reorder_blocks_and_partition
   && !opts_set->x_flag_reorder_functions)
 opts->x_flag_reorder_functions = 1;


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-24 Thread pinskia


> On Sep 24, 2015, at 10:16 AM, Teresa Johnson  wrote:
> 
> This patch unsets -freorder-blocks-and-partition when -fprofile-use
> is not specified. Function splitting was not actually being performed
> in that case, as probably_never_executed_bb_p does not distinguish
> any basic blocks as being cold vs hot when there is no profile data.
> 
> Leaving it enabled, however, causes the assembly code generator to create
> (empty) cold sections and labels, leading to unnecessary size overhead.
> 
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?

This might be ok for now but there is a notion to enable it for non profile 
case. 

Thanks,
Andrew




> 
> Thanks,
> Teresa
> 
> 2015-09-24  Teresa Johnson  
> 
>* opts.c (finish_options): Unset -freorder-blocks-and-partition
>if not using profile.
> 
> Index: opts.c
> ===
> --- opts.c  (revision 228062)
> +++ opts.c  (working copy)
> @@ -821,7 +821,17 @@ finish_options (struct gcc_options *opts, struct g
>   opts->x_flag_reorder_blocks = 1;
> }
> 
> +  /* Disable -freorder-blocks-and-partition when -fprofile-use is not in
> + effect. Function splitting was not actually being performed in that 
> case,
> + as probably_never_executed_bb_p does not distinguish any basic blocks as
> + being cold vs hot when there is no profile data. Leaving it enabled,
> + however, causes the assembly code generator to create (empty) cold
> + sections and labels, leading to unnecessary size overhead.  */
>   if (opts->x_flag_reorder_blocks_and_partition
> +  && !opts_set->x_flag_profile_use)
> +opts->x_flag_reorder_blocks_and_partition = 0;
> +
> +  if (opts->x_flag_reorder_blocks_and_partition
>   && !opts_set->x_flag_reorder_functions)
> opts->x_flag_reorder_functions = 1;
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[gomp4.1] Doacross library implementation

2015-09-24 Thread Jakub Jelinek
Hi!

This patch implements DOACROSS in the library, so far only as busy waiting
and even without exponential (or some guess based on distance) backoff.
Torvald, can you please have a look at it, if I got all the atomics / memory
models right?  The testcase obviously is not a good benchmark, we'll need
some more realistic one.  But obviously when asking for oversubscription, it
is quite expensive.  The question is how to implement a non-busy waiting
fallback, whether we put some mutex and queue guarded by the mutex into the
same (or some other?) cache-line, or just use atomics to queue it and how to
make it cheap for the case where busy waiting is sufficient.  I'd say
it should be sufficient to implement non-busy waiting in the flattened
variant.

As for the compiler side, I'll first adjust for the pending ticket (which
changes meaning of the ordered(n) clause if collapse(m) m > 1 is present),
then there is a bug with ordered loops that have noreturn body (need to add
some edge for that case and condition checking), lastprivate also needs
checking for all the cases, and finally more thinking on the conservative
dependence folding, where there are just too many issues unresolved right
now.

2015-09-24  Jakub Jelinek  

* gimplify.c (gimplify_omp_for): Don't adjust lastprivate
on ordered loops above collapse.
* omp-low.c (expand_omp_ordered_source): Rewritten to pass
address of an array of indices.
(expand_omp_ordered_source_sink): Create the VAR_DECL for it.
(expand_omp_for_ordered_loops): Initialize and update the
array elements.
(expand_omp_for_generic): Likewise.  Move counts array one
element back, so that collapsed loops are multiplied by correct
counts.
(lower_omp_ordered): Avoid the conservative dependence folding
for now, it has too many issues.
* omp-builtins.def (BUILT_IN_GOMP_DOACROSS_POST): Change
type to BT_FN_VOID_PTR.
gcc/testsuite/
* gcc.dg/gomp/sink-fold-1.c: Xfail.
* gcc.dg/gomp/sink-fold-2.c: Likewise.
libgomp/
* ordered.c: Include string.h and doacross.h.
(gomp_doacross_init): New function.
(GOMP_doacross_wait): Implement.
(GOMP_doacross_post): Likewise.  Change arguments to
pointer to long array.
* loop.c (gomp_loop_doacross_static_start,
gomp_loop_doacross_dynamic_start,
gomp_loop_doacross_guided_start): Call gomp_doacross_init.
* libgomp_g.h (GOMP_doacross_post): Adjust prototype.
* libgomp.h (struct gomp_doacross_work_share): New type.
(struct gomp_work_share): Put ordered_team_ids into anonymous
union with new doacross field.
* config/linux/doacross.h: New file.
* config/posix/doacross.h: New file.
* testsuite/libgomp.c/doacross-1.c: New test.

--- gcc/gimplify.c.jj   2015-09-18 18:38:17.0 +0200
+++ gcc/gimplify.c  2015-09-24 19:05:43.607556246 +0200
@@ -7788,6 +7788,10 @@ gimplify_omp_for (tree *expr_p, gimple_s
 (OMP_FOR_INIT (for_stmt))
   * 2);
 }
+  int collapse = 1;
+  c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_COLLAPSE);
+  if (c)
+collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++)
 {
   t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);
@@ -8104,8 +8108,9 @@ gimplify_omp_for (tree *expr_p, gimple_s
  OMP_CLAUSE_LINEAR_STEP (c2) = OMP_CLAUSE_LINEAR_STEP (c);
}
 
-  if ((var != decl || TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) > 1)
- && orig_for_stmt == for_stmt)
+  if ((var != decl || collapse > 1)
+ && orig_for_stmt == for_stmt
+ && i < collapse)
{
  for (c = OMP_FOR_CLAUSES (for_stmt); c ; c = OMP_CLAUSE_CHAIN (c))
if (((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE
--- gcc/omp-low.c.jj2015-09-18 18:38:17.0 +0200
+++ gcc/omp-low.c   2015-09-24 18:06:31.174495644 +0200
@@ -7071,26 +7071,11 @@ static void
 expand_omp_ordered_source (gimple_stmt_iterator *gsi, struct omp_for_data *fd,
   tree *counts, location_t loc)
 {
-  auto_vec args;
   enum built_in_function source_ix = BUILT_IN_GOMP_DOACROSS_POST;
-  tree t;
-  int i;
-
-  for (i = fd->collapse - 1; i < fd->collapse + fd->ordered - 1; i++)
-if (i == fd->collapse - 1 && fd->collapse > 1)
-  args.quick_push (fd->loop.v);
-else if (counts[i])
-  args.safe_push (counts[i]);
-else
-  {
-   t = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (fd->loops[i].v),
-fd->loops[i].v, fd->loops[i].n1);
-   t = fold_convert_loc (loc, fd->iter_type, t);
-   t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE,
- true, GSI_SAME_STMT);
-   args.safe_push (t);
-  }
-  gimple g = g

Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread David Edelsohn
>> +  else if (TARGET_XCOFF && GET_CODE (x) == SYMBOL_REF
>> +  && SYMBOL_REF_TLS_MODEL (x) != 0)
>> +{
>> +  if (SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_EXEC)
>> +   fputs ("@le", file);
>> +  else if (SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_INITIAL_EXEC)
>> +   fputs ("@ie", file);
>> +  else if (SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_GLOBAL_DYNAMIC
>> +  || SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_DYNAMIC)
>> +   fputs ("@m", file);
>> +}
>
> FWIW, I would have written this:
>
>
> else if (TARGET_XCOFF && GET_CODE (x) == SYMBOL_REF)
>   {
> switch (SYMBOL_REF_TLS_MODEL (x))
>   {
>   case 0:
> break;
>   case TLS_MODEL_LOCAL_EXEC:
> fputs ("@le", file);
> break;
>   case TLS_MODEL_INITIAL_EXEC:
> fputs ("@ie", file);
> break;
>   case TLS_MODEL_GLOBAL_DYNAMIC:
>   case TLS_MODEL_LOCAL_DYNAMIC:
> fputs ("@m", file);
> break;
>   default:
> gcc_unreachable ();
>   }
>   }

Okay, I will retest with that.

I separately have some good news and some bad news.
Good news: AIX added support for the initially missing DWARF sections.
Bad news: The support is in an AIX service pack whose presence on a
system requires effort to determine.

Thanks, David


Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread Richard Henderson
On 09/24/2015 11:40 AM, David Edelsohn wrote:
> Good news: AIX added support for the initially missing DWARF sections.

Yay!

> Bad news: The support is in an AIX service pack whose presence on a
> system requires effort to determine.

Boo!

Well, we've had worse problems with Solaris in the past.  We should certainly
put a note about the service pack in the installation instructions.

Is it reasonable to require the service pack be installed before making use of
any of this?  My thinking is that, without location lists, anything except -O0
-g2 is going to be unusable, since most local variables will no longer have any
location data.  At which point you might as well just stick with xcoff
debugging, yes?


r~


Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread David Edelsohn
On Thu, Sep 24, 2015 at 2:49 PM, Richard Henderson  wrote:
> On 09/24/2015 11:40 AM, David Edelsohn wrote:
>> Good news: AIX added support for the initially missing DWARF sections.
>
> Yay!
>
>> Bad news: The support is in an AIX service pack whose presence on a
>> system requires effort to determine.
>
> Boo!
>
> Well, we've had worse problems with Solaris in the past.  We should certainly
> put a note about the service pack in the installation instructions.
>
> Is it reasonable to require the service pack be installed before making use of
> any of this?  My thinking is that, without location lists, anything except -O0
> -g2 is going to be unusable, since most local variables will no longer have 
> any
> location data.  At which point you might as well just stick with xcoff
> debugging, yes?

I agree that debugging without location lists is fairly useless.  I
need to find out what happens without the AIX service packs.  If the
assembler and linker pass the sections with the additional numbers
through and older DBX doesn't know about the sections, it doesn't
matter.  If the assembler and linker generate errors, it further
delays deployment in GCC.

Thanks, David


Re: [PATCH, fortran] Revival of AUTOMATIC patch

2015-09-24 Thread Jerry DeLisle
On 09/24/2015 04:52 AM, Jim MacArthur wrote:
> Hi all, I'm following up on some old work my colleague Mark Doffman did to 
> try 
> and get support for the AUTOMATIC keyword into trunk. In the enclosed patch 
> I've addressed the problem with it accepting 'automatic' outside -std=gnu (it 
> will now only accept AUTOMATIC under -std=gnu or -std=legacy). I've also 
> added 
> some test cases and documentation.
> 
> To address some of the other questions about this patch:
> 
> * AUTOMATIC isn't in any official standard, but is supported by the 
> Sun/Oracle 
> Fortran compiler: 
> http://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vn79/index.html#z400073dc651
>  
> and the IBM XL compiler: 
> https://www-304.ibm.com/support/docview.wss?uid=swg27018978&aid=1
> 
> * Making this patch is our second choice after modifying our source code. The 
> scale of our source means it's not practical to manually modify it. For other 
> legacy features we've been able to do some automated transforms, but we can't 
> figure out any way to do this for AUTOMATIC. There's a chance there will be 
> some other people out there stuck with legacy code who will benefit from this 
> change.
> 

I think I appreciate what you are trying to do here.  I don't intend to sound
negative here, but if the keyword AUTOMATIC does nothing how difficult is it
really to just run a script on all your source code using something like sed and
just strip it out.  5 minutes to develop the script, 13 seconds to run it.

Or maybe a preprocessor directive that defines AUTOMATIC as ''

I must be missing something here.

Regards,

Jerry



Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread Richard Henderson
On 09/24/2015 12:02 PM, David Edelsohn wrote:
> If the assembler and linker pass the sections with the additional numbers 
> through and older DBX doesn't know about the sections, it doesn't matter.

Agreed.  In that case we simply emit the data all the time and let capable
debuggers pick it up.

> If the assembler and linker generate errors, it further
> delays deployment in GCC.

If the assembler generates errors, then surely it's easy to error out at
configure time, printing a note about the service pack.

So, assuming this is true, when configuring for aix7.1,

 * configure --without-dwarf2 disables support for dwarf2 entirely,
   avoiding problems with any missing SP.  Presumably by *not* including
   config/rs6000/aix71.h.

 * configure checks as/ld as required, erroring out if the SP isn't present;
   dwarf2out always emits all of the usual dwarf data.

Sound good?


r~


Re: New power of 2 hash policy

2015-09-24 Thread François Dumont
On 11/09/2015 15:23, Jonathan Wakely wrote:
> On 11/09/15 14:18 +0100, Jonathan Wakely wrote:
>> On 11/09/15 15:11 +0200, Michael Matz wrote:
>>> Hi,
>>>
>>> On Thu, 10 Sep 2015, François Dumont wrote:
>>>
Here is a patch to offer an alternative hash policy. This one is
 using power of 2 number of buckets allowing a faster modulo operation.
 This is obvious when running the performance test that I have
 adapted to
 use this alternative policy. Something between current implementation
 and the tr1 one, the old std one.

Of course with this hash policy the lower bits of the hash code are
 more important. For pointers it would require to change the std::hash
 implementation to remove the lower 0 bits like in the patch I proposed
 some weeks ago.

What do you think ?
>>>
>>> No comment on if it should be included (except that it seems useful to
>>> me), but one observation of the patch:
>>>
 +1ul << 31,
 +#if __SIZEOF_LONG__ != 8
 +1ul << 32
 +#else
>>>
>>> This is wrong, 1ul<<32 is zero on a 32bit machine, and is also the 33rd
>>> entry in that table, when you want only 32.  Like you also (correctly)
>>> stop with 1ul<<63 for a 64bit machine.
>>
>> I'd prefer to see that table disappear completely, replaced by a
>> constexpr function. We need a static table of prime numbers because
>> they can't be computed instantly, but we don't need to store powers of
>> two in the library.
>>
>> I agree the extension is useful, and would like to see it included,
>> but I wonder if we can do it without adding any new symbols to the
>> shared library. We certainly don't need the table, and the few other
>> functions added to the DSO could probably be defined inline in
>> headers.
>
>
> Also there several comments that talk about finding "the next prime"
> which should talk about powers of two, and the smaller table for fast
> lookup of the next "prime" may not be needed for powers of two. There
> are fast tricks for finding the next power of two using bitwise
> operations.
>
> So I'm in favour of the change in general, but it needs a little bit
> of reworking where the prime number code has been copy&pasted.
>
Hi

Here is the new patch then.

Working on it I realised that despite the comment on _M_next_bkt
saying "no smaller than n" the method can return a value smaller for big
n values. This is not likely to happen but I prefer to take care of it.
I just make sure we won't try to rehash again even if the drawback is
that we won't respect max_load_factor anymore at those levels. But as I
said we will surely have a memory issue before that.

Ok to commit ?

François



diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index a9ad7dd..4e1bc29 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -457,6 +457,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// smallest prime that keeps the load factor small enough.
   struct _Prime_rehash_policy
   {
+using __has_load_factor = std::true_type;
+
 _Prime_rehash_policy(float __z = 1.0) noexcept
 : _M_max_load_factor(__z), _M_next_resize(0) { }
 
@@ -501,6 +503,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 mutable std::size_t	_M_next_resize;
   };
 
+  /// Range hashing function considering that second args is a power of 2.
+  struct _Mask_range_hashing
+  {
+typedef std::size_t first_argument_type;
+typedef std::size_t second_argument_type;
+typedef std::size_t result_type;
+
+result_type
+operator()(first_argument_type __num,
+	   second_argument_type __den) const noexcept
+{ return __num & (__den - 1); }
+  };
+
+
+  /// Helper type to compute next power of 2.
+  template
+struct _NextPower2
+{
+  static std::size_t
+  _Get(std::size_t __n)
+  {
+	std::size_t __next = _NextPower2<(_N >> 1)>::_Get(__n);
+	return __next |= __next >> _N;
+  }
+};
+
+  template<>
+struct _NextPower2<1>
+{
+  static std::size_t
+  _Get(std::size_t __n)
+  { return __n |= __n >> 1; }
+};
+
+  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
+  /// operations.
+  struct _Power2_rehash_policy
+  {
+using __has_load_factor = std::true_type;
+
+_Power2_rehash_policy(float __z = 1.0) noexcept
+: _M_max_load_factor(__z), _M_next_resize(0) { }
+
+float
+max_load_factor() const noexcept
+{ return _M_max_load_factor; }
+
+// Return a bucket size no smaller than n (as long as n is not above the
+// highest power of 2).
+std::size_t
+_M_next_bkt(std::size_t __n) const
+{
+  constexpr auto __max_bkt
+	= (std::size_t(1) << (sizeof(std::size_t) * 8 - 1));
+
+  std::size_t __res
+	= _NextPower2<((sizeof(std::size_t) * 8) >> 1)>::_Get(--__n) + 1;
+
+  if (__res == 0)
+	__res = __max_bkt;
+
+  if (__res == __max_bkt)
+	// Set next resize to the max va

Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-09-24 Thread Ilya Verbin
On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote:
> We've discovered that, for several of the libgomp plugin interface routines,
> if the target specific routine calls exit() (usually upon a fatal condition),
> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
> well.
> 
> This is due to many of the plugin routines are called with the device lock 
> held,
> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
> destructor
> tries to iterate through and acquire all device locks to cleanup. Since we 
> already hold
> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is a
> simple futex based lock implementation (instead of pthreads), we don't have a
> trylock mechanism to use either.
> 
> So this patch tries to alleviate this problem by changing the plugin 
> interface;
> the plugin routines that are called while holding the device lock are adjusted
> to assume to never fatal exit, but return a value back to libgomp proper to
> indicate execution results. The core libgomp code then may unlock and call 
> gomp_fatal().
> 
> We believe this is the right route to solve the problem, since there's only
> two accel target plugins so far. Besides the nvptx plugin, I have made some 
> effort
> to update the intelmic plugin as well, though it's not as thoroughly audited.
> Intel folks might want to further make sure your plugin code is free of this 
> problem as well.
> 
> This patch contains the libgomp proper changes. The nvptx and intelmic 
> patches follow.
> I have tested the libgomp testsuite without regressions for both accel 
> targets, is this
> okay for trunk?

(I have no objections)

However, in case of intelmic, these exit()s are just the tip of the iceberg,
because underlying liboffloadmic contains other exit()s at fatal errors.
And I don't know what to do with such deadlocks.

  -- Ilya


Re: [gomp4] Another oacc reduction simplification

2015-09-24 Thread Cesar Philippidis
On 09/22/2015 08:29 AM, Nathan Sidwell wrote:

> 1) Don't have a fake gang reduction outside of worker & vector loops. 
> Deal with the receiver object directly.  I.e. 'ref_to_res' need not be a
> null pointer for vector and worker loops.

What happens when there is no receiver object. E.g. a reduction inside a
routine? Specifically, inside lower_oacc_reductions, your doing this:

/* This is the outermost construct with this reduction,
   see if there's a mapping for it.  */
if (maybe_lookup_field (orig, outer))
  ref_to_res = build_receiver_ref (orig, false, outer);

That's going to ICE inside a routine.

> 2) Create a local private instance for all cases of reference var
> reductions, not just those in vector & worker loops

Good. I was about to make a similar change to fix a gang reduction bug.

Cesar



patch for PR61578

2015-09-24 Thread Vladimir Makarov

  The following patch solves the 2nd case of

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

  I did a lot of benchmarking of different heuristics in hard reg cost 
propagation in IRA.  This is the best what I found.  The patch improves 
stably code size of SPEC2000 and its score although it is not that 
significant.


  The patch was tested and bootstrapped on x86-64.

  Committed as rev. 228097.

2015-09-24  Vladimir Makarov  

PR target/61578
* ira-color.c (update_allocno_cost): Add parameter.
(update_costs_from_allocno): Decrease conflict cost.  Pass the new
parameter.

Index: ira-color.c
===
--- ira-color.c	(revision 227495)
+++ ira-color.c	(working copy)
@@ -1311,10 +1311,12 @@ get_next_update_cost (ira_allocno_t *all
   return true;
 }
 
-/* Increase costs of HARD_REGNO by UPDATE_COST for ALLOCNO.  Return
-   true if we really modified the cost.  */
+/* Increase costs of HARD_REGNO by UPDATE_COST and conflict cost by
+   UPDATE_CONFLICT_COST for ALLOCNO.  Return true if we really
+   modified the cost.  */
 static bool
-update_allocno_cost (ira_allocno_t allocno, int hard_regno, int update_cost)
+update_allocno_cost (ira_allocno_t allocno, int hard_regno,
+		 int update_cost, int update_conflict_cost)
 {
   int i;
   enum reg_class aclass = ALLOCNO_CLASS (allocno);
@@ -1330,7 +1332,7 @@ update_allocno_cost (ira_allocno_t alloc
 (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (allocno),
  aclass, 0, ALLOCNO_CONFLICT_HARD_REG_COSTS (allocno));
   ALLOCNO_UPDATED_HARD_REG_COSTS (allocno)[i] += update_cost;
-  ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (allocno)[i] += update_cost;
+  ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (allocno)[i] += update_conflict_cost;
   return true;
 }
 
@@ -1342,7 +1344,7 @@ static void
 update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
 			   int divisor, bool decr_p, bool record_p)
 {
-  int cost, update_cost;
+  int cost, update_cost, update_conflict_cost;
   machine_mode mode;
   enum reg_class rclass, aclass;
   ira_allocno_t another_allocno, from = NULL;
@@ -1383,11 +1385,20 @@ update_costs_from_allocno (ira_allocno_t
 	  if (decr_p)
 	cost = -cost;
 
-	  update_cost = cp->freq * cost / divisor;
+	  update_conflict_cost = update_cost = cp->freq * cost / divisor;
+
+	  if (ALLOCNO_COLOR_DATA (another_allocno) != NULL
+	  && (ALLOCNO_COLOR_DATA (allocno)->first_thread_allocno
+		  != ALLOCNO_COLOR_DATA (another_allocno)->first_thread_allocno))
+	/* Decrease conflict cost of ANOTHER_ALLOCNO if it is not
+	   in the same allocation thread.  */
+	update_conflict_cost /= COST_HOP_DIVISOR;
+
 	  if (update_cost == 0)
 	continue;
 
-	  if (! update_allocno_cost (another_allocno, hard_regno, update_cost))
+	  if (! update_allocno_cost (another_allocno, hard_regno,
+ update_cost, update_conflict_cost))
 	continue;
 	  queue_update_cost (another_allocno, allocno, divisor * COST_HOP_DIVISOR);
 	  if (record_p && ALLOCNO_COLOR_DATA (another_allocno) != NULL)


Re: [PATCH, fortran] Revival of AUTOMATIC patch

2015-09-24 Thread FX
> I think I appreciate what you are trying to do here.  I don't intend to sound
> negative here, but if the keyword AUTOMATIC does nothing

The testcase given is not an example of useful AUTOMATIC. I think it is meant 
to be used to oppose an implied SAVE attribute, e.g. a variable with explicit 
initialization or the BIND attribute.
Indeed, in the case of implied SAVE by initialization, there it is a little bit 
more work because you have to move the initialization to the executable part of 
the code. But that’s not impossible.

All in all I’m skeptical of adding even more old language extensions with 
little demand when we have a hard time filling up gaps in the standard. Each 
addition adds to maintainance load, especially as they might not interact too 
well with more modern features. (For example coarrays or BIND attribute, which 
were not around when AUTOMATIC was in use.)

I don’t find any request for this feature in the whole bugzilla database.

FX

[gomp4] adjust worker reduction allocation

2015-09-24 Thread Nathan Sidwell
I've committed this patch to reduce the number of worker reduction allocation 
builtins.  We now pass in  the (constant) allocation size and alignment and 
return a void ptr.


nathan
2015-09-24  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_expand_work_red_addr): Args 0 & 1
	are size and alignment of allocation.
	(nvptx_types): Delete  NT_UINTPTR_UINT_UINT, NT_ULLPTR_UINT_UINT,
	NT_FLTPTR_UINT_UINT, NT_DBLPTR_UINT_UINT.  Add
	NT_PTR_UINT_UINT_UINT_UINT.
	(nvptx_builtins): Delete __builtin_nvptx_work_red_addrll,
	__builtin_nvptx_work_red_addrf,
	_builtin_nvptx_work_red_addrd. Adjust
	__builtin_nvptx_work_red_addr type.
	(nvptx_init_builtins): Adjust.
	(nvptx_get_worker_addr_fn): Rename to ...
	(nvptx_get_worker_red_addr): ... here.  Use single builtin and
	cast return type.
	(nvptx_goacc_reduction_setup, nvptx_goacc_reduction_fini,
	nvptx_goac_reduction_teardown): Adjust.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 228094)
+++ config/nvptx/nvptx.c	(working copy)
@@ -4159,18 +4159,9 @@ nvptx_expand_work_red_addr (tree exp, rt
 {
   if (ignore)
 return target;
-  
-  rtx loop_id = expand_expr (CALL_EXPR_ARG (exp, 0),
-			 NULL_RTX, mode, EXPAND_NORMAL);
-  rtx red_id = expand_expr (CALL_EXPR_ARG (exp, 1),
-			 NULL_RTX, mode, EXPAND_NORMAL);
-  gcc_assert (GET_CODE (loop_id) == CONST_INT
-	  && GET_CODE (red_id) == CONST_INT);
-  gcc_assert (REG_P (target));
-
-  unsigned lid = (unsigned)UINTVAL (loop_id);
-  unsigned rid = (unsigned)UINTVAL (red_id);
 
+  unsigned lid = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 2));
+  unsigned rid = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 3));
   unsigned ix;
 
   for (ix = 0; ix != loop_reds.length (); ix++)
@@ -4186,15 +4177,14 @@ nvptx_expand_work_red_addr (tree exp, rt
 
   /* Allocate a new var. */
   {
-tree type = TREE_TYPE (TREE_TYPE (exp));
-enum machine_mode mode = TYPE_MODE (type);
-unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
+unsigned size = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 0));
+unsigned align = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1));
 unsigned off = loop.hwm;
 
 if (align > worker_red_align)
   worker_red_align = align;
 off = (off + align - 1) & ~(align -1);
-loop.hwm = off + GET_MODE_SIZE (mode);
+loop.hwm = off + size;
 loop.vars.safe_push (var_red_t (rid, off));
   }
  found_rid:
@@ -4221,10 +4211,7 @@ enum nvptx_types
 NT_ULL_ULL_INT,
 NT_FLT_FLT_INT,
 NT_DBL_DBL_INT,
-NT_UINTPTR_UINT_UINT,
-NT_ULLPTR_UINT_UINT,
-NT_FLTPTR_UINT_UINT,
-NT_DBLPTR_UINT_UINT,
+NT_PTR_UINT_UINT_UINT_UINT,
 NT_MAX
   };
 
@@ -4236,9 +4223,6 @@ enum nvptx_builtins
   NVPTX_BUILTIN_SHUFFLE_DOWNF,
   NVPTX_BUILTIN_SHUFFLE_DOWND,
   NVPTX_BUILTIN_WORK_RED_ADDR,
-  NVPTX_BUILTIN_WORK_RED_ADDRLL,
-  NVPTX_BUILTIN_WORK_RED_ADDRF,
-  NVPTX_BUILTIN_WORK_RED_ADDRD,
   NVPTX_BUILTIN_MAX
 };
 
@@ -4252,13 +4236,7 @@ static const struct builtin_description
nvptx_expand_shuffle_down},
   {"__builtin_nvptx_shuffle_downd", NT_DBL_DBL_INT,
nvptx_expand_shuffle_down},
-  {"__builtin_nvptx_work_red_addr", NT_UINTPTR_UINT_UINT,
-   nvptx_expand_work_red_addr},
-  {"__builtin_nvptx_work_red_addrll", NT_ULLPTR_UINT_UINT,
-   nvptx_expand_work_red_addr},
-  {"__builtin_nvptx_work_red_addrf", NT_FLTPTR_UINT_UINT,
-   nvptx_expand_work_red_addr},
-  {"__builtin_nvptx_work_red_addrd", NT_DBLPTR_UINT_UINT,
+  {"__builtin_nvptx_work_red_addr", NT_PTR_UINT_UINT_UINT_UINT,
nvptx_expand_work_red_addr},
 };
 
@@ -4294,24 +4272,9 @@ nvptx_init_builtins (void)
   types[NT_DBL_DBL_INT]
 = build_function_type_list (double_type_node, double_type_node,
 integer_type_node, NULL_TREE);
-  types[NT_UINTPTR_UINT_UINT]
-= build_function_type_list (build_pointer_type (unsigned_type_node),
+  types[NT_PTR_UINT_UINT_UINT_UINT]
+= build_function_type_list (ptr_type_node,
 unsigned_type_node, unsigned_type_node,
-NULL_TREE);
-
-  types[NT_ULLPTR_UINT_UINT]
-= build_function_type_list (build_pointer_type
-(long_long_unsigned_type_node),
-unsigned_type_node, unsigned_type_node,
-NULL_TREE);
-
-  types[NT_FLTPTR_UINT_UINT]
-= build_function_type_list (build_pointer_type (float_type_node),
-unsigned_type_node, unsigned_type_node,
-NULL_TREE);
-
-  types[NT_DBLPTR_UINT_UINT]
-= build_function_type_list (build_pointer_type (double_type_node),
 unsigned_type_node, unsigned_type_node,
 NULL_TREE);
 
@@ -4440,37 +4403,18 @@ nvptx_xform_lock (gcall *call, const int
 }
 
 static tree
-nvptx_get_worker_red_addr_fn (tree var, tree rid, tree lid)
+nvptx_get_worker_red_addr (tree type, tree rid, tree lid)
 {
-  tree vartype = TREE_TYPE (var);
-  tree fndecl, call;
-  enum nvptx_builtins fn;
-  machine_mode mode = TYPE_MODE (vartype);
-
-  switch (mode)
-{
-case QImode:
-case HImode:
-case SImode:
-  fn = NVPTX_BUILTIN_WORK_RED_ADDR;
- 

Re: patch for PR61578

2015-09-24 Thread Jeff Law

On 09/24/2015 02:41 PM, Vladimir Makarov wrote:

   The following patch solves the 2nd case of

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

   I did a lot of benchmarking of different heuristics in hard reg cost
propagation in IRA.  This is the best what I found.  The patch improves
stably code size of SPEC2000 and its score although it is not that
significant.
Didn't we have a handful of missed optimization BZs that we speculated 
might be helped by propagation of hard register costs?  Might not be a 
bad idea to review those and see if any are magically fixed now.


jeff



Re: [PATCH] New attribute to create target clones

2015-09-24 Thread Evgeny Stupachenko
I've fixed ICE and review issues.
x86 make check and bootstrap passed.

Thanks,
Evgeny

ChangeLog

2015-09-25  Evgeny Stupachenko  

gcc/
* Makefile.in (OBJS): Add multiple_target.o.
* multiple_target.c (make_attribute): New.
(create_dispatcher_calls): Ditto.
(expand_target_clones): Ditto.
(ipa_target_clone): Ditto.
* passes.def (pass_target_clone): New ipa pass.
* tree-pass.h (make_pass_target_clone): Ditto.

gcc/c-family
* c-common.c (handle_target_clones_attribute): New.
* (c_common_attribute_table): Add handle_target_clones_attribute.
* (handle_always_inline_attribute): Add check on target_clones
attribute.
* (handle_target_attribute): Ditto.

gcc/testsuite
* gcc.dg/mvc1.c: New test for multiple targets cloning.
* gcc.dg/mvc2.c: Ditto.
* gcc.dg/mvc3.c: Ditto.
* gcc.dg/mvc4.c: Ditto.
* gcc.dg/mvc5.c: Ditto.
* gcc.dg/mvc6.c: Ditto.
* gcc.dg/mvc7.c: Ditto.
* g++.dg/ext/mvc1.C: Ditto.
* g++.dg/ext/mvc2.C: Ditto.
* g++.dg/ext/mvc3.C: Ditto.

gcc/doc
* doc/extend.texi (target_clones): New attribute description.


On Wed, Sep 23, 2015 at 1:49 AM, Evgeny Stupachenko  wrote:
> Thank you for the review.
> The patch still works with gcc 5, but the fail reproduced on trunk
> (looks like it appeared while patch was at review). I'll debug it and
> fix.
> As a workaround to test the feature...
> Removing
> "gimple_call_set_fndecl (call, idecl);" from multiple_target.c
> should resolve the ICE
>
> I'll fix the patch for trunk and send an update.
>
> Thanks,
> Evgeny
>
>
> On Wed, Sep 23, 2015 at 12:09 AM, Bernd Schmidt  wrote:
>> On 09/22/2015 09:41 PM, Jeff Law wrote:
>>>
>>> Essentially it allows us to more easily support
>>> per-microarchitecture-optimized versions of functions.   You list just
>>> have to list the microarchitectures and the compiler handles the rest.
>>> Very simple, very easy.  I'd think it'd be particularly helpful for
>>> vectorization.
>>>
>>> You could emulate this with compiling the same source multiple times
>>> with different flags/defines and wire up on ifunc by hand.  But Evgeny's
>>> approach is vastly simpler.
>>
>>
>> As far as I can tell the ifunc is generated automatically (and the
>> functionality is documented as such), so the new target_clone doesn't buy
>> much. But one thing I didn't was that the existing support is only available
>> in C++, while Evgeny's patch works for C. That is probably an argument that
>> could be made for its inclusion.
>>
>> Or at least, it's supposed to work. As I said, I get verify_ssa failures on
>> the included testcases, and for a simpler one I just tried I get the clones
>> of the function, but not the resolver that ought to be generated.
>>
>>
>> Bernd


target_clones.patch
Description: Binary data


Re: New post-LTO OpenACC pass

2015-09-24 Thread Nathan Sidwell

On 09/23/15 14:58, Nathan Sidwell wrote:

On 09/23/15 14:51, Bernd Schmidt wrote:

On 09/23/2015 08:42 PM, Nathan Sidwell wrote:


As I feared, builtin folding occurs in several places.  In particular
its first call is very early on in the host compiler, which is far too
soon.

We have to defer folding until we know whether we're doing host or
device compilation.


Doesn't something like "symtab->state >= EXPANSION" give you that?


I've tried limiting expansion by checking symtab->state.  I have been unable to 
succeed.


It either expands too early in the host compiler, or it doesn't get expanded at 
 all and one ends up with an RTL call to the library function.   For instance 
there doesn't appear to be call to fold builtins when state == EXPANSION. 
lesser values are present in the host compiler before LTO write out, AFAICT.


nathan


Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread David Edelsohn
On Thu, Sep 24, 2015 at 4:05 PM, Richard Henderson  wrote:

>> If the assembler and linker generate errors, it further
>> delays deployment in GCC.
>
> If the assembler generates errors, then surely it's easy to error out at
> configure time, printing a note about the service pack.
>
> So, assuming this is true, when configuring for aix7.1,
>
>  * configure --without-dwarf2 disables support for dwarf2 entirely,
>avoiding problems with any missing SP.  Presumably by *not* including
>config/rs6000/aix71.h.
>
>  * configure checks as/ld as required, erroring out if the SP isn't present;
>dwarf2out always emits all of the usual dwarf data.
>
> Sound good?

Older assemblers produce an error when presented with the new sections.

I was planning to commit the current patch, which is supported by AIX
7.1.  And follow up with another patch that adds a configure test for
the additional sections support (HAVE_XCOFF_DWARF_EXTRA?) to tweak the
output.  As you mentioned, the minimal sections should allow debugging
with -O0 -g2, which is equivalent to the stabs functionality.

Thanks, David


[gomp4] rework ptx builtins ... again

2015-09-24 Thread Nathan Sidwell
I've committed this to rework the ptx builtin machinery.  We don't use the float 
mode shuffles any more, so they;re not needed.  Also, each builtin has a unique 
prototype, so the indirection of the type array is useless.  Added swap and 
cmp_swap builtins that I'll be using shortly.



nathan
2015-09-24  Nathan Sidwell  

	* config/nvptx/nvptx.c (struct builtin_description): Delete.
	(nvptx_expand_shuffle_down): Rename to ...
	(nvptx_expand_shuffle): ... here.  add additional arg for type of
	shuffle.
	(nvptx_expand_work_red_addr): Rename to ...
	(nvptx_expand_worker_addr): ... here.
	(nvptx_expand_swap): New.
	(nvptx_expand_cmp_swap): New.
	(enum nvptx_types): Delete.
	(enum nvptx_builtins): Rename builtins.  Remove float mode
	shuffles, add SWAP and CMP_SWAP.
	(builtins): Delete.
	(nvptx_init_builtins): Reimplement.
	(nvptx_expand_builtin): Likewise.
	(nvptx_get_worker_red_addr): Adjust.
	(nvptx_generate_vector_shuffle): Adjust.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 228100)
+++ config/nvptx/nvptx.c	(working copy)
@@ -4115,36 +4115,28 @@ nvptx_file_end (void)
   }
 }
 
-/* Descriptor for a builtin.  */
+/* Expander for the shuffle builtins.  */
 
-struct builtin_description
-{
-  const char *name;
-  unsigned short type;
-  rtx (*expander) (tree, rtx, machine_mode, int);
-};
-
-/* Expander for the shuffle down builtins.  */
 static rtx
-nvptx_expand_shuffle_down (tree exp, rtx target, machine_mode mode, int ignore)
+nvptx_expand_shuffle (tree exp, rtx target, machine_mode mode, int ignore)
 {
   if (ignore)
 return target;
   
-  if (! target)
-target = gen_reg_rtx (mode);
-
   rtx src = expand_expr (CALL_EXPR_ARG (exp, 0),
 			 NULL_RTX, mode, EXPAND_NORMAL);
   if (!REG_P (src))
 src = copy_to_mode_reg (mode, src);
 
   rtx idx = expand_expr (CALL_EXPR_ARG (exp, 1),
+			 NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx op = expand_expr (CALL_EXPR_ARG  (exp, 2),
 			NULL_RTX, SImode, EXPAND_NORMAL);
+  
   if (!REG_P (idx) && GET_CODE (idx) != CONST_INT)
 idx = copy_to_mode_reg (SImode, idx);
 
-  rtx pat = nvptx_gen_shuffle (target, src, idx, SHUFFLE_DOWN);
+  rtx pat = nvptx_gen_shuffle (target, src, idx, INTVAL (op));
   if (pat)
 emit_insn (pat);
 
@@ -4152,10 +4144,10 @@ nvptx_expand_shuffle_down (tree exp, rtx
 }
 
 /* Worker reduction address expander.  */
+
 static rtx
-nvptx_expand_work_red_addr (tree exp, rtx target,
-			machine_mode ARG_UNUSED (mode),
-			int ignore)
+nvptx_expand_worker_addr (tree exp, rtx target,
+			  machine_mode ARG_UNUSED (mode), int ignore)
 {
   if (ignore)
 return target;
@@ -4205,41 +4197,71 @@ nvptx_expand_work_red_addr (tree exp, rt
   return target;
 }
 
-enum nvptx_types
-  {
-NT_UINT_UINT_INT,
-NT_ULL_ULL_INT,
-NT_FLT_FLT_INT,
-NT_DBL_DBL_INT,
-NT_PTR_UINT_UINT_UINT_UINT,
-NT_MAX
-  };
+static rtx
+nvptx_expand_swap (tree exp, rtx target,
+		   machine_mode mode, int ARG_UNUSED (ignore))
+{
+  if (!target)
+target = gen_reg_rtx (mode);
+
+  rtx mem = expand_expr  (CALL_EXPR_ARG (exp, 0),
+			  NULL_RTX, Pmode, EXPAND_NORMAL);
+  rtx src = expand_expr (CALL_EXPR_ARG (exp, 1),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+
+  rtx pat;
+  
+  if (mode == SImode)
+pat = gen_atomic_exchangesi (target, mem, src, const0_rtx);
+  else
+pat = gen_atomic_exchangedi (target, mem, src, const0_rtx);
+
+  emit_insn (pat);
+
+  return target;
+}
+
+static rtx
+nvptx_expand_cmp_swap (tree exp, rtx target,
+		   machine_mode mode, int ARG_UNUSED (ignore))
+{
+  if (!target)
+target = gen_reg_rtx (mode);
+
+  rtx mem = expand_expr (CALL_EXPR_ARG (exp, 0),
+			 NULL_RTX, Pmode, EXPAND_NORMAL);
+  rtx cmp = expand_expr (CALL_EXPR_ARG (exp, 1),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+  rtx src = expand_expr (CALL_EXPR_ARG (exp, 2),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+  rtx pat;
+
+  mem = gen_rtx_MEM (mode, mem);
+  
+  if (mode == SImode)
+pat = gen_atomic_compare_and_swapsi_1 (target, mem, cmp, src, const0_rtx);
+  else
+pat = gen_atomic_compare_and_swapdi_1 (target, mem, cmp, src, const0_rtx);
+
+  emit_insn (pat);
+
+  return target;
+}
+
 
 /* Codes for all the NVPTX builtins.  */
 enum nvptx_builtins
 {
-  NVPTX_BUILTIN_SHUFFLE_DOWN,
-  NVPTX_BUILTIN_SHUFFLE_DOWNLL,
-  NVPTX_BUILTIN_SHUFFLE_DOWNF,
-  NVPTX_BUILTIN_SHUFFLE_DOWND,
-  NVPTX_BUILTIN_WORK_RED_ADDR,
+  NVPTX_BUILTIN_SHUFFLE,
+  NVPTX_BUILTIN_SHUFFLELL,
+  NVPTX_BUILTIN_WORKER_ADDR,
+  NVPTX_BUILTIN_SWAP,
+  NVPTX_BUILTIN_SWAPLL,
+  NVPTX_BUILTIN_CMP_SWAP,
+  NVPTX_BUILTIN_CMP_SWAPLL,
   NVPTX_BUILTIN_MAX
 };
 
-static const struct builtin_description builtins[] =
-{
-  {"__builtin_nvptx_shuffle_down", NT_UINT_UINT_INT,
-   nvptx_expand_shuffle_down},
-  {"__builtin_nvptx_shuffle_downll", NT_ULL_ULL_INT,
-   nvptx_expand_shuffle_down},
-  {"__builtin_nvptx_shuffle_downf", NT_FLT_FLT_INT,
-   nvptx_expand_shuffle_down},
-  {"__builtin_nvptx_shuffle_dow

Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread Mike Stump
On Sep 24, 2015, at 11:40 AM, David Edelsohn  wrote:
> AIX added support for the initially missing DWARF sections.
> Bad news: The support is in an AIX service pack whose presence on a
> system requires effort to determine.

So, we faced this problem at Apple, and we just required that people apply the 
minor updates.  Sometimes life is too short.  I’d just add a line in the doc 
that says, X isn’t supported without Y.

Re: [PATCH] DWARF support for AIX v4

2015-09-24 Thread David Edelsohn
On Thu, Sep 24, 2015 at 8:01 PM, Mike Stump  wrote:
> On Sep 24, 2015, at 11:40 AM, David Edelsohn  wrote:
>> AIX added support for the initially missing DWARF sections.
>> Bad news: The support is in an AIX service pack whose presence on a
>> system requires effort to determine.
>
> So, we faced this problem at Apple, and we just required that people apply 
> the minor updates.  Sometimes life is too short.  I’d just add a line in the 
> doc that says, X isn’t supported without Y.

The DWARF support is useful with the minimal sections.  In fact, most
of the AIX systems to which I personally have access do not have the
support installed.  I don't have control over the patches installed on
most of the systems.  This is the difference between a server
operating system and a personal operating system: most people can
install an update on their Mac OS X system, but not everyone can
install an update on their AIX system.  Many more people can use and
test the DWARF support on AIX without requiring the latest update.

Thanks, David


[committed, pa] Update atomic support in PA backend

2015-09-24 Thread John David Anglin
The attached change updates the atomic support in the PA backend for hppa-linux.

1) It enables 64-bit sync builtin support,
2) It adds new atomic store patterns that are atomic with respect to the LWS
 compare and swap builtins in linux-atomic.c, and
3) Revises existing atomic load and store patterns to also use the LWS builtins.

Tested on hppa-unknown-linux-gnu with no observed regressions.  Committed
to trunk and gcc-5.

Dave
--
John David Anglin   dave.ang...@bell.net


2015-09-24  John David Anglin  

* config/pa/pa-linux.h (HAVE_sync_compare_and_swapdi): Define.
* config/pa/pa-protos.h (pa_maybe_emit_compare_and_swap_exchange_loop):
Declare.
* config/pa/pa.c (pa_init_libfuncs): Init sync libfuncs up to 8 bytes.
(pa_expand_compare_and_swap_loop): New.
(pa_maybe_emit_compare_and_swap_exchange_loop): New.
* config/pa/pa.md (atomic_storeqi, atomic_storehi, atomic_storesi,
atomic_storesf, atomic_loaddf, atomic_storedf): New expanders.
(atomic_loaddf_1, atomic_storedf_1): New insn patterns.
(atomic_loaddi, atomic_loaddi_1, atomic_storedi, atomic_storedi_1):
Revise.

Index: config/pa/pa-linux.h
===
--- config/pa/pa-linux.h(revision 228100)
+++ config/pa/pa-linux.h(working copy)
@@ -140,3 +140,4 @@
 #define HAVE_sync_compare_and_swapqi 1
 #define HAVE_sync_compare_and_swaphi 1
 #define HAVE_sync_compare_and_swapsi 1
+#define HAVE_sync_compare_and_swapdi 1
Index: config/pa/pa-protos.h
===
--- config/pa/pa-protos.h   (revision 228100)
+++ config/pa/pa-protos.h   (working copy)
@@ -79,6 +79,7 @@
 #endif /* ARGS_SIZE_RTX */
 extern int pa_insn_refs_are_delayed (rtx_insn *);
 extern rtx pa_get_deferred_plabel (rtx);
+extern rtx pa_maybe_emit_compare_and_swap_exchange_loop (rtx, rtx, rtx);
 #endif /* RTX_CODE */
 
 extern int pa_and_mask_p (unsigned HOST_WIDE_INT);
Index: config/pa/pa.c
===
--- config/pa/pa.c  (revision 228100)
+++ config/pa/pa.c  (working copy)
@@ -5749,7 +5749,7 @@
 }
 
   if (TARGET_SYNC_LIBCALL)
-init_sync_libfuncs (UNITS_PER_WORD);
+init_sync_libfuncs (8);
 }
 
 /* HP's millicode routines mean something special to the assembler.
@@ -10555,4 +10555,79 @@
 fputs ("\t.end_brtab\n", asm_out_file);
 }
 
+/* This is a helper function for the other atomic operations.  This function
+   emits a loop that contains SEQ that iterates until a compare-and-swap
+   operation at the end succeeds.  MEM is the memory to be modified.  SEQ is
+   a set of instructions that takes a value from OLD_REG as an input and
+   produces a value in NEW_REG as an output.  Before SEQ, OLD_REG will be
+   set to the current contents of MEM.  After SEQ, a compare-and-swap will
+   attempt to update MEM with NEW_REG.  The function returns true when the
+   loop was generated successfully.  */
+
+static bool
+pa_expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq)
+{
+  machine_mode mode = GET_MODE (mem);
+  rtx_code_label *label;
+  rtx cmp_reg, success, oldval;
+
+  /* The loop we want to generate looks like
+
+cmp_reg = mem;
+  label:
+old_reg = cmp_reg;
+seq;
+(success, cmp_reg) = compare-and-swap(mem, old_reg, new_reg)
+if (success)
+  goto label;
+
+ Note that we only do the plain load from memory once.  Subsequent
+ iterations use the value loaded by the compare-and-swap pattern.  */
+
+  label = gen_label_rtx ();
+  cmp_reg = gen_reg_rtx (mode);
+
+  emit_move_insn (cmp_reg, mem);
+  emit_label (label);
+  emit_move_insn (old_reg, cmp_reg);
+  if (seq)
+emit_insn (seq);
+
+  success = NULL_RTX;
+  oldval = cmp_reg;
+  if (!expand_atomic_compare_and_swap (&success, &oldval, mem, old_reg,
+   new_reg, false, MEMMODEL_SYNC_SEQ_CST,
+   MEMMODEL_RELAXED))
+return false;
+
+  if (oldval != cmp_reg)
+emit_move_insn (cmp_reg, oldval);
+
+  /* Mark this jump predicted not taken.  */
+  emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx,
+   GET_MODE (success), 1, label, 0);
+  return true;
+}
+
+/* This function tries to implement an atomic exchange operation using a 
+   compare_and_swap loop. VAL is written to *MEM.  The previous contents of
+   *MEM are returned, using TARGET if possible.  No memory model is required
+   since a compare_and_swap loop is seq-cst.  */
+
+rtx
+pa_maybe_emit_compare_and_swap_exchange_loop (rtx target, rtx mem, rtx val)
+{
+  machine_mode mode = GET_MODE (mem);
+
+  if (can_compare_and_swap_p (mode, true))
+{
+  if (!target || !register_operand (target, mode))
+target = gen_reg_rtx (mode);
+  if (pa_expand_compare_and_swap_loop (mem, target, val, NUL

[committed, PATCH] Change IA MCU processor from iamcu to lakemount

2015-09-24 Thread H.J. Lu
The first IA MCU processor will be Lakemount.  This patch changes IA MCU
processor name from iamcu to lakemount.

Tested on Linux/x86-64 with -m32.  Checked into trunk.

H.J.
--
gcc/

* config.gcc (x86_archs): Replace iamcu with lakemount.
(with_cpu): Likewise.
(with_arch): Likewise.
* doc/invoke.texi: Likewise.
* config/i386/i386-c.c (ix86_target_macros_internal): Replace
PROCESSOR_IAMCU with PROCESSOR_LAKEMOUNT.  Replace
__tune_iamcu__ with __tune_lakemount__.
* config/i386/i386.c (iamcu_cost): Renamed to ...
(lakemount_cost): This.
(m_IAMCU): Renamed to ...
(m_LAKEMOUNT): This.
(initial_ix86_arch_features): Replace m_IAMCU with m_LAKEMOUNT.
(processor_target_table): Replace "iamcu" with "lakemount".
(processor_alias_table): Likewise.
(ix86_issue_rate): Replace PROCESSOR_IAMCU with
PROCESSOR_LAKEMOUNT.
(ix86_adjust_cost): Likewise.
(ia32_multipass_dfa_lookahead): Likewise.
* config/i386/i386.h (processor_type): Likewise.
* config/i386/x86-tune.def: Replace m_IAMCU with m_LAKEMOUNT.

gcc/testsuite/

* gcc.target/i386/pr66749.c (dg-options): Replace -mtune=iamcu
with -mtune=lakemount.
* gcc.target/i386/pr66821.c (dg-options): Likewise.
* gcc.target/i386/pr67329.c (dg-options): Likewise.
---
 gcc/config.gcc  |  6 +++---
 gcc/config/i386/i386-c.c|  6 +++---
 gcc/config/i386/i386.c  | 16 
 gcc/config/i386/i386.h  |  2 +-
 gcc/config/i386/x86-tune.def| 26 +-
 gcc/doc/invoke.texi |  4 ++--
 gcc/testsuite/gcc.target/i386/pr66749.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr66821.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr67329.c |  2 +-
 9 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f060e2f..41814b8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -587,7 +587,7 @@ tm_defines="$tm_defines LIBC_GLIBC=1 LIBC_UCLIBC=2 
LIBC_BIONIC=3 LIBC_MUSL=4"
 x86_archs="athlon athlon-4 athlon-fx athlon-mp athlon-tbird \
 athlon-xp k6 k6-2 k6-3 geode c3 c3-2 winchip-c6 winchip2 i386 i486 \
 i586 i686 pentium pentium-m pentium-mmx pentium2 pentium3 pentium3m \
-pentium4 pentium4m pentiumpro prescott iamcu"
+pentium4 pentium4m pentiumpro prescott lakemount"
 
 # 64-bit x86 processors supported by --with-arch=.  Each processor
 # MUST be separated by exactly one space.
@@ -3287,7 +3287,7 @@ esac
 if test x$with_cpu = x ; then
   case ${target} in
 i[34567]86-*-elfiamcu)
-  with_cpu=iamcu
+  with_cpu=lakemount
   ;;
 i[34567]86-*-*|x86_64-*-*)
   with_cpu=$cpu
@@ -3385,7 +3385,7 @@ if test x$with_arch = x ; then
   # and TARGET_SUBTARGET64_ISA_DEFAULT in config/i386/darwin.h.
   ;;
 i[34567]86-*-elfiamcu)
-  with_arch=iamcu
+  with_arch=lakemount
   ;;
 i[34567]86-*-*)
   # --with-fpmath sets the default ISA to SSE2, which is the same
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 5e583ae..86f2426 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -63,7 +63,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   def_or_undef (parse_in, "__i486");
   def_or_undef (parse_in, "__i486__");
   break;
-case PROCESSOR_IAMCU:
+case PROCESSOR_LAKEMOUNT:
   /* Intel MCU is based on Intel Pentium CPU.  */
 case PROCESSOR_PENTIUM:
   def_or_undef (parse_in, "__i586");
@@ -293,8 +293,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 case PROCESSOR_SKYLAKE_AVX512:
   def_or_undef (parse_in, "__tune_skylake_avx512__");
   break;
-case PROCESSOR_IAMCU:
-  def_or_undef (parse_in, "__tune_iamcu__");
+case PROCESSOR_LAKEMOUNT:
+  def_or_undef (parse_in, "__tune_lakemount__");
   break;
 case PROCESSOR_INTEL:
 case PROCESSOR_GENERIC:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8a26f68..193cabf 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -417,7 +417,7 @@ struct processor_costs pentium_cost = {
 };
 
 static const
-struct processor_costs iamcu_cost = {
+struct processor_costs lakemount_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1) + 1,   /* cost of a lea instruction */
   COSTS_N_INSNS (1),   /* variable shift costs */
@@ -2085,7 +2085,7 @@ const struct processor_costs *ix86_cost = &pentium_cost;
 #define m_386 (1<

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-24 Thread Trevor Saunders
On Thu, Sep 24, 2015 at 06:55:11PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 06:11 PM, Steve Ellcey wrote:
> >At least one of the warnings in glibc is not justified (in my opinion).
> >The header file timezone/private.h defines time_t_min and time_t_max.
> >These are not used in any of the timezone files built by glibc but if
> >you look at the complete tz package they are used when building other
> >objects that are not part of the glibc tz component and that include
> >private.h.
> 
> The standard C way of writing this would be to declare time_t_min in the
> header and have its definition in another file, or use a TIME_T_MIN macro as
> glibc does in mktime.c. That file even has a local redefinition:
>   time_t time_t_min = TIME_T_MIN;
> So at the very least the warning points at code that has some oddities.

I can believe its an odd way to write C, but is it actually a bad one?
I expect if I got warnings for code like that I'd be pretty unhappy
about either moving the constant out where the compiler can't always see
it, or making it a macro.

> >I would make two arguments about why I don't think we should warn.
> >
> >One is that 'static int const foo = 1' seems a lot like '#define foo 1'
> >and we don't complain about the macro foo not being used.  If we
> >complain about the unused const, why not complain about the unused
> >macro?  We don't complain because we know it would result in too many
> >warnings in existing code.  If we want people to move away from macros,
> >and I think we do, then we should not make it harder to do so by
> >introducing new warnings when they change.
> >
> >The other is that C++ does not complain about this.  I know that C and
> >C++ are different languages with different rules but it seems like this
> >difference is a difference that doesn't have to exist.  Either both
> >should complain or neither should complain.  I can't think of any valid
> >reason for one to complain and the other not to.
> 
> Well, they _are_ different languages, and handling of const is one place
> where they differ. For example, C++ consts can be used in places where
> constant expressions are required. The following is a valid C++ program but
> not a C program:
> 
> const int v = 200;
> int t[v];
> 
> The result is that the typical programming style for C is to have constants
> #defined, while for C++ you can find more examples like the above; I recall
> Stroustrup explicitly advocating that in the introductory books I read 20
> years ago, and using it as a selling point for C++. Existing practice is
> important when deciding what to warn about, and for the moment I remain
> convinced that C practice is sufficiently different from C++.

existing practice is certainly important, but I would say that what is
good practice is also very important.  It seems to me that warning for
these constants is basically making it hard to follow a better practice
than the existing one.  That seems pretty unfortunate.

On the other hand I've become much more of a C++ programmer than a C one
so, I'm probably not the best judge.

Trev

> 
> 
> Bernd