Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.

2014-11-14 Thread Richard Henderson
On 11/13/2014 09:34 PM, Iain Sandoe wrote:
>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>> target region that's at issue, not the size of the lock itself.
> 
> The algorithm I've used is intentionally different from the pthreads-based 
> posix one...

All that would be fine if ...

> +/* The granularity at which locks are applied when n > CACHLINE_SIZE.
> +   We follow the posix pthreads implementation here.  */
> +#ifndef WATCH_SIZE
> +#  define WATCH_SIZE CACHLINE_SIZE
> +#endif

... you hadn't just said right here that the granularity at which you
want to lock is WATCH_SIZE,

> +#define LOCK_SIZE sizeof(LOCK_TYPE)
> +#define NLOCKS (PAGE_SIZE / LOCK_SIZE)
> +/* An array of locks, that should fill one physical page.  */
> +static LOCK_TYPE locks[NLOCKS] __attribute__((aligned(PAGE_SIZE)));

... but then go and use LOCK_SIZE instead.


r~


Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 2/5] Instrument builtin calls

2014-11-14 Thread Ilya Enkovich
2014-11-14 9:49 GMT+03:00 Jeff Law :
> On 11/06/14 05:10, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch enables instrumentation of chosen builtin calls.
>>
>> Thanks,
>> Ilya
>> --
>> 2014-11-06  Ilya Enkovich  
>>
>> * ipa-chkp.c (chkp_versioning): Clone builtin functions.
>> (chkp_instrument_normal_builtin): New.
>> (chkp_add_bounds_to_call_stmt): Support builtin functions.
>> (chkp_replace_function_pointer): Likewise.
>>
>>
>>
>> diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
>> index df7d425..9e2efdb 100644
>> --- a/gcc/tree-chkp.c
>> +++ b/gcc/tree-chkp.c
>> @@ -1586,6 +1586,50 @@ chkp_find_bound_slots (const_tree type, bitmap res)
>> chkp_find_bound_slots_1 (type, res, 0);
>>   }
>>
>> +/* Return 1 if call to FNDECL should be instrumented
>> +   and 0 otherwise.  */
>> +
>> +static bool
>> +chkp_instrument_normal_builtin (tree fndecl)
>> +{
>> +  switch (DECL_FUNCTION_CODE (fndecl))
>> +{
>> +case BUILT_IN_STRLEN:
>> +case BUILT_IN_STRCPY:
>> +case BUILT_IN_STRNCPY:
>> +case BUILT_IN_STPCPY:
>> +case BUILT_IN_STPNCPY:
>> +case BUILT_IN_STRCAT:
>> +case BUILT_IN_STRNCAT:
>> +case BUILT_IN_MEMCPY:
>> +case BUILT_IN_MEMPCPY:
>> +case BUILT_IN_MEMSET:
>> +case BUILT_IN_MEMMOVE:
>> +case BUILT_IN_BZERO:
>> +case BUILT_IN_STRCMP:
>> +case BUILT_IN_STRNCMP:
>> +case BUILT_IN_BCMP:
>> +case BUILT_IN_MEMCMP:
>> +case BUILT_IN_MEMCPY_CHK:
>> +case BUILT_IN_MEMPCPY_CHK:
>> +case BUILT_IN_MEMMOVE_CHK:
>> +case BUILT_IN_MEMSET_CHK:
>> +case BUILT_IN_STRCPY_CHK:
>> +case BUILT_IN_STRNCPY_CHK:
>> +case BUILT_IN_STPCPY_CHK:
>> +case BUILT_IN_STPNCPY_CHK:
>> +case BUILT_IN_STRCAT_CHK:
>> +case BUILT_IN_STRNCAT_CHK:
>> +case BUILT_IN_MALLOC:
>> +case BUILT_IN_CALLOC:
>> +case BUILT_IN_REALLOC:
>> +  return 1;
>> +
>> +default:
>> +  return 0;
>> +}
>> +}
>
> OK, this gates creation of the additional builtin and ensures we don't try
> to create an instrumention clone for anything outside the list above.
>
>
>> @@ -1686,11 +1730,18 @@ chkp_add_bounds_to_call_stmt (gimple_stmt_iterator
>> *gsi)
>> if (!flag_chkp_instrument_calls)
>>   return;
>>
>> -  /* Avoid instrumented builtin functions for now.  Due to IPA
>> - it also means we have to avoid instrumentation of indirect
>> - calls.  */
>> -  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN)
>> -return;
>> +  /* We instrument only some subset of builtins.  We also instrument
>> + builtin calls to be inlined.  */
>> +  if (fndecl
>> +  && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
>> +  && !chkp_instrument_normal_builtin (fndecl))
>> +{
>> +  struct cgraph_node *clone = chkp_maybe_create_clone (fndecl);
>> +  if (!clone
>> + || !gimple_has_body_p (clone->decl)
>> + || !lookup_attribute ("always_inline", DECL_ATTRIBUTES
>> (fndecl)))
>> +   return;
>> +}
>
> Is that outer conditional right?  If we have a fndecl and it's a normal
> builtin, but it's _not_ one of hte builtins we're instrumenting, then call
> chkp_maybe_create_clone?

Some builtin functions (especially their *_chk version) are defined as
always_inline functions in headers.  In this case we handle them as
regular functions (clone and instrument) because they will be inlined
anyway. Seems gimple_has_body_p should be applied to fndecl and moved
into outer if-statement along with attribute check.  Thus unneeded
clones would be avoided.

Thanks,
Ilya

>
>
> Will reserve OK/Not OK decision until after you respond to that issue.
>
> jeff


[AArch64, Patch] Add range-check for Symbol + offset addressing.

2014-11-14 Thread Tejas Belagod


Hi,

Following the discussion here 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02237.html, this has been 
tracked down to a range-checking bug with symbol + offset style 
addressing with adrp where we allowed any arbitrary offset and this 
would cause truncation of final addresses when relocations were being 
resolved by ld.
When we retreive symbol + offset address, we have to make sure the 
offset does not cause overflow of the final address.  But we have no way 
of knowing the address of symbol at compile time
so we can't accurately say if the distance between the PC and symbol + 
offset is outside the addressible range of +/-1M in the TINY code model. 
 So we rely on images not being greater than 1M and cap the offset at 
1M and anything beyond 1M will have to be loaded using an alternate 
mechanism. Similarly for the SMALL code model the offset has been capped 
at 4G.


The cap value for the offset in each code model is open to debate.

All testing done with Alan's workaround 
patch(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01509.html) reversed.


bootstrapped aarch64-linux.

OK for trunk?

Thanks,
Tejas.

2014-11-14  Tejas Belagod  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_classify_symbol):
Fixup prototype.
* config/aarch64/aarch64.c (aarch64_expand_mov_immediate,
aarch64_cannot_force_const_mem, aarch64_classify_address,
aarch64_classify_symbolic_expression): Fixup call to
aarch64_classify_symbol.
(aarch64_classify_symbol): Add range-checking for
symbol + offset addressing for tiny and small models.

testsuite/
* gcc.target/aarch64/symbol-range.c: New.
* gcc.target/aarch64/symbol-range-tiny.c: New.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 470b9eb..2cf4292 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -218,7 +218,7 @@ const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 const char *aarch64_rewrite_selected_cpu (const char *name);
 
-enum aarch64_symbol_type aarch64_classify_symbol (rtx,
+enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx,
  enum aarch64_symbol_context);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 736ad90..d6ecf6c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1072,7 +1072,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 before we start classifying the symbol.  */
   split_const (imm, &base, &offset);
 
-  sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR);
+  sty = aarch64_classify_symbol (base, offset, SYMBOL_CONTEXT_ADR);
   switch (sty)
{
case SYMBOL_FORCE_TO_MEM:
@@ -2963,7 +2963,7 @@ aarch64_cannot_force_const_mem (machine_mode mode 
ATTRIBUTE_UNUSED, rtx x)
   split_const (x, &base, &offset);
   if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
 {
-  if (aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR)
+  if (aarch64_classify_symbol (base, offset, SYMBOL_CONTEXT_ADR)
  != SYMBOL_FORCE_TO_MEM)
return true;
   else
@@ -3377,7 +3377,7 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
  rtx sym, offs;
  split_const (info->offset, &sym, &offs);
  if (GET_CODE (sym) == SYMBOL_REF
- && (aarch64_classify_symbol (sym, SYMBOL_CONTEXT_MEM)
+ && (aarch64_classify_symbol (sym, offs, SYMBOL_CONTEXT_MEM)
  == SYMBOL_SMALL_ABSOLUTE))
{
  /* The symbol and offset must be aligned to the access size.  */
@@ -3434,7 +3434,7 @@ aarch64_classify_symbolic_expression (rtx x,
   rtx offset;
 
   split_const (x, &x, &offset);
-  return aarch64_classify_symbol (x, context);
+  return aarch64_classify_symbol (x, offset, context);
 }
 
 
@@ -6594,7 +6594,7 @@ aarch64_classify_tls_symbol (rtx x)
LABEL_REF X in context CONTEXT.  */
 
 enum aarch64_symbol_type
-aarch64_classify_symbol (rtx x,
+aarch64_classify_symbol (rtx x, rtx offset,
 enum aarch64_symbol_context context ATTRIBUTE_UNUSED)
 {
   if (GET_CODE (x) == LABEL_REF)
@@ -6628,12 +6628,25 @@ aarch64_classify_symbol (rtx x,
   switch (aarch64_cmodel)
{
case AARCH64_CMODEL_TINY:
- if (SYMBOL_REF_WEAK (x))
+ /* When we retreive symbol + offset address, we have to make sure
+the offset does not cause overflow of the final address.  But
+we have no way of knowing the address of symbol at compile time
+so we can't accurately say if the distance between the PC and
+symbol + offset is outside the addressible range of +/-1M in the
+TINY code model.  So we rely on ima

Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.

2014-11-14 Thread Iain Sandoe
Hello Richard,

On 14 Nov 2014, at 08:01, Richard Henderson wrote:

> On 11/13/2014 09:34 PM, Iain Sandoe wrote:
>>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>>> target region that's at issue, not the size of the lock itself.
>> 
>> The algorithm I've used is intentionally different from the pthreads-based 
>> posix one...
> 
> All that would be fine if ...
> 
>> +/* The granularity at which locks are applied when n > CACHLINE_SIZE.
>> +   We follow the posix pthreads implementation here.  */
>> +#ifndef WATCH_SIZE
>> +#  define WATCH_SIZE CACHLINE_SIZE
>> +#endif
> 
> ... you hadn't just said right here that the granularity at which you
> want to lock is WATCH_SIZE,

That granularity *is* applied to items >= on cache line in size.

>> +#define LOCK_SIZE sizeof(LOCK_TYPE)
>> +#define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>> +/* An array of locks, that should fill one physical page.  */
>> +static LOCK_TYPE locks[NLOCKS] __attribute__((aligned(PAGE_SIZE)));
> 
> ... but then go and use LOCK_SIZE instead.

my locks are only 4 bytes [whereas they are 
rounded-up-to-n-cachlines(sizeof(pthreads mutext)) for the posix 
implementation].
The items that they are locking are of arbitrary size (at least up to one page).

hmmm .. there's something I'm not following about what you are seeing as a 
problem here.

In the posix implementation the granularity calculation is also used to round 
up the space allocated in the locks table for each pthreads mutex (i.e. it has 
two uses, AFAICT).

thanks
Iain



Re: [AArch64, Patch] Add range-check for Symbol + offset addressing.

2014-11-14 Thread Andrew Pinski
On Fri, Nov 14, 2014 at 12:12 AM, Tejas Belagod  wrote:
>
> Hi,
>
> Following the discussion here
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02237.html, this has been
> tracked down to a range-checking bug with symbol + offset style addressing
> with adrp where we allowed any arbitrary offset and this would cause
> truncation of final addresses when relocations were being resolved by ld.
> When we retreive symbol + offset address, we have to make sure the offset
> does not cause overflow of the final address.  But we have no way of knowing
> the address of symbol at compile time
> so we can't accurately say if the distance between the PC and symbol +
> offset is outside the addressible range of +/-1M in the TINY code model.  So
> we rely on images not being greater than 1M and cap the offset at 1M and
> anything beyond 1M will have to be loaded using an alternate mechanism.
> Similarly for the SMALL code model the offset has been capped at 4G.
>
> The cap value for the offset in each code model is open to debate.
>
> All testing done with Alan's workaround
> patch(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01509.html) reversed.
>
> bootstrapped aarch64-linux.
>
> OK for trunk?

This looks like a better fix than I would have came up with.
Since you are touching this area you might want to look into this issue:
I notice SYMBOL_REF_WEAK (x) is true for references(decls) which are
comdat's which are declared in the translation unit.  So force them to
memory when really we know they are declared and don't have a value of
zero so they will fit in the medium code model.  This happens with
vtables and we lose some performance because of this.

Thanks,
Andrew Pinski


>
> Thanks,
> Tejas.
>
> 2014-11-14  Tejas Belagod  
>
> gcc/
> * config/aarch64/aarch64-protos.h (aarch64_classify_symbol):
> Fixup prototype.
> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate,
> aarch64_cannot_force_const_mem, aarch64_classify_address,
> aarch64_classify_symbolic_expression): Fixup call to
> aarch64_classify_symbol.
> (aarch64_classify_symbol): Add range-checking for
> symbol + offset addressing for tiny and small models.
>
> testsuite/
> * gcc.target/aarch64/symbol-range.c: New.
> * gcc.target/aarch64/symbol-range-tiny.c: New.


Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 1/5] Builtin codes and decls

2014-11-14 Thread Ilya Enkovich
2014-11-14 9:43 GMT+03:00 Jeff Law :
> On 11/06/14 04:48, Ilya Enkovich wrote:
>>
>> --
>> 2014-11-06  Ilya Enkovich  
>>
>> * tree-core.h (built_in_class): Add builtin codes to be used
>> by Pointer Bounds Checker for instrumented builtin functions.
>> * tree-streamer-in.c: Include ipa-chkp.h.
>> (streamer_get_builtin_tree): Create instrumented decl if
>> required.
>> * ipa-chkp.h (chkp_maybe_clone_builtin_fndecl): New.
>> * ipa-chkp.c (chkp_build_instrumented_fndecl): Support builtin
>> function decls.
>> (chkp_maybe_clone_builtin_fndecl): New.
>> (chkp_maybe_create_clone): Support builtin function decls.
>
> Looks much better than prior versions.
>>
>>
>>
>> @@ -355,6 +365,30 @@ chkp_add_bounds_params_to_function (tree fndecl)
>>   chkp_copy_function_type_adding_bounds (TREE_TYPE (fndecl));
>>   }
>>
>> +tree
>> +chkp_maybe_clone_builtin_fndecl (tree fndecl)
>
> Need a function comment here.
>
>
>>   /* Return clone created for instrumentation of NODE or NULL.  */
>>
>>   cgraph_node *
>> @@ -365,6 +399,52 @@ chkp_maybe_create_clone (tree fndecl)
>>
>> gcc_assert (!node->instrumentation_clone);
>>
>> +  if (DECL_BUILT_IN (fndecl)
>> +  && (DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL
>> + || DECL_FUNCTION_CODE (fndecl) >= BEGIN_CHKP_BUILTINS))
>> +return NULL;
>
> Just so I'm sure, the only way to get into chkp_maybe_clone_builtin_decl is
> if this test is false.  Right?
>
> Can we ultimately end up with checking clones for any normal builtin? What
> filters out builtins that don't need a checking variant?

As you already noticed the filter is in the second patch.

>
>
>> +
>> +  clone = node->instrumented_version;
>> +
>> +  /* For builtin functions we may loose and recreate
>> + cgraph node.  We should check if we already have
>> + instrumented version.  */
>
> Can you describe to me under what circumstances this happens?  It seems like
> we may be papering over an issue that would be better fixed elsewhere.

I don't think I'm hiding some problem here.  Builtin function calls
may be removed during various optimizations.  Therefore we may remove
all calls to some instrumented builtins and corresponding cgraph_node
is removed as unreachable (but fndecl still exists).  Later calls to
removed function may be created again.  IIRC in my test case it
happened in strlen pass which may replace builtin calls with another
ones.  In this case cgraph_node is recreated but fndecl recreation
should be avoided, existing one should be used instead.

Thanks,
Ilya

>
>
>> @@ -409,6 +489,15 @@ chkp_maybe_create_clone (tree fndecl)
>>  actually copies args list from the original decl.  */
>> chkp_add_bounds_params_to_function (new_decl);
>>
>> +  /* Remember builtin fndecl.  */
>> +  if (DECL_BUILT_IN_CLASS (clone->decl) == BUILT_IN_NORMAL
>> + && fndecl == builtin_decl_explicit (DECL_FUNCTION_CODE
>> (fndecl)))
>> +   {
>> + gcc_assert (!builtin_decl_explicit (DECL_FUNCTION_CODE
>> (clone->decl)));
>> + set_builtin_decl (DECL_FUNCTION_CODE (clone->decl),
>> +   clone->decl, false);
>> +   }
>
> I'm not a big fan of slamming in a new DECL like this, but it may be OK.
> I'm not going to object to that now, but I worry about downstream impacts.
>
>
> Tentatively OK after adding the missing function comment.  Please wait for
> entire kit to be approved before committing anything.  I may come back to
> something as I dig deeper into the other patches in the series.
>
> jeff


Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.

2014-11-14 Thread Richard Henderson
On 11/14/2014 09:12 AM, Iain Sandoe wrote:
> my locks are only 4 bytes [whereas they are 
> rounded-up-to-n-cachlines(sizeof(pthreads mutext)) for the posix 
> implementation].
> The items that they are locking are of arbitrary size (at least up to one 
> page).
> 
> hmmm .. there's something I'm not following about what you are seeing as a 
> problem here.
> 
> In the posix implementation the granularity calculation is also used to
> round up the space allocated in the locks table for each pthreads mutex
> (i.e. it has two uses, AFAICT).

No, there's only one use: How large an area is *protected* by the lock.

Since we need to protect one page of these areas, we need NLOCKS = PAGE_SIZE /
WATCH_SIZE locks, which are then allocated in an array.  We do not care how
large that array is.

So if you'd like to differ from the posix implementation in protecting
4 bytes at a time, rather than one cacheline at a time, then just change
WATCH_SIZE to 4.  The fact that WATCH_SIZE happens to equal to the lock size is
simply a coincidence.


r~


The nvptx port

2014-11-14 Thread Jakub Jelinek
Hi!

I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
   to be wired up and uses the %alloca documented in the PTX
   manual, what is the issue with that?  %alloca not being actually
   implemented by the current PTX assembler or translator?  Or
   some local vs. global address space issues?  If the latter,
   could at least VLAs be supported?
2) what is the reason why TLS isn't supported by the port (well,
   __emutls is emitted, but I doubt pthread_[gs]etspecific is
   implementable and thus it will not really do anything.
   Can't the port just emit all DECL_THREAD_LOCAL_P variables
   into .local instead of .global address space?  Would one
   need to convert those pointers to generic any way?
   I'm asking because e.g. libgomp uses __thread heavily and
   it would be nice to be able to use that.
3) in assembly emitted by the nvptx port, I've noticed:
.visible .func (.param.u32 %out_retval)foo(.param.u64 %in_ar1, .param.u32 
%in_ar2)
{
.reg.u64 %ar1;
.reg.u32 %ar2;
.reg.u32 %retval;
.reg.u64 %hr10;
.reg.u32 %r22;
.reg.u64 %r25;
   is the missing \t before the %retval line intentional?
4) I had a brief look at what it would take to port libgomp to PTX,
   which is needed for OpenMP offloading.  OpenMP offloaded kernels
   should start with 1 team and 1 thread in it, if we ignore
   GOMP_teams for now, I think the major things are:
   - right now libgomp is heavily pthread_* based, which is a no-go
 for nvptx I assume, I think we'll need some ifdefs in the sources
   - the main thing is that I believe we just have to replace
 gomp_team_start for nvptx; seems there are
 cudaLaunchDevice (and cudaGetParameterBuffer) functions one can use
 to spawn selected kernel in selected number of threads (and teams),
 from the docs it isn't exactly clear what the calling thread will do,
 if it is suspended and the HW core given to it is reused by something
 else (e.g. one of the newly spawned threads), then I think it should
 be usable.  Not sure what happens with .local memory of the parent
 task, if the children all have different .local memory, then
 perhaps one could just copy over what is needed from the
 invoking to the first invoked thread at start.  The question is
 how to figure out what to pass to cudeLaunchDevice (e.g. how to
 get handle of the current stream), and how to query how many
 teams and/or threads it is reasonable to ask for if the program
 wants defaults (and how many teams/threads are hard limits beyond which
 one can't go)
   - is it worth to reuse cudaLaunchDevice "threads" or are they cheap
 enough to start that any "thread" pooling should be removed for nvptx?
   - we'll need some synchronization primitives, I see atomic support is
 there, we need mutexes and semaphores I think, is that implementable
 using bar instruction?
   - the library uses __attribute__((constructor)) in 3 places or so,
 initialize_team is pthread specific and can be probably ifdefed out,
 we won't support dlclose in nvptx anyway, but at least we need some
 way to initialize the nvptx libgomp; if the initialization is done
 in global memory, would it persist in between different kernels,
 so can the initialization as separate kernel be run once, something
 else?
   - is there any way to do any affinity management, or shall we just
 ignore affinity strategies?
   - the target/offloading stuff should be most likely stubbed in the
 library for nvptx, target data/target regions inside of target
 regions are undefined behavior in OpenMP, no need to bloat things
   - any way how to query time?
   Other thoughts?

Jakub


Re: [patch] OpenACC fortran front end

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 08:56:53AM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 13 Nov 2014 17:44:40 -0800, Cesar Philippidis 
>  wrote:
> > On 11/13/2014 08:43 AM, Jakub Jelinek wrote:
> > > Can you please avoid the TODOs in the source?  If it is not the right
> > > thing, either do something better, or file a PR to schedule such work for
> > > the future.
> 
> Should we use the existing openmp keyword for this,
> , or get a new openacc
> keyword added?

Please add openacc.

Jakub


[PATCH] Merge predicate iteration from match-and-simplify

2014-11-14 Thread Richard Biener

This merges a genmatch IL feature from the branch.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-11-14  Richard Biener  

* genmatch.c (add_operator): Allow CONSTRUCTOR.
(dt_node::gen_kids): Handle CONSTRUCTOR not as GENERIC.
(parser::parse_op): Allow to iterate over predicates.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 217504)
+++ gcc/genmatch.c  (working copy)
@@ -310,7 +310,9 @@ add_operator (enum tree_code code, const
   /* For {REAL,IMAG}PART_EXPR and VIEW_CONVERT_EXPR.  */
   && strcmp (tcc, "tcc_reference") != 0
   /* To have INTEGER_CST and friends as "predicate operators".  */
-  && strcmp (tcc, "tcc_constant") != 0)
+  && strcmp (tcc, "tcc_constant") != 0
+  /* And allow CONSTRUCTOR for vector initializers.  */
+  && !(code == CONSTRUCTOR))
 return;
   operator_id *op = new operator_id (code, id, nargs, tcc);
   id_base **slot = operators->find_slot_with_hash (op, op->hashval, INSERT);
@@ -2013,7 +2015,8 @@ dt_node::gen_kids (FILE *f, bool gimple)
  dt_operand *op = as_a (kids[i]);
  if (expr *e = dyn_cast  (op->op))
{
- if (e->ops.length () == 0)
+ if (e->ops.length () == 0
+ && (!gimple || !(*e->operation == CONSTRUCTOR)))
generic_exprs.safe_push (op);
  else if (e->operation->kind == id_base::FN)
{
@@ -3026,6 +3029,14 @@ parser::parse_op ()
{
  if (code->nargs != 0)
fatal_at (token, "using an operator with operands as 
predicate");
+ /* Parse the zero-operand operator "predicates" as
+expression.  */
+ op = new expr (opr);
+   }
+ else if (user_id *code = dyn_cast  (opr))
+   {
+ if (code->nargs != 0)
+   fatal_at (token, "using an operator with operands as 
predicate");
  /* Parse the zero-operand operator "predicates" as
 expression.  */
  op = new expr (opr);


Re: [PATCH] -fsanitize=unreachable overhaul (PR sanitizer/63839)

2014-11-14 Thread Richard Biener
On Thu, 13 Nov 2014, Marek Polacek wrote:

> As Richi pointed in the pr audit trail, instrumenting via folding is
> bad.  In this case we changed __builtin_unreachable, created by the
> inliner, into BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE which requires
> VOPS, which is a no-no in folding.  So this patch:
> - marks BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE as const to match
>   BUILT_IN_UNREACHABLE,
> - moves the __builtin_unreachable instrumentation into sanopt pass,
> - disables optimize_unreachable when doing the __builtin_unreachable
>   instrumentation,
> - marks BUILT_IN_UNREACHABLE as "cold" (I don't see how this could
>   make a difference?).  Now, BUILT_IN_TRAP should probably be also
>   marked as const+cold; I'm happy to do that as a follow-up.

It's not necessary to mark any of them as cold as seen from predict.c:

  if (is_gimple_call (stmt))
{
  if ((gimple_call_flags (stmt) & ECF_NORETURN)
  && has_return_edges)
predict_paths_leading_to (bb, PRED_NORETURN,
  NOT_TAKEN);
  decl = gimple_call_fndecl (stmt);
  if (decl
  && lookup_attribute ("cold",
   DECL_ATTRIBUTES (decl)))
predict_paths_leading_to (bb, PRED_COLD_FUNCTION,
  NOT_TAKEN);

so anything with a noreturn attribute behaves exactly the same as cold.

So please leave existing non-cold things as non-cold.

> Bootstrapped/regtested on power8-linux, ok for trunk?

Ok with that change.

Thanks,
Richard.

> 2014-11-13  Marek Polacek  
> 
>   PR sanitizer/63839
>   * asan.c (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST,
>   ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST): Define.
>   * builtin-attrs.def (ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST):
>   Define.
>   * builtins.c (fold_builtin_0): Don't include ubsan.h.  Don't
>   instrument BUILT_IN_UNREACHABLE here.
>   * builtins.def (BUILT_IN_UNREACHABLE): Make cold.
>   * sanitizer.def (BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE): Make
>   const.
>   * sanopt.c (pass_sanopt::execute): Instrument BUILT_IN_UNREACHABLE.
>   * tree-ssa-ccp.c (optimize_unreachable): Bail out if
>   SANITIZE_UNREACHABLE.
>   * ubsan.c (ubsan_instrument_unreachable): Rewrite for GIMPLE.
>   * ubsan.h (ubsan_instrument_unreachable): Adjust declaration.
> testsuite/
>   * c-c++-common/ubsan/pr63839.c: New test.
>   * c-c++-common/ubsan/unreachable-2.c: New test.
> 
> diff --git gcc/asan.c gcc/asan.c
> index 79dede7..2961b44 100644
> --- gcc/asan.c
> +++ gcc/asan.c
> @@ -2346,6 +2346,9 @@ initialize_sanitizer_builtins (void)
>  #define ATTR_TMPURE_NOTHROW_LEAF_LIST ECF_TM_PURE | ATTR_NOTHROW_LEAF_LIST
>  #undef ATTR_NORETURN_NOTHROW_LEAF_LIST
>  #define ATTR_NORETURN_NOTHROW_LEAF_LIST ECF_NORETURN | ATTR_NOTHROW_LEAF_LIST
> +#undef ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST
> +#define ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST \
> +  ECF_CONST | ATTR_NORETURN_NOTHROW_LEAF_LIST
>  #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
>  #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
>ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
> @@ -2355,6 +2358,9 @@ initialize_sanitizer_builtins (void)
>  #undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
>  #define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
>/* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
> +#undef ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST
> +#define ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST \
> +  /* ECF_COLD missing */ ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST
>  #undef DEF_SANITIZER_BUILTIN
>  #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,
> \
> diff --git gcc/builtin-attrs.def gcc/builtin-attrs.def
> index 9c05a94..c707367 100644
> --- gcc/builtin-attrs.def
> +++ gcc/builtin-attrs.def
> @@ -145,6 +145,8 @@ DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LIST, 
> ATTR_SENTINEL,\
>   ATTR_NULL, ATTR_NOTHROW_LIST)
>  DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LEAF_LIST, ATTR_SENTINEL,  \
>   ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
> +DEF_ATTR_TREE_LIST (ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
> + ATTR_NULL, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
>  
>  /* Functions whose pointer parameter(s) are all nonnull.  */
>  DEF_ATTR_TREE_LIST (ATTR_NONNULL_LIST, ATTR_NONNULL, ATTR_NULL, ATTR_NULL)
> diff --git gcc/builtins.c gcc/builtins.c
> index 1cd65ed..311c0e3 100644
> --- gcc/builtins.c
> +++ gcc/builtins.c
> @@ -64,7 +64,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "diagnostic-core.h"
>  #include "builtins.h"
>  #include "asan.h"
> -#include "ubsan.h"
>  #include "cilk.h"
>  #include "ipa-ref.h"
>  #include "lto-streamer.h"
> @@ -9803,14 +9802,6 @@ fold_builtin_0 (location_t loc, tree fndecl, bool 
> ignore A

Re: [PATCH] Remove doubled ECF_LEAF

2014-11-14 Thread Richard Biener
On Thu, 13 Nov 2014, Marek Polacek wrote:

> Bootstrapped/regtested on power8-linux, ok for trunk?

Looks obvious ;)

Thanks,
Richard.

> 2014-11-13  Marek Polacek  
> 
>   * tree.c (build_common_builtin_nodes): Remove doubled ECF_LEAF.
> 
> diff --git gcc/tree.c gcc/tree.c
> index cf37a19..5c6fe0b 100644
> --- gcc/tree.c
> +++ gcc/tree.c
> @@ -9935,7 +9935,7 @@ build_common_builtin_nodes (void)
>local_define_builtin ("__builtin_unreachable", ftype, 
> BUILT_IN_UNREACHABLE,
>   "__builtin_unreachable",
>   ECF_NOTHROW | ECF_LEAF | ECF_NORETURN
> - | ECF_CONST | ECF_LEAF);
> + | ECF_CONST);
>  }
>  
>if (!builtin_decl_explicit_p (BUILT_IN_MEMCPY)
> 
>   Marek
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.

2014-11-14 Thread Iain Sandoe

On 14 Nov 2014, at 08:25, Richard Henderson wrote:

> On 11/14/2014 09:12 AM, Iain Sandoe wrote:
>> my locks are only 4 bytes [whereas they are 
>> rounded-up-to-n-cachlines(sizeof(pthreads mutext)) for the posix 
>> implementation].
>> The items that they are locking are of arbitrary size (at least up to one 
>> page).
>> 
>> hmmm .. there's something I'm not following about what you are seeing as a 
>> problem here.
>> 
>> In the posix implementation the granularity calculation is also used to
>> round up the space allocated in the locks table for each pthreads mutex
>> (i.e. it has two uses, AFAICT).
> 
> No, there's only one use: How large an area is *protected* by the lock.
> 
> Since we need to protect one page of these areas, we need NLOCKS = PAGE_SIZE /
> WATCH_SIZE locks, which are then allocated in an array.  We do not care how
> large that array is.
> 
> So if you'd like to differ from the posix implementation in protecting
> 4 bytes at a time, rather than one cacheline at a time, then just change
> WATCH_SIZE to 4.  The fact that WATCH_SIZE happens to equal to the lock size 
> is
> simply a coincidence.

Indeed (the use to round up the space allocated for the mutex also happens to 
be another co-incidence)

However, my intention is to have a variable-sized area protected by each locks.
The nummber of locks allocated exceeds (page-size/watch-size) [unless 
watch-sizes was reduced to 4bytes, of course]

Only when the size of the area to be protected exceeds one cache-line do I 
split it up into cache-line-sized chunks.

I happened to allocate one-page-worth of locks (as a somewhat arbitrary choice 
in the absence of metrics to guide otherwise) - which is another source of 
co-incidence.

Perhaps some re-naming of things would help, or do you think that a scheme to 
lock variable-sized chunks cannot work?

Iain

Re: LTO streaming of TARGET_OPTIMIZE_NODE

2014-11-14 Thread Richard Biener
On Fri, 14 Nov 2014, Jan Hubicka wrote:

> Hi,
> here is upated version with bitfields and also tested on PPC64-linux/aix.
> I hacked configury to use system awk instead of gawk, so the changes are 
> hopefully safe.
> 
> OK?

Ok.

Thanks,
Richard.

> Honza
> 
>   * optc-save-gen.awk: Output cl_target_option_eq,
>   cl_target_option_hash, cl_target_option_stream_out,
>   cl_target_option_stream_in functions.
>   * opth-gen.awk: Output prototypes for
>   cl_target_option_eq and cl_target_option_hash.
>   * lto-streamer.h (cl_target_option_stream_out,
>   cl_target_option_stream_in): Declare.
>   * tree.c (cl_option_hash_hash): Use cl_target_option_hash.
>   (cl_option_hash_eq): Use cl_target_option_eq.
>   * tree-streamer-in.c (unpack_value_fields): Stream in
>   TREE_TARGET_OPTION.
>   * lto-streamer-out.c (DFS::DFS_write_tree_body): Follow
>   DECL_FUNCTION_SPECIFIC_TARGET.
>   (hash_tree): Hash TREE_TARGET_OPTION; visit
>   DECL_FUNCTION_SPECIFIC_TARGET.
>   * tree-streamer-out.c (streamer_pack_tree_bitfields): Skip
>   TS_TARGET_OPTION.
>   (streamer_write_tree_body): Output TS_TARGET_OPTION.
> 
>   * lto.c (compare_tree_sccs_1): Compare cl_target_option_eq.
> Index: lto/lto.c
> ===
> --- lto/lto.c (revision 217513)
> +++ lto/lto.c (working copy)
> @@ -1377,7 +1377,8 @@
>return false;
>  
>if (CODE_CONTAINS_STRUCT (code, TS_TARGET_OPTION))
> -gcc_unreachable ();
> +if (!cl_target_option_eq (TREE_TARGET_OPTION (t1), TREE_TARGET_OPTION 
> (t2)))
> +  return false;
>  
>if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
>  if (memcmp (TREE_OPTIMIZATION (t1), TREE_OPTIMIZATION (t2),
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 217513)
> +++ lto-streamer-out.c(working copy)
> @@ -594,7 +594,7 @@
>  {
>DFS_follow_tree_edge (DECL_VINDEX (expr));
>DFS_follow_tree_edge (DECL_FUNCTION_PERSONALITY (expr));
> -  /* Do not DECL_FUNCTION_SPECIFIC_TARGET.  They will be regenerated.  */
> +  DFS_follow_tree_edge (DECL_FUNCTION_SPECIFIC_TARGET (expr));
>DFS_follow_tree_edge (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr));
>  }
>  
> @@ -945,7 +945,7 @@
>   strlen (TRANSLATION_UNIT_LANGUAGE (t)));
>  
>if (CODE_CONTAINS_STRUCT (code, TS_TARGET_OPTION))
> -gcc_unreachable ();
> +hstate.add_wide_int (cl_target_option_hash (TREE_TARGET_OPTION (t)));
>  
>if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
>  hstate.add (t, sizeof (struct cl_optimization));
> @@ -1028,7 +1028,7 @@
>  {
>visit (DECL_VINDEX (t));
>visit (DECL_FUNCTION_PERSONALITY (t));
> -  /* Do not follow DECL_FUNCTION_SPECIFIC_TARGET.  */
> +  visit (DECL_FUNCTION_SPECIFIC_TARGET (t));
>visit (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (t));
>  }
>  
> Index: lto-streamer.h
> ===
> --- lto-streamer.h(revision 217513)
> +++ lto-streamer.h(working copy)
> @@ -836,7 +836,15 @@
>  lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
>  void select_what_to_stream (bool);
>  
> +/* In options-save.c.  */
> +void cl_target_option_stream_out (struct output_block *, struct bitpack_d *,
> +   struct cl_target_option *);
>  
> +void cl_target_option_stream_in (struct data_in *,
> +  struct bitpack_d *,
> +  struct cl_target_option *);
> +
> +
>  /* In lto-symtab.c.  */
>  extern void lto_symtab_merge_decls (void);
>  extern void lto_symtab_merge_symbols (void);
> Index: optc-save-gen.awk
> ===
> --- optc-save-gen.awk (revision 217513)
> +++ optc-save-gen.awk (working copy)
> @@ -39,6 +39,18 @@
>  print ""
>  print "#include " quote "flags.h" quote
>  print "#include " quote "target.h" quote
> +print "#include " quote "inchash.h" quote
> +print "#include " quote "tree.h" quote
> +print "#include " quote "tree-ssa-alias.h" quote
> +print "#include " quote "is-a.h" quote
> +print "#include " quote "predict.h" quote
> +print "#include " quote "function.h" quote
> +print "#include " quote "basic-block.h" quote
> +print "#include " quote "gimple-expr.h" quote
> +print "#include " quote "gimple.h" quote
> +print "#include " quote "data-streamer.h" quote
> +print "#include " quote "ipa-ref.h" quote
> +print "#include " quote "cgraph.h" quote
>  print ""
>  
>  if (n_extra_c_includes > 0) {
> @@ -417,4 +429,126 @@
>  
>  print "}";
>  
> +print "";
> +print "/* Compare two target options  */";
> +print "bool";
> +print "cl_target_option_eq (struct cl_target_option const *ptr1 
> ATTRIBUTE_UNUSED,";
> +print " struct cl_target_option const *ptr

Re: [AArch64, Patch] Add range-check for Symbol + offset addressing.

2014-11-14 Thread Tejas Belagod

On 14/11/14 08:19, Andrew Pinski wrote:

On Fri, Nov 14, 2014 at 12:12 AM, Tejas Belagod  wrote:


Hi,

Following the discussion here
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02237.html, this has been
tracked down to a range-checking bug with symbol + offset style addressing
with adrp where we allowed any arbitrary offset and this would cause
truncation of final addresses when relocations were being resolved by ld.
When we retreive symbol + offset address, we have to make sure the offset
does not cause overflow of the final address.  But we have no way of knowing
the address of symbol at compile time
so we can't accurately say if the distance between the PC and symbol +
offset is outside the addressible range of +/-1M in the TINY code model.  So
we rely on images not being greater than 1M and cap the offset at 1M and
anything beyond 1M will have to be loaded using an alternate mechanism.
Similarly for the SMALL code model the offset has been capped at 4G.

The cap value for the offset in each code model is open to debate.

All testing done with Alan's workaround
patch(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01509.html) reversed.

bootstrapped aarch64-linux.

OK for trunk?


This looks like a better fix than I would have came up with.
Since you are touching this area you might want to look into this issue:
I notice SYMBOL_REF_WEAK (x) is true for references(decls) which are
comdat's which are declared in the translation unit.  So force them to
memory when really we know they are declared and don't have a value of
zero so they will fit in the medium code model.  This happens with
vtables and we lose some performance because of this.



Do you have a bugzilla ticket open for this?

Thanks,
Tejas.




Re: [Ping] [PATCH, 1/10] two hooks for conditional compare (ccmp)

2014-11-14 Thread Marcus Shawcroft
On 29 October 2014 10:28, Zhenqiang Chen  wrote:
>
>
>> -Original Message-
>> From: Richard Henderson [mailto:r...@redhat.com]
>> Sent: Monday, October 27, 2014 10:56 PM
>> To: Zhenqiang Chen
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [Ping] [PATCH, 1/10] two hooks for conditional compare (ccmp)
>>
>> On 10/27/2014 12:47 AM, Zhenqiang Chen wrote:
>> > + @var{NULL} if the combination of @var{prev} and this comparison is\n\
>>
>> @code{NULL}
>
> Thanks! Patch is updated.
>
> -Zhenqiang


Zhenqiang,  This patch series now has explicit OK from Richard for all
the generic and aarch64 specific parts.  You should go ahead and
commit.
Cheers
/Marcus


Re: [AArch64, Patch] Add range-check for Symbol + offset addressing.

2014-11-14 Thread Marcus Shawcroft
On 14 November 2014 08:19, Andrew Pinski  wrote:
> On Fri, Nov 14, 2014 at 12:12 AM, Tejas Belagod  wrote:
>>
>> Hi,
>>
>> Following the discussion here
>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02237.html, this has been
>> tracked down to a range-checking bug with symbol + offset style addressing
>> with adrp where we allowed any arbitrary offset and this would cause
>> truncation of final addresses when relocations were being resolved by ld.
>> When we retreive symbol + offset address, we have to make sure the offset
>> does not cause overflow of the final address.  But we have no way of knowing
>> the address of symbol at compile time
>> so we can't accurately say if the distance between the PC and symbol +
>> offset is outside the addressible range of +/-1M in the TINY code model.  So
>> we rely on images not being greater than 1M and cap the offset at 1M and
>> anything beyond 1M will have to be loaded using an alternate mechanism.
>> Similarly for the SMALL code model the offset has been capped at 4G.
>>
>> The cap value for the offset in each code model is open to debate.
>>
>> All testing done with Alan's workaround
>> patch(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01509.html) reversed.
>>
>> bootstrapped aarch64-linux.
>>
>> OK for trunk?
>
> This looks like a better fix than I would have came up with.
> Since you are touching this area you might want to look into this issue:
> I notice SYMBOL_REF_WEAK (x) is true for references(decls) which are
> comdat's which are declared in the translation unit.  So force them to
> memory when really we know they are declared and don't have a value of
> zero so they will fit in the medium code model.  This happens with
> vtables and we lose some performance because of this.

Andrew, do you mind if we take that as a separate patch, I'd like to
take Tejas' patch sooner rather than later since it gates building a
variety of stuff some folks care about.
Cheers
/Marcus


Re: [AArch64, Patch] Add range-check for Symbol + offset addressing.

2014-11-14 Thread pinskia




> On Nov 14, 2014, at 12:54 AM, Marcus Shawcroft  
> wrote:
> 
>> On 14 November 2014 08:19, Andrew Pinski  wrote:
>>> On Fri, Nov 14, 2014 at 12:12 AM, Tejas Belagod  
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> Following the discussion here
>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02237.html, this has been
>>> tracked down to a range-checking bug with symbol + offset style addressing
>>> with adrp where we allowed any arbitrary offset and this would cause
>>> truncation of final addresses when relocations were being resolved by ld.
>>> When we retreive symbol + offset address, we have to make sure the offset
>>> does not cause overflow of the final address.  But we have no way of knowing
>>> the address of symbol at compile time
>>> so we can't accurately say if the distance between the PC and symbol +
>>> offset is outside the addressible range of +/-1M in the TINY code model.  So
>>> we rely on images not being greater than 1M and cap the offset at 1M and
>>> anything beyond 1M will have to be loaded using an alternate mechanism.
>>> Similarly for the SMALL code model the offset has been capped at 4G.
>>> 
>>> The cap value for the offset in each code model is open to debate.
>>> 
>>> All testing done with Alan's workaround
>>> patch(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01509.html) reversed.
>>> 
>>> bootstrapped aarch64-linux.
>>> 
>>> OK for trunk?
>> 
>> This looks like a better fix than I would have came up with.
>> Since you are touching this area you might want to look into this issue:
>> I notice SYMBOL_REF_WEAK (x) is true for references(decls) which are
>> comdat's which are declared in the translation unit.  So force them to
>> memory when really we know they are declared and don't have a value of
>> zero so they will fit in the medium code model.  This happens with
>> vtables and we lose some performance because of this.
> 
> Andrew, do you mind if we take that as a separate patch, I'd like to
> take Tejas' patch sooner rather than later since it gates building a
> variety of stuff some folks care about.

Yes that is ok.  This was more of since you were looking into this area kind of 
thing but it can wait until later. 

Thanks,
Andrew

> Cheers
> /Marcus


Re: [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX

2014-11-14 Thread Kyrill Tkachov

Hi Andrew,

On 14/11/14 00:56, Andrew Pinski wrote:

In ThunderX, any 1 cycle arthemantic instruction that produces the flags
register, will be fused with a branch.  This patch depends on
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01508.html.
Note I know bit 1 is going is already going to be used and that is why I
proposed this being bit 2.

Build and tested for aarch64-elf with no regressions.

ChangeLog:
* config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
(thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
(aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.
---
  gcc/config/aarch64/aarch64.c |   15 ++-
  1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a258f40..5216ac0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -304,6 +304,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
  
  #define AARCH64_FUSE_NOTHING	(0)

  #define AARCH64_FUSE_MOV_MOVK (1 << 0)
+#define AARCH64_FUSE_CMP_BRANCH(1 << 2)
  
  #if HAVE_DESIGNATED_INITIALIZERS && GCC_VERSION >= 2007

  __extension__
@@ -349,7 +350,7 @@ static const struct tune_params thunderx_tunings =
&generic_vector_cost,
NAMED_PARAM (memmov_cost, 6),
NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
  };
  
  /* A processor implementing AArch64.  */

@@ -10036,6 +10037,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  }
  }
  
+  if ((aarch64_tune_params->fuseable_ops & AARCH64_FUSE_CMP_BRANCH)

+  && any_condjump_p (curr))
+{
+  /* FIXME: this misses some which is considered simple arthematic
+ instructions for ThunderX.  Simple shifts are missed here.  */

s/is/are


+  if (get_attr_type (prev) == TYPE_ALUS_SREG
+  || get_attr_type (prev) == TYPE_ALUS_IMM
+  || get_attr_type (prev) == TYPE_LOGICS_REG
+  || get_attr_type (prev) == TYPE_LOGICS_IMM)
+   return true;


IIRC the get_attr_* functions can call recog_memoized on prev which can 
potentially change
the recog_data for the insn, sometimes resulting in corruption. Is this 
definitely safe to do?


Kyrill


+}
+
return false;
  }
  





Re: [Patch AArch64] Fix PR 63724 - Improve immediate generation

2014-11-14 Thread Marcus Shawcroft
On 12 November 2014 16:46, Ramana Radhakrishnan
 wrote:

> v2 , based on Richard's suggestion as well as fixing a bug that I hit in
> some more testing at O1. aarch64_internal_mov_immediate should not generate
> a temporary for subtarget when not actually "generating" code.
>
> Tested again on aarch64-none-elf and with a bootstrap / reg test. Ok ?
>
>
>   Ramana Radhakrishnan  
>
> PR target/63724
> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Split out
>numerical immediate handling to...
> (aarch64_internal_mov_immediate): ...this. New.
> (aarch64_rtx_costs): Use aarch64_internal_mov_immediate.
> (aarch64_mov_operand_p): Relax predicate.
> * config/aarch64/aarch64.md (mov:GPI): Do not expand
> CONST_INTs.
> (*movsi_aarch64): Turn into define_insn_and_split and new
> alternative
> for 'n'.
> (*movdi_aarch64): Likewise.


OK.

We should be able to remove aarch64_build_constant now.

/Marcus


Re: [patch] OpenACC fortran front end

2014-11-14 Thread Thomas Schwinge
Hi!

On Fri, 14 Nov 2014 09:33:13 +0100, Jakub Jelinek  wrote:
> On Fri, Nov 14, 2014 at 08:56:53AM +0100, Thomas Schwinge wrote:
> > On Thu, 13 Nov 2014 17:44:40 -0800, Cesar Philippidis 
> >  wrote:
> > > On 11/13/2014 08:43 AM, Jakub Jelinek wrote:
> > > > Can you please avoid the TODOs in the source?  If it is not the right
> > > > thing, either do something better, or file a PR to schedule such work 
> > > > for
> > > > the future.
> > 
> > Should we use the existing openmp keyword for this,
> > , or get a new openacc
> > keyword added?
> 
> Please add openacc.

Turns out that I could do that myself --
 created.


Grüße,
 Thomas


pgptnEphu_gNv.pgp
Description: PGP signature


[PATCH][21/n] Merge from match-and-simplify, more binary patterns

2014-11-14 Thread Richard Biener

The following patch merges binary patterns from the branch that
are exercised by fold_stmt and gimple_fold_stmt_to_constant in the
process of removing the dispatch of fold_binary from those.

I filed PR63862 for a C frontend bug which I noticed because of
fixing a fold bug which happily looked through sign-changing
conversions when asking tree_expr_nonnegative_p for the shift
amount.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2014-11-14  Richard Biener  

* match.pd: Implement more binary patterns exercised by
fold_stmt.
* fold-const.c (sing_bit_p): Export.
(exact_inverse): Likewise.
(fold_binary_loc): Remove patterns here.
(tree_unary_nonnegative_warnv_p): Use CASE_CONVERT.
* fold-const.h (sing_bit_p): Declare.
(exact_inverse): Likewise.

* gcc.c-torture/execute/shiftopt-1.c: XFAIL invalid parts.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2014-11-14 10:16:47.845424237 +0100
--- gcc/fold-const.c2014-11-14 10:17:45.005421735 +0100
*** static tree decode_field_reference (loca
*** 130,136 
HOST_WIDE_INT *,
machine_mode *, int *, int *,
tree *, tree *);
- static tree sign_bit_p (tree, const_tree);
  static int simple_operand_p (const_tree);
  static bool simple_operand_p_2 (tree);
  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
--- 130,135 
*** all_ones_mask_p (const_tree mask, unsign
*** 3651,3657 
 The return value is the (sub)expression whose sign bit is VAL,
 or NULL_TREE otherwise.  */
  
! static tree
  sign_bit_p (tree exp, const_tree val)
  {
int width;
--- 3650,3656 
 The return value is the (sub)expression whose sign bit is VAL,
 or NULL_TREE otherwise.  */
  
! tree
  sign_bit_p (tree exp, const_tree val)
  {
int width;
*** fold_addr_of_array_ref_difference (locat
*** 9474,9480 
  /* If the real or vector real constant CST of type TYPE has an exact
 inverse, return it, else return NULL.  */
  
! static tree
  exact_inverse (tree type, tree cst)
  {
REAL_VALUE_TYPE r;
--- 9473,9479 
  /* If the real or vector real constant CST of type TYPE has an exact
 inverse, return it, else return NULL.  */
  
! tree
  exact_inverse (tree type, tree cst)
  {
REAL_VALUE_TYPE r;
*** fold_binary_loc (location_t loc,
*** 9963,9987 
}
else
{
- /* See if ARG1 is zero and X + ARG1 reduces to X.  */
- if (fold_real_zero_addition_p (TREE_TYPE (arg0), arg1, 0))
-   return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));
- 
- /* Likewise if the operands are reversed.  */
- if (fold_real_zero_addition_p (TREE_TYPE (arg1), arg0, 0))
-   return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1));
- 
- /* Convert X + -C into X - C.  */
- if (TREE_CODE (arg1) == REAL_CST
- && REAL_VALUE_NEGATIVE (TREE_REAL_CST (arg1)))
-   {
- tem = fold_negate_const (arg1, type);
- if (!TREE_OVERFLOW (arg1) || !flag_trapping_math)
-   return fold_build2_loc (loc, MINUS_EXPR, type,
-   fold_convert_loc (loc, type, arg0),
-   fold_convert_loc (loc, type, tem));
-   }
- 
  /* Fold __complex__ ( x, 0 ) + __complex__ ( 0, y )
 to __complex__ ( x, y ).  This is not the same for SNaNs or
 if signed zeros are involved.  */
--- 9962,9967 
*** fold_binary_loc (location_t loc,
*** 10023,10034 
  && (tem = distribute_real_division (loc, code, type, arg0, arg1)))
return tem;
  
- /* Convert x+x into x*2.0.  */
- if (operand_equal_p (arg0, arg1, 0)
- && SCALAR_FLOAT_TYPE_P (type))
-   return fold_build2_loc (loc, MULT_EXPR, type, arg0,
-   build_real (type, dconst2));
- 
/* Convert a + (b*c + d*e) into (a + b*c) + d*e.
   We associate floats only if the user has specified
   -fassociative-math.  */
--- 10003,10008 
*** fold_binary_loc (location_t loc,
*** 10381,10389 
  
if (! FLOAT_TYPE_P (type))
{
- if (integer_zerop (arg0))
-   return negate_expr (fold_convert_loc (loc, type, arg1));
- 
  /* Fold A - (A & B) into ~B & A.  */
  if (!TREE_SIDE_EFFECTS (arg0)
  && TREE_CODE (arg1) == BIT_AND_EXPR)
--- 10355,10360 
*** fold_binary_loc (location_t loc,
*** 10428,10443 
}
}
  
-   /* See if ARG1 is zero and X - ARG1 reduces to X.  */
-   else if (fold_real_zero_addition_p (TREE_TYPE (arg0), arg1, 1)

Re: [Committed] Fix bug 61997

2014-11-14 Thread Marcus Shawcroft
On 11 November 2014 23:39, Andrew Pinski  wrote:
> Hi,
>   The problem here is that aarch64-builtins.c contains gty markers but
> does not include gt-aarch64-builtins.h and is not included in the
> target_gtfiles list in config.gcc.  So sometimes the builtins get
> garbage collected when they should not be.
>
> Committed as obvious after a build and test on aarch64-elf.

Thankyou.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> Bug target/61997
> * config.gcc (aarch64*-*-*): Set target_gtfiles to include
> aarch64-builtins.c.
> * config/aarch64/aarch64-builtins.c: Include gt-aarch64-builtins.h
> at the end of the file.


Looks like this is an issue in 4.9 Andrew do you have the time to back port it?
/Marcus


Re: [patch, aarch64] additional bics patterns

2014-11-14 Thread Richard Earnshaw
On 13/11/14 17:42, Sandra Loosemore wrote:
> On 11/13/2014 10:27 AM, Richard Earnshaw wrote:
>> On 13/11/14 17:05, Ramana Radhakrishnan wrote:
>>> On Thu, Nov 13, 2014 at 4:55 PM, Sandra Loosemore
>>>  wrote:
 This patch to the AArch64 back end adds a couple of additional bics 
 patterns
 to match code of the form

if ((x & y) == x) ...;

 This is testing whether the bits set in x are a subset of the bits set in 
 y;
 or, that no bits in x are set that are not set in y.  So, it is equivalent
 to

if ((x & ~y) == 0) ...;

 Presently this generates code like
and x21, x21, x20
cmp x21, x20
b.eqc0 

 and this patch allows it to be written more concisely as:
bics x21, x20, x21
b.eq c0 

 Since the bics instruction sets the condition codes itself, no explicit
 comparison is required and the result of the bics computation can be
 discarded.

 Regression-tested on aarch64-linux-gnu.  OK to commit?
>>>
>>> Is this not a duplicate of
>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00943.html ?
>>>
>>>
>> I don't think so.  However, I think it is something that should be
>> caught in generic simplification code
>>
>> ie map  ((a & b) == b) ==> ((~a & b) == 0), etc
>>
>> Bit-clear operations are not that uncommon.  Furthermore, A may be a
>> constant.
> 
> Alex posted his patch when I already had Chris's in my regression test 
> queue, but I've just confirmed that it does not fix the test case I 
> included.

Alex's patch is adding a proper pattern for the BICS instruction.  I
wouldn't expect it to directly achieve what you are trying to do - but I
do think the compiler should transform what you have into the form that
Alex has just added the patterns for.

> 
> I already thought a little about making this a generic simplification, 
> but it seemed to me like it was only useful on targets that have a 
> bit-clear instruction that happens to set condition codes, and that it 
> would pessimize code on targets that don't have a bit-clear instruction 
> at all (by inserting the extra complement operation).  So to me it 
> seemed reasonable to do it in the back end.

I doubt that.  I'd be surprised if any target had a direct ((A & B) ==
A) type comparison, so all you're doing is canonicalizing something that
no target is likely to have into something some targets have.  And
ulitimately, if the pattern doesn't match, combine will just undo all
the changes.  Furthermore, as I previously said, if B is a constant then
you're going to get a real optimization for that case that is widely
applicable.  Eg

A & ~1 == A

will simplify into

A & 1 == 0

Many targets will have a flag-setting AND operation.

R.



Re: [patch, aarch64] additional bics patterns

2014-11-14 Thread Richard Earnshaw
On 13/11/14 22:44, Sandra Loosemore wrote:
> On 11/13/2014 10:47 AM, Andrew Pinski wrote:
>> On Thu, Nov 13, 2014 at 9:42 AM, Sandra Loosemore
>>  wrote:
>>> On 11/13/2014 10:27 AM, Richard Earnshaw wrote:

 On 13/11/14 17:05, Ramana Radhakrishnan wrote:
>
> On Thu, Nov 13, 2014 at 4:55 PM, Sandra Loosemore
>  wrote:
>>
>> This patch to the AArch64 back end adds a couple of additional bics
>> patterns
>> to match code of the form
>>
>> if ((x & y) == x) ...;
>>
>> This is testing whether the bits set in x are a subset of the bits set
>> in y;
>> or, that no bits in x are set that are not set in y.  So, it is
>> equivalent
>> to
>>
>> if ((x & ~y) == 0) ...;
>>
>> Presently this generates code like
>> and x21, x21, x20
>> cmp x21, x20
>> b.eqc0 
>>
>> and this patch allows it to be written more concisely as:
>> bics x21, x20, x21
>> b.eq c0 
>>
>> Since the bics instruction sets the condition codes itself, no explicit
>> comparison is required and the result of the bics computation can be
>> discarded.
>>
>> Regression-tested on aarch64-linux-gnu.  OK to commit?
>
>
> Is this not a duplicate of
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00943.html ?
>
>
 I don't think so.  However, I think it is something that should be
 caught in generic simplification code

 ie map  ((a & b) == b) ==> ((~a & b) == 0), etc

 Bit-clear operations are not that uncommon.  Furthermore, A may be a
 constant.
>>>
>>>
>>> Alex posted his patch when I already had Chris's in my regression test
>>> queue, but I've just confirmed that it does not fix the test case I
>>> included.
>>>
>>> I already thought a little about making this a generic simplification, but
>>> it seemed to me like it was only useful on targets that have a bit-clear
>>> instruction that happens to set condition codes, and that it would pessimize
>>> code on targets that don't have a bit-clear instruction at all (by inserting
>>> the extra complement operation).  So to me it seemed reasonable to do it in
>>> the back end.
>>
>> But can't you do this in simplify-rtx.c and allow for the cost model
>> to do the correct thing?
> 
> I could give that a shot, but it seems unlikely that I will be able to 
> complete the patch rewrite and testing before we are in Stage 3.  If I 
> have something ready next week, will it be too late for consideration in 
> GCC 5?
> 

You're following up on a previously submitted patch and you're not
rewriting large chunks of the compiler.  I don't think this is really
that complicated so I'd be surprised if anyone strongly objected because
it wasn't in final form before the end of stage1.

I would recommend you try to get it at least re-submitted before the end
of the year, though.

R.




Re: [PATCH 1/3] [AARCH64] Add macro fusion support for cmp/b.X for ThunderX

2014-11-14 Thread Andrew Pinski
On Fri, Nov 14, 2014 at 1:08 AM, Kyrill Tkachov  wrote:
> Hi Andrew,
>
>
> On 14/11/14 00:56, Andrew Pinski wrote:
>>
>> In ThunderX, any 1 cycle arthemantic instruction that produces the flags
>> register, will be fused with a branch.  This patch depends on
>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01508.html.
>> Note I know bit 1 is going is already going to be used and that is why I
>> proposed this being bit 2.
>>
>> Build and tested for aarch64-elf with no regressions.
>>
>> ChangeLog:
>> * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
>> (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
>> (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.
>> ---
>>   gcc/config/aarch64/aarch64.c |   15 ++-
>>   1 files changed, 14 insertions(+), 1 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index a258f40..5216ac0 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -304,6 +304,7 @@ static const struct cpu_vector_cost
>> cortexa57_vector_cost =
>> #define AARCH64_FUSE_NOTHING(0)
>>   #define AARCH64_FUSE_MOV_MOVK (1 << 0)
>> +#define AARCH64_FUSE_CMP_BRANCH(1 << 2)
>> #if HAVE_DESIGNATED_INITIALIZERS && GCC_VERSION >= 2007
>>   __extension__
>> @@ -349,7 +350,7 @@ static const struct tune_params thunderx_tunings =
>> &generic_vector_cost,
>> NAMED_PARAM (memmov_cost, 6),
>> NAMED_PARAM (issue_rate, 2),
>> -  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
>> +  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
>>   };
>> /* A processor implementing AArch64.  */
>> @@ -10036,6 +10037,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev,
>> rtx_insn *curr)
>>   }
>>   }
>>   +  if ((aarch64_tune_params->fuseable_ops & AARCH64_FUSE_CMP_BRANCH)
>> +  && any_condjump_p (curr))
>> +{
>> +  /* FIXME: this misses some which is considered simple arthematic
>> + instructions for ThunderX.  Simple shifts are missed here.  */
>
> s/is/are
>
>> +  if (get_attr_type (prev) == TYPE_ALUS_SREG
>> +  || get_attr_type (prev) == TYPE_ALUS_IMM
>> +  || get_attr_type (prev) == TYPE_LOGICS_REG
>> +  || get_attr_type (prev) == TYPE_LOGICS_IMM)
>> +   return true;
>
>
> IIRC the get_attr_* functions can call recog_memoized on prev which can
> potentially change
> the recog_data for the insn, sometimes resulting in corruption. Is this
> definitely safe to do?

Safe in this context, yes.  I used the similar pattern as what is done for x86:
In the sched-deps.c before calling this function we have the following
(if before reload):
  extract_insn (insn);

extract_insn already will call recog_memoized.

Thanks,
Andrew

>
> Kyrill
>
>> +}
>> +
>> return false;
>>   }
>>
>
>
>


Re: The nvptx port

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 09:29:48AM +0100, Jakub Jelinek wrote:
> I have some questions about nvptx:

Oh, and
5) I have noticed gcc doesn't generate the .uni suffixes anywhere,
   while llvm generates them; are those appropriate only when a function
   is guaranteed to be run unconditionally from the toplevel kernel,
   or even in spots in arbitrary functions which might not be run
   unconditionally by all threads in thread block, but all threads
   that encounter the particular function will run the specific spot
   unconditionally?  I mean, if we have arbitrary function:
void foo (void) { something; bar (); something; }
   then the call is unconditional in there, but there is no guarantee
   somebody will not do
void baz (int x) { if (x > 20) foo (); }
   and run foo only in a subset of the threads.

Jakub


[Patch, ARM]Fix pattern that is missed for Thumb-1 UAL

2014-11-14 Thread Terry Guo
Hi there,

Attached patch intends to fix a pattern that is found still non-UAL when do
gcc thumb-1 bootstrap. A test case is reduced and attached. Tested with gcc
regression test on pre-v6 thumb1 and v6 thumb1. No regression. Multilib can
be built for both of them.
Is it OK to trunk?

BR,
Terry

gcc/ChangeLog:
2014-11-14  Terry Guo  

 * config/arm/thumb1.md (*addsi3_cbranch_scratch): Updated to UAL
format.

gcc/testsuite/ChangeLog:
2014-11-14  Terry Guo  

 * gcc.target/arm/thumb1-ual-1.c: New test.diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 3d6f80b..ddedc39 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -1420,13 +1420,13 @@
 if (INTVAL (operands[2]) < 0)
   output_asm_insn (\"subs\t%0, %1, %2\", operands);
 else
-  output_asm_insn (\"add\t%0, %1, %2\", operands);
+  output_asm_insn (\"adds\t%0, %1, %2\", operands);
 break;
case 3:
 if (INTVAL (operands[2]) < 0)
   output_asm_insn (\"subs\t%0, %0, %2\", operands);
 else
-  output_asm_insn (\"add\t%0, %0, %2\", operands);
+  output_asm_insn (\"adds\t%0, %0, %2\", operands);
 break;
}
 
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-ual-1.c 
b/gcc/testsuite/gcc.target/arm/thumb1-ual-1.c
new file mode 100644
index 000..a2e439c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-ual-1.c
@@ -0,0 +1,87 @@
+/* Test Thumb1 insn pattern addsi3_cbranch_scratch.  */
+/* { dg-options "-O2" } */
+/* { dg-skip-if "" { ! { arm_thumb1 } } } */
+
+struct real_value {
+
+  unsigned int cl : 2;
+  unsigned int decimal : 1;
+  unsigned int sign : 1;
+  unsigned int signalling : 1;
+  unsigned int canonical : 1;
+  unsigned int uexp : (32 - 6);
+  unsigned long sig[((128 + (8 * 4)) / (8 * 4))];
+};
+
+enum real_value_class {
+  rvc_zero,
+  rvc_normal,
+  rvc_inf,
+  rvc_nan
+};
+
+extern void exit(int);
+extern int foo(long long *, int, int);
+
+int
+real_to_integer (const struct real_value *r, int *fail, int precision)
+{
+  long long val[2 * (((64*(8)) + 64) / 64)];
+  int exp;
+  int words, w;
+  int result;
+
+  switch (r->cl)
+{
+case rvc_zero:
+underflow:
+  return 100;
+
+case rvc_inf:
+case rvc_nan:
+overflow:
+  *fail = 1;
+
+  if (r->sign)
+ return 200;
+  else
+ return 300;
+
+case rvc_normal:
+  if (r->decimal)
+ return 400;
+
+  exp = ((int)((r)->uexp ^ (unsigned int)(1 << ((32 - 6) - 1))) - (1 << 
((32 - 6) - 1)));
+  if (exp <= 0)
+ goto underflow;
+
+
+  if (exp > precision)
+ goto overflow;
+  words = (precision + 64 - 1) / 64;
+  w = words * 64;
+  for (int i = 0; i < words; i++)
+ {
+   int j = ((128 + (8 * 4)) / (8 * 4)) - (words * 2) + (i * 2);
+   if (j < 0)
+ val[i] = 0;
+   else
+ val[i] = r->sig[j];
+   j += 1;
+   if (j >= 0)
+ val[i] |= (unsigned long long) r->sig[j] << (8 * 4);
+ }
+
+
+  result = foo(val, words, w);
+
+  if (r->sign)
+ return -result;
+  else
+ return result;
+
+default:
+  exit(2);
+}
+}
+


Re: [PATCH] plugin event for C/C++ function definitions

2014-11-14 Thread Richard Biener
On Thu, Nov 13, 2014 at 6:42 PM, Andres Tiraboschi
 wrote:
> Hi, this patch adds a new plugin event PLUGIN_START_FUNCTION and 
> PLUGIN_FINISH_FUNCTION that are invoked at start_function and finish_function 
> respectively in the C and C++ frontends.
> PLUGIN_START_FUNCTION is called before parsing the function body.
> PLUGIN_FINISH_FUNCTION is called after parsing a function definition.

Can you name them more specifically, like
PLUGIN_START/FINISH_PARSE_FUNCTION please?

Thanks,
Richard.

> 2014-11-04 Andrés Tiraboschi 
>
> changelog:
>
>  gcc/c/c-decl.c: Invoke callbacks in start_function and finish_function.
>  gcc/cp/decl.c: Invoke callbacks in start_function and finish_function.
>
>  gcc/doc/plugins.texi: Add documentation about PLUGIN_START_FUNCTION and 
> PLUGIN_FINISH_FUNCTION
>  gcc/plugin.def: Add events for start_function and finish_function.
>  gcc/plugin.c (register_callback, invoke_plugin_callbacks): Same.
>
>  gcc/testsuite/g++.dg/plugin/def_plugin.c: New test plugin.
>  gcc/testsuite/g++.dg/plugin/def-plugin-test.C: Testcase for above plugin.
>  gcc/testsuite/g++.dg/plugin/plugin.exp
>
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index e23284a..b349a24 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -8073,6 +8073,7 @@ start_function (struct c_declspecs *declspecs, struct 
> c_declarator *declarator,
>
>decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, true, NULL,
>   &attributes, NULL, NULL, DEPRECATED_NORMAL);
> +  invoke_plugin_callbacks (PLUGIN_START_FUNCTION, decl1);
>
>/* If the declarator is not suitable for a function definition,
>   cause a syntax error.  */
> @@ -8886,6 +8887,7 @@ finish_function (void)
>   It's still in DECL_STRUCT_FUNCTION, and we'll restore it in
>   tree_rest_of_compilation.  */
>set_cfun (NULL);
> +  invoke_plugin_callbacks (PLUGIN_FINISH_FUNCTION, current_function_decl);
>current_function_decl = NULL;
>  }
>
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index d4adbeb..a8c6ebe 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -13631,6 +13631,7 @@ start_function (cp_decl_specifier_seq *declspecs,
>tree decl1;
>
>decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, &attrs);
> +  invoke_plugin_callbacks (PLUGIN_START_FUNCTION, decl1);
>if (decl1 == error_mark_node)
>  return false;
>/* If the declarator is not suitable for a function definition,
> @@ -14260,6 +14261,7 @@ finish_function (int flags)
>vec_free (deferred_mark_used_calls);
>  }
>
> +  invoke_plugin_callbacks (PLUGIN_FINISH_FUNCTION, fndecl);
>return fndecl;
>  }
>
> diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi
> index 4a839b8..b4a20e1 100644
> --- a/gcc/doc/plugins.texi
> +++ b/gcc/doc/plugins.texi
> @@ -174,6 +174,8 @@ Callbacks can be invoked at the following pre-determined 
> events:
>  @smallexample
>  enum plugin_event
>  @{
> +  PLUGIN_START_FUNCTION /* Called before parsing the body of a 
> function. */
> +  PLUGIN_FINISH_FUNCTION/* After finishing parsing a function. */
>PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager.  */
>PLUGIN_FINISH_TYPE,   /* After finishing parsing a type.  */
>PLUGIN_FINISH_DECL,   /* After finishing parsing a declaration. */
> diff --git a/gcc/plugin.c b/gcc/plugin.c
> index 8debc09..c048d93 100644
> --- a/gcc/plugin.c
> +++ b/gcc/plugin.c
> @@ -433,6 +433,8 @@ register_callback (const char *plugin_name,
> return;
>   }
>/* Fall through.  */
> +  case PLUGIN_START_FUNCTION:
> +  case PLUGIN_FINISH_FUNCTION:
>case PLUGIN_FINISH_TYPE:
>case PLUGIN_FINISH_DECL:
>case PLUGIN_START_UNIT:
> @@ -511,6 +513,8 @@ invoke_plugin_callbacks_full (int event, void *gcc_data)
> gcc_assert (event >= PLUGIN_EVENT_FIRST_DYNAMIC);
> gcc_assert (event < event_last);
>/* Fall through.  */
> +  case PLUGIN_START_FUNCTION:
> +  case PLUGIN_FINISH_FUNCTION:
>case PLUGIN_FINISH_TYPE:
>case PLUGIN_FINISH_DECL:
>case PLUGIN_START_UNIT:
> diff --git a/gcc/plugin.def b/gcc/plugin.def
> index df5d383..a4131ad 100644
> --- a/gcc/plugin.def
> +++ b/gcc/plugin.def
> @@ -17,6 +17,11 @@ You should have received a copy of the GNU General Public 
> License
>  along with GCC; see the file COPYING3.  If not see
>  .  */
>
> +/* Called before parsing the body of a function.  */
> +DEFEVENT (PLUGIN_START_FUNCTION)
> +
> +/* After finishing parsing a function definition. */
> +DEFEVENT (PLUGIN_FINISH_FUNCTION)
>
>  /* To hook into pass manager.  */
>  DEFEVENT (PLUGIN_PASS_MANAGER_SETUP)
> diff --git a/gcc/testsuite/g++.dg/plugin/def-plugin-test.C 
> b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
> new file mode 100644
> index 000..b7f2d3d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
> @@ -0,0 +1,13 @@
> +int global = 12;
> +
> +int function1(void);
>

Re: [PATCH 2/4] New data structure for cgraph_summary introduced.

2014-11-14 Thread Richard Biener
On Thu, Nov 13, 2014 at 4:50 PM, Jan Hubicka  wrote:
>> gcc/ChangeLog:
>>
>> 2014-11-12  Martin Liska  
>>
>>   * Makefile.in: New object file is added.
>>   * cgraph.h (symbol_table::allocate_cgraph_symbol): Summary UID
>>   is filled up.
>>   * cgraph_summary.c: New file.
>>   * cgraph_summary.h: New file.
>
> Since I am trying to get rid of the cgraph prefixes for symbols (keep it for
> the graph only) and the summaries can be annotated to variables too. Even if 
> it
> not necessarily supported by your current implementation, lets keep API
> prepared for it. So I would call it symtab-summary.* for source files and
> symtab_summary for base type  (probably function_summary for annotating
> functions/cgraph_edge_summary for annotating edges?)

Also please don't use underscores in filenames but dashes, thus
cgraph-summary.[ch], not cgraph_summary.[ch].

Richard.


>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>> index e2becb9..588b6d5 100644
>> --- a/gcc/cgraph.h
>> +++ b/gcc/cgraph.h
>> @@ -1225,6 +1225,8 @@ public:
>>int count_materialization_scale;
>>/* Unique id of the node.  */
>>int uid;
>> +  /* Summary unique id of the node.  */
>> +  int summary_uid;
>
> What makes summary_uid better than uid?
>
>> diff --git a/gcc/cgraph_summary.c b/gcc/cgraph_summary.c
>> new file mode 100644
>> index 000..9af1d7e
>> --- /dev/null
>> +++ b/gcc/cgraph_summary.c
>
> And why do we need this file?  It will need license header if really needed.
>
> The implementation seems sane - I will check the actual uses :)
> Please send the updated patch though.
>
> Honza


Re: PATCH: PR bootstrap/63784: [5 Regression] profiledbootstrap failure with bootstrap-lto

2014-11-14 Thread Richard Biener
On Fri, Nov 14, 2014 at 12:15 AM, H.J. Lu  wrote:
> On Tue, Nov 11, 2014 at 8:02 AM, H.J. Lu  wrote:
>> On Mon, Nov 10, 2014 at 11:42 AM, H.J. Lu  wrote:
>>> On Mon, Nov 10, 2014 at 5:44 AM, Richard Biener
>>>  wrote:
 On Mon, Nov 10, 2014 at 2:43 PM, Jakub Jelinek  wrote:
> On Mon, Nov 10, 2014 at 05:32:32AM -0800, H.J. Lu wrote:
>> On Mon, Nov 10, 2014 at 4:05 AM, Jakub Jelinek  wrote:
>> > On Mon, Nov 10, 2014 at 12:50:44PM +0100, Richard Biener wrote:
>> >> On Sun, Nov 9, 2014 at 5:46 PM, H.J. Lu  wrote:
>> >> > Hi,
>> >> >
>> >> > r216964 disables bootstrap for libcc1 which exposed 2 things:
>> >> >
>> >> > 1. libcc1 isn't compiled with LTO even when GCC is configured with
>> >> > "--with-build-config=bootstrap-lto".  It may be intentional since
>> >> > libcc1 is disabled for bootstrap.
>> >> > 2. -fPIC isn't used to created libcc1.so, which is OK if libcc1 is
>> >> > compiled with LTO which remembers PIC option.
>> >>
>> >> Why is this any special to LTO?  If it is then it looks like a LTO
>> >> (driver) issue to me?  Why are we linking the pic libibterty into
>> >> a non-pic libcc1?
>> >
>> > I admit I haven't tried LTO bootstrap, but from normal bootstrap logs,
>> > libcc1 is built normally using libtool using -fPIC only, and linked 
>> > into
>> > libcc1.so.0.0.0 and libcc1plugin.so.0.0.0, and of course against the
>> > pic/libiberty.a, because we need PIC code in the shared libraries.
>> > So, I don't understand the change at all.
>> >
>> > Jakub
>>
>> This is the command line to build libcc1.la:
>
> Sure, but there was -fPIC used to compile all the *.o files that are being
> linked into libcc1.so, so LTO should know that.

 And it does.  If not please file a bug with a smaller testcase than libcc1
 and libiberty.

>>>
>>> There is nothing wrong with linker.  It is a slm-lto bug in libtool.  I 
>>> uploaded
>>> a testcase at
>>>
>>> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33931
>>>
>>
>> My patch is a backport of libtool LTO support:
>>
>> commit b81fd4ef009c24a86a7e64727ea09efb410ea149
>> Author: Ralf Wildenhues 
>> Date:   Sun Aug 29 17:31:29 2010 +0200
>>
>> Support GCC LTO on GNU/Linux.
>>
>> * libltdl/config/ltmain.m4sh (func_mode_link): Allow through
>> flags matching -O*, -flto*, -fwhopr, -fuse-linker-plugin.
>> * libltdl/m4/libtool.m4 (_LT_CMD_GLOBAL_SYMBOLS): Drop symbols
>> starting with __gnu_lto.
>> (_LT_LINKER_SHLIBS) [linux] :
>> Add $pic_flag for GCC.
>> (_LT_LANG_CXX_CONFIG) [linux] :
>> Likewise.
>> (_LT_SYS_HIDDEN_LIBDEPS): Ignore files matching *.lto.o.
>> * NEWS: Update.
>>
>> Signed-off-by: Ralf Wildenhues 
>>
>> OK to install?
>>
>
> Ping.
>
> Stage 1 will be closed tomorrow.  I'd like to restore LTO bootstrap.

Bugfixing is still possible after that date.  I suppose you don't call
LTO bootstrap a new feature ;)

Richard.

>
> --
> H.J.


"openacc" Bugzilla keyword (was: [patch] OpenACC fortran front end)

2014-11-14 Thread Thomas Schwinge
Hi!

On Fri, 14 Nov 2014 10:21:47 +0100, I wrote:
>  created.

Committed to gomp-4_0-branch in r217549:

commit 4a4f1ca48781ac08d2d51ba1a08210d8c8ca7528
Author: tschwinge 
Date:   Fri Nov 14 10:19:53 2014 +

libgomp documentation, Reporting Bug: Mention the "openacc" Bugzilla 
keyword.

libgomp/
* libgomp.texi (Reporting Bugs): Mention the "openacc" Bugzilla
keyword.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217549 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 5 +
 libgomp/libgomp.texi   | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index a5a58a0..492393b 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-11-14  Thomas Schwinge  
+
+   * libgomp.texi (Reporting Bugs): Mention the "openacc" Bugzilla
+   keyword.
+
 2014-11-13  Thomas Schwinge  
 
* testsuite/libgomp.oacc-c-c++-common/context-2.c: Fix data
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 26c65a6..6c2673b 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -2686,8 +2686,9 @@ becomes
 @chapter Reporting Bugs
 
 Bugs in the GNU OpenACC or OpenMP implementation should be reported via
-@uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  For OpenMP cases, please add
-"openmp" to the keywords field in the bug report.
+@uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  As appropriate, please
+add "openacc", or "openmp", or both to the keywords field in the bug
+report.
 
 
 


Grüße,
 Thomas


pgpBd1QzTGJdS.pgp
Description: PGP signature


Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.

2014-11-14 Thread Richard Henderson
On 11/14/2014 09:44 AM, Iain Sandoe wrote:
> or do you think that a scheme to lock variable-sized chunks cannot work?

It certainly can't work while

> +void
> +libat_lock_1 (void *ptr)
> +{
> +  LockLock (&locks[addr_hash (ptr, 1)]);
> +}

doesn't have the true size.


r~


Re: bitmap fix for current

2014-11-14 Thread Richard Biener
On Fri, Nov 14, 2014 at 2:10 AM, Jeff Law  wrote:
> On 11/13/14 12:37, Mike Stump wrote:
>>
>> I was doing a merge, and it failed to even compile the runtime
>> libraries due to checking in bitmap.  bitmap goes to remove set bits
>> from the bitmap (the second hunk in a two hunk set), and it fails to
>> update the current pointer.  That memory is freed and then
>> reallocated and a new index is put into it, and then we fail a
>> consistency check later on due to the mismatch between head->index
>> and head->current->indx, because current was not properly maintained.
>> This patch removes the old value of current when we remove what it
>> points to from the bitmap.
>
> Was the calling code iterating through the bit with a form like
>
> EXECUTE_IF_SET_IN_BITMAP (something, 0, i, bi)
>  {
>bitmap_clear_bit (something, i)
>[ ... whatever code we want to process i, ... ]
>  }
>
> If so, that's the real issue and we'd really like to identify & fix any code
> that has that kind of structure.
>
> See:
>
> https://gcc.gnu.org/ml/gcc/2009-06/msg00482.html

Indeed.  I can't see how this can have triggered:

  prev = elt->prev;
  if (prev)
{
  prev->next = NULL;
  if (head->current->indx > prev->indx)
{
  head->current = prev;
  head->indx = prev->indx;

so if there was elt->prev then if current == elt current->indx should
better be > prev->indx.

Sth else must be wrong (and I doubt it's the above bogus use of
bitmaps).

Richard.

> jeff


Re: [PATCH] Add a way to mark regions of code which assume that the GC won't run

2014-11-14 Thread Richard Biener
On Fri, Nov 14, 2014 at 2:19 AM, Jeff Law  wrote:
> On 11/13/14 08:28, David Malcolm wrote:
>
 It was pointed out to me on IRC that I could instead use RAII for this,
 so here's an alternative version of the patch that puts it in a class,
 so that you can put:

 auto_assert_no_gc no_gc_here;

 into a scope to get the assertion failure if someone uses ggc_collect
 somewhere inside.
>>>
>>>
>>> I think rather than "assert-no-gc-here" its name should reflect that
>>> the caller wants to protect a region from GC (just as if we had
>>> a thread-safe collector).  Thus better name it 'protect_gc'?
>>> I'd add explicit protect_gc () / unprotect_gc () calls to the RAII
>>> interface as well - see TODO_do_not_ggc_collect which is probably
>>> hard to reflect with RAII (the TODO prevents a collection between
>>> the current and the next pass).
>>
>>
>> Thanks.
>>
>> Here's an updated patch that adds protect_gc / unprotect_gc inline fns
>> to ggc.h, and renames the RAII class to "auto_protect_gc", calling them.
>
> RAII is good.
>
>
>>
>> My original intention here was an assertion i.e. something that merely
>> adds checking to a non-release build, which is what the patch does - I
>> use it to mark a routine in the JIT that is written with the assumption
>> that nothing in it can lead to a gcc_collect call (and for which
>> currently it can't).
>>
>> However, "protect" to me suggests that this could instead affect the
>> behavior of ggc_collect, making it immediately return, rather than
>> merely being an assertion that it wasn't called.
>>
>> That approach would make ggc_collect safe in such a region, rather than
>> the attached patch's approach of making it be an assertion failure
>> (though either approach is better than the status quo of having
>> unpredictable heap corruption if somehow there is a ggc_collect call in
>> such a region).
>>
>> Is this OK for trunk as-is (assuming usual testing), or would you prefer
>> the "ggc_collect bails out if we're in a protected region" behavior?
>> (in which case the ENABLE_CHECKING bits of it needs to go away - we
>> don't want differences between a release vs checked build, especially in
>> GC, right?).
>
> I'd tend to want an assert so that any such code could be identified rather
> than allowing folks to be lazy.

Well - we do have that scary TODO_do_not_ggc_collect.  It would be
nice to be able to call protect_gc () from ira.c and unprotect_gc ()
from the reload/lra pass and get rid of that TODO.

And yes, a ggc_collect () should be a no-op inside such region (maybe
set a flag, do_ggc_collect_at_unprotect_gc and do what it suggests?)

[un]protect_gc () should also nest properly.

> As for whether or not to change behaviour of the GC system in release vs
> checked builds -- I don't think it's a big deal, at least not in this case.
> If folks think it's a big deal, then just remove the ENABLE_CHECKING bits --
> they're so little overhead compared to the actual GC system that I wouldn't
> worry about them at all.
>
> I think the patch is fine as-is.

So yes, the patchis fine as-is but I'd like to see the above done - removal
of the TODO and making the thing really protect stuff.  Incrementally
I'd like to identify passes that are safe GC-wise, make the collector
thread-safe and push collection to a thread.

Thanks,
Richard.

> jeff


Re: OpenACC middle end changes

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 11:28:15AM +0100, Richard Biener wrote:
> > This patch is based on the last merge of trunk into gomp-4_0-branch,
> > 9be82689 (trunk r216846, 2014-10-29), and still includes an old version
> > of the offloading patches, as currently present on gomp-4_0-branch.
> > We're already working on rebasing onto the set of offloading patches that
> > has just been committed to trunk, but I didn't want to have this delay
> > any further (it seems, the rebase/merge is not always trivial) the


> >   * ChangeLog snippets still need to be written.
> 
> Badly needed - I wonder why you need changes to LTO files at all.

I think he doesn't, but the LTO changes that were committed to trunk by
Intel haven't been integrated yet into the branch AFAIK; at least
I've skipped all those bits I expect to be in already.  See above
Thomas' comment.

Jakub


[RFC: AArch64] Parametrically set defaults for function and jump alignment

2014-11-14 Thread James Greenhalgh

Hi,

We currently do not set any interesting default values for jump and function
alignment in AArch64. I've made the formula for these values derive from
the issue rate of the processor as so:

  jumps: 4 * processor issue-rate (rounded down to nearest power of two)
  functions: 4 * processor issue-rate (rounded up to nearest power of two)

This is sensible for the ARMv8-a implementations I tested on. An
alternative patch would make these values new fields in the tuning
tables.

This happens to work well for some benchmarks and doesn't harm others.
The benefit swings depending on the existing alignment and the knock-on
effects.

Bootstrapped on aarch64-none-linux-gnu with no issues.

Does anyone have any thoughts or preferences as to how we set these
values in future? If not, OK For trunk?

Thanks,
James

---
2014-11-14  James Greenhalgh  

* config/aarch64/aarch64.c (aarch64_override_options): Set default
alignments for functions and jumps.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d4a8a2f..6b51885 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6493,6 +6493,32 @@ aarch64_override_options (void)
 #endif
 }
 
+  /* If we haven't been asked for any particular alignment for loops, jumps
+ and functions, choose defaults for the user.  We pick an alignment
+ which is word-size * issue-rate, rounded up to the nearest power of
+ two up for functions and down to the nearest power of two for
+ jumps.  */
+  if (!optimize_size)
+{
+  if (align_jumps <= 0)
+	{
+	  /* Ideally, we want to be aligned to a block at least as large
+	 as the issue width of the processor, but too much padding
+	 risks wasting cache space.  Settle for the nearest power
+	 of below what we wanted.  */
+	  align_jumps = aarch64_tune_params->issue_rate * 4;
+	  align_jumps = 1 << floor_log2 (align_jumps);
+	}
+  if (align_functions <= 0)
+	{
+	  /* We want to be aligned to a block at least as large as the issue
+	 width of the processor.  */
+	  align_functions = aarch64_tune_params->issue_rate * 4;
+	  /* Round up to the nearest power of two.  */
+	  align_functions = 1 << ceil_log2 (align_functions);
+	}
+}
+
   aarch64_override_options_after_change ();
 }
 

Re: [PATCH 5/5] use vec in lto_tree_ref_table

2014-11-14 Thread Richard Biener
On Thu, Nov 13, 2014 at 6:55 AM,   wrote:
> From: Trevor Saunders 
>
> Hi,
>
> gengtype fails to create valid user marking functions for this type, which is
> fixed by using vec here (which seems cleaner anyway).
>
> bootstrapped + regtested powerpc64-linux (gcc 110 since gcc20 died) ok?

Ok.

Thanks,
Richard.

> Trev
>
>
> gcc/ChangeLog:
>
> 2014-11-13  Trevor Saunders  
>
> * lto-section-in.c (lto_delete_in_decl_state): Adjust.
> (lto_free_function_in_decl_state): Likewise.
> * lto-streamer-out.c (copy_function_or_variable): Likewise.
> * lto-streamer.h (lto_file_decl_data_get_ ## name): Likewise.
> (lto_file_decl_data_num_ ## name ## s): Likewise.
> (struct lto_tree_ref_table): Remove.
> (struct lto_in_decl_state): Replace lto_tree_ref_table with vec.
>
> gcc/lto/ChangeLog:
>
> 2014-11-13  Trevor Saunders  
>
> * lto.c (lto_read_in_decl_state): Adjust.
> (lto_fixup_state): Likewise.
>
>
> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> index 042dd99..75f394d 100644
> --- a/gcc/lto-section-in.c
> +++ b/gcc/lto-section-in.c
> @@ -379,8 +379,7 @@ lto_delete_in_decl_state (struct lto_in_decl_state *state)
>int i;
>
>for (i = 0; i < LTO_N_DECL_STREAMS; i++)
> -if (state->streams[i].trees)
> -  ggc_free (state->streams[i].trees);
> +vec_free (state->streams[i]);
>ggc_free (state);
>  }
>
> @@ -429,7 +428,7 @@ lto_free_function_in_decl_state (struct lto_in_decl_state 
> *state)
>  {
>int i;
>for (i = 0; i < LTO_N_DECL_STREAMS; i++)
> -ggc_free (state->streams[i].trees);
> +vec_free (state->streams[i]);
>ggc_free (state);
>  }
>
> diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
> index dc406da..bc18a9c 100644
> --- a/gcc/lto-streamer-out.c
> +++ b/gcc/lto-streamer-out.c
> @@ -2186,8 +2186,8 @@ copy_function_or_variable (struct symtab_node *node)
>
>for (i = 0; i < LTO_N_DECL_STREAMS; i++)
>  {
> -  size_t n = in_state->streams[i].size;
> -  tree *trees = in_state->streams[i].trees;
> +  size_t n = vec_safe_length (in_state->streams[i]);
> +  vec *trees = in_state->streams[i];
>struct lto_tree_ref_encoder *encoder = &(out_state->streams[i]);
>
>/* The out state must have the same indices and the in state.
> @@ -2196,7 +2196,7 @@ copy_function_or_variable (struct symtab_node *node)
>gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
>encoder->trees.reserve_exact (n);
>for (j = 0; j < n; j++)
> -   encoder->trees.safe_push (trees[j]);
> +   encoder->trees.safe_push ((*trees)[j]);
>  }
>
>lto_free_section_data (file_data, LTO_section_function_body, name,
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 4b875a2..9d6d7a0 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -274,15 +274,14 @@ lto_file_decl_data_get_ ## name (struct 
> lto_file_decl_data *data, \
>  unsigned int idx) \
>  { \
>struct lto_in_decl_state *state = data->current_decl_state; \
> -  gcc_assert (idx < state->streams[LTO_DECL_STREAM_## UPPER_NAME].size); \
> -  return state->streams[LTO_DECL_STREAM_## UPPER_NAME].trees[idx]; \
> +   return (*state->streams[LTO_DECL_STREAM_## UPPER_NAME])[idx]; \
>  } \
>  \
>  static inline unsigned int \
>  lto_file_decl_data_num_ ## name ## s (struct lto_file_decl_data *data) \
>  { \
>struct lto_in_decl_state *state = data->current_decl_state; \
> -  return state->streams[LTO_DECL_STREAM_## UPPER_NAME].size; \
> +  return vec_safe_length (state->streams[LTO_DECL_STREAM_## UPPER_NAME]); \
>  }
>
>
> @@ -420,18 +419,6 @@ struct lto_symtab_encoder_iterator
>
>
>
> -
> -/* Mapping from indices to trees.  */
> -struct GTY(()) lto_tree_ref_table
> -{
> -  /* Array of referenced trees . */
> -  tree * GTY((length ("%h.size"))) trees;
> -
> -  /* Size of array. */
> -  unsigned int size;
> -};
> -
> -
>  /* The lto_tree_ref_encoder struct is used to encode trees into indices. */
>
>  struct lto_tree_ref_encoder
> @@ -445,7 +432,7 @@ struct lto_tree_ref_encoder
>  struct GTY(()) lto_in_decl_state
>  {
>/* Array of lto_in_decl_buffers to store type and decls streams. */
> -  struct lto_tree_ref_table streams[LTO_N_DECL_STREAMS];
> +  vec *streams[LTO_N_DECL_STREAMS];
>
>/* If this in-decl state is associated with a function. FN_DECL
>   point to the FUNCTION_DECL. */
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index d8519d9..cdd2331 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -260,13 +260,15 @@ lto_read_in_decl_state (struct data_in *data_in, const 
> uint32_t *data,
>for (i = 0; i < LTO_N_DECL_STREAMS; i++)
>  {
>uint32_t size = *data++;
> -  tree *decls = ggc_vec_alloc (size);
> +  vec *decls = NULL;
> +  vec_alloc (decls, size);
>
>for (j = 0; j < size; j++)
> -   decls[j] = streamer_tree_cache_get_tree (data_in->reader_cache, 
> data[j]);
> +   vec_safe_push 

Re: libsanitizer merge from upstream r221802

2014-11-14 Thread Christophe Lyon
On 13 November 2014 21:44, Konstantin Serebryany
 wrote:
> On Thu, Nov 13, 2014 at 1:16 AM, Jakub Jelinek  wrote:
>> On Wed, Nov 12, 2014 at 05:35:48PM -0800, Konstantin Serebryany wrote:
>>> Here is one more merge of libsanitizer (last one was in Sept).
>>>
>>> Tested on x86_64 Ubuntu 14.04 like this:
>>> rm -rf */{*/,}libsanitizer && make -j 50
>>> make -j 40 -C gcc check-g{cc,++}
>>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp' && \
>>> make -j 40 -C gcc check-g{cc,++}
>>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} tsan.exp' && \
>>> make -j 40 -C gcc check
>>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp' && \
>>> echo PASS
>>>
>>> Expected ChangeLog entry:
>>>
>>> 2014-11-12  Kostya Serebryany  
>>>
>>> * All source files: Merge from upstream r221802.
>>> * sanitizer_common/sanitizer_symbolizer_libbacktrace.cc
>>>   (LibbacktraceSymbolizer::SymbolizeData): replace 'address'
>>>   with 'start' to follow the new interface.
>>
>> Capital R in Replace.  All lines are indented by single tab, not tab
>> and two spaces.
>>
>>> * asan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>
>> Capital A in Added.  Also, I wonder if we shouldn't use -std=gnu++11
>> instead.  As the sources are compiled by newly built compiler, it should be
>> generally fine to use extensions in there.
>
> in llvm we use -std=c++11, so I use it here for consistency.
>
>>
>>> * interception/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>> * libbacktrace/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>> * lsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>> * sanitizer_common/Makefile.am (sanitizer_common_files): Added new
>>>   files.
>>>   (AM_CXXFLAGS): added -std=c++11.
>>> * tsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>> * ubsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>
>> Ditto.
>>
>>> * asan/Makefile.in: Regenerate.
>>> * interception/Makefile.in: Regenerate.
>>> * libbacktrace/Makefile.in: Regenerate.
>>> * lsan/Makefile.in: Regenerate.
>>> * sanitizer_common/Makefile.in: Regenerate.
>>> * tsan/Makefile.in: Regenerate.
>>> * ubsan/Makefile.in: Regenerate.
>>
>> Other than that, it looks good to me, I've bootstrapped/regtested
>> it on x86_64-linux and i686-linux too.  So, with those changes ok for trunk
>> (how do you decide about c++11 vs. gnu++11 I'll leave to you).
>
> Fixed all, committed. r217518.
>

Hmm
So as already reported on the llvm lists, this has the side effect of
breaking the build for aarch64 when using "old" kernel headers.
I wish the discussion at
http://reviews.llvm.org/D6026
had converged before merging incorrect things into GCC.

>
>>
>> A few questions regarding possible changes on the compiler side:
>> 1) is __asan_poison_intra_object_redzone/__asan_unpoison_intra_object_redzone
>>just for the ABI incompatible putting of red zones in between fields
>>in structures?  How do you handle whole struct copying in that case?
>
> This is all highly experimental:
> https://code.google.com/p/address-sanitizer/wiki/IntraObjectOverflow
> Currently we apply this instrumentation only to C++ classes that are
>   a) non-standard-layout, i.e. we are allowed by the standard to
> reshuffle the fields and add paddings.
>   b) have a DTOR, where we can do the unpoison.
> Even with this strict limitation we hit lots of failures where users
> make assumptions about the layout or size of non-standard-layout
> types.
> We do find juicy bugs in this mode so we'll likely continue the
> investigation and try to reduce the current limitations.
>
>>Could it be done without changing ABI for a subset of structs
>>which have natural padding in them?
> Quite likely. But we will need to figure out where to unpoison the paddings.
>
>> 2) regarding the tsan memory layout changes, is it now possible to support
>>non-pie binaries?  If yes, we should probably remove the:
>> %{!pie:%{!shared:%e-fsanitize=thread linking must be done with -pie or 
>> -shared}}}\
>>and add testcases that would test that.
>
> Yes, that was one of the reasons for the change.
> But let's hear from Dmitry if he is ready to remove -pie now or wants
> to do some more testing.
>
> --kcc
>
>>
>> Jakub


Re: [RFC: AArch64] Parametrically set defaults for function and jump alignment

2014-11-14 Thread Andrew Pinski
On Fri, Nov 14, 2014 at 2:35 AM, James Greenhalgh
 wrote:
>
> Hi,
>
> We currently do not set any interesting default values for jump and function
> alignment in AArch64. I've made the formula for these values derive from
> the issue rate of the processor as so:
>
>   jumps: 4 * processor issue-rate (rounded down to nearest power of two)
>   functions: 4 * processor issue-rate (rounded up to nearest power of two)
>
> This is sensible for the ARMv8-a implementations I tested on. An
> alternative patch would make these values new fields in the tuning
> tables.

I had submitted an alternative patch a few hours ago which allows the
tuning structure say what alignment is wanted for all three:
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01615.html

>
> This happens to work well for some benchmarks and doesn't harm others.
> The benefit swings depending on the existing alignment and the knock-on
> effects.
>
> Bootstrapped on aarch64-none-linux-gnu with no issues.
>
> Does anyone have any thoughts or preferences as to how we set these
> values in future? If not, OK For trunk?

Also I notice you don't align loops, was that an oversight or just you
did not think it was needed?

Thanks,
Andrew


>
> Thanks,
> James
>
> ---
> 2014-11-14  James Greenhalgh  
>
> * config/aarch64/aarch64.c (aarch64_override_options): Set default
> alignments for functions and jumps.


[PATCH 0/3][AArch64]More intrinsics/builtins improvements

2014-11-14 Thread Alan Lawrence
These three are logically independent, but all on a common theme, and I've 
tested them all together by


bootstrapped + check-gcc on aarch64-none-elf
cross-tested check-gcc on aarch64_be-none-elf

Ok for trunk?



Re: Follow-up to PR51471

2014-11-14 Thread Eric Botcazou
> I wonder how many other problems of this nature are lurking in reorg.c.
> For example steal_delay_list_from_{target,fallthrough} or the code
> which searches for arithmetic at the branch target, and puts the
> opposite insn in a delay slot.

Right, and the latter has already been dealt with by Richard:

2012-02-11  Richard Sandiford  

PR rtl-optimization/52175
* reorg.c (fill_slots_from_thread): Don't apply add/sub optimization
to frame-related instructions.

> In fact, I really wonder if we should be allowing anything frame related
> outside fill_simple_delay_slots.

That could well be the end result after a few more years of tweaking. :-)

-- 
Eric Botcazou


[PATCH 1/3][AArch64]Replace __builtin_aarch64_createv1df with a cast, cleanup

2014-11-14 Thread Alan Lawrence
Now that float64x1_t is a vector, casting to it from a unit64_t causes the bit 
pattern to be reinterpreted, just as vcreate_f64 should. (Previously when 
float64x1_t was still a scalar, casting caused a conversion.) Hence, replace the 
__builtin with a cast. None of the other variants of the aarch64_create pattern 
were used, so remove it, and associated guff.


Also have to inhibit optimization of some testcases, as the midend can see 
through casts, whereas it couldn't see builtins ;).


The affected intrinsics are all covered by tests gcc.target/aarch64/vrnd_f64_1, 
vreinterpret_f64_1.c, vget_high_1.c.


gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (TYPES_CREATE): Remove.
* config/aarch64/aarch64-simd-builtins.def (create): Remove.
* config/aarch64/aarch64-simd.md (aarch64_create): Remove.
* config/aarch64/arm_neon.h (vcreate_f64, vreinterpret_f64_s64,
vreinterpret_f64_u64): Replace __builtin_aarch64_createv1df with C 
casts.
* config/aarch64/iterators.md (VD1): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vfma_f64.c: Add asm volatile memory.
* gcc.target/aarch64/simd/vfms_f64.c: Likewise.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 527445c5c7788bc37f41d9c3428f59a18410a93a..c130f80b869304087205e21aa644d76c06749309 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -136,7 +136,6 @@ static enum aarch64_type_qualifiers
 aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned };
 #define TYPES_UNOPU (aarch64_types_unopu_qualifiers)
-#define TYPES_CREATE (aarch64_types_unop_qualifiers)
 static enum aarch64_type_qualifiers
 aarch64_types_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_maybe_immediate };
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 62b7f3357d12f2a4a483588e3ccf027c3f957c20..8cdb9609520a227f33008efa9201d7771e241755 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -39,7 +39,6 @@
1-9 - CODE_FOR_<1-9>
10 - CODE_FOR_.  */
 
-  BUILTIN_VD1 (CREATE, create, 0)
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
   BUILTIN_VDQF (UNOP, sqrt, 2)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ef196e4b6fb39c0d2fd9ebfee76abab8369b1e92..00b59d3a352325e77632daa9723f3df4850cf922 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2315,16 +2315,6 @@
 
 ;; Patterns for AArch64 SIMD Intrinsics.
 
-(define_expand "aarch64_create"
-  [(match_operand:VD1 0 "register_operand" "")
-   (match_operand:DI 1 "general_operand" "")]
-  "TARGET_SIMD"
-{
-  rtx src = gen_lowpart (mode, operands[1]);
-  emit_move_insn (operands[0], src);
-  DONE;
-})
-
 ;; Lane extraction with sign extension to general purpose register.
 (define_insn "*aarch64_get_lane_extend"
   [(set (match_operand:GPI 0 "register_operand" "=r")
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 0ec1a24a52d81a6f8a2d45c0a931e771972d5eef..4a0d718642f8a3cb56281a70435b1b6445ee35be 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2662,7 +2662,7 @@ vcreate_u64 (uint64_t __a)
 __extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
 vcreate_f64 (uint64_t __a)
 {
-  return __builtin_aarch64_createv1df (__a);
+  return (float64x1_t) __a;
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
@@ -3262,7 +3262,7 @@ vreinterpret_f64_s32 (int32x2_t __a)
 __extension__ static __inline float64x1_t __attribute__((__always_inline__))
 vreinterpret_f64_s64 (int64x1_t __a)
 {
-  return __builtin_aarch64_createv1df ((uint64_t) vget_lane_s64 (__a, 0));
+  return (float64x1_t) __a;
 }
 
 __extension__ static __inline float64x1_t __attribute__((__always_inline__))
@@ -3286,7 +3286,7 @@ vreinterpret_f64_u32 (uint32x2_t __a)
 __extension__ static __inline float64x1_t __attribute__((__always_inline__))
 vreinterpret_f64_u64 (uint64x1_t __a)
 {
-  return __builtin_aarch64_createv1df (vget_lane_u64 (__a, 0));
+  return (float64x1_t) __a;
 }
 
 __extension__ static __inline float64x2_t __attribute__((__always_inline__))
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 74c71fcc8047f221f28cedaba8fca80995576cc7..c5abc3af79405fa4cd5ab2fd6f9e756b5907a3ae 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -147,9 +147,6 @@
 ;; Double vector modes for combines.
 (define_mode_iterator VDIC [V8QI V4HI V2SI])
 
-;; Double vector modes inc V1DF
-(define_mode_iterator VD1 [V8QI V4HI V2SI V2SF V1DF])
-
 ;; Vector modes except double int.
 (define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
 
dif

Re: [PATCH][sched-deps] Remove needless check for modified_in_p when trying to fuse two non-conditional jump insns

2014-11-14 Thread Kyrill Tkachov


On 14/11/14 05:17, Jeff Law wrote:

On 11/13/14 07:09, Kyrill Tkachov wrote:


I've updated the documentation for the hook.
The testcase I was looking at involves fusing the AArch64 adrp+add
instructions and depends on the backend implementation of the matching
code, under review currently at
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01263.html.

I'm attaching a reduced testcase that generates an adrp and an add and
due to the restriction addressed in this patch it doesn't call the
backend hook for a pair of adrp and add instructions, causing them to be
scheduled apart.
I don't think it's a good candidate for the testsuite though because
it's not easy to scan for consequent assembly instructions from Dejagnu
and the instruction pair may end up scheduled together for some tuning
parameters even though the macro fusion hook does not trigger for them
as it should.

It's painful, but certainly possible to check for consecutive assembly
instructions.  You just have to match the first instruction, its
operands, a newline, then the 2nd instruction & operands.  The
difficulty is in getting all the escape sequences right!

There are some examples of this.  For example mips/umips-branch-1.c

/* { dg-final { scan-assembler "\tjr?\t\\\$31\n\tmove\t\\\$2,\\\$0" } } */


Which looks for a jr instruction, some operands, then a move instruction
on the next line.

As for instability, that's an inherent problem in some of this kind of
stuff.  Just run it for the target you care about, possibly giving
explicit tuning parameters that are known to make it trigger.

OK with a testcase.

Hi Jeff,

I did manage to integrate it (hopefully it doesn't become fragile).
I'll commit the attached patch after the prerequisite aarch64 fusion 
patch goes in.


Thanks again,
Kyrill


2014-11-14  Kyrylo Tkachov  

* sched-deps.c (sched_macro_fuse_insns): Do not check modified_in_p
in the not conditional jump case.
* doc/tm.texi (TARGET_SCHED_MACRO_FUSION_PAIR_P): Update description.
* target.def (TARGET_SCHED_MACRO_FUSION_PAIR_P): Update description.

2014-11-14  Kyrylo Tkachov  

* gcc.target/aarch64/fuse_adrp_add_1.c: New test.



jeff


commit 399f71dca4f7c3d678b8f986319841b7da0ce86f
Author: Kyrylo Tkachov 
Date:   Thu Nov 6 12:05:03 2014 +

[sched-deps] remove overly strict check on macro fusion

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8d137f5..762 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6484,11 +6484,13 @@ cycle.  These other insns can then be taken into account properly.
 This hook is used to check whether target platform supports macro fusion.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_PAIR_P (rtx_insn *@var{condgen}, rtx_insn *@var{condjmp})
-This hook is used to check whether two insns could be macro fused for
-target microarchitecture. If this hook returns true for the given insn pair
-(@var{condgen} and @var{condjmp}), scheduler will put them into a sched
-group, and they will not be scheduled apart.
+@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_PAIR_P (rtx_insn *@var{prev}, rtx_insn *@var{curr})
+This hook is used to check whether two insns should be macro fused for
+a target microarchitecture. If this hook returns true for the given insn pair
+(@var{prev} and @var{curr}), the scheduler will put them into a sched
+group, and they will not be scheduled apart.  The two insns will be either
+two SET insns or a compare and a conditional jump and this hook should
+validate any dependencies needed to fuse the two insns together.
 @end deftypefn
 
 @deftypefn {Target Hook} void TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK (rtx_insn *@var{head}, rtx_insn *@var{tail})
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index a4ea836..ee534b0 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -2877,8 +2877,7 @@ sched_macro_fuse_insns (rtx_insn *insn)
   prev = prev_nonnote_nondebug_insn (insn);
   if (!prev
   || !insn_set
-  || !single_set (prev)
-  || !modified_in_p (SET_DEST (insn_set), prev))
+  || !single_set (prev))
 return;
 
 }
diff --git a/gcc/target.def b/gcc/target.def
index c329b2a..0732a90 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1067,11 +1067,13 @@ DEFHOOK
 
 DEFHOOK
 (macro_fusion_pair_p,
- "This hook is used to check whether two insns could be macro fused for\n\
-target microarchitecture. If this hook returns true for the given insn pair\n\
-(@var{condgen} and @var{condjmp}), scheduler will put them into a sched\n\
-group, and they will not be scheduled apart.",
- bool, (rtx_insn *condgen, rtx_insn *condjmp), NULL)
+ "This hook is used to check whether two insns should be macro fused for\n\
+a target microarchitecture. If this hook returns true for the given insn pair\n\
+(@var{prev} and @var{curr}), the scheduler will put them into a sched\n\
+group, and they will not be scheduled apart.  The two insns will be either\n\
+two SET insns or a compare and a 

Re: [PATCH, trivial][AArch64] Fix mode iterator for *aarch64_simd_ld1r pattern

2014-11-14 Thread Marcus Shawcroft
On 13 November 2014 06:14, Yangfei (Felix)  wrote:
> Hi,
>
>   We find that the VALLDI mode iterator used in *aarch64_simd_ld1r 
> pattern is not appropriate.
>   The reason is that it's impossible to get a new operand of DImode by 
> vec_duplicating an operand of the same mode.
>   So this patch just excludes the DImode and uses VALL instead.
>   Reg-tested for aarch64-linux-gnu with QEMU.  OK for the trunk?

OK, can you back port it to 4.9?
Thanks
/Marcus


[PATCH 2/3][AArch64] Extend aarch64_simd_vec_set pattern, replace asm for vld1_lane

2014-11-14 Thread Alan Lawrence
The vld1_lane intrinsic is currently implemented using inline asm. This patch 
replaces that with a load and a straightforward use of vset_lane (this gives us 
correct bigendian lane-flipping in a simple manner).


Naively this would produce assembler along the lines of (for vld1_lane_u8):
ldrbw0, [x0]
ins v0.b[5], w0
Hence, the patch also extends the aarch64_simd_vec_set pattern, adding a variant 
that reads from a memory operand, producing the expected:

ld1 {v0.b}[5], [x0]
...and thus we'll also get that assembler from a programmer writing natively in 
GCC vector extensions and not using intrinsics :).


I've also added a testcase, as existing tests in aarch64 and advsimd-intrinsics 
seemed only to cover vld{2,3,4}_lane, not vld1_lane.


gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set): Add
variant reading from memory and assembling to ld1.

* config/aarch64/arm_neon.h (vld1_lane_f32, vld1_lane_f64, vld1_lane_p8,
vld1_lane_p16, vld1_lane_s8, vld1_lane_s16, vld1_lane_s32,
vld1_lane_s64, vld1_lane_u8, vld1_lane_u16, vld1_lane_u32,
vld1_lane_u64, vld1q_lane_f32, vld1q_lane_f64, vld1q_lane_p8,
vld1q_lane_p16, vld1q_lane_s8, vld1q_lane_s16, vld1q_lane_s32,
vld1q_lane_s64, vld1q_lane_u8, vld1q_lane_u16, vld1q_lane_u32,
vld1q_lane_u64): Replace asm with vset_lane and pointer dereference.

gcc/testsuite/ChangeLog:

gcc.target/aarch64/vld1_lane.c: New test.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 00b59d3a352325e77632daa9723f3df4850cf922..b77a4f831c44df9df8fac609216ee3c501e0e54a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -455,12 +455,12 @@
 )
 
 (define_insn "aarch64_simd_vec_set"
-  [(set (match_operand:VQ_S 0 "register_operand" "=w,w")
+  [(set (match_operand:VQ_S 0 "register_operand" "=w,w,w")
 (vec_merge:VQ_S
 	(vec_duplicate:VQ_S
-		(match_operand: 1 "register_operand" "r,w"))
-	(match_operand:VQ_S 3 "register_operand" "0,0")
-	(match_operand:SI 2 "immediate_operand" "i,i")))]
+		(match_operand: 1 "aarch64_simd_general_operand" "r,w,Utv"))
+	(match_operand:VQ_S 3 "register_operand" "0,0,0")
+	(match_operand:SI 2 "immediate_operand" "i,i,i")))]
   "TARGET_SIMD"
   {
int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
@@ -471,11 +471,13 @@
 	return "ins\\t%0.[%p2], %w1";
  case 1:
 	return "ins\\t%0.[%p2], %1.[0]";
+ case 2:
+return "ld1\\t{%0.}[%p2], %1";
  default:
 	gcc_unreachable ();
  }
   }
-  [(set_attr "type" "neon_from_gp, neon_ins")]
+  [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")]
 )
 
 (define_insn "aarch64_simd_lshr"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 4a0d718642f8a3cb56281a70435b1b6445ee35be..f036f7c0ba2733a822661027b815e7c3654db1bc 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -628,7 +628,7 @@ typedef struct poly16x8x4_t
 #define __aarch64_vdupq_laneq_u64(__a, __b) \
__aarch64_vdup_lane_any (u64, q, q, __a, __b)
 
-/* vset_lane internal macro.  */
+/* vset_lane and vld1_lane internal macro.  */
 
 #ifdef __AARCH64EB__
 /* For big-endian, GCC's vector indices are the opposite way around
@@ -6275,162 +6275,6 @@ vld1_dup_u64 (const uint64_t * a)
   return result;
 }
 
-#define vld1_lane_f32(a, b, c)  \
-  __extension__ \
-({  \
-   float32x2_t b_ = (b);\
-   const float32_t * a_ = (a);  \
-   float32x2_t result;  \
-   __asm__ ("ld1 {%0.s}[%1], %2"\
-: "=w"(result)  \
-: "i" (c), "Utv"(*a_), "0"(b_)  \
-: /* No clobbers */);   \
-   result;  \
- })
-
-#define vld1_lane_f64(a, b, c)  \
-  __extension__ \
-({  \
-   float64x1_t b_ = (b);\
-   const float64_t * a_ = (a);  \
-   float64x1_t result;  \
-   __asm__ ("ld1 {%0.d}[%1], %2"\
-: "=w"(result)  \
-: "i" (c), "Utv"(*a_), "0"(b_)  \
-: /* No clobbers */);

[PATCH 3/3][AArch64]Replace temporary assembler for vld1_dup

2014-11-14 Thread Alan Lawrence
This patch replaces the inline asm for vld1_dup intrinsics with a vdup_n_ and a 
load from the pointer. The existing *aarch64_simd_ld1r insn, combiner, 
etc., are quite capable of generating the expected single ld1r instruction from 
this. (I've verified by inspecting assembler output.)


gcc/ChangeLog:

* config/aarch64/arm_neon.h (vld1_dup_f32, vld1_dup_f64, vld1_dup_p8,
vld1_dup_p16, vld1_dup_s8, vld1_dup_s16, vld1_dup_s32, vld1_dup_s64,
vld1_dup_u8, vld1_dup_u16, vld1_dup_u32, vld1_dup_u64, vld1q_dup_f32,
vld1q_dup_f64, vld1q_dup_p8, vld1q_dup_p16, vld1q_dup_s8, vld1q_dup_s16,
vld1q_dup_s32, vld1q_dup_s64, vld1q_dup_u8, vld1q_dup_u16,
vld1q_dup_u32, vld1q_dup_u64): Replace inline asm with vdup_n_ and
pointer dereference.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index f036f7c0ba2733a822661027b815e7c3654db1bc..61a3bd3ab59c427522087f10ddd5679d6d46019d 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -6144,270 +6144,6 @@ vhsubq_u32 (uint32x4_t a, uint32x4_t b)
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vld1_dup_f32 (const float32_t * a)
-{
-  float32x2_t result;
-  __asm__ ("ld1r {%0.2s}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
-vld1_dup_f64 (const float64_t * a)
-{
-  float64x1_t result;
-  __asm__ ("ld1r {%0.1d}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vld1_dup_p8 (const poly8_t * a)
-{
-  poly8x8_t result;
-  __asm__ ("ld1r {%0.8b}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vld1_dup_p16 (const poly16_t * a)
-{
-  poly16x4_t result;
-  __asm__ ("ld1r {%0.4h}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vld1_dup_s8 (const int8_t * a)
-{
-  int8x8_t result;
-  __asm__ ("ld1r {%0.8b}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vld1_dup_s16 (const int16_t * a)
-{
-  int16x4_t result;
-  __asm__ ("ld1r {%0.4h}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vld1_dup_s32 (const int32_t * a)
-{
-  int32x2_t result;
-  __asm__ ("ld1r {%0.2s}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vld1_dup_s64 (const int64_t * a)
-{
-  int64x1_t result;
-  __asm__ ("ld1r {%0.1d}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vld1_dup_u8 (const uint8_t * a)
-{
-  uint8x8_t result;
-  __asm__ ("ld1r {%0.8b}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vld1_dup_u16 (const uint16_t * a)
-{
-  uint16x4_t result;
-  __asm__ ("ld1r {%0.4h}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vld1_dup_u32 (const uint32_t * a)
-{
-  uint32x2_t result;
-  __asm__ ("ld1r {%0.2s}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vld1_dup_u64 (const uint64_t * a)
-{
-  uint64x1_t result;
-  __asm__ ("ld1r {%0.1d}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vld1q_dup_f32 (const float32_t * a)
-{
-  float32x4_t result;
-  __asm__ ("ld1r {%0.4s}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vld1q_dup_f64 (const float64_t * a)
-{
-  float64x2_t result;
-  __asm__ ("ld1r {%0.2d}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vld1q_dup_p8 (const poly8_t * a)
-{
-  poly8x16_t result;
-  __asm__ ("ld1r {%0.16b}, %1"
-	   : "=w"(result)
-	   : "Utv"(*a)
-	   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_

Re: [PATCH 8/9] Negative numbers added for sreal class.

2014-11-14 Thread Richard Biener
On Thu, Nov 13, 2014 at 1:35 PM, mliska  wrote:
> gcc/ChangeLog:
>
> 2014-11-13  Martin Liska  
>
> * predict.c (propagate_freq): More elegant sreal API is used.
> (estimate_bb_frequencies): New static constants defined by sreal
> replace precomputed ones.
> * sreal.c (sreal::normalize): New function.
> (sreal::to_int): Likewise.
> (sreal::operator+): Likewise.
> (sreal::operator-): Likewise.
> * sreal.h: Definition of new functions added.

Please use gcc_checking_assert()s everywhere.  sreal is supposed
to be fast... (I see it has current uses of gcc_assert - you may want
to mass-convert them as a followup).

> ---
>  gcc/predict.c | 30 +++-
>  gcc/sreal.c   | 56 
>  gcc/sreal.h   | 75 
> ---
>  3 files changed, 126 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 0215e91..0f640f5 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -82,7 +82,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX.  */
> -static sreal real_zero, real_one, real_almost_one, real_br_prob_base,
> +static sreal real_almost_one, real_br_prob_base,
>  real_inv_br_prob_base, real_one_half, real_bb_freq_max;
>
>  static void combine_predictions_for_insn (rtx_insn *, basic_block);
> @@ -2528,13 +2528,13 @@ propagate_freq (basic_block head, bitmap tovisit)
> bb->count = bb->frequency = 0;
>  }
>
> -  BLOCK_INFO (head)->frequency = real_one;
> +  BLOCK_INFO (head)->frequency = sreal::one ();
>last = head;
>for (bb = head; bb; bb = nextbb)
>  {
>edge_iterator ei;
> -  sreal cyclic_probability = real_zero;
> -  sreal frequency = real_zero;
> +  sreal cyclic_probability = sreal::zero ();
> +  sreal frequency = sreal::zero ();
>
>nextbb = BLOCK_INFO (bb)->next;
>BLOCK_INFO (bb)->next = NULL;
> @@ -2559,13 +2559,13 @@ propagate_freq (basic_block head, bitmap tovisit)
>   * BLOCK_INFO (e->src)->frequency /
>   REG_BR_PROB_BASE);  */
>
> -   sreal tmp (e->probability, 0);
> +   sreal tmp = e->probability;
> tmp *= BLOCK_INFO (e->src)->frequency;
> tmp *= real_inv_br_prob_base;
> frequency += tmp;
>   }
>
> - if (cyclic_probability == real_zero)
> + if (cyclic_probability == sreal::zero ())
> {
>   BLOCK_INFO (bb)->frequency = frequency;
> }
> @@ -2577,7 +2577,7 @@ propagate_freq (basic_block head, bitmap tovisit)
>   /* BLOCK_INFO (bb)->frequency = frequency
>   / (1 - cyclic_probability) */
>
> - cyclic_probability = real_one - cyclic_probability;
> + cyclic_probability = sreal::one () - cyclic_probability;
>   BLOCK_INFO (bb)->frequency = frequency / cyclic_probability;
> }
> }
> @@ -2591,7 +2591,7 @@ propagate_freq (basic_block head, bitmap tovisit)
>  = ((e->probability * BLOCK_INFO (bb)->frequency)
>  / REG_BR_PROB_BASE); */
>
> - sreal tmp (e->probability, 0);
> + sreal tmp = e->probability;
>   tmp *= BLOCK_INFO (bb)->frequency;
>   EDGE_INFO (e)->back_edge_prob = tmp * real_inv_br_prob_base;
> }
> @@ -2873,13 +2873,11 @@ estimate_bb_frequencies (bool force)
>if (!real_values_initialized)
>  {
>   real_values_initialized = 1;
> - real_zero = sreal (0, 0);
> - real_one = sreal (1, 0);
> - real_br_prob_base = sreal (REG_BR_PROB_BASE, 0);
> - real_bb_freq_max = sreal (BB_FREQ_MAX, 0);
> + real_br_prob_base = REG_BR_PROB_BASE;
> + real_bb_freq_max = BB_FREQ_MAX;
>   real_one_half = sreal (1, -1);
> - real_inv_br_prob_base = real_one / real_br_prob_base;
> - real_almost_one = real_one - real_inv_br_prob_base;
> + real_inv_br_prob_base = sreal::one () / real_br_prob_base;
> + real_almost_one = sreal::one () - real_inv_br_prob_base;
> }
>
>mark_dfs_back_edges ();
> @@ -2897,7 +2895,7 @@ estimate_bb_frequencies (bool force)
>
>   FOR_EACH_EDGE (e, ei, bb->succs)
> {
> - EDGE_INFO (e)->back_edge_prob = sreal (e->probability, 0);
> + EDGE_INFO (e)->back_edge_prob = e->probability;
>   EDGE_INFO (e)->back_edge_prob *= real_inv_br_prob_base;
> }
> }
> @@ -2906,7 +2904,7 @@ estimate_bb_frequencies (bool force)
>   to outermost to examine frequencies for back edges.  */
>estimate_loops ();
>
> -  freq_max = real_zero;
> +  freq_max = s

Re: [RFC: AArch64] Parametrically set defaults for function and jump alignment

2014-11-14 Thread James Greenhalgh
On Fri, Nov 14, 2014 at 10:42:27AM +, Andrew Pinski wrote:
> On Fri, Nov 14, 2014 at 2:35 AM, James Greenhalgh
>  wrote:
> >
> > Hi,
> >
> > We currently do not set any interesting default values for jump and function
> > alignment in AArch64. I've made the formula for these values derive from
> > the issue rate of the processor as so:
> >
> >   jumps: 4 * processor issue-rate (rounded down to nearest power of two)
> >   functions: 4 * processor issue-rate (rounded up to nearest power of two)
> >
> > This is sensible for the ARMv8-a implementations I tested on. An
> > alternative patch would make these values new fields in the tuning
> > tables.
> 
> I had submitted an alternative patch a few hours ago which allows the
> tuning structure say what alignment is wanted for all three:
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01615.html

D'oh! I should have flicked through gcc-patches before hitting send!
I imagine I'm encoding similar logic to that you used when writing
this patch.

I'm happy with either approach, so I'll leave it to the maintainers to
decide which they prefer.

> >
> > This happens to work well for some benchmarks and doesn't harm others.
> > The benefit swings depending on the existing alignment and the knock-on
> > effects.
> >
> > Bootstrapped on aarch64-none-linux-gnu with no issues.
> >
> > Does anyone have any thoughts or preferences as to how we set these
> > values in future? If not, OK For trunk?
> 
> Also I notice you don't align loops, was that an oversight or just you
> did not think it was needed?

I didn't think it was needed, and saw some performance issues when
benchmarking with it on. I found these parameters to be very fickle
in the way they change performance. It is easy to end up with a
bunch of cold loops needlessly padding out the binary and therefore
the cache.

It might be that hard-coding loop alignment to 8 for all cores is
sensible, but I don't have data either way.

Cheers,
James

> > ---
> > 2014-11-14  James Greenhalgh  
> >
> > * config/aarch64/aarch64.c (aarch64_override_options): Set default
> > alignments for functions and jumps.
> 


Re: [PATCH 2/3] [AARCH64] Add scheduler for ThunderX

2014-11-14 Thread Marcus Shawcroft
On 14 November 2014 00:56, Andrew Pinski  wrote:
> This adds the schedule model for ThunderX. There are a few TODOs in that
> not all of the SIMD is model currently.  Also the idea of a simple
> shift/extend is not modeled and all cases where there is a shift/extend
> is considered as non simple and take up two cycles rather than correct
> value of one cycle.  Also the 32bit divide and the 64bit divide
> have different cycle counts but there is no way to model that currently.
> Also multiply high takes one cycle more than the normal multiply but
> there is no way to model that currently either.
>
> Build and tested for aarch64-elf with no regressions.
>
> ChangeLog:
> * config/aarch64/aarch64-cores.def (thunderx): Change the scheduler
> over to thunderx.
> * config/aarch64/aarch64.md: Include thunderx.md.
> (generic_sched): Set to no for thunderx.
> * config/aarch64/thunderx.md: New file.

OK /Marcus


Re: system.h vs. C++ STL headers again

2014-11-14 Thread Richard Biener
On Fri, Nov 14, 2014 at 8:18 AM, Jakub Jelinek  wrote:
> On Tue, Nov 11, 2014 at 11:46:55AM +0100, Richard Biener wrote:
>> > BTW: There are lots of places where std::swap can be used, a nice
>> > search-and-replace task for someone to start with gcc development. ;)
>>
>> Agreed ;)  Note that we have to be careful to avoid pulling all of libstdc++
>> into all files via system.h (system.h is so a bad thing... :/).
>
> Apparently the nvptx port is another thing that fails to build because of
> the header ordering issues, this time not just when starting with clang, but
> just when building cross-compiler starting from gcc 4.9.1 on x86_64.
>
> There is:
>
> #include "config.h"
> #include "system.h"
> ...
> #include "hashtab.h"
> #include 
>
> and I get:
> g++ -c   -g  -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
> -DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/. -I../../gcc/../include 
> -I../../gcc/../libcpp/include  -I../../gcc/../libdecnumber 
> -I../../gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/../libbacktrace 
>-o nvptx.o -MT nvptx.o -MMD -MP -MF ./.deps/nvptx.TPo 
> ../../gcc/config/nvptx/nvptx.c
> In file included from /usr/include/c++/4.9.1/bits/basic_ios.h:37:0,
>  from /usr/include/c++/4.9.1/ios:44,
>  from /usr/include/c++/4.9.1/istream:38,
>  from /usr/include/c++/4.9.1/sstream:38,
>  from ../../gcc/config/nvptx/nvptx.c:54:
> /usr/include/c++/4.9.1/bits/locale_facets.h:240:53: error: macro "toupper" 
> passed 2 arguments, but takes just 1
>toupper(char_type *__lo, const char_type* __hi) const
>  ^
> and many more.
> As you said that we don't want to pull all of libstdc++
> headers via system.h, looking at system.h, I see just the
>
> /* Define this so that inttypes.h defines the PRI?64 macros even
>when compiling with a C++ compiler.  Define it here so in the
>event inttypes.h gets pulled in by another header it is already
>defined.  */
> #define __STDC_FORMAT_MACROS
>
> macro being critical to be included before any system headers,
> can't we move that into config.in/auto-host.h and just suggest
> header ordering of
>
> #include "config.h"
> #include  // or any other extra STL headers not provided by system.h 
> you need
> #include "system.h"
> all other includes

Ick.

> ?  There are also some comments about stdarg.h and stdio.h ordering,
> dunno what it comes from and if it is still relevant when we require
> C++ compiler.

I think we should simply discourage people from using sstream for
example.

But I don't see how we can live without system.h with all the weird
host systems still around - thus your solution above will very likely
not work.

Eventually we can split system.h into a c-system.h and cxx-system.h
so we can distinguish between uses in files compiled with a C and
a C++ compiler?

Richard.

> Jakub


Re: [changes.html] Document -fdiagnostics-color= default changes

2014-11-14 Thread Richard Biener
On Fri, Nov 14, 2014 at 8:42 AM, Jakub Jelinek  wrote:
> On Thu, Nov 13, 2014 at 05:54:46PM -0700, Jeff Law wrote:
>> On 11/12/14 15:33, Jakub Jelinek wrote:
>> >This patch makes the -fdiagnostics-color= default configurable, and
>> >changes the default (if no configure option is specified for it)
>> >to --with-diagnostics-color=auto.  The previous behavior can be
>> >restored with --with-diagnostics-color=auto-if-env , the 4.8
>> >behavior (never coloring anything) with --with-diagnostics-color=never .
>> >Bootstrapped/regtested on x86_64-linux and i686-linux and tested with
>> >all 5 different configure options (the four explicit one and without).
>> >Ok for trunk?
>> >
>> >2014-11-12  Jakub Jelinek  
>> >
>> > * configure.ac (--with-diagnostics-color): New configure
>> > option, default to --with-diagnostics-color=auto.
>> > * toplev.c (process_options): Use DIAGNOSTICS_COLOR_DEFAULT
>> > to determine -fdiagnostics-color= option default.
>> > * doc/invoke.texi (-fdiagnostics-color=): Document new
>> > default.
>> > * configure: Regenerated.
>> > * config.in: Regenerated.
>> OK.  Should the change in default behaviour be mentioned somewhere?
>
> Like this?
>
> --- gcc-5/changes.html  12 Nov 2014 14:40:55 -  1.22
> +++ gcc-5/changes.html  14 Nov 2014 07:39:36 -
> @@ -43,6 +43,17 @@
>
>  C family
>
> +The -fdiagnostics-color= option default is now
> +   configurable at GCC configury time using
> +   --with-diagnostics-color=, can default to
> +   auto - the new default unless configured otherwise,
> +   where diagnostics is colorized by default when emitted to terminal,
> +   never, always or auto-if-env,
> +   which is the default of GCC 4.9 - auto if non-empty
> +   GCC_COLORS is in the environment, never
> +   otherwise.  Note, as before, having empty GCC_COLORS
> +   variable in the environment will always turn the coloring off, no
> +   matter what the default is or what command line options are used.
>  A new command-line option -Wswitch-bool has been added 
> for
> the C and C++ compilers, which warns whenever a switch
> statement has an index of boolean type.

Ok.

> doc/invoke.texi also talks about it, wonder if it shouldn't be mentioned
> elsewhere, install.texi?

Sure - the configure option should be mentioned in install.texi as well.

Thanks,
Richard.

> Jakub


Re: [PATCH] -fsanitize=unreachable overhaul (PR sanitizer/63839)

2014-11-14 Thread Marek Polacek
On Fri, Nov 14, 2014 at 09:36:40AM +0100, Richard Biener wrote:
> So please leave existing non-cold things as non-cold.
> 
> > Bootstrapped/regtested on power8-linux, ok for trunk?
> 
> Ok with that change.

Thanks, this is the version with the builtins.def hunk dropped.

Applying to trunk.

2014-11-14  Marek Polacek  

PR sanitizer/63839
* asan.c (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST,
ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST): Define.
* builtin-attrs.def (ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST):
Define.
* builtins.c (fold_builtin_0): Don't include ubsan.h.  Don't
instrument BUILT_IN_UNREACHABLE here.
* sanitizer.def (BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE): Make
const.
* sanopt.c (pass_sanopt::execute): Instrument BUILT_IN_UNREACHABLE.
* tree-ssa-ccp.c (optimize_unreachable): Bail out if
SANITIZE_UNREACHABLE.
* ubsan.c (ubsan_instrument_unreachable): Rewrite for GIMPLE.
* ubsan.h (ubsan_instrument_unreachable): Adjust declaration.
testsuite/
* c-c++-common/ubsan/pr63839.c: New test.
* c-c++-common/ubsan/unreachable-2.c: New test.

diff --git gcc/asan.c gcc/asan.c
index 79dede7..2961b44 100644
--- gcc/asan.c
+++ gcc/asan.c
@@ -2346,6 +2346,9 @@ initialize_sanitizer_builtins (void)
 #define ATTR_TMPURE_NOTHROW_LEAF_LIST ECF_TM_PURE | ATTR_NOTHROW_LEAF_LIST
 #undef ATTR_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_NORETURN_NOTHROW_LEAF_LIST ECF_NORETURN | ATTR_NOTHROW_LEAF_LIST
+#undef ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST \
+  ECF_CONST | ATTR_NORETURN_NOTHROW_LEAF_LIST
 #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
   ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
@@ -2355,6 +2358,9 @@ initialize_sanitizer_builtins (void)
 #undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
   /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
+#undef ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST \
+  /* ECF_COLD missing */ ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST
 #undef DEF_SANITIZER_BUILTIN
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,  \
diff --git gcc/builtin-attrs.def gcc/builtin-attrs.def
index 9c05a94..c707367 100644
--- gcc/builtin-attrs.def
+++ gcc/builtin-attrs.def
@@ -145,6 +145,8 @@ DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LIST, 
ATTR_SENTINEL,  \
ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LEAF_LIST, ATTR_SENTINEL,\
ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
+   ATTR_NULL, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
 
 /* Functions whose pointer parameter(s) are all nonnull.  */
 DEF_ATTR_TREE_LIST (ATTR_NONNULL_LIST, ATTR_NONNULL, ATTR_NULL, ATTR_NULL)
diff --git gcc/builtins.c gcc/builtins.c
index 1cd65ed..311c0e3 100644
--- gcc/builtins.c
+++ gcc/builtins.c
@@ -64,7 +64,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "builtins.h"
 #include "asan.h"
-#include "ubsan.h"
 #include "cilk.h"
 #include "ipa-ref.h"
 #include "lto-streamer.h"
@@ -9803,14 +9802,6 @@ fold_builtin_0 (location_t loc, tree fndecl, bool ignore 
ATTRIBUTE_UNUSED)
 case BUILT_IN_CLASSIFY_TYPE:
   return fold_builtin_classify_type (NULL_TREE);
 
-case BUILT_IN_UNREACHABLE:
-  if (flag_sanitize & SANITIZE_UNREACHABLE
- && (current_function_decl == NULL
- || !lookup_attribute ("no_sanitize_undefined",
-   DECL_ATTRIBUTES (current_function_decl
-   return ubsan_instrument_unreachable (loc);
-  break;
-
 default:
   break;
 }
diff --git gcc/sanitizer.def gcc/sanitizer.def
index cddc5ea..3fc8c83 100644
--- gcc/sanitizer.def
+++ gcc/sanitizer.def
@@ -394,7 +394,7 @@ 
DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS,
 DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE,
  "__ubsan_handle_builtin_unreachable",
  BT_FN_VOID_PTR,
- ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+ ATTR_COLD_CONST_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_MISSING_RETURN,
  "__ubsan_handle_missing_return",
  BT_FN_VOID_PTR,
diff --git gcc/sanopt.c gcc/sanopt.c
index 0fc032a..fe2e42d 100644
--- gcc/sanopt.c
+++ gcc/sanopt.c
@@ -312,6 +312,21 @@ pass_sanopt::execute (function *fun)
  break;
}
}
+ else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+   {
+ tree callee = gimple_call_fndecl (stmt);
+ switch (DECL_FUNCTION_

Re: system.h vs. C++ STL headers again

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 11:57:57AM +0100, Richard Biener wrote:
> > ?  There are also some comments about stdarg.h and stdio.h ordering,
> > dunno what it comes from and if it is still relevant when we require
> > C++ compiler.
> 
> I think we should simply discourage people from using sstream for
> example.

That would be my preference too of course, unfortunately Bernd chose to use
it everywhere (grep '<<' nvptx.c').

> But I don't see how we can live without system.h with all the weird
> host systems still around - thus your solution above will very likely
> not work.

Well, I wasn't suggesting without system.h, I was suggesting to include
config.h first (that is required anyway), then C++ STL headers, then
system.h and then other GCC headers.

> Eventually we can split system.h into a c-system.h and cxx-system.h
> so we can distinguish between uses in files compiled with a C and
> a C++ compiler?

That wouldn't help here.

Jakub


Re: [RFC: AArch64] Parametrically set defaults for function and jump alignment

2014-11-14 Thread Marcus Shawcroft
On 14 November 2014 10:50, James Greenhalgh  wrote:
> On Fri, Nov 14, 2014 at 10:42:27AM +, Andrew Pinski wrote:
>> On Fri, Nov 14, 2014 at 2:35 AM, James Greenhalgh
>>  wrote:
>> >
>> > Hi,
>> >
>> > We currently do not set any interesting default values for jump and 
>> > function
>> > alignment in AArch64. I've made the formula for these values derive from
>> > the issue rate of the processor as so:
>> >
>> >   jumps: 4 * processor issue-rate (rounded down to nearest power of two)
>> >   functions: 4 * processor issue-rate (rounded up to nearest power of two)
>> >
>> > This is sensible for the ARMv8-a implementations I tested on. An
>> > alternative patch would make these values new fields in the tuning
>> > tables.
>>
>> I had submitted an alternative patch a few hours ago which allows the
>> tuning structure say what alignment is wanted for all three:
>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01615.html
>
> D'oh! I should have flicked through gcc-patches before hitting send!
> I imagine I'm encoding similar logic to that you used when writing
> this patch.
>
> I'm happy with either approach, so I'll leave it to the maintainers to
> decide which they prefer.

I think Andrews approach of making it adjustable per core makes sense.
Andrew can you split the alignment part of your patch from the fusion
part of your patch?

Cheers
/Marcus


Re: The nvptx port

2014-11-14 Thread Bernd Schmidt

Hi Jakub,


I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
to be wired up and uses the %alloca documented in the PTX
manual, what is the issue with that?  %alloca not being actually
implemented by the current PTX assembler or translator?


Yes, it's unimplemented. There's an internal declaration for it but that 
seems to be as far as it goes, and that declaration is 32-bit only anyway.



2) what is the reason why TLS isn't supported by the port (well,
__emutls is emitted, but I doubt pthread_[gs]etspecific is
implementable and thus it will not really do anything.
Can't the port just emit all DECL_THREAD_LOCAL_P variables
into .local instead of .global address space?


.local is stack frame memory, not TLS. The ptx docs mention the use of 
.local at file-scope as occurring only in "legacy" ptx code and I get 
the impression it's discouraged.


(As an aside, there's a question of how to represent a different 
concept, gang-local memory, in gcc. That would be .shared memory. We're 
currently going with just using an internal attribute)



3) in assembly emitted by the nvptx port, I've noticed:
.visible .func (.param.u32 %out_retval)foo(.param.u64 %in_ar1, .param.u32 
%in_ar2)
{
.reg.u64 %ar1;
.reg.u32 %ar2;
.reg.u32 %retval;
.reg.u64 %hr10;
.reg.u32 %r22;
.reg.u64 %r25;
is the missing \t before the %retval line intentional?


No, I can fix that up.


4) I had a brief look at what it would take to port libgomp to PTX,
which is needed for OpenMP offloading.  OpenMP offloaded kernels
should start with 1 team and 1 thread in it, if we ignore
GOMP_teams for now, I think the major things are:
- right now libgomp is heavily pthread_* based, which is a no-go
  for nvptx I assume, I think we'll need some ifdefs in the sources


I haven't looked into whether libpthread is doable. I suspect it's a 
poor match. I also haven't really looked into OpenMP, so I'm feeling a 
bit uncertain about answering your further questions.



- the main thing is that I believe we just have to replace
  gomp_team_start for nvptx; seems there are
  cudaLaunchDevice (and cudaGetParameterBuffer) functions one can use
  to spawn selected kernel in selected number of threads (and teams),
  from the docs it isn't exactly clear what the calling thread will do,
  if it is suspended and the HW core given to it is reused by something
  else (e.g. one of the newly spawned threads), then I think it should
  be usable.  Not sure what happens with .local memory of the parent
  task, if the children all have different .local memory, then
  perhaps one could just copy over what is needed from the
  invoking to the first invoked thread at start.


I'm a bit confused here, it sounds as if you want to call 
cudaLaunchDevice from ptx code? These are called from the host. As 
mentioned above, .local is probably not useful for what you want.



- is it worth to reuse cudaLaunchDevice "threads" or are they cheap
  enough to start that any "thread" pooling should be removed for nvptx?


Sorry, I don't understand the question.


- we'll need some synchronization primitives, I see atomic support is
  there, we need mutexes and semaphores I think, is that implementable
  using bar instruction?


It's probably membar you need.


- the library uses __attribute__((constructor)) in 3 places or so,
  initialize_team is pthread specific and can be probably ifdefed out,
  we won't support dlclose in nvptx anyway, but at least we need some
  way to initialize the nvptx libgomp; if the initialization is done
  in global memory, would it persist in between different kernels,
  so can the initialization as separate kernel be run once, something
  else?


I think that it would persist, and this would be my scheme for 
implementing constructors, but I haven't actually tried.



- is there any way to do any affinity management, or shall we just
  ignore affinity strategies?


Not sure what they do in libgomp. It's probably not a match for GPU 
architectures.



- any way how to query time?


There are %clock and %clock64 cycle counters.


Bernd



Re: [arm][patch] fix arm_neon_ok check on !arm_arch7

2014-11-14 Thread Andrew Stubbs

On 07/11/14 10:35, Andrew Stubbs wrote:

   if armv6 never co-exist with NEON, personally I think your original
patch is better
   because TARGET_NEON generally will be used when all options are
processed.

   any way, this needs gate keeper's approval.


Ping, Richard.


Ping.



Re: system.h vs. C++ STL headers again

2014-11-14 Thread Richard Biener
On Fri, Nov 14, 2014 at 12:03 PM, Jakub Jelinek  wrote:
> On Fri, Nov 14, 2014 at 11:57:57AM +0100, Richard Biener wrote:
>> > ?  There are also some comments about stdarg.h and stdio.h ordering,
>> > dunno what it comes from and if it is still relevant when we require
>> > C++ compiler.
>>
>> I think we should simply discourage people from using sstream for
>> example.
>
> That would be my preference too of course, unfortunately Bernd chose to use
> it everywhere (grep '<<' nvptx.c').
>
>> But I don't see how we can live without system.h with all the weird
>> host systems still around - thus your solution above will very likely
>> not work.
>
> Well, I wasn't suggesting without system.h, I was suggesting to include
> config.h first (that is required anyway), then C++ STL headers, then
> system.h and then other GCC headers.

I'm quite sure that we'll find a system where that won't work?  But
sure - maybe it's worth a try...

Richard.

>> Eventually we can split system.h into a c-system.h and cxx-system.h
>> so we can distinguish between uses in files compiled with a C and
>> a C++ compiler?
>
> That wouldn't help here.
>
> Jakub


Re: [Patch ARM-AArch64/testsuite] More Neon intrinsics executable tests

2014-11-14 Thread Marcus Shawcroft
On 12 November 2014 13:11, Christophe Lyon  wrote:
> Hi,
>
> The attached patch adds a few more tests to the recently added
> advsimd-intrinsics series.
>
> OK for trunk?
>
> Christophe.
>
> 2014-11-12  Christophe Lyon  
>
> * gcc.target/aarch64/advsimd-intrinsics/vcls.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vcnt.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vcombine.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vcreate.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vcvt.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vext.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vget_high.c: New test.
> * gcc.target/aarch64/advsimd-intrinsics/vget_low.c: New test.

Do they pass for both BE and LE ?


Re: The nvptx port

2014-11-14 Thread Bernd Schmidt

On 11/14/2014 11:01 AM, Jakub Jelinek wrote:

On Fri, Nov 14, 2014 at 09:29:48AM +0100, Jakub Jelinek wrote:

I have some questions about nvptx:


Oh, and
5) I have noticed gcc doesn't generate the .uni suffixes anywhere,
while llvm generates them; are those appropriate only when a function
is guaranteed to be run unconditionally from the toplevel kernel,
or even in spots in arbitrary functions which might not be run
unconditionally by all threads in thread block, but all threads
that encounter the particular function will run the specific spot
unconditionally?  I mean, if we have arbitrary function:
void foo (void) { something; bar (); something; }
then the call is unconditional in there, but there is no guarantee
somebody will not do
void baz (int x) { if (x > 20) foo (); }
and run foo only in a subset of the threads.


It's unclear to me what the .uni suffix even does on calls. Google finds 
this:


  http://divmap.wordpress.com/home/divopt/

which suggests that it says that the call's predicate will evaluate to 
the same value on all threads. So I think for an unconditional call 
instruction it's just meaningless.



Bernd



Re: [PATCH] Optimize ASAN_CHECK checks

2014-11-14 Thread Dodji Seketeli

Jakub Jelinek  writes:

> On Wed, Nov 12, 2014 at 02:09:59PM +0300, Yury Gribov wrote:
> > >For asan you're right, we can have addresses of decls there etc.
> > >If you have spare cycles, feel free to take over the patch and adjust it.
> > 
> > I guess I'd wait when this gets to trunk?
> 
> Ok, in that case I've bootstrapped/regtested on x86_64-linux/i686-linux what 
> I have with
> the ASAN_CHECK_NON_ZERO_LEN stuff removed from it (all non-INTEGER_CST
> lengths ignored).  Dodji, is this ok for trunk?

[...]

> +++ gcc/sanopt.c  2014-11-12 21:04:50.007325020 +0100
> 
>  /* This is used to carry information about basic blocks.  It is
> @@ -56,7 +57,29 @@ along with GCC; see the file COPYING3.
>  
>  struct sanopt_info
>  {

[...]

> +  /* True if there is a block with HAS_FREEING_CALL_P flag set
> + on any a path between imm(BB) and BB.  */

s/a//.

Also, I'd rather say "on any path between an immediate dominator of
BB, denoted imm(BB), and BB".  That way, subsequent uses of imm(BB)
makes sense more for the new comer.  This is only a suggestion.  If
you feel that formulation is obvious enough, then please ignore my
comment.

> +  bool imm_dom_path_with_freeing_call_p;

[...]

 };

[...]
 
> +/* Return true if there might be any call to free/munmap operation
> +   on any path in between DOM (which should be imm(BB)) and BB.  */
> +
> +static bool
> +imm_dom_path_with_freeing_call (basic_block bb, basic_block dom)
> +{

To ease maintainability, maybe we could assert that:

   gcc_assert (dom == get_immediate_dominator(CDI_DOMINATORS, bb));

?

And thus remove the assert that is at the caller site of this
function, later in maybe_optimize_asan_check_ifn:

> +   basic_block imm = get_immediate_dominator (CDI_DOMINATORS, last_bb);
> +   gcc_assert (imm);
> +   if (imm_dom_path_with_freeing_call (last_bb, imm))
> + break;


Also, when the 'dom' basic block is NULL, couldn't we just return
immediately?

[...]

> +}


[...]

> +/* Optimize away redundant ASAN_CHECK calls.  */
> +
> +static bool
> +maybe_optimize_asan_check_ifn (struct sanopt_ctx *ctx, gimple stmt)
> +{
> +  gcc_assert (gimple_call_num_args (stmt) == 4);
> +  tree ptr = gimple_call_arg (stmt, 1);
> +  tree len = gimple_call_arg (stmt, 2);
> +  basic_block bb = gimple_bb (stmt);
> +  sanopt_info *info = (sanopt_info *) bb->aux;
> +
> +  if (TREE_CODE (len) != INTEGER_CST)
> +return false;
> +  if (integer_zerop (len))
> +return false;
> +
> +  gimple_set_uid (stmt, info->freeing_call_events);

I am not sure, but I am wondering if we shouldn't save the previous uid
of 'stmt' here before setting it, and then restore it before getting out
of this function.

[...]

> +}
> +
>  /* Try to optimize away redundant UBSAN_NULL checks.
> 
> We walk blocks in the CFG via a depth first search of the dominator
> @@ -89,111 +402,77 @@ sanopt_optimize_walker (basic_block bb,

I think the comment of this sanopt_optimize_walker function should now
be adapted to say that it optimizes away redundant UBSAN_NULL *and*
ASAN_CHECK internal function calls.

>  {

[...]

>  }

[...]

OK with those changes.

Thanks.

-- 
Dodji


Re: system.h vs. C++ STL headers again

2014-11-14 Thread Bernd Schmidt

On 11/14/2014 12:03 PM, Jakub Jelinek wrote:

On Fri, Nov 14, 2014 at 11:57:57AM +0100, Richard Biener wrote:

?  There are also some comments about stdarg.h and stdio.h ordering,
dunno what it comes from and if it is still relevant when we require
C++ compiler.


I think we should simply discourage people from using sstream for
example.


That would be my preference too of course, unfortunately Bernd chose to use
it everywhere (grep '<<' nvptx.c').


I needed a way to output something but defer it to the end of the 
assembly file. sstream seemed like the tool for the job; if we go to the 
trouble of converting the compiler to C++ we might as well use the 
language as intended.


FWIW Cesar also ran into this problem and moving the sstream include 
first seems to solve it, so I'll be committing that fix.



Bernd



Re: libsanitizer merge from upstream r221802

2014-11-14 Thread Uros Bizjak
Hello!

> Here is one more merge of libsanitizer (last one was in Sept).
>
> Tested on x86_64 Ubuntu 14.04 like this:
> rm -rf */{*/,}libsanitizer && make -j 50
> make -j 40 -C gcc check-g{cc,++}
> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp' && \
> make -j 40 -C gcc check-g{cc,++}
> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} tsan.exp' && \
> make -j 40 -C gcc check
> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp' && \
> echo PASS
>
> Expected ChangeLog entry:
>
> 2014-11-12  Kostya Serebryany  
>
>* All source files: Merge from upstream r221802.
>* sanitizer_common/sanitizer_symbolizer_libbacktrace.cc
>  (LibbacktraceSymbolizer::SymbolizeData): replace 'address'
>  with 'start' to follow the new interface.
>* asan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>* interception/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>* libbacktrace/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>* lsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>* sanitizer_common/Makefile.am (sanitizer_common_files): Added new
>  files.
>  (AM_CXXFLAGS): added -std=c++11.
>* tsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>* ubsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>* asan/Makefile.in: Regenerate.
>* interception/Makefile.in: Regenerate.
>* libbacktrace/Makefile.in: Regenerate.
>* lsan/Makefile.in: Regenerate.
>* sanitizer_common/Makefile.in: Regenerate.
>* tsan/Makefile.in: Regenerate.
>* ubsan/Makefile.in: Regenerate.

This patch breaks CENTOS5 build with:

In file included from /usr/include/asm-x86_64/byteorder.h:30:0,
 from /usr/include/asm/byteorder.h:5,
 from /usr/include/linux/aio_abi.h:30,
 from
/home/uros/gcc-svn/trunk/libsanitizer/include/system/linux/aio_abi.h:2,
 from
../../../../gcc-svn/trunk/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:49:
/usr/include/linux/byteorder/little_endian.h:43:19: error: ‘__le64’
does not name a type
 static __inline__ __le64 __cpu_to_le64p(const __u64 *p)
   ^
/usr/include/linux/byteorder/little_endian.h:47:46: error: ‘__le64’
does not name a type
 static __inline__ __u64 __le64_to_cpup(const __le64 *p)
  ^
/usr/include/linux/byteorder/little_endian.h:67:19: error: ‘__be64’
does not name a type
 static __inline__ __be64 __cpu_to_be64p(const __u64 *p)
   ^
/usr/include/linux/byteorder/little_endian.h:71:46: error: ‘__be64’
does not name a type
 static __inline__ __u64 __be64_to_cpup(const __be64 *p)
  ^
gmake[4]: *** [sanitizer_platform_limits_linux.lo] Error 1
gmake[4]: Leaving directory
`/home/uros/gcc-build/x86_64-unknown-linux-gnu/libsanitizer/sanitizer_common'
gmake[3]: *** [all-recursive] Error 1
gmake[3]: Leaving directory
`/home/uros/gcc-build/x86_64-unknown-linux-gnu/libsanitizer'
gmake[2]: *** [all] Error 2
gmake[2]: Leaving directory
`/home/uros/gcc-build/x86_64-unknown-linux-gnu/libsanitizer'
gmake[1]: *** [all-target-libsanitizer] Error 2

Uros.


Re: [RFC: AArch64] Parametrically set defaults for function and jump alignment

2014-11-14 Thread Ramana Radhakrishnan
On Fri, Nov 14, 2014 at 11:11 AM, Marcus Shawcroft
 wrote:
> On 14 November 2014 10:50, James Greenhalgh  wrote:
>> On Fri, Nov 14, 2014 at 10:42:27AM +, Andrew Pinski wrote:
>>> On Fri, Nov 14, 2014 at 2:35 AM, James Greenhalgh
>>>  wrote:
>>> >
>>> > Hi,
>>> >
>>> > We currently do not set any interesting default values for jump and 
>>> > function
>>> > alignment in AArch64. I've made the formula for these values derive from
>>> > the issue rate of the processor as so:
>>> >
>>> >   jumps: 4 * processor issue-rate (rounded down to nearest power of two)
>>> >   functions: 4 * processor issue-rate (rounded up to nearest power of two)
>>> >
>>> > This is sensible for the ARMv8-a implementations I tested on. An
>>> > alternative patch would make these values new fields in the tuning
>>> > tables.
>>>
>>> I had submitted an alternative patch a few hours ago which allows the
>>> tuning structure say what alignment is wanted for all three:
>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01615.html
>>
>> D'oh! I should have flicked through gcc-patches before hitting send!
>> I imagine I'm encoding similar logic to that you used when writing
>> this patch.
>>
>> I'm happy with either approach, so I'll leave it to the maintainers to
>> decide which they prefer.
>
> I think Andrews approach of making it adjustable per core makes sense.
> Andrew can you split the alignment part of your patch from the fusion
> part of your patch?

FWIW, I agree parameterizing this off issue width appears an interesting metric
but having a handle on these independently would probably be more
useful in the long run. And I think it would make sense for generic to have
8 bytes alignment where sensible.

I'm not sure about align-loops from benchmarking experiments in AArch32 land
a couple of years ago. IIRC it was very hard to stop the compiler from
just willy
nilly inserting too many nops. The situation in A64 may be different to this but
doesn't sound like it from James's experimentation so far.

Ramana

>
> Cheers
> /Marcus


Re: The nvptx port

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 12:09:03PM +0100, Bernd Schmidt wrote:
> >I have some questions about nvptx:
> >1) you've said that alloca isn't supported, but it seems
> >to be wired up and uses the %alloca documented in the PTX
> >manual, what is the issue with that?  %alloca not being actually
> >implemented by the current PTX assembler or translator?
> 
> Yes, it's unimplemented. There's an internal declaration for it but that
> seems to be as far as it goes, and that declaration is 32-bit only anyway.

:(.  Does NVidia plan to fix that in next version?

> >2) what is the reason why TLS isn't supported by the port (well,
> >__emutls is emitted, but I doubt pthread_[gs]etspecific is
> >implementable and thus it will not really do anything.
> >Can't the port just emit all DECL_THREAD_LOCAL_P variables
> >into .local instead of .global address space?
> 
> .local is stack frame memory, not TLS. The ptx docs mention the use of
> .local at file-scope as occurring only in "legacy" ptx code and I get the
> impression it's discouraged.

:(.  So what other option one has to implement something like TLS, even
using inline asm or similar?  There is %tid, so perhaps indexing some array
with %tid?  The trouble with that is that some thread can do
#pragma omp parallel again, and I bet the %tid afterwards would be
again 0-(n-1), and if it is an index into a global array, it wouldn't work
well then.  Maybe without anything like TLS we can't really support nested
parallelism, only one level of #pragma omp parallel inside of nvptx regions.
But, if we add support for #pragma omp team, we'd either need the array
in gang-local memory, or some other special register to give us gang id.

BTW, one can still invoke OpenMP target regions (even OpenACC regions) from
multiple host threads, so the question is how without local TLS we can
actually do anything at all.  Sure, we can pass parameters to the kernel,
but we'd need to propagate it through all functions.  Or can
cudaGetParameterBuffer be used for that?

> >4) I had a brief look at what it would take to port libgomp to PTX,
> >which is needed for OpenMP offloading.  OpenMP offloaded kernels
> >should start with 1 team and 1 thread in it, if we ignore
> >GOMP_teams for now, I think the major things are:
> >- right now libgomp is heavily pthread_* based, which is a no-go
> >  for nvptx I assume, I think we'll need some ifdefs in the sources
> 
> I haven't looked into whether libpthread is doable. I suspect it's a poor
> match. I also haven't really looked into OpenMP, so I'm feeling a bit
> uncertain about answering your further questions.

What OpenMP needs is essentially:
- some way to spawn multiple threads (fork-join model), where the parent
  thread is the first one among those other threads, or, if that isn't
  possible, the first thread pretends to be the same as the first thread
  and the parent thread sleeps
- something like pthread_mutex_lock/unlock (only basic; or say atomic ops + 
futex
  we use for Linux)
- something like sem_* semaphore
- and some TLS or something similar (pthread_[gs]etspecific etc.)

> >- the main thing is that I believe we just have to replace
> >  gomp_team_start for nvptx; seems there are
> >  cudaLaunchDevice (and cudaGetParameterBuffer) functions one can use
> >  to spawn selected kernel in selected number of threads (and teams),
> >  from the docs it isn't exactly clear what the calling thread will do,
> >  if it is suspended and the HW core given to it is reused by something
> >  else (e.g. one of the newly spawned threads), then I think it should
> >  be usable.  Not sure what happens with .local memory of the parent
> >  task, if the children all have different .local memory, then
> >  perhaps one could just copy over what is needed from the
> >  invoking to the first invoked thread at start.
> 
> I'm a bit confused here, it sounds as if you want to call cudaLaunchDevice
> from ptx code? These are called from the host. As mentioned above, .local is
> probably not useful for what you want.

In CUDA_Dynamic_Parallelism_Programming_Guide.pdf in C.3.2 it is mentioned
it should be possible, there is:
.extern .func(.param .b32 func_retval0) cudaLaunchDevice
(
.param .b64 func,
.param .b64 parameterBuffer,
.param .align 4 .b8 gridDimension[12],
.param .align 4 .b8 blockDimension[12],
.param .b32 sharedMemSize,
.param .b64 stream
)
;
(or s/.b64/.b32/ for -m32) that should be usable from within PTX.
The Liao-OpenMP-Accelerator-Model-2013.pdf paper also mentions using dynamic
parallelism (because all other variants are just bad for OpenMP, you'd need
to preallocate all the gangs/threads (without knowing how many you'll need),
and perhaps let them sleep on some barrier until you have work for them.

> >- is it worth to reuse cudaLaunchDevice "threads" or are they cheap
> >  enough to start that any "thread" pooling should be removed for nvptx?
> 
> Sorry, 

Re: system.h vs. C++ STL headers again

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 12:27:52PM +0100, Bernd Schmidt wrote:
> On 11/14/2014 12:03 PM, Jakub Jelinek wrote:
> >On Fri, Nov 14, 2014 at 11:57:57AM +0100, Richard Biener wrote:
> >>>?  There are also some comments about stdarg.h and stdio.h ordering,
> >>>dunno what it comes from and if it is still relevant when we require
> >>>C++ compiler.
> >>
> >>I think we should simply discourage people from using sstream for
> >>example.
> >
> >That would be my preference too of course, unfortunately Bernd chose to use
> >it everywhere (grep '<<' nvptx.c').
> 
> I needed a way to output something but defer it to the end of the assembly
> file. sstream seemed like the tool for the job; if we go to the trouble of
> converting the compiler to C++ we might as well use the language as
> intended.
> 
> FWIW Cesar also ran into this problem and moving the sstream include first
> seems to solve it, so I'll be committing that fix.

Please include it after config.h though.

Jakub


Re: [PATCH] Peg down -(-A) -> A transformation

2014-11-14 Thread Marek Polacek
On Wed, Nov 12, 2014 at 11:53:19AM +0100, Richard Biener wrote:
> Err - please adjust fold_negate_expr instead.

Like this?

(It's not best that for -trapv/-fsanitize=s-i-o we don't emit
compile-time warning "integer overflow in expression" for -INT_MIN,
because the warning relies on the folding.)

Bootstrapped/regtested on power8-linux.

2014-11-14  Marek Polacek  

* fold-const.c (fold_negate_expr): Don't fold INTEGER_CST if
that overflows when SANITIZE_SI_OVERFLOW is on.  Guard -(-A)
folding with TYPE_OVERFLOW_SANITIZED.

* c-c++-common/ubsan/overflow-negate-3.c: New test.

diff --git gcc/fold-const.c gcc/fold-const.c
index ee9ed7b..8994aa4 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -555,7 +555,8 @@ fold_negate_expr (location_t loc, tree t)
 case INTEGER_CST:
   tem = fold_negate_const (t, type);
   if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
- || !TYPE_OVERFLOW_TRAPS (type))
+ || (!TYPE_OVERFLOW_TRAPS (type)
+ && (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0))
return tem;
   break;
 
@@ -612,7 +613,9 @@ fold_negate_expr (location_t loc, tree t)
   break;
 
 case NEGATE_EXPR:
-  return TREE_OPERAND (t, 0);
+  if (!TYPE_OVERFLOW_SANITIZED (type))
+   return TREE_OPERAND (t, 0);
+  break;
 
 case PLUS_EXPR:
   if (!HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (type))
diff --git gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c 
gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c
index e69de29..e6db394 100644
--- gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c
+++ gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=signed-integer-overflow" } */
+
+#define INT_MIN (-__INT_MAX__ - 1)
+
+int
+main ()
+{
+  int x = INT_MIN;
+  int y;
+  asm ("" : "+g" (x));
+  y = -(-x);
+  asm ("" : "+g" (y));
+  y = -(-INT_MIN);
+  asm ("" : "+g" (y));
+}
+
+/* { dg-output "negation of -2147483648 cannot be represented in type 
'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
itself\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*negation of -2147483648 cannot be represented in type 
'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
itself\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*negation of -2147483648 cannot be represented in type 
'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
itself\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*negation of -2147483648 cannot be represented in type 
'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
itself\[^\n\r]*(\n|\r\n|\r)" } */

Marek


Re: [PATCH] Peg down -(-A) -> A transformation

2014-11-14 Thread Richard Biener
On Fri, 14 Nov 2014, Marek Polacek wrote:

> On Wed, Nov 12, 2014 at 11:53:19AM +0100, Richard Biener wrote:
> > Err - please adjust fold_negate_expr instead.
> 
> Like this?
> 
> (It's not best that for -trapv/-fsanitize=s-i-o we don't emit
> compile-time warning "integer overflow in expression" for -INT_MIN,
> because the warning relies on the folding.)

Well - the warning implementation is clearly bougs then ;)

> Bootstrapped/regtested on power8-linux.

Ok.

Thanks,
Richard.

> 2014-11-14  Marek Polacek  
> 
>   * fold-const.c (fold_negate_expr): Don't fold INTEGER_CST if
>   that overflows when SANITIZE_SI_OVERFLOW is on.  Guard -(-A)
>   folding with TYPE_OVERFLOW_SANITIZED.
> 
>   * c-c++-common/ubsan/overflow-negate-3.c: New test.
> 
> diff --git gcc/fold-const.c gcc/fold-const.c
> index ee9ed7b..8994aa4 100644
> --- gcc/fold-const.c
> +++ gcc/fold-const.c
> @@ -555,7 +555,8 @@ fold_negate_expr (location_t loc, tree t)
>  case INTEGER_CST:
>tem = fold_negate_const (t, type);
>if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
> -   || !TYPE_OVERFLOW_TRAPS (type))
> +   || (!TYPE_OVERFLOW_TRAPS (type)
> +   && (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0))
>   return tem;
>break;
>  
> @@ -612,7 +613,9 @@ fold_negate_expr (location_t loc, tree t)
>break;
>  
>  case NEGATE_EXPR:
> -  return TREE_OPERAND (t, 0);
> +  if (!TYPE_OVERFLOW_SANITIZED (type))
> + return TREE_OPERAND (t, 0);
> +  break;
>  
>  case PLUS_EXPR:
>if (!HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (type))
> diff --git gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c 
> gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c
> index e69de29..e6db394 100644
> --- gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c
> +++ gcc/testsuite/c-c++-common/ubsan/overflow-negate-3.c
> @@ -0,0 +1,21 @@
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=signed-integer-overflow" } */
> +
> +#define INT_MIN (-__INT_MAX__ - 1)
> +
> +int
> +main ()
> +{
> +  int x = INT_MIN;
> +  int y;
> +  asm ("" : "+g" (x));
> +  y = -(-x);
> +  asm ("" : "+g" (y));
> +  y = -(-INT_MIN);
> +  asm ("" : "+g" (y));
> +}
> +
> +/* { dg-output "negation of -2147483648 cannot be represented in type 
> 'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
> itself\[^\n\r]*(\n|\r\n|\r)" } */
> +/* { dg-output "\[^\n\r]*negation of -2147483648 cannot be represented in 
> type 'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
> itself\[^\n\r]*(\n|\r\n|\r)" } */
> +/* { dg-output "\[^\n\r]*negation of -2147483648 cannot be represented in 
> type 'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
> itself\[^\n\r]*(\n|\r\n|\r)" } */
> +/* { dg-output "\[^\n\r]*negation of -2147483648 cannot be represented in 
> type 'int'\[^\n\r]*; cast to an unsigned type to negate this value to 
> itself\[^\n\r]*(\n|\r\n|\r)" } */
> 
>   Marek
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Re: [PATCH] Optimize ASAN_CHECK checks

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 12:25:37PM +0100, Dodji Seketeli wrote:
> > +  /* True if there is a block with HAS_FREEING_CALL_P flag set
> > + on any a path between imm(BB) and BB.  */
> 
> s/a//.
> 
> Also, I'd rather say "on any path between an immediate dominator of
> BB, denoted imm(BB), and BB".  That way, subsequent uses of imm(BB)
> makes sense more for the new comer.  This is only a suggestion.  If
> you feel that formulation is obvious enough, then please ignore my
> comment.

Ok.

> > +/* Return true if there might be any call to free/munmap operation
> > +   on any path in between DOM (which should be imm(BB)) and BB.  */
> > +
> > +static bool
> > +imm_dom_path_with_freeing_call (basic_block bb, basic_block dom)
> > +{
> 
> To ease maintainability, maybe we could assert that:
> 
>gcc_assert (dom == get_immediate_dominator(CDI_DOMINATORS, bb));

Well, that would make the dom argument useless, the point of passing it
around was that because we have to call it in the caller already,
there is no point in calling it again the the callee.
So if we as well call it (most people don't use --disable-checking),
we could just drop the argument.

> ?
> 
> And thus remove the assert that is at the caller site of this
> function, later in maybe_optimize_asan_check_ifn:
> 
> > + basic_block imm = get_immediate_dominator (CDI_DOMINATORS, last_bb);
> > + gcc_assert (imm);

The assert is very much intentional here, want to double-check the algorithm
that we indeed always reach gbb through the imm dominator walk (unless
bailing out early, but if we wouldn't, we'd reach it).

> > + if (imm_dom_path_with_freeing_call (last_bb, imm))
> > +   break;
> 
> 
> Also, when the 'dom' basic block is NULL, couldn't we just return
> immediately?

get_immediate_dominator (CDI_DOMINATORS, ) should return NULL
just for the ENTRY block, it would be a bug to reach that bb.

> > +/* Optimize away redundant ASAN_CHECK calls.  */
> > +
> > +static bool
> > +maybe_optimize_asan_check_ifn (struct sanopt_ctx *ctx, gimple stmt)
> > +{
> > +  gcc_assert (gimple_call_num_args (stmt) == 4);
> > +  tree ptr = gimple_call_arg (stmt, 1);
> > +  tree len = gimple_call_arg (stmt, 2);
> > +  basic_block bb = gimple_bb (stmt);
> > +  sanopt_info *info = (sanopt_info *) bb->aux;
> > +
> > +  if (TREE_CODE (len) != INTEGER_CST)
> > +return false;
> > +  if (integer_zerop (len))
> > +return false;
> > +
> > +  gimple_set_uid (stmt, info->freeing_call_events);
> 
> I am not sure, but I am wondering if we shouldn't save the previous uid
> of 'stmt' here before setting it, and then restore it before getting out
> of this function.

No, gimple uids are AFAIK undefined at the start of passes, passes that use
them are supposed to initialize them before use (new statements created
during the pass will get 0 there by default), and don't have to clean them
up anyway at the end of pass.

> > @@ -89,111 +402,77 @@ sanopt_optimize_walker (basic_block bb,
> 
> I think the comment of this sanopt_optimize_walker function should now
> be adapted to say that it optimizes away redundant UBSAN_NULL *and*
> ASAN_CHECK internal function calls.

Ok, will do.

Jakub


[gomp4] Merge trunk r217483 (2014-11-13) into gomp-4_0-branch

2014-11-14 Thread Thomas Schwinge
Hi!

In r217554, I have committed a merge from trunk r217483 (2014-11-13) into
gomp-4_0-branch.

This is just before the first "offloading" commit in trunk, so this
revision of gomp-4_0-branch can now serve as a basis for merging the
gomp-4_0-branch changes with the offloading changes, which are the
commits following on trunk after r217483.  Cesar, you've already been
working on such a merge -- I'll have a look at the patch you sent me,
thanks!


Grüße,
 Thomas


pgpprE2OKkLNC.pgp
Description: PGP signature


Re: [Patch ARM-AArch64/testsuite] More Neon intrinsics executable tests

2014-11-14 Thread Christophe Lyon
On 14 November 2014 12:17, Marcus Shawcroft  wrote:
> On 12 November 2014 13:11, Christophe Lyon  wrote:
>> Hi,
>>
>> The attached patch adds a few more tests to the recently added
>> advsimd-intrinsics series.
>>
>> OK for trunk?
>>
>> Christophe.
>>
>> 2014-11-12  Christophe Lyon  
>>
>> * gcc.target/aarch64/advsimd-intrinsics/vcls.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vcnt.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vcombine.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vcreate.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vcvt.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vext.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vget_high.c: New test.
>> * gcc.target/aarch64/advsimd-intrinsics/vget_low.c: New test.
>
> Do they pass for both BE and LE ?

Yes.
I tested on
aarch64_be-none-elf
aarch64-none-elf
using the Foundation Model, and on
arm-none-linux-gnueabihf
armeb-none-linux-gnueabihf
arm-none-linux-gnueabi
armeb-none-linux-gnueabi
arm-none-eabi
using qemu.

Christophe.


[patch] Fix mangling of ABI-tagged std::wstring

2014-11-14 Thread Jonathan Wakely

Jason approved this yesterday on IRC.

Tested powerpc64-linux, committed to trunk.
commit 9a8efcc0e2068abb51aaf513b4a7ed262454cd1e
Author: Jonathan Wakely 
Date:   Fri Nov 14 11:00:24 2014 +

gcc/cp:
	* mangle.c (find_substitution): Look for abi_tag on class templates.

gcc/testsuite:
	* g++.dg/abi/abi-tag11.C: New.

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 048c957..576ad1d 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -609,7 +609,7 @@ find_substitution (tree node)
 }
 
   tree tags = NULL_TREE;
-  if (OVERLOAD_TYPE_P (node))
+  if (OVERLOAD_TYPE_P (node) || DECL_CLASS_TEMPLATE_P (node))
 tags = lookup_attribute ("abi_tag", TYPE_ATTRIBUTES (type));
   /* Now check the list of available substitutions for this mangling
  operation.  */
diff --git a/gcc/testsuite/g++.dg/abi/abi-tag11.C b/gcc/testsuite/g++.dg/abi/abi-tag11.C
new file mode 100644
index 000..36c1c9f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/abi-tag11.C
@@ -0,0 +1,12 @@
+// { dg-final { scan-assembler "_Z1fSbB3fooIwSt11char_traitsIwESaIwEES3_" } }
+
+namespace std {
+  template  struct char_traits {};
+  template  struct allocator {};
+  template 
+  struct __attribute ((abi_tag ("foo"))) basic_string { };
+  typedef basic_string,allocator >
+wstring;
+}
+
+void f(std::wstring,std::wstring) {}


Re: [PATCH] Optimize ASAN_CHECK checks

2014-11-14 Thread Dodji Seketeli
Jakub Jelinek  writes:

>> I am not sure, but I am wondering if we shouldn't save the previous uid
>> of 'stmt' here before setting it, and then restore it before getting out
>> of this function.
>
> No, gimple uids are AFAIK undefined at the start of passes, passes that use
> them are supposed to initialize them before use (new statements created
> during the pass will get 0 there by default), and don't have to clean them
> up anyway at the end of pass.

Yeah, this is what I figured by grepping other passes, but I wasn't sure
:-)

Maybe I should follow up with a doc patch for the (otherwise very terse)
comment of gimple_set_uid and gimple_uid accessors.

Thanks.

-- 
Dodji


Re: [PATCH] Optimize ASAN_CHECK checks

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 01:06:52PM +0100, Dodji Seketeli wrote:
> Jakub Jelinek  writes:
> 
> >> I am not sure, but I am wondering if we shouldn't save the previous uid
> >> of 'stmt' here before setting it, and then restore it before getting out
> >> of this function.
> >
> > No, gimple uids are AFAIK undefined at the start of passes, passes that use
> > them are supposed to initialize them before use (new statements created
> > during the pass will get 0 there by default), and don't have to clean them
> > up anyway at the end of pass.
> 
> Yeah, this is what I figured by grepping other passes, but I wasn't sure
> :-)
> 
> Maybe I should follow up with a doc patch for the (otherwise very terse)
> comment of gimple_set_uid and gimple_uid accessors.

That would be indeed nice (similarly for other stuff that we expect to be
undefined on pass boundaries, or expect to be in certain state at pass
boundaries; in the former case set before uses, and don't care about what
state we leave it in, in the latter case can assume some state first (say 0)
and have to put it back into the same state).  There are various visited
flags and the like, Richard, any ideas what other things might be nice to
document?

Jakub


[changes.html] Document -fdiagnostics-color= default changes

2014-11-14 Thread Manuel López-Ibáñez
>
>  C family
>
> +The -fdiagnostics-color= option default is now
> +configurable at GCC configury time using
> +--with-diagnostics-color=, can default to
> +auto - the new default unless configured otherwise,
> +where diagnostics is colorized by default when emitted to terminal,
> +never, always or auto-if-env,
> +which is the default of GCC 4.9 - auto if non-empty
> +GCC_COLORS is in the environment, never
> +otherwise.  Note, as before, having empty GCC_COLORS
> +variable in the environment will always turn the coloring off, no
> +matter what the default is or what command line options are used.

This not only affects 'C family' but also Fortran now, so perhaps it
should go in the Caveats section at the top or on a "Building GCC"
section. Apart from that, do you think the following is a bit clearer?

The default setting of the -fdiagnostics-color= option is now
configurable https://gcc.gnu.org/install/configure.html";>when building GCC
using configuration option --with-diagnostics-color=. The
possible values are:
never, always, auto and
auto-if-env.
The new default auto means to use color only when the
standard error is a terminal.
The default in GCC 4.9 was auto-if-env, which defaults to
auto if there is a non-empty GCC_COLORS
environment
variable, and never otherwise. As in GCC 4.9, an empty
GCC_COLORS variable in the environment will always
disable colors, no
matter what the default is or what command line options are used.

Cheers,

Manuel.


Re: The nvptx port

2014-11-14 Thread Bernd Schmidt
I'm adding Thomas and Cesar to the Cc list, they may have more insight 
into CUDA library questions as I haven't really looked into that part 
all that much.


On 11/14/2014 12:39 PM, Jakub Jelinek wrote:

On Fri, Nov 14, 2014 at 12:09:03PM +0100, Bernd Schmidt wrote:

I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
to be wired up and uses the %alloca documented in the PTX
manual, what is the issue with that?  %alloca not being actually
implemented by the current PTX assembler or translator?


Yes, it's unimplemented. There's an internal declaration for it but that
seems to be as far as it goes, and that declaration is 32-bit only anyway.


:(.  Does NVidia plan to fix that in next version?


I very much doubt it. It was like this in CUDA 5.0 when we started 
working on it, and it's still like this in CUDA 6.5.



2) what is the reason why TLS isn't supported by the port (well,
__emutls is emitted, but I doubt pthread_[gs]etspecific is
implementable and thus it will not really do anything.
Can't the port just emit all DECL_THREAD_LOCAL_P variables
into .local instead of .global address space?


.local is stack frame memory, not TLS. The ptx docs mention the use of
.local at file-scope as occurring only in "legacy" ptx code and I get the
impression it's discouraged.


:(.  So what other option one has to implement something like TLS, even
using inline asm or similar?  There is %tid, so perhaps indexing some array
with %tid?


That ought to work. For performance you'd want that array in .shared 
memory but I believe that's limited in size.



BTW, one can still invoke OpenMP target regions (even OpenACC regions) from
multiple host threads, so the question is how without local TLS we can
actually do anything at all.  Sure, we can pass parameters to the kernel,
but we'd need to propagate it through all functions.  Or can
cudaGetParameterBuffer be used for that?


Presumably a kernel could copy its arguments out to memory somewhere 
when it's called?



4) I had a brief look at what it would take to port libgomp to PTX,
which is needed for OpenMP offloading.  OpenMP offloaded kernels
should start with 1 team and 1 thread in it, if we ignore
GOMP_teams for now, I think the major things are:
- right now libgomp is heavily pthread_* based, which is a no-go
  for nvptx I assume, I think we'll need some ifdefs in the sources


I haven't looked into whether libpthread is doable. I suspect it's a poor
match. I also haven't really looked into OpenMP, so I'm feeling a bit
uncertain about answering your further questions.


What OpenMP needs is essentially:
- some way to spawn multiple threads (fork-join model), where the parent
   thread is the first one among those other threads, or, if that isn't
   possible, the first thread pretends to be the same as the first thread
   and the parent thread sleeps
- something like pthread_mutex_lock/unlock (only basic; or say atomic ops + 
futex
   we use for Linux)
- something like sem_* semaphore
- and some TLS or something similar (pthread_[gs]etspecific etc.)


- the main thing is that I believe we just have to replace
  gomp_team_start for nvptx; seems there are
  cudaLaunchDevice (and cudaGetParameterBuffer) functions one can use
  to spawn selected kernel in selected number of threads (and teams),
  from the docs it isn't exactly clear what the calling thread will do,
  if it is suspended and the HW core given to it is reused by something
  else (e.g. one of the newly spawned threads), then I think it should
  be usable.  Not sure what happens with .local memory of the parent
  task, if the children all have different .local memory, then
  perhaps one could just copy over what is needed from the
  invoking to the first invoked thread at start.


I'm a bit confused here, it sounds as if you want to call cudaLaunchDevice
from ptx code? These are called from the host. As mentioned above, .local is
probably not useful for what you want.


In CUDA_Dynamic_Parallelism_Programming_Guide.pdf in C.3.2 it is mentioned
it should be possible, there is:
.extern .func(.param .b32 func_retval0) cudaLaunchDevice
(
.param .b64 func,
.param .b64 parameterBuffer,
.param .align 4 .b8 gridDimension[12],
.param .align 4 .b8 blockDimension[12],
.param .b32 sharedMemSize,
.param .b64 stream
)
;
(or s/.b64/.b32/ for -m32) that should be usable from within PTX.
The Liao-OpenMP-Accelerator-Model-2013.pdf paper also mentions using dynamic
parallelism (because all other variants are just bad for OpenMP, you'd need
to preallocate all the gangs/threads (without knowing how many you'll need),
and perhaps let them sleep on some barrier until you have work for them.


The latter would have been essentially the model I'd have tried to use 
(instead of sleeping, conditionalize on %tid==0). I didn't know there 
was a way to launch kernels from ptx code and haven't thought about

Re: [changes.html] Document -fdiagnostics-color= default changes

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 01:14:02PM +0100, Manuel López-Ibáñez wrote:
> >
> >  C family
> >
> > +The -fdiagnostics-color= option default is now
> > +configurable at GCC configury time using
> > +--with-diagnostics-color=, can default to
> > +auto - the new default unless configured otherwise,
> > +where diagnostics is colorized by default when emitted to terminal,
> > +never, always or auto-if-env,
> > +which is the default of GCC 4.9 - auto if non-empty
> > +GCC_COLORS is in the environment, never
> > +otherwise.  Note, as before, having empty GCC_COLORS
> > +variable in the environment will always turn the coloring off, no
> > +matter what the default is or what command line options are used.
> 
> This not only affects 'C family' but also Fortran now, so perhaps it
> should go in the Caveats section at the top or on a "Building GCC"

Well, it doesn't look like something for Caveats section IMHO, it is an
extension of the previous behavior.  The reason I put it into the C family
part is that that is where it was documented for 4.9, and for Fortran IMHO
gcc-5/changes.html just should document that now it supports diagnostics
colors like the C family of frontends.

> section. Apart from that, do you think the following is a bit clearer?
> 
> The default setting of the -fdiagnostics-color= option is now
> configurable  ="https://gcc.gnu.org/install/configure.html";>when building GCC
> using configuration option --with-diagnostics-color=. The
> possible values are:
> never, always, auto and
> auto-if-env.
> The new default auto means to use color only when the
> standard error is a terminal.
> The default in GCC 4.9 was auto-if-env, which defaults to
> auto if there is a non-empty GCC_COLORS
> environment
> variable, and never otherwise. As in GCC 4.9, an empty
> GCC_COLORS variable in the environment will always
> disable colors, no
> matter what the default is or what command line options are used.

Indeed, it does.  So feel free to turn that into patch form.

Jakub


[patch] Define new std::ios_base::failure with abi_tag("cxx11")

2014-11-14 Thread Jonathan Wakely

This adds system_error support to iostreams, including the required
base class changes to std::ios_base::failure. The abi_tag is used to
make it a distinct type.  This changes the type of I/O exceptions
thrown by the library but exceptions are very rarely used with
iostreams.

Tested powerpc64-linux and x86_64-linux
commit 8f8279579e72423450eb3ff744d9102f7b891d8d
Author: Jonathan Wakely 
Date:   Thu Nov 13 19:30:15 2014 +

Define C++11 version of std::ios_base::failure.

	* config/abi/pre/gnu.ver: Add new exports.
	* include/bits/ios_base.h (ios_base::failure): New definition using
	abi_tag.
	(io_errc, make_error_code, make_error_category, iostream_category):
	Define.
	* include/std/system_error (system_error): Add char* constructors.
	* src/c++11/Makefile.am: Add new file.
	* src/c++11/Makefile.in: Regenerate.
	* src/c++11/cxx11-ios_failure.cc: New file.
	* src/c++98/ios_failure.cc: Compile old definition without abi_tag.
	* testsuite/27_io/ios_base/failure/cxx11.cc: New.
	* testsuite/27_io/ios_base/failure/what-1.cc: Allow string returned by
	ios_base::failure::what() to contain additional data.
	* testsuite/27_io/ios_base/failure/what-2.cc: Likewise..
	* testsuite/27_io/ios_base/failure/what-3.cc: Likewise..
	* testsuite/27_io/ios_base/failure/what-big.cc: Likewise..

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index bd44bcc..78f3e77 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1473,6 +1473,18 @@ GLIBCXX_3.4.21 {
 # std::basic_ios::operator bool() const
 _ZNKSt9basic_iosI[cw]St11char_traitsI[cw]EEcvbEv;
 
+# C++11 version of std::ios_base::failure
+_ZNKSt8ios_base7failureB5cxx114whatEv;
+_ZNSt8ios_base7failureB5cxx11C[12]ERKSs;
+_ZNSt8ios_base7failureB5cxx11C[12]EPKcRKSt10error_code;
+_ZNSt8ios_base7failureB5cxx11C[12]ERKSsB5cxx11;
+_ZNSt8ios_base7failureB5cxx11C[12]ERKSsB5cxx11RKSt10error_code;
+_ZNSt8ios_base7failureB5cxx11D[012]Ev;
+_ZTINSt8ios_base7failureB5cxx11E;
+_ZTSNSt8ios_base7failureB5cxx11E;
+_ZTVNSt8ios_base7failureB5cxx11E;
+_ZSt17iostream_categoryv;
+
 # std::ctype_base::blank
 _ZNSt10ctype_base5blankE;
 
diff --git a/libstdc++-v3/include/bits/ios_base.h b/libstdc++-v3/include/bits/ios_base.h
index 5e33b81..8e60059 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -40,6 +40,12 @@
 #include 
 #include 
 
+#if __cplusplus < 201103L
+# include 
+#else
+# include 
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -186,6 +192,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_ios_seekdir_end = 1L << 16 
 };
 
+#if __cplusplus >= 201103L
+  /// I/O error code
+  enum class io_errc { stream = 1 };
+
+  template <> struct is_error_code_enum : public true_type { };
+
+  const error_category& iostream_category() noexcept;
+
+  inline error_code
+  make_error_code(io_errc e) noexcept
+  { return error_code(static_cast(e), iostream_category()); }
+
+  inline error_condition
+  make_error_condition(io_errc e) noexcept
+  { return error_condition(static_cast(e), iostream_category()); }
+#endif
+
   // 27.4.2  Class ios_base
   /**
*  @brief  The base of the I/O class hierarchy.
@@ -198,6 +221,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   */
   class ios_base
   {
+#if _GLIBCXX_USE_CXX11_ABI
+#if __cplusplus < 201103L
+// Type that is layout-compatible with std::system_error
+struct system_error : std::runtime_error
+{
+  // Type that is layout-compatible with std::error_code
+  struct error_code
+  {
+	error_code() { }
+  private:
+	int		_M_value;
+	const void*	_M_cat;
+  } _M_code;
+};
+#endif
+#endif
   public:
 
 /** 
@@ -206,6 +245,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  *
  *  27.4.2.1.1  Class ios_base::failure
  */
+#if _GLIBCXX_USE_CXX11_ABI
+class _GLIBCXX_ABI_TAG_CXX11 failure : public system_error
+{
+public:
+  explicit
+  failure(const string& __str);
+
+#if __cplusplus >= 201103L
+  explicit
+  failure(const string&, const error_code&);
+
+  explicit
+  failure(const char*, const error_code& = io_errc::stream);
+#endif
+
+  virtual
+  ~failure() throw();
+
+  virtual const char*
+  what() const throw();
+};
+#else
 class failure : public exception
 {
 public:
@@ -225,6 +286,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   string _M_msg;
 };
+#endif
 
 // 27.4.2.1.2  Type ios_base::fmtflags
 /**
diff --git a/libstdc++-v3/include/std/system_error b/libstdc++-v3/include/std/system_error
index 4ec83d7..ed17f55 100644
--- a/libstdc++-v3/include/std/system_error
+++ b/libstdc++-v3/include/std/system_error
@@ -321,16 +321,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 system_error(error_code __ec, const string& __what)
 : runtime_error(__

Re: [PATCH] Optimize ASAN_CHECK checks

2014-11-14 Thread Richard Biener
On Fri, 14 Nov 2014, Jakub Jelinek wrote:

> On Fri, Nov 14, 2014 at 01:06:52PM +0100, Dodji Seketeli wrote:
> > Jakub Jelinek  writes:
> > 
> > >> I am not sure, but I am wondering if we shouldn't save the previous uid
> > >> of 'stmt' here before setting it, and then restore it before getting out
> > >> of this function.
> > >
> > > No, gimple uids are AFAIK undefined at the start of passes, passes that 
> > > use
> > > them are supposed to initialize them before use (new statements created
> > > during the pass will get 0 there by default), and don't have to clean them
> > > up anyway at the end of pass.
> > 
> > Yeah, this is what I figured by grepping other passes, but I wasn't sure
> > :-)
> > 
> > Maybe I should follow up with a doc patch for the (otherwise very terse)
> > comment of gimple_set_uid and gimple_uid accessors.
> 
> That would be indeed nice (similarly for other stuff that we expect to be
> undefined on pass boundaries, or expect to be in certain state at pass
> boundaries; in the former case set before uses, and don't care about what
> state we leave it in, in the latter case can assume some state first (say 0)
> and have to put it back into the same state).  There are various visited
> flags and the like, Richard, any ideas what other things might be nice to
> document?

aux pointers on CFG structures which I believe have to be cleared
after each pass that uses them (maybe already documented).

Nothing else off the top of my head.

Richard.



[PATCH] Fix patch mangling with --inline option in mklog

2014-11-14 Thread Tom de Vries

Diego,

I noticed that a patch processed with mklog --inline got mangled.

In mklog, first we read the .diff file into array diff_lines.  Then, in the case 
of --inline, at the end we expect diff_lines still to contain the .diff file. 
That's not the case however, and that causes the mangling.


The patch fixes this by copying the diff_lines before processing, and using the 
copy at the end to reproduce the .diff file.


Committed as obvious.

Thanks,
- Tom
2014-11-14  Tom de Vries  

	* mklog: Move reading of .diff file up and add comment.  Copy diff_lines
	to orig_diff_lines.  Use orig_diff_lines when appending patch.
---
 contrib/mklog | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/contrib/mklog b/contrib/mklog
index 8412d38..840f6f8 100755
--- a/contrib/mklog
+++ b/contrib/mklog
@@ -132,15 +132,23 @@ sub is_top_level {
 	return $function && $function !~ /^[\s{]/;
 }
 
+# Read contents of .diff file
+open (DFILE, $diff) or die "Could not open file $diff for reading";
+chomp (my @diff_lines = );
+close (DFILE);
+
+# Array diff_lines is modified by the log generation, so save a copy in
+# orig_diff_lines if needed.
+if ($inline) {
+@orig_diff_lines = @diff_lines;
+}
+
 # For every file in the .diff print all the function names in ChangeLog
 # format.
 %cl_entries = ();
 $change_msg = undef;
 $look_for_funs = 0;
 $clname = get_clname('');
-open (DFILE, $diff) or die "Could not open file $diff for reading";
-chomp (my @diff_lines = );
-close (DFILE);
 $line_idx = 0;
 foreach (@diff_lines) {
 # Stop processing functions if we found a new file.
@@ -313,7 +321,7 @@ foreach my $clname (keys %cl_entries) {
 
 if ($inline) {
 	# Append the patch to the log
-	foreach (@diff_lines) {
+	foreach (@orig_diff_lines) {
 		print OUTPUTFILE "$_\n";
 	}
 }
-- 
1.9.1



Re: [PATCH 2/9] New template fibonacci_heap class introduced.

2014-11-14 Thread Florian Weimer

On 11/13/2014 09:06 PM, mliska wrote:

+   Insert: O(2) amortized. O(1) actual.


This does not make much sense.  Typo?

--
Florian Weimer / Red Hat Product Security


Re: [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR

2014-11-14 Thread Alan Lawrence
Ah, I didn't realize Loongson was little-endian only. In that case (with mid-end 
reductions-via-shifts changes pushed) I don't think I have actually broken 
anything, or at least, no MIPS platform that exists :).


However, yes, that would seem a safe bet (and simpler than my linked patch that 
provided a BE version too!).


Cheers, Alan

Matthew Fortune wrote:

(for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html,
although I have not been able to test this as there doesn't seem to be
any working MIPS/Loongson hardware in the Compile Farm;


I will post a patch to remove vec_shl and only support vec_shr for
little endian. This is on the basis that loongson only supports
little endian anyway.

I believe this is a safe thing to do regardless of whether your change
is in place or not.

Matthew






Re: [PATCH] Optimize ASAN_CHECK checks

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 01:16:36PM +0100, Richard Biener wrote:
> > That would be indeed nice (similarly for other stuff that we expect to be
> > undefined on pass boundaries, or expect to be in certain state at pass
> > boundaries; in the former case set before uses, and don't care about what
> > state we leave it in, in the latter case can assume some state first (say 0)
> > and have to put it back into the same state).  There are various visited
> > flags and the like, Richard, any ideas what other things might be nice to
> > document?
> 
> aux pointers on CFG structures which I believe have to be cleared
> after each pass that uses them (maybe already documented).
> 
> Nothing else off the top of my head.

There is at least gimple_plf GF_PLF_{1,2} too, gimple_visited_p, BB_VISITED, 
... 

Jakub


Re: The nvptx port

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 01:12:40PM +0100, Bernd Schmidt wrote:
> >:(.  So what other option one has to implement something like TLS, even
> >using inline asm or similar?  There is %tid, so perhaps indexing some array
> >with %tid?
> 
> That ought to work. For performance you'd want that array in .shared memory
> but I believe that's limited in size.

Any way to query those limits?  Size of .shared memory, number of threads in
warp, number of warps, etc.?  In OpenACC, are all workers in a single gang
the same warp?

> >BTW, one can still invoke OpenMP target regions (even OpenACC regions) from
> >multiple host threads, so the question is how without local TLS we can
> >actually do anything at all.  Sure, we can pass parameters to the kernel,
> >but we'd need to propagate it through all functions.  Or can
> >cudaGetParameterBuffer be used for that?
> 
> Presumably a kernel could copy its arguments out to memory somewhere when
> it's called?

The question is where.  If it is global memory, then how would you find out
what value is for your team and what value is for some other team?

> >>>- we'll need some synchronization primitives, I see atomic support is
> >>>  there, we need mutexes and semaphores I think, is that implementable
> >>>  using bar instruction?
> >>
> >>It's probably membar you need.
> >
> >That is a memory barrier, I need threads to wait on each other, wake up one
> >another etc.
> 
> Hmm. It's worthwhile to keep in mind that GPU threads really behave somewhat
> differently from CPUs (they don't really execute independently); the OMP
> model may just be a poor match for the architecture in general.
> One could busywait on a spinlock, but AFAIK there isn't really a way to put
> a thread to sleep. By not executing independently, I mean this: I believe if
> one thread in a warp is waiting on the spinlock, all the other ones are also
> busywaiting. There may be other effects that seem odd if one approaches it
> from a CPU perspective - for example you probably want only one thread in a
> warp to try to take the spinlock.

So, for a warp, if some threads perform one branch of an if and other
threads another one, all threads perform the first one first (with some
maybe not doing anything), then all the threads the others (again, other
threads not doing anything)?

As for the match, OpenMP isn't written for a particular accelerator, though
supposedly the addition of #pragma omp teams construct was done for NVidia.
So, some OpenMP code may be efficient on PTX, while other code might not be
that much (e.g. if all threads in a warp need to execute the same thing,
supposedly #pragma omp task isn't very good idea for the devices).

Jakub


[PATCH] Fix gimple_fold_stmt_to_constant regression

2014-11-14 Thread Richard Biener

Following up https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01233.html and
fixing the regressions this caused as soon as I removed the dispatch
to fold_unary (and more regressions it would have caused if I managed
to finish the idea to also remove the dispatches to fold_binary
and fold_ternary...) the following patch makes CCP and VRP follow
selected SSA edges again when gimple_fold_stmt_to_constant_1
dispatches to gimple_simplify.

The valueization for gimple_simplify of SSA propagator users may
both valueize to anything (in particular constants) and it may
signal to follow SSA edges if the destination will never be
visited again by the propagator (thus its lattice value is stable).
Esp. cutting out valueizing SSA names to constants is what caused
the regressions.

Note that this highlights the fact that overloading the valueization
result with the signal to (not) follow SSA edges isn't the very
best thing to do - for example we can't valueize to a SSA name
(like for looking through SSA copies) but at the same time say
that gimple_simplify shouldn't follow the edge to its definition.
This shouldn't be a serious limitation for CCP and VRP which
care about constants only - but it shows a defect in the
gimple_simplify interface.  I haven't yet concluded on a better
one though - options go from adding a secondary return to
the valueize hook to adding a second hook maybe with additionally
adding a simple flag to turn off SSA edge following globally.

Anyway - the following patch should fix the immediate regression
and allows to go forward with removing GENERIC folding from
both fold_stmt and gimple_fold_stmt_to_constant.  Just not
for this stage1 which will end too soon.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Thanks,
Richard.
 
2014-11-14  Richard Biener  

* gimple-fold.h (gimple_fold_stmt_to_constant_1): Add 2nd
valueization hook defaulted to no_follow_ssa_edges.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Pass
2nd valueization hook to gimple_simplify.
* tree-ssa-ccp.c (valueize_op_1): New function to be
used for gimple_simplify called via gimple_fold_stmt_to_constant_1.
(ccp_fold): Adjust.
* tree-vrp.c (vrp_valueize_1): New function to be
used for gimple_simplify called via gimple_fold_stmt_to_constant_1.
(vrp_visit_assignment_or_call): Adjust.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   
(svn+ssh://rgue...@gcc.gnu.org/svn/gcc/trunk/gcc/gimple-fold.c) (revision 
217545)
+++ gcc/gimple-fold.c   (.../gcc/gimple-fold.c) (working copy)
@@ -4467,7 +4411,8 @@ maybe_fold_or_comparisons (enum tree_cod
to avoid the indirect function call overhead.  */
 
 tree
-gimple_fold_stmt_to_constant_1 (gimple stmt, tree (*valueize) (tree))
+gimple_fold_stmt_to_constant_1 (gimple stmt, tree (*valueize) (tree),
+   tree (*gvalueize) (tree))
 {
   code_helper rcode;
   tree ops[3] = {};
@@ -4475,7 +4420,7 @@ gimple_fold_stmt_to_constant_1 (gimple s
  edges if there are intermediate VARYING defs.  For this reason
  do not follow SSA edges here even though SCCVN can technically
  just deal fine with that.  */
-  if (gimple_simplify (stmt, &rcode, ops, NULL, no_follow_ssa_edges)
+  if (gimple_simplify (stmt, &rcode, ops, NULL, gvalueize)
   && rcode.is_tree_code ()
   && (TREE_CODE_LENGTH ((tree_code) rcode) == 0
  || ((tree_code) rcode) == ADDR_EXPR)
Index: gcc/gimple-fold.h
===
--- gcc/gimple-fold.h   
(svn+ssh://rgue...@gcc.gnu.org/svn/gcc/trunk/gcc/gimple-fold.h) (revision 
217545)
+++ gcc/gimple-fold.h   (.../gcc/gimple-fold.h) (working copy)
@@ -36,7 +36,8 @@ extern bool arith_overflowed_p (enum tre
const_tree);
 extern tree no_follow_ssa_edges (tree);
 extern tree follow_single_use_edges (tree);
-extern tree gimple_fold_stmt_to_constant_1 (gimple, tree (*) (tree));
+extern tree gimple_fold_stmt_to_constant_1 (gimple, tree (*) (tree),
+   tree (*) (tree) = 
no_follow_ssa_edges);
 extern tree gimple_fold_stmt_to_constant (gimple, tree (*) (tree));
 extern tree fold_const_aggregate_ref_1 (tree, tree (*) (tree));
 extern tree fold_const_aggregate_ref (tree);
Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c  
(svn+ssh://rgue...@gcc.gnu.org/svn/gcc/trunk/gcc/tree-ssa-ccp.c)
(revision 217545)
+++ gcc/tree-ssa-ccp.c  (.../gcc/tree-ssa-ccp.c)(working copy)
@@ -1126,6 +1126,27 @@ valueize_op (tree op)
   return op;
 }
 
+/* Return the constant value for OP, but signal to not follow SSA
+   edges if the definition may be simulated again.  */
+
+static tree
+valueize_op_1 (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)
+{
+  tree tem = get_constant_value (op);
+  if (t

Re: The nvptx port

2014-11-14 Thread Bernd Schmidt

On 11/14/2014 01:36 PM, Jakub Jelinek wrote:

Any way to query those limits?  Size of .shared memory, number of threads in
warp, number of warps, etc.?


I'd have to google most of that. There seems to be a WARP_SZ constant 
available in ptx to get the size of the warp.



In OpenACC, are all workers in a single gang
the same warp?


No, warps are a relatively small size (32 threads).


So, for a warp, if some threads perform one branch of an if and other
threads another one, all threads perform the first one first (with some
maybe not doing anything), then all the threads the others (again, other
threads not doing anything)?


I believe that's what happens.


Bernd



Re: Add used_by_single_function flag for static variables

2014-11-14 Thread Bernhard Reutner-Fischer
On 14 November 2014 00:44, Jan Hubicka  wrote:
>> Honza,
>>
>> On 23 June 2014 06:24, Jan Hubicka  wrote:
>>
>> > --- lto-cgraph.c(revision 211881)
>> > +++ lto-cgraph.c(working copy)
>> > @@ -614,6 +614,7 @@ lto_output_varpool_node (struct lto_simp
>> >   /* in_other_partition.  */
>> >  }
>> >bp_pack_value (&bp, node->tls_model, 3);
>> > +  bp_pack_value (&bp, node->used_by_single_function, 1);
>> >streamer_write_bitpack (&bp);
>> >
>> >group = node->get_comdat_group ();
>> > @@ -1275,6 +1276,7 @@ input_varpool_node (struct lto_file_decl
>> >if (node->alias && !node->analyzed && node->weakref)
>> >  node->alias_target = get_alias_symbol (node->decl);
>> >node->tls_model = (enum tls_model)bp_unpack_value (&bp, 3);
>> > +  node->used_by_single_function = (enum tls_model)bp_unpack_value (&bp, 
>> > 1);
>> >group = read_identifier (ib);
>>
>> Let's please remove the (wrong) cast to tls_model for the
>> used_by_single_function bit.
>
> Yep, it is obiovus pasto :)
>
>> PS: lto-cgraph should seemingly be switched to use bp_unpack_enum(), no?
>
> Yes, in genral lto-cgraph needs a lot of cleanups (most of that code was
> written in early LTO days and needs a rewrite, it just never broke badly 
> enough
> to force it), I will try to schedule these early next stage 1.
>
>> PPS: input_ref() speculative setting should also remove the wrong enum
>> ipa_ref_use cast.
>> I better stop reading here ;)
> Hehe, just go ahead and keep me posted ;)

lto_output_node does not really like
bp_pack_enum (&bp, node_frequency, 2+1, node->frequency);
though. I'll try to have a look in the evening.

btw, i've seen that struct symtab_node has two huge gaps, 30bit and 4bytes
[initially meant to play around with a simple pahole-like plugin to
point those out].
So i started to play around in layout_struct to automatically reorder
member elts to fill
eventual gaps, just to see how/if offsetof and addressof and ada break
if i put a
2D packing step there.
But that raises the question if we have hit-rate data, perhaps in
profile-mode?, for struct
member-access yet? Would be nice to be able to weight the hotter
members higher, towards
the start of the struct. Even neglecting ABI concerns, I fear this
obvious idea is a bit
more time-consuming than i'd like it to be..


[match-and-simplify] Merge from trunk

2014-11-14 Thread Richard Biener

Brings back the last stage1 merge.  I'll followup on the trunk
status of match-and-simplify on monday - development will now
continue on the branch.

Richard.

2014-11-14  Richard Biener  

Merge from trunk r217546 through r217560.

Brings back next merge piece.


[PATCH] Add force option to find_best_rename_reg in regrename pass

2014-11-14 Thread Thomas Preud'homme
We are planning to introduce a new optimization in aarch64 backend, similar to 
the FP load balancing pass in the LLVM project [1]. This pass would be core 
specific and involve doing some register renaming. An RFC version of this patch 
should be posted later today. As part of this pass, we want to rename a 
register to any register following some specific constraints. We wanted to 
reuse the global (non static) find_best_rename_reg function as does the c6x 
backend but this function is a bit too specific to the register renaming pass.

[1] 
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64A57FPLoadBalancing.cpp?view=markup

It looks at register that respect the constraints of all the instructions in 
the set and tries to pick one in the preferred class for all the instructions 
involved. This is generally useful for any pass that wants to do register 
renaming. However it also contains some logic to only select the register that 
also haven't been used for a longer time than the register that should be 
replaced. This bit is specific to the register renaming pass and makes the 
function unusable for this new pass as a result which forces us to do a copy of 
the function.

This patch adds an extra parameter to skip this check and only consider the 
constraints and tries to pick a register in the preferred class.


ChangeLog entry is as follows:

2014-11-14  Thomas Preud'homme  

* regrename.c (find_best_rename_reg): Rename to ...
(find_rename_reg): This. Also add a parameter to skip tick check.
* regrename.h: Likewise.
* config/c6x/c6x.c: Adapt to above renaming.


diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
index 06319d0..6aca1e3 100644
--- a/gcc/config/c6x/c6x.c
+++ b/gcc/config/c6x/c6x.c
@@ -3513,7 +3513,8 @@ try_rename_operands (rtx_insn *head, rtx_insn *tail, 
unit_req_table reqs,
   COMPL_HARD_REG_SET (unavailable, reg_class_contents[(int) super_class]);
 
   old_reg = this_head->regno;
-  best_reg = find_best_rename_reg (this_head, super_class, &unavailable, 
old_reg);
+  best_reg = find_rename_reg (this_head, super_class, &unavailable, old_reg,
+ true);
 
   regrename_do_replace (this_head, best_reg);
 
diff --git a/gcc/regrename.h b/gcc/regrename.h
index 03b7164..05c78ad 100644
--- a/gcc/regrename.h
+++ b/gcc/regrename.h
@@ -89,8 +89,8 @@ extern void regrename_init (bool);
 extern void regrename_finish (void);
 extern void regrename_analyze (bitmap);
 extern du_head_p regrename_chain_from_id (unsigned int);
-extern int find_best_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *,
-int);
+extern int find_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *, int,
+   bool);
 extern void regrename_do_replace (du_head_p, int);
 
 #endif
diff --git a/gcc/regrename.c b/gcc/regrename.c


index 66f562b..5de7826 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -357,11 +357,13 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 /* For the chain THIS_HEAD, compute and return the best register to
rename to.  SUPER_CLASS is the superunion of register classes in
the chain.  UNAVAILABLE is a set of registers that cannot be used.
-   OLD_REG is the register currently used for the chain.  */
+   OLD_REG is the register currently used for the chain.  BEST_RENAME
+   controls whether the register chosen must be better than the
+   current one or just respect the given constraint.  */
 
 int
-find_best_rename_reg (du_head_p this_head, enum reg_class super_class,
- HARD_REG_SET *unavailable, int old_reg)
+find_rename_reg (du_head_p this_head, enum reg_class super_class,
+HARD_REG_SET *unavailable, int old_reg, bool best_rename)
 {
   bool has_preferred_class;
   enum reg_class preferred_class;
@@ -408,8 +410,13 @@ find_best_rename_reg (du_head_p this_head, enum reg_class 
super_class,
  && ((pass == 0
   && !TEST_HARD_REG_BIT (reg_class_contents[preferred_class],
  best_new_reg))
- || tick[best_new_reg] > tick[new_reg]))
-   best_new_reg = new_reg;
+ || !best_rename || tick[best_new_reg] > tick[new_reg]))
+   {
+ if (best_rename)
+   best_new_reg = new_reg;
+ else
+   return new_reg;
+   }
}
   if (pass == 0 && best_new_reg != old_reg)
break;
@@ -480,8 +487,8 @@ rename_chains (void)
   if (n_uses < 2)
continue;
 
-  best_new_reg = find_best_rename_reg (this_head, super_class,
-  &this_unavailable, reg);
+  best_new_reg = find_rename_reg (this_head, super_class,
+ &this_unavailable, reg, true);
 
   if (dump_file)
{

=== Testing ===

c6x backend is the only user beyond regrename itself to use this API.
* Therefore

Re: [C++ Patch] Add maybe_constant_folded_value

2014-11-14 Thread Paolo Carlini

Hi,

On 11/14/2014 03:35 AM, Jason Merrill wrote:

On 11/13/2014 04:31 PM, Paolo Carlini wrote:

I think this should be replaced with fold_ if
(processing_template_decl) in build_enumerator.

Ok. The value can be NULL_TREE, thus in a straightforward change (per
the below) I have to check for that, otherwise we crash in
maybe_constant_value. Either that or just check for NULL_TREE at the
beginning of maybe_constant_value itself, I guess.


The current fold_ already checks for NULL_TREE; I think we want to 
preserve that behavior.

Ok, I added it to the new fold_.



This is the relatively most tricky change: we have a regression for
init/array11.C, because the gcc_assert at the end of
maybe_constant_value (called by maybe_constant_init) triggers:

   gcc_assert (r == t
   || CONVERT_EXPR_P (t)
   || (TREE_CONSTANT (t) && !TREE_CONSTANT (r))
   || !cp_tree_equal (r, t));

we have VIEW_CONVERT_EXPRs, neither is constant, r != t and
cp_tree_equal is true. Wild guess: are VIEW_CONVERT_EXPRs also Ok?


Yes.

Ok.

What did you think about avoiding the duplicate 
instantiation_dependent_expression_p and potential_constant_expression 
checks?

Frankly at some point I forgot that, sorry.

Today I figured out the below: the new fold_non_dependent_expr is much 
bigger but definitely calls instantiation_dependent_expression_p and 
potential_constant_expression at most once and should be always 
logically equivalent to instantiate_non_dependent_expr_sfinae followed 
by maybe_constant_value, even in the special cases of those 
TREE_OVERFLOW_Ps. Note: among various other simplifications, I tried 
removing the early return via the conditional:


  if (type_unknown_p (t)
  || BRACE_ENCLOSED_INITIALIZER_P (t))

but it's actually used by eg, g++.dg/cpp0x/constexpr-initlist5.C.

I'm attaching what passed testing on x86_64-linux.

Thanks!
Paolo.

//
Index: cp/call.c
===
--- cp/call.c   (revision 217547)
+++ cp/call.c   (working copy)
@@ -572,7 +572,7 @@ null_ptr_cst_p (tree t)
 {
   /* Core issue 903 says only literal 0 is a null pointer constant.  */
   if (cxx_dialect < cxx11)
-   t = maybe_constant_value (fold_non_dependent_expr_sfinae (t, tf_none));
+   t = fold_non_dependent_expr (t);
   STRIP_NOPS (t);
   if (integer_zerop (t) && !TREE_OVERFLOW (t))
return true;
@@ -7437,8 +7437,8 @@ build_over_call (struct z_candidate *cand, int fla
 return error_mark_node;
 
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0
-  /* Don't mess with virtual lookup in fold_non_dependent_expr; virtual
-functions can't be constexpr.  */
+  /* Don't mess with virtual lookup in instantiate_non_dependent_expr;
+virtual functions can't be constexpr.  */
   && !in_template_function ())
 {
   tree t;
@@ -9361,7 +9361,7 @@ perform_implicit_conversion_flags (tree type, tree
 type of non-dependent expressions, so we do not have to
 perform the actual conversion.  But for initializers, we
 need to be able to perform it at instantiation
-(or fold_non_dependent_expr) time.  */
+(or instantiate_non_dependent_expr) time.  */
   expr = build1 (IMPLICIT_CONV_EXPR, type, expr);
   if (!(flags & LOOKUP_ONLYCONVERTING))
IMPLICIT_CONV_EXPR_DIRECT_INIT (expr) = true;
Index: cp/class.c
===
--- cp/class.c  (revision 217547)
+++ cp/class.c  (working copy)
@@ -359,9 +359,9 @@ build_base_path (enum tree_code code,
 
   /* Don't bother with the calculations inside sizeof; they'll ICE if the
  source type is incomplete and the pointer value doesn't matter.  In a
- template (even in fold_non_dependent_expr), we don't have vtables set
- up properly yet, and the value doesn't matter there either; we're just
- interested in the result of overload resolution.  */
+ template (even in instantiate_non_dependent_expr), we don't have vtables
+ set up properly yet, and the value doesn't matter there either; we're
+ just interested in the result of overload resolution.  */
   if (cp_unevaluated_operand != 0
   || in_template_function ())
 {
@@ -6933,7 +6933,8 @@ resolves_to_fixed_type_p (tree instance, int* nonn
   tree fixed;
 
   /* processing_template_decl can be false in a template if we're in
- fold_non_dependent_expr, but we still want to suppress this check.  */
+ instantiate_non_dependent_expr, but we still want to suppress
+ this check.  */
   if (in_template_function ())
 {
   /* In a template we only care about the type of the result.  */
Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 217547)
+++ cp/constexpr.c  (working copy)
@@ -2908,6 +2908,7 @@ maybe_constant_value (tree t, tree decl)
   /* cp_tree_equal 

Re: [C++ Patch] Add maybe_constant_folded_value

2014-11-14 Thread Jason Merrill

On 11/14/2014 08:46 AM, Paolo Carlini wrote:

+/* Like maybe_constant_value but first fully constant fold the argument.  */


"but first fully instantiate the argument."

OK with that change, thanks.

Jason



Re: [PATCH 2/4] New data structure for cgraph_summary introduced.

2014-11-14 Thread Martin Liška

On 11/13/2014 04:50 PM, Jan Hubicka wrote:

gcc/ChangeLog:

2014-11-12  Martin Liska  

* Makefile.in: New object file is added.
* cgraph.h (symbol_table::allocate_cgraph_symbol): Summary UID
is filled up.
* cgraph_summary.c: New file.
* cgraph_summary.h: New file.


Since I am trying to get rid of the cgraph prefixes for symbols (keep it for
the graph only) and the summaries can be annotated to variables too. Even if it
not necessarily supported by your current implementation, lets keep API
prepared for it. So I would call it symtab-summary.* for source files and
symtab_summary for base type  (probably function_summary for annotating
functions/cgraph_edge_summary for annotating edges?)


Hello.

I followed your remarks, new class is called function_summary and is located
in symbol-summary.h.




diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index e2becb9..588b6d5 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1225,6 +1225,8 @@ public:
int count_materialization_scale;
/* Unique id of the node.  */
int uid;
+  /* Summary unique id of the node.  */
+  int summary_uid;


What makes summary_uid better than uid?


Because cgraph_node::uid is not a unique ID, it's recycled. As I can see,
there are two remaining usages of the fact that cgraph::uid are quite 
consecutive:

a) node_growth_cache vector is resized according to cgraph_max_uid
b) lto-partition.c: lto_balanced_map

If we change ipa-related stuff to annotations and lto_balanced_map with be 
rewritten,
we can finally unify uid and summary_uid. As Martin correctly pointed out, we 
should
unify cgraph_node dumps, we combine uid and order.




diff --git a/gcc/cgraph_summary.c b/gcc/cgraph_summary.c
new file mode 100644
index 000..9af1d7e
--- /dev/null
+++ b/gcc/cgraph_summary.c


And why do we need this file?  It will need license header if really needed.


Sure, the file can be removed.

Martin



The implementation seems sane - I will check the actual uses :)
Please send the updated patch though.

Honza



>From d7c149edea20850e95fde2e2e332895f5b5a8594 Mon Sep 17 00:00:00 2001
From: mliska 
Date: Thu, 13 Nov 2014 15:11:05 +0100
Subject: [PATCH 1/3] New data structure for function_summary introduced.

gcc/ChangeLog:

2014-11-12  Martin Liska  

	* cgraph.h (symbol_table::allocate_cgraph_symbol): Summary UID
	is filled up.
	* symbol-summary.h: New file.
	* gengtype.c (open_base_files): Add symbol-summary.h.
	* toplev.c (general_init): Call constructor of symbol_table.
---
 gcc/cgraph.h |   8 ++
 gcc/gengtype.c   |   4 +-
 gcc/symbol-summary.h | 313 +++
 gcc/toplev.c |   3 +-
 4 files changed, 325 insertions(+), 3 deletions(-)
 create mode 100644 gcc/symbol-summary.h

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index e2becb9..588b6d5 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1225,6 +1225,8 @@ public:
   int count_materialization_scale;
   /* Unique id of the node.  */
   int uid;
+  /* Summary unique id of the node.  */
+  int summary_uid;
   /* ID assigned by the profiling.  */
   unsigned int profile_id;
   /* Time profiler: first run of function.  */
@@ -1786,6 +1788,10 @@ public:
   friend class cgraph_node;
   friend class cgraph_edge;
 
+  symbol_table (): cgraph_max_summary_uid (1)
+  {
+  }
+
   /* Initialize callgraph dump file.  */
   void initialize (void);
 
@@ -1982,6 +1988,7 @@ public:
 
   int cgraph_count;
   int cgraph_max_uid;
+  int cgraph_max_summary_uid;
 
   int edges_count;
   int edges_max_uid;
@@ -2310,6 +2317,7 @@ symbol_table::allocate_cgraph_symbol (void)
   node->uid = cgraph_max_uid++;
 }
 
+  node->summary_uid = cgraph_max_summary_uid++;
   return node;
 }
 
diff --git a/gcc/gengtype.c b/gcc/gengtype.c
index fac83ee..1e2db27 100644
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -1842,8 +1842,8 @@ open_base_files (void)
   "tree-ssa-loop-niter.h", "tree-into-ssa.h", "tree-dfa.h", 
   "tree-ssa.h", "reload.h", "cpp-id-data.h", "tree-chrec.h",
   "except.h", "output.h",  "cfgloop.h", "target.h", "lto-streamer.h",
-  "target-globals.h", "ipa-ref.h", "cgraph.h", "ipa-prop.h", 
-  "ipa-inline.h", "dwarf2out.h", NULL
+  "target-globals.h", "ipa-ref.h", "cgraph.h", "function-summary.h",
+  "ipa-prop.h", "ipa-inline.h", "dwarf2out.h", NULL
 };
 const char *const *ifp;
 outf_p gtype_desc_c;
diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
new file mode 100644
index 000..893f065
--- /dev/null
+++ b/gcc/symbol-summary.h
@@ -0,0 +1,313 @@
+/* Callgraph summary data structure.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Martin Liska
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be usef

Re: [PATCH 4/4] Data structure is used for inline_summary struct.

2014-11-14 Thread Martin Liška

On 11/13/2014 05:04 PM, Jan Hubicka wrote:

+  if (!inline_summary_summary)
+inline_summary_summary = (inline_summary_cgraph_summary *) 
inline_summary_cgraph_summary::create_ggc (symtab);


Hehe, this is funny naming scheme.
Peraps inline_summary_d and inline_summary_t for the data and type?


Hello.

I adopted suggested naming scheme.


-
-static void
-inline_node_duplication_hook (struct cgraph_node *src,
- struct cgraph_node *dst,
- ATTRIBUTE_UNUSED void *data)
+void
+inline_summary_cgraph_summary::duplication_hook (cgraph_node *src,
+ cgraph_node *dst,
+ inline_summary *,
+ inline_summary *info)


Becuase those are no longer "hooks" but virtual function, I guess we could call 
them
simply duplicate/insert/remove.


Agree with the change.



In a way I would like to see these to be methods of the underlying type rather 
than
virtual methods of the summary, becuase these are operations on the data 
themselves.
I was thinking to model these by specual constructor and copy constructor
(taking the extra node pointer parameters) and standard destructor.  I am not 
sure this
would be more understandable this way?


Motivation for this implementation is:
a) it's useful to have an access to cgraph_node that is associated with a sumary
b) with GTY, we cannot call destructors


-/* Need a typedef for inline_summary because of inline function
-   'inline_summary' below.  */
-typedef struct inline_summary inline_summary_t;
-extern GTY(()) vec *inline_summary_vec;
+class GTY((user)) inline_summary_cgraph_summary: public cgraph_summary 

+{
+public:
+  inline_summary_cgraph_summary (symbol_table *symtab, bool ggc):
+cgraph_summary  (symtab, ggc) {}
+
+  static inline_summary_cgraph_summary *create_ggc (symbol_table *symtab)
+  {
+inline_summary_cgraph_summary *summary = new (ggc_cleared_alloc 
 ()) inline_summary_cgraph_summary(symtab, true);
+summary->disable_insertion_hook ();
+return summary;
+  }
+
+
+  virtual void insertion_hook (cgraph_node *, inline_summary *);
+  virtual void removal_hook (cgraph_node *node, inline_summary *);
+  virtual void duplication_hook (cgraph_node *src, cgraph_node *dst, 
inline_summary *src_data, inline_summary *dst_data);
+};
+
+extern GTY(()) cgraph_summary  *inline_summary_summary;


All in all it looks better than original code.  If we moved insert/


  /* Information kept about parameter of call site.  */
  struct inline_param_summary
@@ -249,10 +265,10 @@ void clone_inlined_nodes (struct cgraph_edge *e, bool, 
bool, int *,
  extern int ncalls_inlined;
  extern int nfunctions_inlined;

-static inline struct inline_summary *
-inline_summary (struct cgraph_node *node)
+static inline inline_summary *
+get_inline_summary (const struct cgraph_node *node)
  {
-  return &(*inline_summary_vec)[node->uid];
+  return (*inline_summary_summary)[node->summary_uid];


Hmm, i guess there is no way to avoid the (*...)? Otherwise it would be cleaner
to use inline_summary[...] instead of get_inline_summary IMO.


I added function_summary::get method, where the usage looks cleaner:
inline_summary_d->get (node).

Thanks,
Martin
 

Thanks for working on this!
Honza



>From 6e8531d8d3659524e337c7c1d96596952c3ff0e8 Mon Sep 17 00:00:00 2001
From: mliska 
Date: Fri, 14 Nov 2014 14:54:12 +0100
Subject: [PATCH 3/3] Data structure is used for inline_summary struct.

gcc/ChangeLog:

2014-11-12  Martin Liska  

	* cgraphunit.c (symbol_table::process_new_functions):
	inline_summary_vec is replaced with inline_summary_t.
	* ipa-cp.c (ipcp_cloning_candidate_p): Usage of inline_summary_d::get.
	(devirtualization_time_bonus): Likewise.
	(estimate_local_effects): Likewise.
	(ipcp_propagate_stage): Likewise.
	* ipa-inline-analysis.c (evaluate_conditions_for_known_args): Likewise.
	(evaluate_properties_for_edge): Likewise.
	(inline_summary_alloc): Deletion of old hook holders.
	(reset_inline_summary): inline_summary is added as argument.
	(inline_summary_cgraph_summary::removal_hook): New function.
	(inline_summary_cgraph_summary::duplication_hook): Likewise.
	(dump_inline_edge_summary): Struct keyword removed.
	(dump_inline_summary): Likewise.
	(estimate_function_body_sizes): Usage of inline_summary_d::get.
	(compute_inline_parameters): Likewise.
	(estimate_edge_devirt_benefit): Struct keyword removed.
	(estimate_node_size_and_time): Likewise.
	(inline_update_callee_summaries): Likewise.
	(inline_merge_summary): Usage of inline_summary_d::get.
	(inline_update_overall_summary): Likewise.
	(simple_edge_hints): Likewise.
	(do_estimate_edge_time): Likewise.
	(estimate_time_after_inlining): Likewise.
	(estimate_size_after_inlining): Likewise.
	(do_estimate_growth): Likewise.
	(growth_likely_positive): Likewise.
	(inline_generate_summary): inline_summary_t is registered.
	(inline_read_section): Struct keyword removed.
	(inline_read_summary): Likewise.
	(inline

Re: libsanitizer merge from upstream r221802

2014-11-14 Thread Uros Bizjak
On Fri, Nov 14, 2014 at 12:31 PM, Uros Bizjak  wrote:

>> Here is one more merge of libsanitizer (last one was in Sept).
>>
>> Tested on x86_64 Ubuntu 14.04 like this:
>> rm -rf */{*/,}libsanitizer && make -j 50
>> make -j 40 -C gcc check-g{cc,++}
>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp' && \
>> make -j 40 -C gcc check-g{cc,++}
>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} tsan.exp' && \
>> make -j 40 -C gcc check
>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp' && \
>> echo PASS
>>
>> Expected ChangeLog entry:
>>
>> 2014-11-12  Kostya Serebryany  
>>
>>* All source files: Merge from upstream r221802.
>>* sanitizer_common/sanitizer_symbolizer_libbacktrace.cc
>>  (LibbacktraceSymbolizer::SymbolizeData): replace 'address'
>>  with 'start' to follow the new interface.
>>* asan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>* interception/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>* libbacktrace/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>* lsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>* sanitizer_common/Makefile.am (sanitizer_common_files): Added new
>>  files.
>>  (AM_CXXFLAGS): added -std=c++11.
>>* tsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>* ubsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
>>* asan/Makefile.in: Regenerate.
>>* interception/Makefile.in: Regenerate.
>>* libbacktrace/Makefile.in: Regenerate.
>>* lsan/Makefile.in: Regenerate.
>>* sanitizer_common/Makefile.in: Regenerate.
>>* tsan/Makefile.in: Regenerate.
>>* ubsan/Makefile.in: Regenerate.
>
> This patch breaks CENTOS5 build with:
>
> In file included from /usr/include/asm-x86_64/byteorder.h:30:0,
>  from /usr/include/asm/byteorder.h:5,
>  from /usr/include/linux/aio_abi.h:30,
>  from
> /home/uros/gcc-svn/trunk/libsanitizer/include/system/linux/aio_abi.h:2,
>  from
> ../../../../gcc-svn/trunk/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:49:
> /usr/include/linux/byteorder/little_endian.h:43:19: error: ‘__le64’
> does not name a type
>  static __inline__ __le64 __cpu_to_le64p(const __u64 *p)
>^
> /usr/include/linux/byteorder/little_endian.h:47:46: error: ‘__le64’
> does not name a type
>  static __inline__ __u64 __le64_to_cpup(const __le64 *p)
>   ^
> /usr/include/linux/byteorder/little_endian.h:67:19: error: ‘__be64’
> does not name a type
>  static __inline__ __be64 __cpu_to_be64p(const __u64 *p)
>^
> /usr/include/linux/byteorder/little_endian.h:71:46: error: ‘__be64’
> does not name a type
>  static __inline__ __u64 __be64_to_cpup(const __be64 *p)
>   ^
> gmake[4]: *** [sanitizer_platform_limits_linux.lo] Error 1
> gmake[4]: Leaving directory
> `/home/uros/gcc-build/x86_64-unknown-linux-gnu/libsanitizer/sanitizer_common'
> gmake[3]: *** [all-recursive] Error 1
> gmake[3]: Leaving directory
> `/home/uros/gcc-build/x86_64-unknown-linux-gnu/libsanitizer'
> gmake[2]: *** [all] Error 2
> gmake[2]: Leaving directory
> `/home/uros/gcc-build/x86_64-unknown-linux-gnu/libsanitizer'
> gmake[1]: *** [all-target-libsanitizer] Error 2

The missing definition in system's /usr/include/linux/types.h is protected with:

typedef __u16 __bitwise __le16;
typedef __u16 __bitwise __be16;
typedef __u32 __bitwise __le32;
typedef __u32 __bitwise __be32;
#if defined(__GNUC__) && !defined(__STRICT_ANSI__)
typedef __u64 __bitwise __le64;
typedef __u64 __bitwise __be64;
#endif

which doesn't work with -std=c++11, but works without problems with
-std=gnu++11.

As proposed by Jakub at [1], -std=gnu++11 fixes the problem with old
kernels. Attached patch implements this proposal.

2014-11-14  Uros Bizjak  

* sanitizer_common/Makefile.am (AM_CXXFLAGS): Use -std=gnu++11.
* asan/Makefile.am (AM_CXXFLAGS): Ditto.
* lsan/Makefile.am (AM_CXXFLAGS): Ditto.
* interception/Makefile.am (AM_CXXFLAGS): Ditto.
* tsan/Makefile.am (AM_CXXFLAGS): Ditto.
* libbacktrace/Makefile.am (AM_CXXFLAGS): Ditto.
* ubsan/Makefile.am (AM_CXXFLAGS): Ditto.
* sanitizer_common/Makefile.in: Regenerate.
* asan/Makefile.in: Ditto.
* lsan/Makefile.in: Ditto.
* interception/Makefile.in: Ditto.
* tsan/Makefile.in: Ditto.
* libbacktrace/Makefile.in: Ditto.
* ubsan/Makefile.in: Ditto.

Bootstrapped and regtested on CENTOS 5 x86_64-linux-gnu.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01420.html

Uros.
Index: sanitizer_common/Makefile.in
===
--- sanitizer_common/Makefile.in(revision 217538)
+++ sanitizer_common/Makefile.in(working copy)
@@ -253,8 +253,8 @@
 AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic \
-Wno-long-long

Re: [Patch, ARM]Fix pattern that is missed for Thumb-1 UAL

2014-11-14 Thread Ramana Radhakrishnan



On 14/11/14 10:02, Terry Guo wrote:

Hi there,

Attached patch intends to fix a pattern that is found still non-UAL when do
gcc thumb-1 bootstrap. A test case is reduced and attached. Tested with gcc
regression test on pre-v6 thumb1 and v6 thumb1. No regression. Multilib can
be built for both of them.
Is it OK to trunk?


This is OK.

Ramana



BR,
Terry

gcc/ChangeLog:
2014-11-14  Terry Guo  

  * config/arm/thumb1.md (*addsi3_cbranch_scratch): Updated to UAL
format.

gcc/testsuite/ChangeLog:
2014-11-14  Terry Guo  

  * gcc.target/arm/thumb1-ual-1.c: New test.



Re: libsanitizer merge from upstream r221802

2014-11-14 Thread Jakub Jelinek
On Fri, Nov 14, 2014 at 03:11:16PM +0100, Uros Bizjak wrote:
> The missing definition in system's /usr/include/linux/types.h is protected 
> with:
> 
> typedef __u16 __bitwise __le16;
> typedef __u16 __bitwise __be16;
> typedef __u32 __bitwise __le32;
> typedef __u32 __bitwise __be32;
> #if defined(__GNUC__) && !defined(__STRICT_ANSI__)
> typedef __u64 __bitwise __le64;
> typedef __u64 __bitwise __be64;
> #endif
> 
> which doesn't work with -std=c++11, but works without problems with
> -std=gnu++11.
> 
> As proposed by Jakub at [1], -std=gnu++11 fixes the problem with old
> kernels. Attached patch implements this proposal.
> 
> 2014-11-14  Uros Bizjak  
> 
> * sanitizer_common/Makefile.am (AM_CXXFLAGS): Use -std=gnu++11.
> * asan/Makefile.am (AM_CXXFLAGS): Ditto.
> * lsan/Makefile.am (AM_CXXFLAGS): Ditto.
> * interception/Makefile.am (AM_CXXFLAGS): Ditto.
> * tsan/Makefile.am (AM_CXXFLAGS): Ditto.
> * libbacktrace/Makefile.am (AM_CXXFLAGS): Ditto.
> * ubsan/Makefile.am (AM_CXXFLAGS): Ditto.
> * sanitizer_common/Makefile.in: Regenerate.
> * asan/Makefile.in: Ditto.
> * lsan/Makefile.in: Ditto.
> * interception/Makefile.in: Ditto.
> * tsan/Makefile.in: Ditto.
> * libbacktrace/Makefile.in: Ditto.
> * ubsan/Makefile.in: Ditto.

Ok, thanks.  Really no reason for pedantic checking.

Jakub


[PATCH][AArch64] Adjust generic move costs

2014-11-14 Thread Wilco Dijkstra
Hi,

This patch adjusts the generic move costs to better reflect the INT<->FP move 
costs used in the
various core specific cost tables.

The intention of these generic costs is that they provide reasonable 
performance across a range of
cores without unduly pessimizing any one specific core.

This adjustment is sufficient to prevent the register allocator inserting huge 
amounts of
unnecessary int<->FP moves. The GP2FP/FP2GP costs must be larger than the 
memory cost to avoid the
worst of this issue. In sha2 this replaces all 642 redundant fp moves with 35 
load and 7 store
spills.

OK for commit?

ChangeLog:
2014-11-14  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (generic_regmove_cost):
Increase FP move cost.

---
 gcc/config/aarch64/aarch64.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index de53c94..cd30724 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -218,8 +218,10 @@ __extension__
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   NAMED_PARAM (GP2GP, 1),
-  NAMED_PARAM (GP2FP, 2),
-  NAMED_PARAM (FP2GP, 2),
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+ their cost higher than memmov_cost.  */
+  NAMED_PARAM (GP2FP, 5),
+  NAMED_PARAM (FP2GP, 5),
   NAMED_PARAM (FP2FP, 2)
 };
 
-- 
1.9.1




Re: PATCH: Don't assume modern glibc for x86 Android targets

2014-11-14 Thread H.J. Lu
On Thu, Nov 13, 2014 at 09:32:07PM -0800, H.J. Lu wrote:
> For i[34567]86-*-linux* and x86_64-*-linux* targets, config.gcc assumes
> modern glibc and set default_gnu_indirect_function to yes.  That is wrong
> for i[34567]86-*-linux-android* and x86_64-*-linux-android* targets.
> This patch fixes it.  Tested on Linux/x86-64, Linux/ia32 and cross-teste
> for i686-linux-android and x86_64-linux-android.  OK for trunk?
> 
> Thanks.
> 
> 
> H.J.
> 
> 2014-11-13  H.J. Lu  
> 
>   * config.gcc (default_gnu_indirect_function): Don't assume
>   modern glibc for i[34567]86-*-linux* and x86_64-*-linux* when
>   targeting Android.
> 

Here is the updated patch to cover uclibc.  OK for trunk?

Thanks.

H.J.
---
2014-11-14  H.J. Lu  

* config.gcc (default_gnu_indirect_function): Set to yes
for i[34567]86-*-linux* and x86_64-*-linux* if not targeting
Android nor uclibc.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a6b37d8..a2c502e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1404,8 +1404,11 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | 
i[34567]86-*-knetbsd*-gnu | i
i[34567]86-*-linux*)
tm_file="${tm_file} linux.h linux-android.h"
extra_options="${extra_options} linux-android.opt"
-   # Assume modern glibc
-   default_gnu_indirect_function=yes
+   # Assume modern glibc if not targeting Android nor uclibc.
+   case ${target} in
+   *-*-*android*|*-*-*uclibc*) ;;
+   *) default_gnu_indirect_function=yes ;;
+   esac
if test x$enable_targets = xall; then
tm_file="${tm_file} i386/x86-64.h 
i386/gnu-user-common.h i386/gnu-user64.h i386/linux-common.h i386/linux64.h"
tm_defines="${tm_defines} TARGET_BI_ARCH=1"
@@ -1467,8 +1470,11 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu | 
x86_64-*-knetbsd*-gnu)
x86_64-*-linux*)
tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h 
i386/linux64.h"
extra_options="${extra_options} linux-android.opt"
-   # Assume modern glibc
-   default_gnu_indirect_function=yes
+   # Assume modern glibc if not targeting Android nor uclibc.
+   case ${target} in
+   *-*-*android*|*-*-*uclibc*) ;;
+   *) default_gnu_indirect_function=yes ;;
+   esac
;;
x86_64-*-kfreebsd*-gnu)
tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h"


Re: [PATCH] DW_AT_APPLE_* DWARF extensions.

2014-11-14 Thread Andrew Burgess
* Jakub Jelinek  [2014-11-13 14:13:42 +0100]:

> On Thu, Nov 13, 2014 at 01:21:21PM +0100, Andrew Burgess wrote:
> > I had a look around and couldn't find anything helpful.  The best I
> > can offer would be the current path within the llvm source code where
> > these are defined.  Would that be sufficient?
>
> That is not useful.  The point is not to suggest where those constants come
> from but what they mean, see e.g. the
> http://www.dwarfstd.org/ShowIssue.php?issue=100909.2&type=open
> http://gcc.gnu.org/wiki/TemplateParmsDwarf
> etc. links that describe what the extensions do.
> If LLVM doesn't have any documentation of their extensions, then just
> /* Apple extensions.  */
> is good enough.

I agree, hence my comment about not finding anything useful :)

I've look again, and still can't find anything suitable, so could we
go with just "/* Apple extensions.  */" then?

Patch & Changelog below.

Thanks,
Andrew

2014-11-14  Shinichiro Hamaji  

* dwarf2.def (DW_AT_APPLE_optimized, DW_AT_APPLE_flags)
(DW_AT_APPLE_isa, DW_AT_APPLE_block)
(DW_AT_APPLE_major_runtime_vers, DW_AT_APPLE_runtime_class)
(DW_AT_APPLE_omit_frame_ptr, DW_AT_APPLE_property_name)
(DW_AT_APPLE_property_getter, DW_AT_APPLE_property_setter)
(DW_AT_APPLE_property_attribute, DW_AT_APPLE_objc_complete_type)
(DW_AT_APPLE_property): New macros.


diff --git a/include/dwarf2.def b/include/dwarf2.def
index 71a37b3..878c5c6 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -406,6 +406,20 @@ DW_AT (DW_AT_upc_threads_scaled, 0x3210)
 DW_AT (DW_AT_PGI_lbase, 0x3a00)
 DW_AT (DW_AT_PGI_soffset, 0x3a01)
 DW_AT (DW_AT_PGI_lstride, 0x3a02)
+/* Apple extensions.  */
+DW_AT (DW_AT_APPLE_optimized, 0x3fe1)
+DW_AT (DW_AT_APPLE_flags, 0x3fe2)
+DW_AT (DW_AT_APPLE_isa, 0x3fe3)
+DW_AT (DW_AT_APPLE_block, 0x3fe4)
+DW_AT (DW_AT_APPLE_major_runtime_vers, 0x3fe5)
+DW_AT (DW_AT_APPLE_runtime_class, 0x3fe6)
+DW_AT (DW_AT_APPLE_omit_frame_ptr, 0x3fe7)
+DW_AT (DW_AT_APPLE_property_name, 0x3fe8)
+DW_AT (DW_AT_APPLE_property_getter, 0x3fe9)
+DW_AT (DW_AT_APPLE_property_setter, 0x3fea)
+DW_AT (DW_AT_APPLE_property_attribute, 0x3feb)
+DW_AT (DW_AT_APPLE_objc_complete_type, 0x3fec)
+DW_AT (DW_AT_APPLE_property, 0x3fed)
 DW_END_AT
 
 DW_FIRST_OP (DW_OP_addr, 0x03)


Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler

2014-11-14 Thread Maxim Kuvyrkov
On Nov 14, 2014, at 4:57 AM, Vladimir Makarov  wrote:

> On 2014-10-21 12:06 AM, Maxim Kuvyrkov wrote:
> 
...
> I'd prefer symbolic constants for dont_delay.  Also the address can contains 
> other parts, e.g. index for some targets.  It is not necessary to change the 
> code but a comment would be nice that right now it is oriented for machine 
> with base+disp only addressing.
> 
> Although it is probably matter of taste.  So you are free to commit it 
> without any change.

I'll add an enum with symbolic constants for dont_delay and a comment about 
handled memory address types.

Thanks for the review!

--
Maxim Kuvyrkov
www.linaro.org



  1   2   3   >