date:20151210

On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> >
> >In addition to that, how about we split up gcc/omp-low.c into several
> >files?  Would it make sense (I have not yet looked in detail) to do so
> >along the borders of the several passes defined therein?  Or, can you
> >tell already that there would be too many cross-references between the
> >several files to make this infeasible?
> 
> It would be nice to get rid of all the code duplication in that file. That
> alone could reduce the size by quite a bit, and hopefully make it easier to
> read.

What exact code duplication do you mean?

> I suspect a split along the ompexp/omplow boundary would be quite easy to
> achieve.

Yeah, that might be the possible splitting boundary (have omp-low.c,
omp-exp.c).

> >I'd suggest to do this shortly before GCC 6 is released, so that
> >backports from trunk to gcc-6-branch will be easy.  (I assume we don't
> >have to care for gcc-5-branch backports too much any longer.)
> 
> I'll declare myself agnostic as to whether such a change is appropriate for
> gcc-6 at this stage. I guess it kind of depends on the specifics.

Certainly.  On one side I'd say it is too late now in stage3, on the other
side when would be better time to do that, during stage1 people will have
more likely out of the tree branches with more changes (I'm aware we even
now have the HSA, OpenMP -> PTX and OpenACC branches).

So, if somebody wants to try that, we can see if the result would be
appropriate.

Jakub

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-10 Thread Alessandro Fanfarillo

2015-12-09 23:16 GMT+01:00 Tobias Burnus :
> Thanks. Committed as r231476.

Thanks.

>
> Do we need to do anything about GCC 5 or is this only a GCC 6 issue?
>

Yes, the patch should be applied to GCC 5 too.

> That can be changed: Simply fill out the form and list me (burnus (at]
> gcc.gnu.org) as sponsor: https://sourceware.org/cgi-bin/pdw/ps_form.cgi –
> and see https://gcc.gnu.org/svnwrite.html

Done. Thanks.

>
> Tobias

[committed, obvious] Remove invalid assert in find_func_aliases_for_builtin_call

2015-12-10 Thread Tom de Vries

[ was : Re: [PATCH, PR68716] Fix GOMP/GOACC_parallel handling in 
find_func_clobbers ]


On 09/12/15 11:01, Tom de Vries wrote:

[ was: Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta ]

On 30/11/15 13:11, Tom de Vries wrote:

On 30/11/15 10:16, Richard Biener wrote:

On Mon, 30 Nov 2015, Tom de Vries wrote:


Hi,

this patch fixes PR46032.

It handles a call:
...
   __builtin_GOMP_parallel (fn, data, num_threads, flags)
...
as:
...
   fn (data)
...
in ipa-pta.

This improves ipa-pta alias analysis in the parallelized function fn,
and
allows vectorization in the testcase without a runtime alias test.

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?


+ /* Assign the passed argument to the appropriate incoming
+parameter of the function.  */
+ struct constraint_expr lhs ;
+ lhs = get_function_part_constraint (fi, fi_parm_base + 0);
+ auto_vec rhsc;
+ struct constraint_expr *rhsp;
+ get_constraint_for_rhs (arg, &rhsc);
+ while (rhsc.length () != 0)
+   {
+ rhsp = &rhsc.last ();
+ process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.pop ();
+   }

please use style used elsewhere with

  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
process_constraint (new_constraint (lhs, *rhsp));
  rhsc.truncate (0);



That code was copied from find_func_aliases_for_call.
I've factored out the bit that I copied as
find_func_aliases_for_call_arg, and fixed the style there (and dropped
'rhsc.truncate (0)' since AFAIU it's redundant at the end of a function).


+ /* Parameter passed by value is used.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ struct constraint_expr *rhsp;
+ get_constraint_for_address_of (arg, &rhsc);

This isn't correct - you want to use get_constraint_for (arg, &rhsc).
After all rhs is already an ADDR_EXPR.



Can we add an assert somewhere to detect this incorrect usage?


+ FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.truncate (0);
+
+ /* The caller clobbers what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_clobbers);
+ rhs = get_function_part_constraint (cfi, fi_clobbers);
+ process_constraint (new_constraint (lhs, rhs));
+
+ /* The caller uses what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ rhs = get_function_part_constraint (cfi, fi_uses);
+ process_constraint (new_constraint (lhs, rhs));

I don't see why you need those.  The solver should compute these
in even better precision (context sensitive on the call side).

The same is true for the function parameter.  That is, the only
needed part of the patch should be that making sure we see
the "direct" call and assign parameters correctly.



Dropped this bit.


Dropping the find_func_clobbers bit (in particular, the part copying the
clobbers) caused PR68716.

Basically the find_func_clobbers bit tries to emulate the generic
handling of calls in find_func_clobbers, but pretends the call is
   fn (data)
when handling
   __builtin_GOMP_parallel (fn, data, num_threads, flags)
.

I used the same comments as in the generic call handling to make it
clear what part of the generic call handling is emulated.

Compared to the originally posted patch, the changes are:
- removed 'gcc_assert (TREE_CODE (arg) == ADDR_EXPR)'. Ran into a case
   where arg is NULL.


And for that same reason, I've removed the same assert from 
find_func_aliases_for_builtin_call.


Committed to trunk as obvious.

Thanks,
- Tom
Remove invalid assert in find_func_aliases_for_builtin_call

2015-12-10  Tom de Vries  

	* tree-ssa-structalias.c (find_func_aliases_for_builtin_call): Remove
	invalid assert.

---
 gcc/tree-ssa-structalias.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 15e351e..c350862 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4533,7 +4533,6 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
 	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	  tree fndecl = TREE_OPERAND (fnarg, 0);
 	  tree arg = gimple_call_arg (t, argpos);
-	  gcc_assert (TREE_CODE (arg) == ADDR_EXPR);
 
 	  varinfo_t fi = get_vi_for_tree (fndecl);
 	  find_func_aliases_for_call_arg (fi, 0, arg);

[PATCH] Fix PR68721


This fixes a wrong-code case after my IPA split fix (with PR68672 still
unfixed).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-12-10  Richard Biener  

PR ipa/68721
* ipa-split.c (split_function): Record return value properly
when the split part doesn't set it.

* gcc.dg/torture/pr68721.c: New testcase.

Index: gcc/ipa-split.c
===
--- gcc/ipa-split.c (revision 231396)
+++ gcc/ipa-split.c (working copy)
@@ -1281,7 +1281,7 @@ split_function (basic_block return_bb, s
  to return void instead of just outputting function with undefined return
  value.  For structures this affects quality of codegen.  */
   else if (!split_point->split_part_set_retval
-   && find_retval (return_bb))
+   && (retval = find_retval (return_bb)))
 {
   bool redirected = true;
   basic_block new_return_bb = create_basic_block (NULL, 0, return_bb);
@@ -1305,6 +1305,7 @@ split_function (basic_block return_bb, s
   e->count = new_return_bb->count;
   add_bb_to_loop (new_return_bb, current_loops->tree_root);
   bitmap_set_bit (split_point->split_bbs, new_return_bb->index);
+  retbnd = find_retbnd (return_bb);
 }
   /* When we pass around the value, use existing return block.  */
   else
Index: gcc/testsuite/gcc.dg/torture/pr68721.c
===
--- gcc/testsuite/gcc.dg/torture/pr68721.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68721.c  (working copy)
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+int a, b, c, *d, **e = &d;
+
+int *
+fn1 ()
+{
+  for (;;)
+{
+  for (; a;)
+   if (b)
+ abort ();
+  break;
+}
+  for (; c;)
+;
+  return &a;
+}
+
+int
+main ()
+{
+  *e = fn1 ();
+
+  if (!d)
+abort ();
+
+  return 0;
+}

[PATCH] Make tree if-conversion honor PARAM_ALLOW_STORE_DATA_RACES


The following makes us honor PARAM_ALLOW_STORE_DATA_RACES instead of
making -ftree-loop-if-convert-stores an if-conversion specific "alias"
of that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

I've also verified it passes SPEC CPU 2006 on x86_64.

Richard.

2015-12-10  Richard Biener  

* tree-if-conv.c: Include params.h.
(ifcvt_memrefs_wont_trap): Use PARAM_ALLOW_STORE_DATA_RACES
instead of flag_tree_loop_if_convert_stores to guard cases
we'd introduce store-data-races.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 231493)
+++ gcc/tree-if-conv.c  (working copy)
@@ -112,6 +112,7 @@ along with GCC; see the file COPYING3.
 #include "tree-hash-traits.h"
 #include "varasm.h"
 #include "builtins.h"
+#include "params.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
@@ -714,7 +715,7 @@ ifcvt_memrefs_wont_trap (gimple *stmt, v
  to unconditionally.  */
   if (base_master_dr
  && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
-   return flag_tree_loop_if_convert_stores;
+   return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
   else
{
  /* or the base is know to be not readonly.  */
@@ -722,7 +723,7 @@ ifcvt_memrefs_wont_trap (gimple *stmt, v
  if (DECL_P (base_tree)
  && decl_binds_to_current_def_p (base_tree)
  && ! TREE_READONLY (base_tree))
-   return flag_tree_loop_if_convert_stores;
+   return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
}
 }
   return false;

[PATCH] Fix wrong-code with IPA PTA


This fixes a missing constraint for by-reference DECL_RESULT in
nonlocal_p mode.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-10  Richard Biener  

* tree-ssa-structalias.c (create_function_info_for): Add missing
constraint from nonlocal for DECL_RESULT.

Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 231494)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -5631,6 +5630,11 @@ create_function_info_for (tree decl, con
   if (DECL_RESULT (decl))
insert_vi_for_tree (DECL_RESULT (decl), resultvi);
 
+  if (nonlocal_p
+ && DECL_RESULT (decl)
+ && DECL_BY_REFERENCE (DECL_RESULT (decl)))
+   make_constraint_from (resultvi, nonlocal_id);
+
   gcc_assert (prev_vi->offset < resultvi->offset);
   prev_vi->next = resultvi->id;
   prev_vi = resultvi;

[PATCH] Fix IPA PTA vs. PTA regressions


The following patch takes a stab at fixing the most obvious regression
of IPA PTA vs. PTA - dumbing down of pt_solution_includes_global
with

-   /* ???  This predicate is not correct for the IPA-PTA solution
-  as we do not properly distinguish between unit escape points
-  and global variables.  */
-   if (cfun->gimple_df->ipa_pta)
- return true;

by making the semantics of pt_solution->vars_contains_nonlocal
(or what composes the NONLOCAL solution) not depend on
the PTA mode (IPA or not).

Optimization-wise this fixes missing DSE and overly conservative
IPA pure-const discovery (only in its less important late invocation).

Note that using IPA PTA still causes optimization regressions because
it does not handle restrict nor the heapvar escape trick and its
escape logic is still overly conservative (probably the biggest issue).

Anyway - the following patch passed bootstrap with -fipa-pta and
regtest on x86_64-unknown-linux-gnu.

Richard.

2015-12-10  Richard Biener  

PR ipa/68331
* tree-ssa-structalias.c (set_uids_in_ptset): Add fndecl
parameter and make vars_contains_nonlocal properly have
function-scope semantics in IPA mode.
(find_what_var_points_to): Add fndecl parameter.
(find_what_p_points_to): Likewise.
(pt_solution_includes_global): Remove IPA PTA early out.
(compute_points_to_sets): Adjust.
(ipa_pta_execute): Likewise.  Clear final_solutions after
each function.

* gcc.dg/torture/ipa-pta-3.c: New testcase.
* g++.dg/ipa/ipa-pta-1.C: Likewise.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 231357)
--- gcc/tree-ssa-structalias.c  (working copy)
***
*** 159,167 
  
 The is_global_var bit which marks escape points is overly conservative
 in IPA mode.  Split it to is_escape_point and is_global_var - only
!externally visible globals are escape points in IPA mode.  This is
!also needed to fix the pt_solution_includes_global predicate
!(and thus ptr_deref_may_alias_global_p).
  
 The way we introduce DECL_PT_UID to avoid fixing up all points-to
 sets in the translation unit when we copy a DECL during inlining
--- 159,165 
  
 The is_global_var bit which marks escape points is overly conservative
 in IPA mode.  Split it to is_escape_point and is_global_var - only
!externally visible globals are escape points in IPA mode.
  
 The way we introduce DECL_PT_UID to avoid fixing up all points-to
 sets in the translation unit when we copy a DECL during inlining
***
*** 186,191 
--- 184,190 
 propagating it simply like the clobber / uses solutions.  The
 solution can go alongside the non-IPA espaced solution and be
 used to query which vars escape the unit through a function.
+This is also required to make the escaped-HEAP trick work in IPA mode.
  
 We never put function decls in points-to sets so we do not
 keep the set of called functions for indirect calls.
*** shared_bitmap_add (bitmap pt_vars)
*** 6101,6107 
  /* Set bits in INTO corresponding to the variable uids in solution set FROM.  
*/
  
  static void
! set_uids_in_ptset (bitmap into, bitmap from, struct pt_solution *pt)
  {
unsigned int i;
bitmap_iterator bi;
--- 6100,6107 
  /* Set bits in INTO corresponding to the variable uids in solution set FROM.  
*/
  
  static void
! set_uids_in_ptset (bitmap into, bitmap from, struct pt_solution *pt,
!  tree fndecl)
  {
unsigned int i;
bitmap_iterator bi;
*** set_uids_in_ptset (bitmap into, bitmap f
*** 6139,6145 
  /* Add the decl to the points-to set.  Note that the points-to
 set contains global variables.  */
  bitmap_set_bit (into, DECL_PT_UID (vi->decl));
! if (vi->is_global_var)
pt->vars_contains_nonlocal = true;
}
  }
--- 6139,6157 
  /* Add the decl to the points-to set.  Note that the points-to
 set contains global variables.  */
  bitmap_set_bit (into, DECL_PT_UID (vi->decl));
! if (vi->is_global_var
! /* In IPA mode the escaped_heap trick doesn't work as
!ESCAPED is escaped from the unit but
!pt_solution_includes_global needs to answer true for
!all variables not automatic within a function.
!For the same reason is_global_var is not the
!correct flag to track - local variables from other
!functions also need to be considered global.
!Conveniently all HEAP vars are not put in function
!scope.  */
! || (in_ipa_mode
! && fndecl
! && ! auto_var_in_fn_p (vi->decl, fndecl)))
pt->vars_contains_nonlocal = true;
}
  }
***

Re: [PATCH] Fix up fold_ctor_reference and fully_constant_vn_reference_p (PR tree-optimization/68785)

On Wed, 9 Dec 2015, Jakub Jelinek wrote:

> Hi!
> 
> On a testcase like below which would trigger UB at runtime we trigger
> UB in the compiler, by reading uninitialized bytes.
> 
> The VCE folding for which native_{encode,interpret}_expr has been originally
> written passes the length from the first one to the second one, so that
> the latter can return NULL_TREE (not fold) if not enough bytes in the buffer
> were filled.  I believe this is the shortest fix for this issue and makes
> the code consistent with what is used in VCE folding.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2015-12-09  Jakub Jelinek  
> 
>   PR tree-optimization/68785
>   * gimple-fold.c (fold_ctor_reference): Pass return value from
>   native_encode_expr to native_interpret_expr.
>   * tree-ssa-sccvn.c (fully_constant_vn_reference_p): Likewise.
> 
>   * gcc.dg/pr68785.c: New test.
> 
> --- gcc/gimple-fold.c.jj  2015-11-24 11:43:35.0 +0100
> +++ gcc/gimple-fold.c 2015-12-09 10:48:06.824975709 +0100
> @@ -5495,9 +5495,10 @@ fold_ctor_reference (tree type, tree cto
>&& size <= MAX_BITSIZE_MODE_ANY_MODE)
>  {
>unsigned char buf[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT];
> -  if (native_encode_expr (ctor, buf, size / BITS_PER_UNIT,
> -   offset / BITS_PER_UNIT) > 0)
> - return native_interpret_expr (type, buf, size / BITS_PER_UNIT);
> +  int len = native_encode_expr (ctor, buf, size / BITS_PER_UNIT,
> + offset / BITS_PER_UNIT);
> +  if (len > 0)
> + return native_interpret_expr (type, buf, len);
>  }
>if (TREE_CODE (ctor) == CONSTRUCTOR)
>  {
> --- gcc/tree-ssa-sccvn.c.jj   2015-12-04 17:19:12.0 +0100
> +++ gcc/tree-ssa-sccvn.c  2015-12-09 10:50:30.329960789 +0100
> @@ -1370,8 +1370,9 @@ fully_constant_vn_reference_p (vn_refere
> else
>   {
> unsigned char buf[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT];
> -   if (native_encode_expr (ctor, buf, size, off) > 0)
> - return native_interpret_expr (ref->type, buf, size);
> +   int len = native_encode_expr (ctor, buf, size, off);
> +   if (len > 0)
> + return native_interpret_expr (ref->type, buf, len);
>   }
>   }
>  }
> --- gcc/testsuite/gcc.dg/pr68785.c.jj 2015-12-09 10:52:00.232698487 +0100
> +++ gcc/testsuite/gcc.dg/pr68785.c2015-12-09 10:50:54.0 +0100
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/68785 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int
> +foo (void)
> +{
> +  return *(int *) "";
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching



On 10/12/15 09:26, Christian Bruel wrote:

Hi Kyrill,

On 12/09/2015 06:32 PM, Kyrill Tkachov wrote:

Hi Christian,

On 08/12/15 12:53, Christian Bruel wrote:

Hi,

The order of the NEON builtins construction has led to complications since the 
attribute target support. This was not a problem when driven from the command 
line, but was causing various issues when the builtins was mixed between fpu
configurations or when used with LTO.

Firstly the builtin functions was not initialized before the parsing of 
functions, leading to wrong type initializations.

Then error catching code when a builtin was used without the proper fpu flags 
was incomprehensible for the user, for instance

#include "arm_neon.h"

int8x8_t a, b;
int16x8_t e;

void
main()
{
   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
}

compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages 
of

/arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name 
'__simd64_int8_t'
  typedef __simd64_int8_t int8x8_t;
...
...
arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka 
__vector(4) int}' to type 'int' which has different size
return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) 
__b, __c);
^~
...
... and one for each arm_neon.h lines..

by postponing the check into arm_expand_builtin, we now emit something more 
useful:

testo.c: In function 'main':
testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported 
in this configuration.
e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
^~~

One small side effect to note: The total memory allocated is 370k bigger when 
neon is not used, so this support will have a follow-up to make their 
initialization lazy. But I'd like first to stabilize the stuff for stage3 (or 
get it
pre-approved if the memory is an issue)

tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
(a few tests that was fail are now unsupported)



I agree, the vector types (re)initialisation is a tricky part.
I've seen similar issues in the aarch64 work for target attributes

   bool
   arm_vector_mode_supported_p (machine_mode mode)
   {
-  /* Neon also supports V2SImode, etc. listed in the clause below.  */
-  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
 || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-  || mode == V2DImode || mode == V8HFmode))
-return true;
-
-  if ((TARGET_NEON || TARGET_IWMMXT)
-  && ((mode == V2SImode)
-  || (mode == V4HImode)
-  || (mode == V8QImode)))
+  || mode == V2DImode || mode == V8HFmode
+  || mode == V2SImode || mode == V4HImode || mode == V8QImode)
   return true;


So this allows vector modes unconditionally for all targets/fpu configurations?
I was tempted to do that in aarch64 when I was encountering similar issues.
In the end what worked for me was re-laying out the vector types in 
SET_CURRENT_FUNCTION
if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)


yes my assumption was that arm_init_neon_builtins () is now called for all targets, since the check is done at expand time and that the builtins need to be known by lto, with the vector type initialization, before they are expanded. 
However at that time, lto streaming-in have not yet processed the attributes and TARGET_NEON is not set for the function.


I had a look at your re-layout, but I'm not sure. it feels like a hack. I think this should be solved first place during the builtin construction. Also set_current_function is too late, builtin_expand that will explode because of the 
unknown modes.


But raise the point. In fact I was not really happy with this 
arm_vector_mode_supported_p neither as I was not sure about other contexts it 
can be called from and I cannot clearly claim that this change is always 
correct.



So the main usage of targetm.vector_mode_supported_p is in stor-layout.c and 
vector_type_mode in particular seems
to have a relevant comment:
 /* Vector types need to re-check the target flags each time we report
the machine mode.  We need to do this because attribute target can
change the result of vector_mode_supported_p and have_regs_of_mode
on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
change on a per-function basis.  */

I think that implies that it expects targetm.vector_mode_supported_p to reject 
vector modes in
contexts that don't support NEON...


I'd like to think about other way to set the vector modes from 
arm_init_neon_builtins before the target flags are known. I'm thinking about 
the lazy initialization at expand time, or using a contextual boolean flags. 
how does that sound ?



Laying out the vector types during arm_init_neon_builtins sounds more promising 
to me.
Changing layout of types during expand is risky, from

Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching



On 10/12/15 10:11, Christian Bruel wrote:



On 12/10/2015 10:59 AM, Kyrill Tkachov wrote:


On 10/12/15 09:26, Christian Bruel wrote:

Hi Kyrill,

On 12/09/2015 06:32 PM, Kyrill Tkachov wrote:

Hi Christian,

On 08/12/15 12:53, Christian Bruel wrote:

Hi,

The order of the NEON builtins construction has led to complications since the 
attribute target support. This was not a problem when driven from the command 
line, but was causing various issues when the builtins was mixed between fpu
configurations or when used with LTO.

Firstly the builtin functions was not initialized before the parsing of 
functions, leading to wrong type initializations.

Then error catching code when a builtin was used without the proper fpu flags 
was incomprehensible for the user, for instance

#include "arm_neon.h"

int8x8_t a, b;
int16x8_t e;

void
main()
{
e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
}

compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages 
of

/arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name 
'__simd64_int8_t'
   typedef __simd64_int8_t int8x8_t;
...
...
arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka 
__vector(4) int}' to type 'int' which has different size
 return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) 
__b, __c);
 ^~
...
... and one for each arm_neon.h lines..

by postponing the check into arm_expand_builtin, we now emit something more 
useful:

testo.c: In function 'main':
testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported 
in this configuration.
 e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
 ^~~

One small side effect to note: The total memory allocated is 370k bigger when 
neon is not used, so this support will have a follow-up to make their 
initialization lazy. But I'd like first to stabilize the stuff for stage3 (or 
get it
pre-approved if the memory is an issue)

tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
(a few tests that was fail are now unsupported)



I agree, the vector types (re)initialisation is a tricky part.
I've seen similar issues in the aarch64 work for target attributes

bool
arm_vector_mode_supported_p (machine_mode mode)
{
-  /* Neon also supports V2SImode, etc. listed in the clause below.  */
-  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
  || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-  || mode == V2DImode || mode == V8HFmode))
-return true;
-
-  if ((TARGET_NEON || TARGET_IWMMXT)
-  && ((mode == V2SImode)
-  || (mode == V4HImode)
-  || (mode == V8QImode)))
+  || mode == V2DImode || mode == V8HFmode
+  || mode == V2SImode || mode == V4HImode || mode == V8QImode)
return true;


So this allows vector modes unconditionally for all targets/fpu configurations?
I was tempted to do that in aarch64 when I was encountering similar issues.
In the end what worked for me was re-laying out the vector types in 
SET_CURRENT_FUNCTION
if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)


yes my assumption was that arm_init_neon_builtins () is now called for all 
targets, since the check is done at expand time and that the builtins need to 
be known by lto, with the vector type initialization, before they are expanded.
However at that time, lto streaming-in have not yet processed the attributes 
and TARGET_NEON is not set for the function.

I had a look at your re-layout, but I'm not sure. it feels like a hack. I think 
this should be solved first place during the builtin construction. Also 
set_current_function is too late, builtin_expand that will explode because of 
the
unknown modes.

But raise the point. In fact I was not really happy with this 
arm_vector_mode_supported_p neither as I was not sure about other contexts it 
can be called from and I cannot clearly claim that this change is always 
correct.



So the main usage of targetm.vector_mode_supported_p is in stor-layout.c and 
vector_type_mode in particular seems
to have a relevant comment:
   /* Vector types need to re-check the target flags each time we report
  the machine mode.  We need to do this because attribute target can
  change the result of vector_mode_supported_p and have_regs_of_mode
  on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
  change on a per-function basis.  */

I think that implies that it expects targetm.vector_mode_supported_p to reject 
vector modes in
contexts that don't support NEON...


yes, thanks for this clarification, that settles it. this part of my patch is 
rubbish :-)




I'd like to think about other way to set the vector modes from 
arm_init_neon_builtins before the target flags are known. I'm thinking about 
the lazy initialization at expand

Re: [AArch64] Emit square root using the Newton series



On 09/12/15 18:50, Evandro Menezes wrote:

On 12/09/2015 11:16 AM, Kyrill Tkachov wrote:


On 09/12/15 17:02, Kyrill Tkachov wrote:


On 09/12/15 16:59, Evandro Menezes wrote:

On 12/09/2015 10:52 AM, Kyrill Tkachov wrote:

Hi Evandro,

On 08/12/15 21:35, Evandro Menezes wrote:

Emit square root using the Newton series

   2015-12-03  Evandro Menezes 

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt2): New
   expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt2): New expansion
   and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing x^-1/2 using 
the Newton series by adding support for x^1/2 as well.

Is it OK at this point of stage 3?



A comment on the patch itself from me...

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 6f7dbce..11c6c9a 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,4 @@

 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)

That seems like a misleading name to me.
If we're doing this, that means that the sqrt instruction is not faster
than doing the inverse sqrt estimation followed by a multiply.
I think a name like "synth_sqrt" or "estimate_sqrt" or something along those 
lines
is more appropriate.


Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. 
:-(  So, other targets when this is also true, using the "fast_sqrt" option 
might make sense.



Sure, but the way your patch is written, we will emit the series when 
"fast_sqrt" is set, rather
than the other way around, unless I'm misreading the logic in:



Sorry, what I meant to say is it would be clearer, IMO, to describe the 
compiler action that is being taken
(e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using 
a series.

Kyrill


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 030a101..f6d2da4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4280,7 +4280,23 @@

 ;; sqrt

-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:VDQF 0 "register_operand")
+(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+  if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags)
+  && !optimize_function_for_size_p (cfun)
+  && flag_finite_math_only
+  && !flag_trapping_math
+  && flag_unsafe_math_optimizations)
+{
+  aarch64_emit_swsqrt (operands[0], operands[1]);
+  DONE;
+}
+})



Kyrill,

How about "approx_sqrt" for, you guessed it, approximate square root?  The same adjective 
would perhaps describe "recip_sqrt" better too.



Sounds good to me.
Sorry for the bikeshedding.

Kyrill


Thanks,

[PATCH][Testsuite]Cleanup logs from gdb tests by adding newlines

2015-12-10 Thread Alan Lawrence

Runs of the guality testsuite can sometimes end up with gcc.log containing
malformed lines like:

A debugging session is active.PASS: gcc.dg/guality/pr36728-1.c   -O2  line 18 
arg4 == 4
A debugging session is active.PASS: gcc.dg/guality/restrict.c   -O2  line 30 
type:ip == int *
Inferior 1 [process 27054] will be killed.PASS: 
gcc.dg/guality/restrict.c   -O2  line 30 type:cicrp == const int * const 
restrict
Inferior 1 [process 27160] will be killed.PASS: 
gcc.dg/guality/restrict.c   -O2  line 30 type:cvirp == int * const volatile 
restrict

This patch just makes sure the PASS/FAIL comes at the beginning of a line.  (At
the slight cost of adding some extra newlines not in the actual test output.)

I moved the remote_close target calls earlier, to avoid any possible race
condition of extra output being generated after the newline - this may not be
strictly necessary.

Tested on aarch64-none-linux-gnu and x86_64-none-linux-gnu.

I think this is reasonable for stage 3 - OK for trunk?

gcc/testsuite/ChangeLog:
* lib/gcc-gdb-test.exp (gdb-test): call remote_close earlier, and send
newline to log, before calling pass/fail/unsupported.
* lib/gcc-simulate-thread.exp (simulate-thread): Likewise.
---
 gcc/testsuite/lib/gcc-gdb-test.exp| 15 ++-
 gcc/testsuite/lib/gcc-simulate-thread.exp | 10 +++---
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-gdb-test.exp 
b/gcc/testsuite/lib/gcc-gdb-test.exp
index d3ba6e4..f60cabf 100644
--- a/gcc/testsuite/lib/gcc-gdb-test.exp
+++ b/gcc/testsuite/lib/gcc-gdb-test.exp
@@ -84,8 +84,9 @@ proc gdb-test { args } {
 remote_expect target [timeout_value] {
# Too old GDB
-re "Unhandled dwarf expression|Error in sourced command file|

Re: [PATCH 2/7][ARM] Multilib support for ARMv8.1.

On Mon, Dec 7, 2015 at 4:05 PM, Matthew Wahab
 wrote:
> Ping. Updated patch attached.
> Matthew
>
>
> On 26/11/15 15:58, Matthew Wahab wrote:
>>
>> This patch sets up multilib support for ARMv8.1, treating it as a
>> synonym for ARMv8. Since ARMv8.1 integer, FP or SIMD
>> instructions are only generated for the new, instruction-specific
>> instrinsics, mapping to ARMv8 rather than adding a new multilib variant
>> is sufficient.
>>
>> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
>> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
>> bootstrap and make check.
>>
>> Ok for trunk?
>> Matthew
>>
>> gcc/
>> 2015-11-26  Matthew Wahab  
>>
>>  * config/arm/t-aprofile: Make "armv8.1-a" and "armv8.1-a+crc"
>>  matches for "armv8-a".
>>


OK.

Ramana
>

Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

On Mon, Dec 7, 2015 at 4:04 PM, Matthew Wahab
 wrote:
> Ping. Updated patch attached.
> Matthew
>
>
> On 26/11/15 15:55, Matthew Wahab wrote:
>>
>> Hello,
>>
>>
>> ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
>> instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
>> ARMv8.1 and for the new instructions, enabling the architecture with
>> --march=armv8.1-a. The new instructions are enabled when both ARMv8.1
>> and a suitable fpu options are set, for instance with -march=armv8.1-a
>> -mfpu=neon-fp-armv8 -mfloat-abi=hard.
>>
>> This patch set adds the command line options and internal feature
>> macros. Following patches
>> - enable multilib support for ARMv8.1,
>> - add patterns for the new instructions,
>> - add the ACLE feature macro for the ARMv8.1 extensions,
>> - extend target support in the testsuite to ARMv8.1,
>> - add the ACLE intrinsics for vqrmdl{as}h and
>> - add the ACLE intrinsics for vqrmdl{as}h_lane.
>>
>> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
>> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
>> bootstrap and make check.
>>
>> Is this ok for trunk?
>> Matthew
>>
>> gcc/
>> 2015-11-26  Matthew Wahab  
>>
>>  * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
>>  * config/arm/arm-protos.h (FL2_ARCH8_1): New.
>>  (FL2_FOR_ARCH8_1A): New.
>>  * config/arm/arm-tables.opt: Regenerate.
>>  * config/arm/arm.c (arm_arch8_1): New.
>>  (arm_option_override): Set arm_arch8_1.
>>  * config/arm/arm.h (TARGET_NEON_RDMA): New.
>>  (arm_arch8_1): Declare.
>>  * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
>>  "armv8.1-a+crc".
>>  (ARM Options, -mfpu): Fix a typo.
>
>

OK.

Re: [PATCH 3/7][ARM] Add patterns for new instructions

On Mon, Dec 7, 2015 at 4:06 PM, Matthew Wahab
 wrote:
> Ping. Updated patch attached.
> Matthew
>
>
> On 26/11/15 16:00, Matthew Wahab wrote:
>>
>> Hello,
>>
>> This patch adds patterns for the instructions, vqrdmlah and vqrdmlsh,
>> introduced in the ARMv8.1 architecture. The instructions are made
>> available when -march=armv8.1-a is enabled with suitable fpu settings,
>> such as -mfpu=neon-fp-armv8 -mfloat-abi=hard.
>>
>> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
>> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
>> bootstrap and make check.
>>
>> Ok for trunk?
>> Matthew

OK.

Ramana

>>
>> gcc/
>> 2015-11-26  Matthew Wahab  
>>
>>  * config/arm/iterators.md (VQRDMLH_AS): New.
>>  (neon_rdma_as): New.
>>  * config/arm/neon.md
>>  (neon_vqrdmlh): New.
>>  (neon_vqrdmlh_lane): New.
>>  * config/arm/unspecs.md (UNSPEC_VQRDMLAH): New.
>>  (UNSPEC_VQRDMLSH): New.
>>
>

Re: [PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

On Tue, Dec 8, 2015 at 7:45 AM, Christian Bruel  wrote:
> Hi Matthew,
>
>
> On 12/07/2015 05:07 PM, Matthew Wahab wrote:
>>
>> Ping. Updated patch attached.
>> Matthew
>>
>>
>> On 26/11/15 16:01, Matthew Wahab wrote:
>>>
>>> Hello,
>>>
>>> This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
>>> presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
>>> defined when the instructions are available, as it is when
>>> -march=armv8.1-a is enabled with suitable fpu options.
>>>
>>> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
>>> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
>>> bootstrap and make check.
>>>
>>> Ok for trunk?
>>> Matthew
>>>
>>> gcc/
>>> 2015-11-26  Matthew Wahab  
>>>
>>>   * config/arm/arm-c.c (arm_cpu_builtins): Define
>>> __ARM_FEATURE_QRDMX.
>>>
>>
>
> +  if (TARGET_NEON_RDMA)
> +builtin_define ("__ARM_FEATURE_QRDMX");
> +
>
> Since it depends on TARGET_NEON, could you please use
>
>   def_or_undef_macro (pfile, "__ARM_FEATURE_QRDMX", TARGET_NEON_RDMA);
>
> instead ?

I think that's what it should be -

OK with that fixed.

Ramana


>
> thanks
>
> Christian

Re: ipa-cp heuristics fixes

Hi,

thanks for looking into this, I only have one question:

On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> Martin,
> while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
> First of all, ipcp_cloning_candidate_p calls
>  optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
> which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
> around.  I replaced it by node->optimize_for_size_p ().
> 
> Second we perform incredible number of clones because we do obtain some sort 
> of
> polymorphic call context for them.  In wast majority of cases this is useless
> effort, because the functions in question do not contain virtual calls and do
> not pass the parameter further.  For firefox about 40k out of 50k clones
> created are created just because we found some context.
> 
> I changed the code to only clone if this immediately leads to 
> devirtualization.
> This do not cause any noticeable drop in number of devirtualized calls on
> Firefox. I suppose we will miss the case where cloning a caller may allow
> devirtualization in a clone of callee, but I do not think the heuristics for
> context independent values can handle this as implemented right now and it
> simply have way to many false positives.
> 
> What we can do is to devirtualize w/o cloning for local functions and
> speculatively devirtualize in case we would otherwise clone.
> 
> Third problem I noticed is that
> will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
> ignore the function size when deciding about the code size impact.
> This function is doing some analysis for inliner where it, for example, 
> analyses
> if a comdat which is going to be inlined consistently in the whole program
> will be removed.
> 
> In the cloning case I do not see this to apply: we have no evidence that the
> other units will pass the same constants to the function.  I think you
> basically want to assume that the  function will be removed if it has no
> address taken and it is not externally visibible. This is what local flag
> is for.
> 
> I gathered some stats:
> 
> number of clones for all contexts: 49948->11102
> number of clones: 4376->4383
> 
> good_cloning_opportunity_p is called about 70k times, I wonder if the
> thresholds are not simply set too high.  For example, inliner does about 300k
> inlines at Firefox.
> 
> number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I 
> do not have data on unpatched tree for this)
> number of devirts: 956->933
> number of devirts happening at inline: 781->868
> number of indirect calls promoted: 512->512
> 
> Inliner stats from: Unit growth for small function inlining: 7965701->9130051 
> (14%)
> to: Unit growth for small function inlining: 7965010->9138577
> 
> So it seems that except for large drop in number of clones there is no 
> significant difference.
> 
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> Honza
> 
>   * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
>   (good_cloning_opportunity_p): Likewise.
>   (gather_context_independent_values): Do not return true when
>   polymorphic call context is known or when we have known aggregate
>   value of unused parameter.
>   (estimate_local_effects): Try to create clone for all context
>   when either some params are substituted or devirtualization is possible
>   or some params can be removed; use local flag instead of
>   node->will_be_removed_from_program_if_no_direct_calls_p.
>   (identify_dead_nodes): Likewise.
> Index: ipa-cp.c
> ===
> --- ipa-cp.c  (revision 231477)
> +++ ipa-cp.c  (working copy)
> @@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_
>return false;
>  }
>  
> -  if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +  if (node->optimize_for_size_p ())
>  {
>if (dump_file)
>  fprintf (dump_file, "Not considering %s for cloning; "
> @@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap
>  {
>if (time_benefit == 0
>|| !opt_for_fn (node->decl, flag_ipa_cp_clone)
> -  || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +  || node->optimize_for_size_p ())
>  return false;
>  
>gcc_assert (size_cost > 0);
> @@ -2387,12 +2387,14 @@ gather_context_independent_values (struc
>   *removable_params_cost
> += ipa_get_param_move_cost (info, i);
>  
> +  if (!ipa_is_param_used (info, i))
> + continue;
> +

Is this really necessary, is it not enough to remove the assignment to
ret below?  If the parameter is not used, devirtualization time bonus,
which you then rely on estimate_local_effects, should be zero for it.

It is a very minor point, I suppose, but if the function gets cloned
for a different reason, it might still be beneficial to have as much
context-

Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

On Mon, Dec 7, 2015 at 4:10 PM, Matthew Wahab
 wrote:
> On 27/11/15 17:11, Matthew Wahab wrote:
>>
>> On 27/11/15 13:44, Christophe Lyon wrote:

 On 26/11/15 16:02, Matthew Wahab wrote:
>>
>>
> This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
> tests to specify targest and to set up command line options.
> It builds on the ARMv8.1 target support added for AArch64 tests, partly
> reworking that support to take into account the different
> configurations
> that tests may be run under.
>>
>>
>>> I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
>>> supported by aarch64-gcc. So it seems to me that
>>> check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
>>> for aarch64 after your patch.
>>
>>
>>> Or does it work because no option is needed and thus "" always
>>> matches and thus the loop always exits after the first iteration
>>> on aarch64?
>>
>>
>> Yes, the idea is that the empty string will make the function first try
>> '-march=armv8.1-a' without any other flag. That will work for AArch64
>> because it
>> doesn't need any other option.
>>
>>> Maybe a more accurate comment would help remembering that, in case
>>> -mfpu option becomes necessary for aarch64.
>>>
>>
>> Agreed, it's worth having a comment to explain what the 'foreach'
>> construct is doing.
>>
>> Matthew
>
>
> I've added a comment to the foreach construct, to make it clearer what
> it's doing.
>
> Matthew
>
> testsuite/
> 2015-12-07  Matthew Wahab  
>
>
> * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
> comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
> the command line options.
> (check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
> test to allow ARM targets.  Select and record a working set of
> command line options.
> (check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
> targets.
>


># Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
>-# otherwise.  The test is valid for AArch64.
>+# otherwise.  The test is valid for AArch64 and ARM.  Record the command
>+# line options that needed.

s/that//

Can you also make sure doc/sourcebuild.texi is updated for this helper
function ? If not documented,it would be good to add the documentation
for the same while you are here.

Ramana

Re: [PATCH 6/7][ARM] Add ACLE intrinsics vqrdmlah and vqrdmlsh

On Mon, Dec 7, 2015 at 4:12 PM, Matthew Wahab
 wrote:
> Ping. Updated patch attached.
> Matthew
>
>
> On 26/11/15 16:04, Matthew Wahab wrote:
>>
>> Hello,
>>
>> This patch adds the ACLE intrinsics for the instructions introduced in
>> ARMv8.1. It adds the vqrmdlah and vqrdmlsh forms of the instrinsics to
>> the arm_neon.h header, together with the ARM builtins used to implement
>> them. The intrinsics are available when -march=armv8.1-a is enabled
>> together with appropriate fpu options.
>>
>> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
>> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
>> bootstrap and make check.
>>
>> Ok for trunk?
>> Matthew
>>
>> gcc/
>> 2015-11-26  Matthew Wahab  
>>
>>  * config/arm/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
>>  (vqrdmlahq_s16, vqrdmlahq_s32): New.
>>  (vqrdmlsh_s16, vqrdmlsh_s32): New.
>>  (vqrdmlahq_s16, vqrdmlshq_s32): New.
>>  * config/arm/arm_neon_builtins.def: Add "vqrdmlah" and "vqrdmlsh".
>>
>


OK.

Ramana

Re: [PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

On Thu, Nov 26, 2015 at 4:04 PM, Matthew Wahab
 wrote:
> Hello,
>
> This patch adds the ACLE intrinsics for the instructions introduced in
> ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
> instrinsics to the arm_neon.h header, together with the ARM builtins
> used to implement them. The intrinsics are available when
> -march=armv8.1-a is enabled together with appropriate fpu options.
>
> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
> bootstrap and make check.
>
> Ok for trunk?
> Matthew
>
> gcc/
> 2015-11-26  Matthew Wahab  
>
> * config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
> (vqrdmlahq_lane_s32): New.
> (vqrdmlah_lane_s16): New.
> (vqrdmlah_lane_s32): New.
> (vqrdmlshq_lane_s16): New.
> (vqrdmlshq_lane_s32): New.
> (vqrdmlsh_lane_s16): New.
> (vqrdmlsh_lane_s32): New.
> * config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
> "vqrdmlsh_lane".
>

OK.

Ramana

Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

On Thu, Dec 10, 2015 at 10:43 AM, Ramana Radhakrishnan
 wrote:
> On Mon, Dec 7, 2015 at 4:04 PM, Matthew Wahab
>  wrote:
>> Ping. Updated patch attached.
>> Matthew
>>
>>
>> On 26/11/15 15:55, Matthew Wahab wrote:
>>>
>>> Hello,
>>>
>>>
>>> ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
>>> instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
>>> ARMv8.1 and for the new instructions, enabling the architecture with
>>> --march=armv8.1-a. The new instructions are enabled when both ARMv8.1
>>> and a suitable fpu options are set, for instance with -march=armv8.1-a
>>> -mfpu=neon-fp-armv8 -mfloat-abi=hard.
>>>
>>> This patch set adds the command line options and internal feature
>>> macros. Following patches
>>> - enable multilib support for ARMv8.1,
>>> - add patterns for the new instructions,
>>> - add the ACLE feature macro for the ARMv8.1 extensions,
>>> - extend target support in the testsuite to ARMv8.1,
>>> - add the ACLE intrinsics for vqrmdl{as}h and
>>> - add the ACLE intrinsics for vqrmdl{as}h_lane.
>>>
>>> Tested the series for arm-none-eabi with cross-compiled check-gcc on an
>>> ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
>>> bootstrap and make check.
>>>
>>> Is this ok for trunk?
>>> Matthew
>>>
>>> gcc/
>>> 2015-11-26  Matthew Wahab  
>>>
>>>  * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
>>>  * config/arm/arm-protos.h (FL2_ARCH8_1): New.
>>>  (FL2_FOR_ARCH8_1A): New.
>>>  * config/arm/arm-tables.opt: Regenerate.
>>>  * config/arm/arm.c (arm_arch8_1): New.
>>>  (arm_option_override): Set arm_arch8_1.
>>>  * config/arm/arm.h (TARGET_NEON_RDMA): New.
>>>  (arm_arch8_1): Declare.
>>>  * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
>>>  "armv8.1-a+crc".
>>>  (ARM Options, -mfpu): Fix a typo.
>>
>>
>
> OK.

I couldn't find 0/7 but in addition here you need to update the output
for TAG_FP_SIMD_Arch to be 4.

regards
Ramana

Re: [PATCH] Avoid false vector mask conversion

On Fri, Dec 4, 2015 at 4:00 PM, Ilya Enkovich  wrote:
> On 02 Dec 16:27, Richard Biener wrote:
>> On Wed, Dec 2, 2015 at 4:24 PM, Ilya Enkovich  wrote:
>> >
>> > The problem is that conversion is supposed to be handled by
>> > vectorizable_conversion,
>> > but it fails to because it is not actually a conversion. I suppose it
>> > may be handled
>> > in vectorizable_assignment but I chose this pattern because it's meant
>> > to handle mask
>> > conversion issues.
>>
>> I think it's always better to avoid patterns if you can.
>>
>> Richard.
>>
>
> Here is a variant with vectorizable_assignment change.  Bootstrapped and 
> regtested on x86_64-unknown-linux-gnu.  Does it look better?

Yes.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-12-04  Ilya Enkovich  
>
> * tree-vect-stmts.c (vectorizable_assignment): Support
> useless boolean conversion.
>
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 3b078da..2cdbb04 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -4229,7 +4229,12 @@ vectorizable_assignment (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>/* But a conversion that does not change the bit-pattern is ok.  */
>&& !((TYPE_PRECISION (TREE_TYPE (scalar_dest))
> > TYPE_PRECISION (TREE_TYPE (op)))
> -  && TYPE_UNSIGNED (TREE_TYPE (op
> +  && TYPE_UNSIGNED (TREE_TYPE (op)))
> +  /* Conversion between boolean types of different sizes is
> +a simple assignment in case their vectypes are same
> +boolean vectors.  */
> +  && (!VECTOR_BOOLEAN_TYPE_P (vectype)
> + || !VECTOR_BOOLEAN_TYPE_P (vectype_in)))
>  {
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

On Wed, Dec 9, 2015 at 10:31 PM, Markus Trippelsdorf
 wrote:
> On 2015.12.09 at 10:53 -0800, H.J. Lu wrote:
>>
>> Empty C++ class is a corner case which isn't covered in psABI nor C++ ABI.
>> There is no mention of "empty record" in GCC documentation.  But there are
>> plenty of "empty class" in gcc/cp.  This change affects all targets.  C++ ABI
>> should specify how it should be passed.
>
> There is a C++ ABI mailinglist, where you could discuss this issue:
> http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev

Yep.  As long as the ABI doesn't state how to pass those I'd rather _not_ change
GCCs way.

Richard.

> --
> Markus

Re: [PATCH, i386] Use scalar mask for 16-byte and 32-byte vectors when possible

2015-12-10 Thread Ilya Enkovich

Ping

2015-11-26 13:33 GMT+03:00 Ilya Enkovich :
> Hi,
>
> This patch allows usage of scalar masks for ymm and xmm registers when target 
> supports it.  Bootstrapped and regtested on x86_64-unknown-linux-gnu.  OK for 
> trunk?
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-26  Ilya Enkovich  
>
> * config/i386/i386.c (ix86_get_mask_mode): Use scalar
> modes for 32 and 16 byte vectors when possible.
>
> gcc/testsuite/
>
> 2015-11-26  Ilya Enkovich  
>
> * gcc.dg/vect/vect-32-chars.c: New test.
>
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 83749d5..d7c359f 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -53443,7 +53443,8 @@ ix86_get_mask_mode (unsigned nunits, unsigned 
> vector_size)
>unsigned elem_size = vector_size / nunits;
>
>/* Scalar mask case.  */
> -  if (TARGET_AVX512F && vector_size == 64)
> +  if ((TARGET_AVX512F && vector_size == 64)
> +  || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
>  {
>if (elem_size == 4 || elem_size == 8 || TARGET_AVX512BW)
> return smallest_mode_for_size (nunits, MODE_INT);
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-32-chars.c 
> b/gcc/testsuite/gcc.dg/vect/vect-32-chars.c
> new file mode 100644
> index 000..0af5d2d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-32-chars.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-mavx512bw -mavx512vl" { target { i?86-*-* 
> x86_64-*-* } } } */
> +
> +char a[32];
> +char b[32];
> +char c[32];
> +
> +void test()
> +{
> +  int i = 0;
> +  for (i = 0; i < 32; i++)
> +if (b[i] > 0)
> +  a[i] = c[i];
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
> i?86-*-* x86_64-*-* } } } } */

Re: Splitting up gcc/omp-low.c?


On 12/10/2015 09:08 AM, Jakub Jelinek wrote:

On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:

On 12/09/2015 05:24 PM, Thomas Schwinge wrote:


In addition to that, how about we split up gcc/omp-low.c into several
files?  Would it make sense (I have not yet looked in detail) to do so
along the borders of the several passes defined therein?  Or, can you
tell already that there would be too many cross-references between the
several files to make this infeasible?


It would be nice to get rid of all the code duplication in that file. That
alone could reduce the size by quite a bit, and hopefully make it easier to
read.


What exact code duplication do you mean?


Functions that are near-identical with slight differences, or which have 
large sections of identical code. scan_omp_task vs scan_omp_parallel, or 
the various expand_omp_for functions are examples.



Bernd

Re: Splitting up gcc/omp-low.c?

On Thu, Dec 10, 2015 at 12:26:10PM +0100, Bernd Schmidt wrote:
> On 12/10/2015 09:08 AM, Jakub Jelinek wrote:
> >On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> >>On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> >>>
> >>>In addition to that, how about we split up gcc/omp-low.c into several
> >>>files?  Would it make sense (I have not yet looked in detail) to do so
> >>>along the borders of the several passes defined therein?  Or, can you
> >>>tell already that there would be too many cross-references between the
> >>>several files to make this infeasible?
> >>
> >>It would be nice to get rid of all the code duplication in that file. That
> >>alone could reduce the size by quite a bit, and hopefully make it easier to
> >>read.
> >
> >What exact code duplication do you mean?
> 
> Functions that are near-identical with slight differences, or which have
> large sections of identical code. scan_omp_task vs scan_omp_parallel, or the

Even these two (quite short) have huge number of differences, so I'm not
100% sure it would be more readable if we had just one scan_omp_taskreg that
handled both.

> various expand_omp_for functions are examples.

I'm aware of some duplication in expand_omp_for_* functions, and some of the
obvious duplications were already moved to helper functions.  But in these
cases the number of differences is even significantly bigger too, so having
just one function that would handle all the different schedules would be far
less readable.  Perhaps we can add some small helpers to handle some little
pieces that repeat between the functions.

Jakub

Re: [PATCH v4] Fix shrink-wrapping bug (PR67778, PR68634)


On 12/09/2015 09:47 PM, Segher Boessenkool wrote:

After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.

Unfortunately a candidate block we select can be inside a loop, and we
will still allow it (because the loop always exits via our previously
chosen block).  We can do that just fine if we make a duplicate of the
block, but we do not want to here.

So we need to detect this situation.  We can place the prologue at a
previous block PRE only if PRE dominates every block reachable from
it, because then we will never need to duplicate that block (it will
always be executed with prologue).

v4: Fixed all the stupid mistakes you noticed.  Also, the previous
version stopped looking when the previous try didn't work out.  This
version doesn't: it is simpler, more in line with the rest of the
algorithm, potentially useful, and doesn't really cost more.

Tested on the two testcases from the PRs.  Also regression checked
on powerpc64-linux.

Is this okay for trunk?


Ok. This seems like a safe way to test things. For gcc-7 it might be 
worthwhile to try to have loop information available and use that for 
efficiency.



Bernd

Re: [PATCH] update -Wall and -Wextra documentation


On 12/10/2015 12:07 AM, Martin Sebor wrote:


* invoke.texi (Warning Options): Update -Wall options.  Clarify
when some -Wextra options are enabled.  Add -Wplacement-new example.


I tried to check this list against c.opt - I figure this should contain 
essentially the ones that are have an EnabledBy(Wall), plus whatever 
logic there is in c-opts.



+-Wduplicated-cond  @gol


I don't see this one as enabled by Wall, and

2015-10-02  Marek Polacek  

* c.opt (Wduplicated-cond): Don't enable by -Wall anymore.


+-Wplacement-new @r{(only for C++)} @gol


This one appears to be on by default?


+In C++, this warning is also enabled by @option{-Wall}.  In C, it is also
+enabled by @option{-Wextra}; to get the other warnings of @option{-Wextra}
+without this warning, use @option{-Wextra -Wno-sign-compare}.


Is the last part of the sentence really necessary? It kind of follows 
from the rest of the documentation and we don't spell this out for other 
-Wextra options.



Bernd

Re: [hsa 3/10] HSA libgomp plugin [part 1/2]

2015-12-10 Thread Martin Liška

On 12/09/2015 01:15 PM, Jakub Jelinek wrote:
> On Mon, Dec 07, 2015 at 12:20:49PM +0100, Martin Jambor wrote:
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include "libgomp-plugin.h"
>> +#include "gomp-constants.h"
>> +#include "hsa.h"
>> +#include "hsa_ext_finalize.h"
> 
> If these 2 headers are coming from outside of gcc, better use <> for them
> instead of "", and better place them above the libgomp specific ones.
> 
>> +#include "dlfcn.h"
> 
> Ditto.
> 
>> +/* Flag to decide whether print to stderr information about what is going 
>> on.
>> +   Set in init_debug depending on environment variables.  */
>> +
>> +static bool debug;
>> +
>> +/* Flag to decide if the runtime should suppress a possible fallback to host
>> +   execution.  */
>> +
>> +static bool suppress_host_fallback;
>> +
>> +/* Initialize debug and suppress_host_fallback according to the 
>> environment.  */
>> +
>> +static void
>> +init_enviroment_variables (void)
>> +{
>> +  if (getenv ("HSA_DEBUG"))
>> +debug = true;
>> +  else
>> +debug = false;
>> +
>> +  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
>> +suppress_host_fallback = true;
>> +  else
>> +suppress_host_fallback = false;
> 
> Wonder whether we want these custom env vars in suid programs, allowing
> users to change behavior might be undesirable.  So perhaps use secure_getenv
> instead of getenv if found by configure?
>> +
>> +  const char* hsa_error;
> 
> Space before * instead of after it (multiple spots).
> 
>> +  hsa_status_string (status, &hsa_error);
>> +
>> +  unsigned l = strlen (hsa_error);
>> +
>> +  char *err = GOMP_PLUGIN_malloc (sizeof (char) * l);
> 
> sizeof (char) is 1, please don't multiply by it, that is just confusing.
> It makes sense in macros where you don't know what the type really is, but
> not here.
> 
>> +  memcpy (err, hsa_error, l - 1);
>> +  err[l] = '\0';
>> +
>> +  fprintf (stderr, "HSA warning: %s (%s)\n", str, err);
> 
> Why do you allocate err at all, if you free it right away?  Can't you use
> hsa_error directly instead?
>> +
>> +  free (err);
> 
>> +static void
>> +hsa_fatal (const char *str, hsa_status_t status)
>> +{
>> +  const char* hsa_error;
>> +  hsa_status_string (status, &hsa_error);
>> +  GOMP_PLUGIN_fatal ("HSA fatal error: %s (%s)", str, hsa_error);
> 
> Like you do e.g. here?
> 
>> +/* Callback of dispatch queues to report errors.  */
>> +
>> +static void
>> +queue_callback(hsa_status_t status, hsa_queue_t* queue __attribute__ 
>> ((unused)),
> 
> Missing space before (.
>> +/* Callback of hsa_agent_iterate_regions.  Determine if a memory REGION can 
>> be
>> +   used for kernarg allocations and if so write it to the memory pointed to 
>> by
>> +   DATA and break the query.  */
>> +
>> +static hsa_status_t get_kernarg_memory_region (hsa_region_t region, void* 
>> data)
> 
> Missing newline between return type and function name.
> 
>> +  hsa_region_t* ret = (hsa_region_t*) data;
> 
> Two spots with wrong formatting above.
>> +{
>> +  if (agent->first_module)
>> +  agent->first_module->prev = module;
> 
> Wrong indentation.
> 
>> +  unsigned size = end - start;
>> +  char *buf = (char *) malloc (size);
>> +  memcpy (buf, start, size);
> 
> malloc could return NULL and you crash.  Should this use GOMP_PLUGIN_malloc
> instead?
> 
>> +  struct GOMP_hsa_kernel_dispatch *shadow = GOMP_PLUGIN_malloc_cleared
>> +(sizeof (struct GOMP_hsa_kernel_dispatch));
> 
> Formatting, = should go on the next line, and if even that doesn't help,
> maybe use sizeof (*shadow)?
>> +
>> +  shadow->queue = agent->command_q;
>> +  shadow->omp_data_memory = omp_data_size > 0
>> +? GOMP_PLUGIN_malloc (omp_data_size) : NULL;
> 
> = should go on the next line.
> 
>> +  unsigned dispatch_count = kernel->dependencies_count;
>> +  shadow->kernel_dispatch_count = dispatch_count;
>> +
>> +  shadow->children_dispatches = GOMP_PLUGIN_malloc
>> +(dispatch_count * sizeof (struct GOMP_hsa_kernel_dispatch *));
> 
> Likewise.
>> +indent_stream (FILE *f, unsigned indent)
>> +{
>> +  for (int i = 0; i < indent; i++)
>> +fputc (' ', f);
> 
> Perhaps fprintf (f, "%*s", indent, "");
> instead?
> 
>> +  struct GOMP_hsa_kernel_dispatch *shadow = create_single_kernel_dispatch
>> +(kernel, omp_data_size);
> 
> Formatting issues again.
> 
>> +  shadow->omp_num_threads = 64;
>> +  shadow->debug = 0;
>> +  shadow->omp_level = kernel->gridified_kernel_p ? 1 : 0;
>> +
>> +  /* Create kernel dispatch data structures.  We do not allow to have
>> + a kernel dispatch with depth bigger than one.  */
>> +  for (unsigned i = 0; i < kernel->dependencies_count; i++)
>> +{
>> +  struct kernel_info *dependency = get_kernel_for_agent
>> +(kernel->agent, kernel->dependencies[i]);
>> +  shadow->children_dispatches[i] = create_single_kernel_dispatch
>> +(dependency, omp_data_size);
>> +  shadow->children_dispatches[i]->queue =
> 
> Never put = at the end of line.
>> +kernel->agent->kernel_dispatch_command_q;
> 
>

Re: [hsa 3/10] HSA libgomp plugin [part 2/2]

2015-12-10 Thread Martin Liška

On 12/09/2015 01:15 PM, Jakub Jelinek wrote:
> On Mon, Dec 07, 2015 at 12:20:49PM +0100, Martin Jambor wrote:
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include "libgomp-plugin.h"
>> +#include "gomp-constants.h"
>> +#include "hsa.h"
>> +#include "hsa_ext_finalize.h"
> 
> If these 2 headers are coming from outside of gcc, better use <> for them
> instead of "", and better place them above the libgomp specific ones.
> 
>> +#include "dlfcn.h"
> 
> Ditto.
> 
>> +/* Flag to decide whether print to stderr information about what is going 
>> on.
>> +   Set in init_debug depending on environment variables.  */
>> +
>> +static bool debug;
>> +
>> +/* Flag to decide if the runtime should suppress a possible fallback to host
>> +   execution.  */
>> +
>> +static bool suppress_host_fallback;
>> +
>> +/* Initialize debug and suppress_host_fallback according to the 
>> environment.  */
>> +
>> +static void
>> +init_enviroment_variables (void)
>> +{
>> +  if (getenv ("HSA_DEBUG"))
>> +debug = true;
>> +  else
>> +debug = false;
>> +
>> +  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
>> +suppress_host_fallback = true;
>> +  else
>> +suppress_host_fallback = false;
> 
> Wonder whether we want these custom env vars in suid programs, allowing
> users to change behavior might be undesirable.  So perhaps use secure_getenv
> instead of getenv if found by configure?
>> +
>> +  const char* hsa_error;
> 
> Space before * instead of after it (multiple spots).
> 
>> +  hsa_status_string (status, &hsa_error);
>> +
>> +  unsigned l = strlen (hsa_error);
>> +
>> +  char *err = GOMP_PLUGIN_malloc (sizeof (char) * l);
> 
> sizeof (char) is 1, please don't multiply by it, that is just confusing.
> It makes sense in macros where you don't know what the type really is, but
> not here.
> 
>> +  memcpy (err, hsa_error, l - 1);
>> +  err[l] = '\0';
>> +
>> +  fprintf (stderr, "HSA warning: %s (%s)\n", str, err);
> 
> Why do you allocate err at all, if you free it right away?  Can't you use
> hsa_error directly instead?
>> +
>> +  free (err);
> 
>> +static void
>> +hsa_fatal (const char *str, hsa_status_t status)
>> +{
>> +  const char* hsa_error;
>> +  hsa_status_string (status, &hsa_error);
>> +  GOMP_PLUGIN_fatal ("HSA fatal error: %s (%s)", str, hsa_error);
> 
> Like you do e.g. here?
> 
>> +/* Callback of dispatch queues to report errors.  */
>> +
>> +static void
>> +queue_callback(hsa_status_t status, hsa_queue_t* queue __attribute__ 
>> ((unused)),
> 
> Missing space before (.
>> +/* Callback of hsa_agent_iterate_regions.  Determine if a memory REGION can 
>> be
>> +   used for kernarg allocations and if so write it to the memory pointed to 
>> by
>> +   DATA and break the query.  */
>> +
>> +static hsa_status_t get_kernarg_memory_region (hsa_region_t region, void* 
>> data)
> 
> Missing newline between return type and function name.
> 
>> +  hsa_region_t* ret = (hsa_region_t*) data;
> 
> Two spots with wrong formatting above.
>> +{
>> +  if (agent->first_module)
>> +  agent->first_module->prev = module;
> 
> Wrong indentation.
> 
>> +  unsigned size = end - start;
>> +  char *buf = (char *) malloc (size);
>> +  memcpy (buf, start, size);
> 
> malloc could return NULL and you crash.  Should this use GOMP_PLUGIN_malloc
> instead?
> 
>> +  struct GOMP_hsa_kernel_dispatch *shadow = GOMP_PLUGIN_malloc_cleared
>> +(sizeof (struct GOMP_hsa_kernel_dispatch));
> 
> Formatting, = should go on the next line, and if even that doesn't help,
> maybe use sizeof (*shadow)?
>> +
>> +  shadow->queue = agent->command_q;
>> +  shadow->omp_data_memory = omp_data_size > 0
>> +? GOMP_PLUGIN_malloc (omp_data_size) : NULL;
> 
> = should go on the next line.
> 
>> +  unsigned dispatch_count = kernel->dependencies_count;
>> +  shadow->kernel_dispatch_count = dispatch_count;
>> +
>> +  shadow->children_dispatches = GOMP_PLUGIN_malloc
>> +(dispatch_count * sizeof (struct GOMP_hsa_kernel_dispatch *));
> 
> Likewise.
>> +indent_stream (FILE *f, unsigned indent)
>> +{
>> +  for (int i = 0; i < indent; i++)
>> +fputc (' ', f);
> 
> Perhaps fprintf (f, "%*s", indent, "");
> instead?
> 
>> +  struct GOMP_hsa_kernel_dispatch *shadow = create_single_kernel_dispatch
>> +(kernel, omp_data_size);
> 
> Formatting issues again.
> 
>> +  shadow->omp_num_threads = 64;
>> +  shadow->debug = 0;
>> +  shadow->omp_level = kernel->gridified_kernel_p ? 1 : 0;
>> +
>> +  /* Create kernel dispatch data structures.  We do not allow to have
>> + a kernel dispatch with depth bigger than one.  */
>> +  for (unsigned i = 0; i < kernel->dependencies_count; i++)
>> +{
>> +  struct kernel_info *dependency = get_kernel_for_agent
>> +(kernel->agent, kernel->dependencies[i]);
>> +  shadow->children_dispatches[i] = create_single_kernel_dispatch
>> +(dependency, omp_data_size);
>> +  shadow->children_dispatches[i]->queue =
> 
> Never put = at the end of line.
>> +kernel->agent->kernel_dispatch_command_q;
> 
>

Re: [PATCH][ARM] PR 49526: Add support for smmul,smmla,smmls instructions



On 25/11/15 10:36, Kyrill Tkachov wrote:

Hi Yvan,

On 24/11/15 15:05, Yvan Roux wrote:

Hi Kyrill,

On 24 November 2015 at 14:31, Kyrill Tkachov  wrote:

Ping.

Thanks,
Kyrill



On 13/11/15 11:50, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00686.html

Thanks,
Kyrill

On 06/11/15 17:05, Kyrill Tkachov wrote:

Hi all,

This patch introduces support for the smmul, smmla and smmls instructions
that appear in armv6 architecture levels
and higher. To quote the SMMUL description from the ARMARM:
"Signed Most Significant Word Multiply multiplies two signed 32-bit
values, extracts the most significant 32 bits of
the result, and writes those bits to the destination register."

The smmla and smmls are the multiply-accumulate and multiply-subtract
extensions of that multiply.
There also exists an smmulr variant that rounds the result rather than
truncating it.
However, when I tried adding patterns for those forms I got an LRA ICE
that I was not able to figure out.
I'll try to find a testcase for that, but in the meantime there's no
reason to not have patterns for the
non-rounding variants.

I agree, but would be interested to give a look at the LRA issue when
you have a testcase.


I've filed PR 68536 for the ICE I've been seeing. Any insight is, of course, 
welcome.


Bootstrapped and tested on arm-none-linux-gnueabihf.
I've seen this trigger in quite a few places in SPEC2006 where it always
made the code better.

Ok for trunk?

Patch looks good to me (but can't approved it), I just wonder if
adding a subreg_highpart_p predicate in emit-rtl.c would be useful,
looking at other usage of subreg_highpart_offset it doesn't seem to be
the case, but maybe for consistency with the existing
subreg_lowpart_p.


Perhaps, we can put it there if there's much scope for its reuse in other
parts of the compiler. I didn't put it there in the first place so as not to
extend the review scope of the patch to the midend...



I have since discovered a bug in the proposed patterns for smmla and smmls.
The instructions are more involved than I thought, so it's not easy to write
a pattern for them.
I'll propose an updated patch adding just the smmul pattern in next stage1.

So, I'm withdrawing this patch.
thanks,
Kyrill


Thanks for looking at this,
Kyrill



Cheers,
Yvan


Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

 PR target/49526
 * config/arm/arm.md (*mulsidi3si_v6): New pattern.
 (*mulsidi3siaddsi_v6): Likewise.
 (*mulsidi3sisubsi_v6): Likewise.
 * config/arm/predicates.md (subreg_highpart_operator):
 New predicate.

2015-11-06  Kyrylo Tkachov  

 PR target/49526
 * gcc.target/arm/pr49526_1.c: New test.
 * gcc.target/arm/pr49526_2.c: Likewise.
 * gcc.target/arm/pr49526_3.c: Likewise.

Re: [RFA] Transparent alias suport part 10: Fix base+offset alias analysis oracle WRT aliases

>   PR ipa/61886
>   PR middle-end/25140
>   * tree-ssa-alias.c (ptr_deref_may_alias_decl_p): Usecompare_base_decls
>   (nonoverlapping_component_refs_of_decl_p): Update sanity check.
>   (decl_refs_may_alias_p): Use compare_base_decls.
>   * alias.c: Include cgraph.h
>   (rtx_equal_for_memref_p): Use rtx_equal_for_memref_p.
>   (compare_base_decls): New function.
>   (base_alias_check): Likewise.
>   (memrefs_conflict_p): Likewise.
>   (nonoverlapping_memrefs_p): Likewise.
>   * alias.h (compare_base_decls): Declare.

The installed patch also contains the cut-off for the Ada testcases, so I have 
added the following line to the ChangeLong:

(get_alias_set): Add cut-off for recursion.

and installed the Ada testcases:

* gnat.dg/specs/access1.ads: New test.
* gnat.dg/specs/access2.ads: Likewise.

Thanks for quickly fixing the issue!

-- 
Eric Botcazou

Re: [PATCH] rtlanal: Fix bits/bytes confusion in set_noop_p (PR68814)

> My case comes from the gcc.dg/pr65492-2.c, the "test1int2" case.
> combine has made an insn
>   modifying insn i321: zero_extract(%3:DI,0x20,0)=%3:DI
> which is splatting the SImode parameter to both the high and low halves
> of the dest reg.  Then, it tries to combine it with the USE of that reg
> at the end of the function, giving
> 
> Failed to match this instruction:
> (parallel [
> (use (ior:DI (and:DI (reg/i:DI 3 3)
> (const_int 4294967295 [0x]))
> (ashift:DI (reg:DI 3 3 [ c ])
> (const_int 32 [0x20]
> (set (zero_extract:DI (reg/i:DI 3 3)
> (const_int 32 [0x20])
> (const_int 0 [0]))
> (reg:DI 3 3 [ c ]))
> ])
> 
> and if it has a parallel of two which doesn't match, it sees if it
> just needs one arm because the other is a noop set, and that ends
> up with
>   deleting noop move 21
> because of the wrong test, making the testcase fail.
> (powerpc64le has BITS_BIG_ENDIAN set, a bit unusual).

Thanks.  It seems a little odd for the condition to test the POS (operand #2) 
and not also the LEN (operand #1) of the ZERO_EXTRACT before returning true, 
but I'm not sure what the test would be given the above example.  Or maybe 
it's implicitly contained in the POS test because of the little-endian case.

The patch is OK for mainline.

-- 
Eric Botcazou

Re: [testsuite][ARM target attributes] Fix effective_target tests

Hi Christophe,

On 08/12/15 11:18, Christophe Lyon wrote:

On 8 December 2015 at 11:50, Kyrill Tkachov wrote:

Hi Christophe,

On 27/11/15 13:00, Christophe Lyon wrote:

Hi,

After the recent commits from Christian adding target attributes
support for ARM FPU settings, I've noticed that some of the tests
were failing because of incorrect assumptions wrt to the default
cpu/fpu/float-abi of the compiler.

This patch fixes the problems I've noticed in the following way:
- do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
when gcc is configured --with-float=hard

- change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
defined

- introduce arm_fp_ok, which is similar but does not enforce fpu setting

- add a new effective_target: arm_crypto_pragma_ok to check that
setting this fpu via a pragma is actually supported by the current
"multilib". This is different from checking the command-line option
because the pragma might conflict with the command-line options in
use.

The updates in the testcases are as follows:
- attr-crypto.c, we have to make sure that the defaut fpu does not
conflict with the one forced by pragma. That's why I use the arm_vfp
options/effective_target. This is needed if gcc has been configured
--with-fpu=neon-fp16, as the pragma fpu=crypto-neon-fp-armv8 would
conflict.

- attr-neon-builtin-fail.c: use arm_fp to force the appropriate
float-abi setting. Enforcing fpu is not needed here.

- attr-neon-fp16.c: similar, I also removed arm_neon_ok since it was
not necessary to make the test pass in my testing. On second thought,
I'm wondering whether I should leave it and make the test unsupported
in more cases (such as when forcing -march=armv5t, although it does
pass with this patch)

- attr-neon2.c: use arm_vfp to force the appropriate float-abi
setting. Enforcing mfpu=vfp is needed to avoid conflict with the
pragma target fpu=neon (for instance if the toolchain default is
neon-fp16)

- attr-neon3.c: similar

Tested on a variety of configurations, see:

http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html

Note that the regressions reported fall into 3 categories:
- when forcing march=armv5t: tests are now unsupported because I
modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.

- the warning reported by attr-neon-builtin-fail.c moved from line 12
to 14 and is thus seen as a regression + one improvement

- finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
which I need to post a bugzilla.

TBH, I'm a bit concerned by the complexity of all these multilib-like
conditions. I'm confident that I'm still missing some combinations :-)

And with new target attributes coming, new architectures etc... all
this logic is likely to become even more complex.

That being said, OK for trunk?

I'd like us to clean up and reorganise the gcc.target/arm testsuite
at some point to make it more robust with respect to the tons of multilib
options and configurations we can have for arm. It's becoming very
frustrating
to test feature-specific stuff :(

This is ok in the meantime.
Sorry for the delay.

Thanks for the approval, glad to see I'm not the only one willing to see
improvements in this area :)

Committed as r231403.

With this patch we're seeing legitimate PASSes go to NA.
For example:

gcc.target/arm/vfp-1.c
gcc.target/arm/vfp-ldmdbs.c
and other vfp tests in arm.exp.
This is, for example, for the
variant-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard

I suspect under your new predicates they'd need to be changed to check for
arm_fp_ok rather than arm_vfp_ok.

We want to avoid removing any PASSes.
Can you please test some more to make sure we don't remove any legitimate
passes with your patch?

Also, Ramana reminded me offline that any new predicates in target-supports.exp
need documenting in sourcebuild.txt.

In light of that, can you please revert your patch and address the issues above
so that we can be confident we don't lose existing legitimate test coverage?

Sorry for not catching these sooner.
Kyrill

Christophe.

Thanks for handling this!
Kyrill

Christophe

2015-11-27 Christophe Lyon

* lib/target-supports.exp
(check_effective_target_arm_vfp_ok_nocache): New.
(check_effective_target_arm_vfp_ok): Call the new
check_effective_target_arm_vfp_ok_nocache function.
(check_effective_target_arm_fp_ok_nocache): New.
(check_effective_target_arm_fp_ok): New.
(add_options_for_arm_fp): New.
(check_effective_target_arm_crypto_ok_nocache): Require
target_arm_v8_neon_ok instead of arm32.
(check_effective_target_arm_crypto_pragma_ok_nocache): New.
(check_effective_target_arm_crypto_pragma_ok): New.
(add_options_for_arm_vfp): New.
* gcc.target/arm/attr-crypto.c: Use arm_crypto_pragma_ok effective
target. Do not force -mfl

[v3] avoid alignment of static variables affecting stack's

2015-12-10 Thread Jan Beulich

Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v3: Re-base to current trunk.]
[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2015-12-10  Jan Beulich  

* cfgexpand.c (expand_one_var): Exclude static and external
variables when adjusting stack alignment related state.

gcc/testsuite/
2015-12-10  Jan Beulich  

* gcc.c-torture/execute/stkalign.c: New.

--- 2015-12-09/gcc/cfgexpand.c
+++ 2015-12-09/gcc/cfgexpand.c
@@ -1544,12 +1544,15 @@ static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
   unsigned int align = BITS_PER_UNIT;
+  bool stack = true;
   tree origvar = var;
 
   var = SSAVAR (var);
 
   if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
 {
+  stack = !TREE_STATIC (var) && !DECL_EXTERNAL (var);
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
@@ -1578,7 +1581,8 @@ expand_one_var (tree var, bool toplevel,
align = POINTER_SIZE;
 }
 
-  record_alignment_for_reg_var (align);
+  if (stack)
+record_alignment_for_reg_var (align);
 
   if (TREE_CODE (origvar) == SSA_NAME)
 {
--- 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
+++ 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
@@ -0,0 +1,26 @@
+/* { dg-options "-fno-inline" } */
+
+#include 
+
+#define ALIGNMENT 64
+
+unsigned test(unsigned n, unsigned p)
+{
+  static struct { char __attribute__((__aligned__(ALIGNMENT))) c; } s;
+  unsigned x;
+
+  assert(__alignof__(s) == ALIGNMENT);
+  asm ("" : "=g" (x), "+m" (s) : "0" (&x));
+
+  return n ? test(n - 1, x) : (x ^ p);
+}
+
+int main (int argc, char *argv[] __attribute__((unused)))
+{
+  unsigned int x = test(argc, 0);
+
+  x |= test(argc + 1, 0);
+  x |= test(argc + 2, 0);
+
+  return !(x & (ALIGNMENT - 1));
+}



avoid alignment of static variables affecting stack's

Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v3: Re-base to current trunk.]
[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2015-12-10  Jan Beulich  

* cfgexpand.c (expand_one_var): Exclude static and external
variables when adjusting stack alignment related state.

gcc/testsuite/
2015-12-10  Jan Beulich  

* gcc.c-torture/execute/stkalign.c: New.

--- 2015-12-09/gcc/cfgexpand.c
+++ 2015-12-09/gcc/cfgexpand.c
@@ -1544,12 +1544,15 @@ static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
   unsigned int align = BITS_PER_UNIT;
+  bool stack = true;
   tree origvar = var;
 
   var = SSAVAR (var);
 
   if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
 {
+  stack = !TREE_STATIC (var) && !DECL_EXTERNAL (var);
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
@@ -1578,7 +1581,8 @@ expand_one_var (tree var, bool toplevel,
align = POINTER_SIZE;
 }
 
-  record_alignment_for_reg_var (align);
+  if (stack)
+record_alignment_for_reg_var (align);
 
   if (TREE_CODE (origvar) == SSA_NAME)
 {
--- 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
+++ 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
@@ -0,0 +1,26 @@
+/* { dg-options "-fno-inline" } */
+
+#include 
+
+#define ALIGNMENT 64
+
+unsigned test(unsigned n, unsigned p)
+{
+  static struct { char __attribute__((__aligned__(ALIGNMENT))) c; } s;
+  unsigned x;
+
+  assert(__alignof__(s) == ALIGNMENT);
+  asm ("" : "=g" (x), "+m" (s) : "0" (&x));
+
+  return n ? test(n - 1, x) : (x ^ p);
+}
+
+int main (int argc, char *argv[] __attribute__((unused)))
+{
+  unsigned int x = test(argc, 0);
+
+  x |= test(argc + 1, 0);
+  x |= test(argc + 2, 0);
+
+  return !(x & (ALIGNMENT - 1));
+}

Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [0/3] v2

On Thu, Dec 10, 2015 at 12:23 AM, Jeff Law  wrote:
> Richi and I have been discussing revamping slightly how DOM handles
> conditionals which it detects are always true or always false.
>
> During gcc6 stage1 I added code to allow DOM to clean them up
> immediately, primarily to avoid the waste of having the threader handle
> those cases.  It was also believed that by cleaning things up during the
> DOM walk we could realize some secondary benefits (certain PHIs become
> more likely to collapse down to a const/copy which can then be propagated).
>
> That code causes an interesting problem as shown by 68619.  Essentially
> the CFG has 3 loops, one is a natural loop, the other two are irreducible.
>
> DOM finds conditionals which it can optimize to true/false.  It removes
> the unreachable edges and everything seems perfect.  Except that removal
> of those edges causes the irreducible loops become reducible.  This is a
> good thing, except
>
> Now we have two new natural loops, which triggers a checking failure
> because we haven't set up loop structures for the newly exposed natural
> loops.
>
> Richi's suggestion (before this problem was reported) was to have DOM
> leave the CFG alone, but otherwise optimize as-if the edges had been
> removed.  Final removal of the edges would be left to cfg_cleanup.  He
> also pointed me at SCCVN which does something similar.
>
> Further iteration with Richi has led us to extending the dom walker
> interface to directly support propagation of unexecutable properties for
> clients that want that improvement.
>
> To use the new capability, the client passes a new parameter to the walker
> constructor and in its before_dom_children callback the client returns
> either the taken edge when a COND, SWITCH or GOTO collapse to a single
> compile-time destination or NULL otherwise.
>
> The walker will take that information and determine which edges are no
> longer executable.  If that in turn causes blocks to become unreachable the
> walker can then mark further edges as non-executable.
>
> The walker clears EDGE_EXECUTABLE on any edges that are deemed not
> executable.  The client can use this to further refine analysis and
> optimizations.
>
> The walker will no longer call the before/after_dom_children callbacks for
> unreachable blocks.
>
> The change is broken into 4 parts for ease of review.  They must be
> committed together due to the API change in the walker.  They have been
> bootstrapped and regression tested as a unit on x86_64-linux-gnu.
>
>
> 1. Refactor the code from tree-ssa-sccvn.c into domwalk.c.  Essentially the
> constructor is no longer trivial as it may need to initialize
> EDGE_EXECUTABLE.  The return type of the before_dom_children callback
> changes from void to an edge.  Two additional private member functions are
> added to check a block for reachability and to propagate unexecutable
> properties.  Included are the changes to sccvn to use the new capabilities.
>
> 2. Use the new capabilities in tree-ssa-dom.c.
>
> 3. New tests.  One is the actual 68619 testcase.  Two ICEs for minor
> bugs found during development/testing, one case where we optimize better
> now than before, one for a missed optimization during development.
>
> 4. Mechanical changes to the other dom walkers. The return type of the
> before_dom_children callback changes from void to edge. For the callbacks
> changed in this patch, we always return NULL.
>
>
> How does this look now?

The series looks good to me.

Thanks,
Richard.

>
> Jeff

Re: [testsuite][ARM target attributes] Fix effective_target tests

2015-12-10 Thread Christophe Lyon

On 10 December 2015 at 13:30, Kyrill Tkachov  wrote:
> Hi Christophe,
>
>
> On 08/12/15 11:18, Christophe Lyon wrote:
>>
>> On 8 December 2015 at 11:50, Kyrill Tkachov 
>> wrote:
>>>
>>> Hi Christophe,
>>>
>>>
>>> On 27/11/15 13:00, Christophe Lyon wrote:

 Hi,

 After the recent commits from Christian adding target attributes
 support for ARM FPU settings,  I've noticed that some of the tests
 were failing because of incorrect assumptions wrt to the default
 cpu/fpu/float-abi of the compiler.

 This patch fixes the problems I've noticed in the following way:
 - do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
 when gcc is configured --with-float=hard

 - change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
 flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
 defined

 - introduce arm_fp_ok, which is similar but does not enforce fpu setting

 - add a new effective_target: arm_crypto_pragma_ok to check that
 setting this fpu via a pragma is actually supported by the current
 "multilib". This is different from checking the command-line option
 because the pragma might conflict with the command-line options in
 use.

 The updates in the testcases are as follows:
 - attr-crypto.c, we have to make sure that the defaut fpu does not
 conflict with the one forced by pragma. That's why I use the arm_vfp
 options/effective_target. This is needed if gcc has been configured
 --with-fpu=neon-fp16, as the pragma fpu=crypto-neon-fp-armv8 would
 conflict.

 - attr-neon-builtin-fail.c: use arm_fp to force the appropriate
 float-abi setting. Enforcing fpu is not needed here.

 - attr-neon-fp16.c: similar, I also removed arm_neon_ok since it was
 not necessary to make the test pass in my testing. On second thought,
 I'm wondering whether I should leave it and make the test unsupported
 in more cases (such as when forcing -march=armv5t, although it does
 pass with this patch)

 - attr-neon2.c: use arm_vfp to force the appropriate float-abi
 setting. Enforcing mfpu=vfp is needed to avoid conflict with the
 pragma target fpu=neon (for instance if the toolchain default is
 neon-fp16)

 - attr-neon3.c: similar

 Tested on a variety of configurations, see:


 http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html

 Note that the regressions reported fall into 3 categories:
 - when forcing march=armv5t: tests are now unsupported because I
 modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.

 - the warning reported by attr-neon-builtin-fail.c moved from line 12
 to 14 and is thus seen as a regression + one improvement

 - finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
 which I need to post a bugzilla.


 TBH, I'm a bit concerned by the complexity of all these multilib-like
 conditions. I'm confident that I'm still missing some combinations :-)

 And with new target attributes coming, new architectures etc... all
 this logic is likely to become even more complex.

 That being said, OK for trunk?
>>>
>>>
>>> I'd like us to clean up and reorganise the gcc.target/arm testsuite
>>> at some point to make it more robust with respect to the tons of multilib
>>> options and configurations we can have for arm. It's becoming very
>>> frustrating
>>> to test feature-specific stuff :(
>>>
>>> This is ok in the meantime.
>>> Sorry for the delay.
>>>
>> Thanks for the approval, glad to see I'm not the only one willing to see
>> improvements in this area :)
>>
>> Committed as r231403.
>
>
> With this patch we're seeing legitimate PASSes go to NA.
> For example:
>
> gcc.target/arm/vfp-1.c
> gcc.target/arm/vfp-ldmdbs.c
> and other vfp tests in arm.exp.
> This is, for example, for the
> variant-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard
>
Hmm I'm attempting to cover such a configuration in my matrix of validations:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231403/report-build-info.html

The difference is that I use similar flags at GCC configure time, while you
override them when running the testsuite:
--target arm-none-linux-gnueabihf --with-mode=thum --with-cpu=cortex-a57
--with-fpu=crypto-neon-fp-armv8

I this case, I do not see the regressions.

> I suspect under your new predicates they'd need to be changed to check for
> arm_fp_ok rather than arm_vfp_ok.
Probably, yes.

> We want to avoid removing any PASSes.
> Can you please test some more to make sure we don't remove any legitimate
> passes with your patch?
Is that the only combination in which you saw less PASSes?

> Also, Ramana reminded me offline that any new predicates in
> target-supports.exp
> need documenting in sourcebuild.txt.

Re: [testsuite][ARM target attributes] Fix effective_target tests

On 10/12/15 13:04, Christophe Lyon wrote:

On 10 December 2015 at 13:30, Kyrill Tkachov wrote:

Hi Christophe,

On 08/12/15 11:18, Christophe Lyon wrote:

On 8 December 2015 at 11:50, Kyrill Tkachov
wrote:

Hi Christophe,

On 27/11/15 13:00, Christophe Lyon wrote:

Hi,

This patch fixes the problems I've noticed in the following way:
- do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
when gcc is configured --with-float=hard

- change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
defined

- introduce arm_fp_ok, which is similar but does not enforce fpu setting

- attr-neon-builtin-fail.c: use arm_fp to force the appropriate
float-abi setting. Enforcing fpu is not needed here.

- attr-neon3.c: similar

Tested on a variety of configurations, see:

http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html

Note that the regressions reported fall into 3 categories:
- when forcing march=armv5t: tests are now unsupported because I
modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.

- the warning reported by attr-neon-builtin-fail.c moved from line 12
to 14 and is thus seen as a regression + one improvement

- finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
which I need to post a bugzilla.

TBH, I'm a bit concerned by the complexity of all these multilib-like
conditions. I'm confident that I'm still missing some combinations :-)

And with new target attributes coming, new architectures etc... all
this logic is likely to become even more complex.

That being said, OK for trunk?

This is ok in the meantime.
Sorry for the delay.

Thanks for the approval, glad to see I'm not the only one willing to see
improvements in this area :)

Committed as r231403.

With this patch we're seeing legitimate PASSes go to NA.
For example:

gcc.target/arm/vfp-1.c
gcc.target/arm/vfp-ldmdbs.c
and other vfp tests in arm.exp.
This is, for example, for the
variant-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard

Hmm I'm attempting to cover such a configuration in my matrix of validations:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231403/report-build-info.html

The difference is that I use similar flags at GCC configure time, while you
override them when running the testsuite:
--target arm-none-linux-gnueabihf --with-mode=thum --with-cpu=cortex-a57
--with-fpu=crypto-neon-fp-armv8

I this case, I do not see the regressions.

My gcc is arm-none-eabi configured with
--with-float=hard --with-cpu=cortex-a15 --with-fpu=neon-vfpv4 --with-float=hard

I suspect under your new predicates they'd need to be changed to check for
arm_fp_ok rather than arm_vfp_ok.

Probably, yes.

In the test log where it checks the effective target it fails due to:
arm_vfp_ok27268.c:6:4: error: #error __ARM_NEON_FP defined
it's compiling the check with
-mthumb -march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -o
arm_vfp_ok27268.o arm_vfp_ok27268.c

We want to avoid removing any PASSes.
Can you please test some more to make sure we don't remove any legitimate
passes with your patch?

Is that the only combination in which you saw less PASSes?

That's the one that was brought to my attention, but I think the issue here is
just the
tests that check for arm_vfp

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-10 Thread Tom de Vries

[ copy-pasting-with-quote from 
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some 
reason I didn't get this email ]



On Thu, 3 Dec 2015, Tom de Vries wrote:

The flag is set here in expand_omp_target:
...
12682 /* Prevent IPA from removing child_fn as unreachable,
 since there are no
12683refs from the parent function to child_fn in offload
 LTO mode.  */
12684 if (ENABLE_OFFLOADING)
12685   cgraph_node::get (child_fn)->mark_force_output ();
...



How are there no refs from the "parent"?  Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?


That descriptor is the offload table, which is emitted in 
omp_finish_file. The function iterates over vectors offload_vars and 
offload_funcs.


[ I would guess there's a one-on-one correspondance between 
symtab_node::offloadable and membership of either offload_vars or 
offload_funcs. ]



I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.


I've tried an approach where I add a test for node->offloadable next to 
each test for node->force_output, except for the test in the nonlocal_p 
def in ipa_pta_execute. But I didn't (yet) manage to make that work.



I guess setting forced_by_abi instead would also mean child_fn is not removed
as unreachable, while still allowing optimizations:
...
  /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
 to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
 symbols promoted to static and it does not inhibit
 optimization.  */
  unsigned forced_by_abi : 1;
...

But I suspect that other optimizations (than ipa-pta) might break things.


How so?


Probably it's more accurate to say that I do not understand the 
difference very well between force_output and force_by_abi, and what is 
the class of optimizations enabled by using forced_by_abi instead of 
force_output.'



Essentially we have two situations:
- in the host compiler, there is no need for the forced_output flag,
  and it inhibits optimization
- in the accelerator compiler, it (or some equivalent) is needed


Actually, things are slightly more complicated, I realize now. There's 
also the distinction between:

- symbols declared as offloadable in the source code, and
- symbols create by the compiler and marked offloadable


I wonder if setting the force_output flag only when streaming the bytecode for
offloading would work. That way, it wouldn't be set in the host compiler,
while being set in the accelerator compiler.


Yeah, that was my original thinking btw.


FTR, I've tried that approach, as attached. It fixed the 
goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also 
using an accelerator configuration) without any regressions.


Thanks,
- Tom

Set force_output in offload stream if offloadable

---
 gcc/lto-cgraph.c | 2 +-
 gcc/omp-low.c| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 62e5454..c862b19 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -511,7 +511,7 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
   bp_pack_value (&bp, node->local.versionable, 1);
   bp_pack_value (&bp, node->local.can_change_signature, 1);
   bp_pack_value (&bp, node->local.redefined_extern_inline, 1);
-  bp_pack_value (&bp, node->force_output, 1);
+  bp_pack_value (&bp, node->force_output || node->offloadable, 1);
   bp_pack_value (&bp, node->forced_by_abi, 1);
   bp_pack_value (&bp, node->unique_name, 1);
   bp_pack_value (&bp, node->body_removed, 1);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5643480..569cfd7 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12770,10 +12770,12 @@ expand_omp_target (struct omp_region *region)
 	assign_assembler_name_if_neeeded (child_fn);
   cgraph_edge::rebuild_edges ();
 
+#if 0
   /* Prevent IPA from removing child_fn as unreachable, since there are no
 	 refs from the parent function to child_fn in offload LTO mode.  */
   if (ENABLE_OFFLOADING)
 	cgraph_node::get (child_fn)->mark_force_output ();
+#endif
 
   /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the

Re: [PATCH] Fix up noce_try_abs again (PR rtl-optimization/68670)


On 12/09/2015 11:37 PM, Jakub Jelinek wrote:

Not sure what I've been thinking when writing the previous noce_try_abs fix.
I thought that the optimization can be applied for all the conditions,
and whether it can be applied depends on if it is cond ? ~x : x or
cond ? x : ~x.  But that is not the case, the optimization can be only
applied to a subset of conditions, and when it can be applied, it can be
applied to both the cond ? ~x : x and cond ? x : ~x cases (depending on
the condition one is one_cmpl_abs (x) and the other ~one_cmpl_abs (x)).


Odd, I thought you made a good argument last time :-( Sorry for not 
catching the problem.



switch (GET_CODE (cond))
  {
- case GT:
-   if (!negate)
- return FALSE;
-   break;
  case GE:
-   /* >= 0 is the same case as above > -1.  */
-   if (negate)
- return FALSE;
-   break;
  case LT:
-   if (negate)
- return FALSE;
-   break;
- case LE:
-   /* <= 0 is the same case as above < 1.  */
-   if (!negate)
- return FALSE;
break;
  default:
return FALSE;


This turns into a much simpler if statement, doesn't it? Ok with that 
change.



+/* PR rtl-optimization/68670 */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftracer" } */


Curious what effect -ftracer has here?


Bernd

[PATCH] Fix PR68817



Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-10  Richard Biener  

PR tree-optimization/68817
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Detect
gaps early.

* gfortran.dg/pr68817.f90: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 231499)
+++ gcc/tree-vect-slp.c (working copy)
@@ -1237,6 +1237,8 @@ vect_attempt_slp_rearrange_stmts (slp_in
   bitmap_clear (load_index);
   FOR_EACH_VEC_ELT (node->load_permutation, i, lidx)
 {
+  if (lidx >= group_size)
+   return false;
   if (bitmap_bit_p (load_index, lidx))
{
  sbitmap_free (load_index);
Index: gcc/testsuite/gfortran.dg/pr68817.f90
===
--- gcc/testsuite/gfortran.dg/pr68817.f90   (revision 0)
+++ gcc/testsuite/gfortran.dg/pr68817.f90   (working copy)
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math" }
+  SUBROUTINE TEST(A,B,C)
+  DIMENSION B(3),C(1000,10)
+  DO I = 1,3
+ I3=I*3
+ B(1) = B(1) + (C(K,I3+1)-A)
+ B(3) = B(3) + (C(K,I3+3)-A)
+  ENDDO
+  END
+

Re: [PATCH] Fix up noce_try_abs again (PR rtl-optimization/68670)

On Thu, Dec 10, 2015 at 02:20:04PM +0100, Bernd Schmidt wrote:
> This turns into a much simpler if statement, doesn't it? Ok with that
> change.

Ok, will do, thanks.

> >+/* PR rtl-optimization/68670 */
> >+/* { dg-do run } */
> >+/* { dg-options "-O2 -ftracer" } */
> 
> Curious what effect -ftracer has here?

In one case it was seeing all test_bb/then_bb/else_bb/join_bb,
where one of then_bb/else_bb (don't remember which one) contained
y = ~x and another one y = x, while the other case had no else_bb,
with already test_bb doing y = x and then_bb then doing y = ~y.
Correspondingly different condition, different negate value.

Jakub

RE: [PATCH] [ARC] Add support for atomic memory built-in.

2015-12-10 Thread Claudiu Zissulescu

Patch applied:  Committed r231509

Thanks,
Claudiu

> -Original Message-
> From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk]
> Sent: Wednesday, December 09, 2015 6:11 AM
> To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: francois.bed...@synopsys.com; jeremy.benn...@embecosm.com
> Subject: Re: [PATCH] [ARC] Add support for atomic memory built-in.
> 
> 
> 
> On 07/12/15 13:25, Claudiu Zissulescu wrote:
> >
> > Tested with dg.exp (when passing -matomic to gcc compiler line, the
> atomic tests are also successfully executed).
> The comment before "*memory_barrier" could use some elaboration on
> what it does for TARGET_HS.
> Otherwise, this is OK.

Re: [PATCH PR68542]

On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev  wrote:
> Hi Richard.
>
> Thanks a lot for your review.
> Below are my answers.
>
> You asked why I inserted additional check to
> ++ b/gcc/tree-ssa-forwprop.c
> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
> tree_code code, tree type,
>
>gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>
> +  /* Do not perform combining it types are not compatible.  */
> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> +  && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0
> +return NULL_TREE;
> +
>
> again, how does this happen?
>
> This is because without it I've got assert in fold_convert_loc
>   gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>
> since it tries to convert vector of bool to scalar bool.
> Here is essential part of call-stack:
>
> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
> at ../../gcc/diagnostic.c:1259
> #1  0x01743ada in fancy_abort (
> file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
> function=0x184b9d0  tree_node*)::__FUNCTION__> "fold_convert_loc") at
> ../../gcc/diagnostic.c:1332
> #2  0x009c8330 in fold_convert_loc (loc=0, type=0x718a9d20,
> arg=0x71a7f488) at ../../gcc/fold-const.c:2216
> #3  0x009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
> op2=0x718c2030) at ../../gcc/fold-const.c:11453
> #4  0x009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
> op2=0x718c2030) at ../../gcc/fold-const.c:12394
> #5  0x009d870c in fold_binary_op_with_conditional_arg (loc=0,
> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
> op1=0x71a48780, cond=0x71a7f460, arg=0x71a48780,
> cond_first_p=1) at ../../gcc/fold-const.c:6465
> #6  0x009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
> type=0x718a9d20, op0=0x71a7f460, op1=0x71a48780)
> at ../../gcc/fold-const.c:9211
> #7  0x00ecb8fa in combine_cond_expr_cond (stmt=0x71a487d0,
> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
> op1=0x71a48780, invariant_only=true)
> at ../../gcc/tree-ssa-forwprop.c:382

Ok, but that only shows that

  /* Convert A ? 1 : 0 to simply A.  */
  if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
 : (integer_onep (op1)
&& !VECTOR_TYPE_P (type)))
  && integer_zerop (op2)
  /* If we try to convert OP0 to our type, the
 call to fold will try to move the conversion inside
 a COND, which will recurse.  In that case, the COND_EXPR
 is probably the best choice, so leave it alone.  */
  && type == TREE_TYPE (arg0))
return pedantic_non_lvalue_loc (loc, arg0);

  /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
 over COND_EXPR in cases such as floating point comparisons.  */
  if (integer_zerop (op1)
  && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
: (integer_onep (op2)
   && !VECTOR_TYPE_P (type)))
  && truth_value_p (TREE_CODE (arg0)))
return pedantic_non_lvalue_loc (loc,
fold_convert_loc (loc, type,
  invert_truthvalue_loc (loc,
 arg0)));

are wrong?  I can't say for sure without a testcase.

That said, papering over this in tree-ssa-forwprop.c is not the
correct thing to do.

> Secondly, I did not catch your idea to implement GCC Vector Extension
> for vector comparison with bool result since
> such extension completely depends on comparison context, e.g. for your
> example, result type of comparison depends on using - for
> if-comparison it is scalar, but for c = (a==b) - result type is
> vector. I don't think that this is reasonable for current release.

The idea was to be able to write testcases exercising different EQ/NE vector
compares.  But yes, if that's non-trivial the it's not appropriate for stage3.

Can you add a testcase for the forwprop issue and try to fix the offending
bogus folders instead?

Thanks,
Richard.

> And finally about AMD performance. I checked that this transformation
> works for "-march=bdver4" option and regression for 481.wrf must
> disappear too.
>
> Thanks.
> Yuri.
>
> 2015-12-04 15:18 GMT+03:00 Richard Biener :
>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>> sligthly modified mask store optimization. This transformation allows
>>> perform unpredication for semi-hammock containing mask

[Ada] Comment fixes

Tested on x86-64/Linux, applied on the mainline.


2015-12-10  Eric Botcazou  

* gcc-interface/gigi.h (create_var_decl): Adjust comment.
(create_subprog_decl): Likewise.
* gcc-interface/utils.c (create_var_decl): Likewise.
(create_subprog_decl): Likewise.

-- 
Eric BotcazouIndex: gcc-interface/gigi.h
===
--- gcc-interface/gigi.h	(revision 231498)
+++ gcc-interface/gigi.h	(working copy)
@@ -678,11 +678,11 @@ extern tree create_type_decl (tree name,
CONST_FLAG is true if this variable is constant, in which case we might
return a CONST_DECL node unless CONST_DECL_ALLOWED_P is false.
 
-   PUBLIC_FLAG is true if this definition is to be made visible outside of
-   the current compilation unit. This flag should be set when processing the
-   variable definitions in a package specification.
+   PUBLIC_FLAG is true if this is for a reference to a public entity or for a
+   definition to be made visible outside of the current compilation unit, for
+   instance variable definitions in a package specification.
 
-   EXTERN_FLAG is nonzero when processing an external variable declaration (as
+   EXTERN_FLAG is true when processing an external variable declaration (as
opposed to a definition: no storage is to be allocated for the variable).
 
STATIC_FLAG is only relevant when not at top level and indicates whether
@@ -694,6 +694,8 @@ extern tree create_type_decl (tree name,
 
DEBUG_INFO_P is true if we need to write debug information for it.
 
+   ATTR_LIST is the list of attributes to be attached to the variable.
+
GNAT_NODE is used for the position of the decl.  */
 extern tree create_var_decl (tree name, tree asm_name, tree type, tree init,
 			 bool const_flag, bool public_flag,
@@ -728,13 +730,17 @@ extern tree create_label_decl (tree name
the list of its parameters (a list of PARM_DECL nodes chained through the
DECL_CHAIN field).
 
-   INLINE_STATUS, CONST_FLAG, PUBLIC_FLAG, EXTERN_FLAG, VOLATILE_FLAG as well
-   as ATTR_LIST are used to set the appropriate fields in the FUNCTION_DECL.
+   INLINE_STATUS describes the inline flags to be set on the FUNCTION_DECL.
+
+   CONST_FLAG, PUBLIC_FLAG, EXTERN_FLAG, VOLATILE_FLAG are used to set the
+   appropriate flags on the FUNCTION_DECL.
 
ARTIFICIAL_P is true if the subprogram was generated by the compiler.
 
DEBUG_INFO_P is true if we need to write debug information for it.
 
+   ATTR_LIST is the list of attributes to be attached to the subprogram.
+
GNAT_NODE is used for the position of the decl.  */
 extern tree create_subprog_decl (tree name, tree asm_name, tree type,
  tree param_decl_list,
Index: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 231498)
+++ gcc-interface/utils.c	(working copy)
@@ -2344,6 +2344,8 @@ create_type_decl (tree name, tree type,
 
DEBUG_INFO_P is true if we need to write debug information for it.
 
+   ATTR_LIST is the list of attributes to be attached to the variable.
+
GNAT_NODE is used for the position of the decl.  */
 
 tree
@@ -3048,13 +3050,17 @@ create_label_decl (tree name, Node_Id gn
the list of its parameters (a list of PARM_DECL nodes chained through the
DECL_CHAIN field).
 
-   INLINE_STATUS, CONST_FLAG, PUBLIC_FLAG, EXTERN_FLAG, VOLATILE_FLAG as well
-   as ATTR_LIST are used to set the appropriate fields in the FUNCTION_DECL.
+   INLINE_STATUS describes the inline flags to be set on the FUNCTION_DECL.
+
+   CONST_FLAG, PUBLIC_FLAG, EXTERN_FLAG, VOLATILE_FLAG are used to set the
+   appropriate flags on the FUNCTION_DECL.
 
ARTIFICIAL_P is true if the subprogram was generated by the compiler.
 
DEBUG_INFO_P is true if we need to write debug information for it.
 
+   ATTR_LIST is the list of attributes to be attached to the subprogram.
+
GNAT_NODE is used for the position of the decl.  */
 
 tree

Re: [PATCH] Fix detection of setrlimit in libstdc++ testsuite

2015-12-10 Thread Maxim Kuvyrkov

> On Nov 11, 2015, at 7:56 PM, Maxim Kuvyrkov  wrote:
> 
> Hi,
> 
> This patch fixes an obscure cross-testing problem that crashed (OOMed) our 
> boards at Linaro.  Several tests in libstdc++ (e.g., [1]) limit themselves to 
> some reasonable amount of RAM and then try to allocate 32 gigs.  
> Unfortunately, the configure test that checks presence of setrlimit is rather 
> strange: if target is native, then try compile file with call to setrlimit -- 
> if compilation succeeds, then use setrlimit, otherwise, ignore setrlimit.  
> The strange part is that the compilation check is done only for native 
> targets, as if cross-toolchains can't generate working executables.  [This is 
> rather odd, and I might be missing some underlaying caveat.]
> 
> Therefore, when testing a cross toolchain, the test [1] still tries to 
> allocate 32GB of RAM with no setrlimit restrictions.  On most targets that 
> people use for cross-testing this is not an issue because either
> - the target is 32-bit, so there is no 32GB user-space to speak of, or
> - the target board has small amount of RAM and no swap, so allocation 
> immediately fails, or
> - the target board has plenty of RAM, so allocating 32GB is not an issue.
> 
> However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of 
> swap, then one gets into an obscure near-OOM swapping condition.  This is 
> exactly the case with cross-testing aarch64-linux-gnu toolchains on APM 
> Mustang.
> 
> The attached patch removes "native" restriction from configure test for 
> setrlimit.  This enables setrlimit restrictions on the testsuite, and the 
> test [1] expectedly fails to allocate 32GB due to setrlimit restriction.
> 
> I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, 
> and aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no 
> regressions [*].
> 
> OK to commit?
> 
> I didn't go as far as enabling setenv/locale tests when cross-testing 
> libstdc++ because I remember of issues with generating locales in cross-built 
> glibc.  In any case, locale tests are unlikely to OOM the test board the way 
> that absence of setrlimit does.
> 
> [1] 27_io/ios_base/storage/2.cc
> 
> [*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution 
> test to FAIL.  This test uses setrlimit set max file size, and is misbehaving 
> only under QEMU.  I believe this a QEMU issue with not handling setrlimit 
> correctly.
> 

Ping.

--
Maxim Kuvyrkov
www.linaro.org

Re: [v3] avoid alignment of static variables affecting stack's


On 12/10/2015 01:38 PM, Jan Beulich wrote:


* cfgexpand.c (expand_one_var): Exclude static and external
variables when adjusting stack alignment related state.

gcc/testsuite/
2015-12-10  Jan Beulich  

* gcc.c-torture/execute/stkalign.c: New.

--- 2015-12-09/gcc/cfgexpand.c
+++ 2015-12-09/gcc/cfgexpand.c
@@ -1544,12 +1544,15 @@ static HOST_WIDE_INT
  expand_one_var (tree var, bool toplevel, bool really_expand)
  {
unsigned int align = BITS_PER_UNIT;
+  bool stack = true;
tree origvar = var;

var = SSAVAR (var);

if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
  {
+  stack = !TREE_STATIC (var) && !DECL_EXTERNAL (var);
+
/* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
@@ -1578,7 +1581,8 @@ expand_one_var (tree var, bool toplevel,
align = POINTER_SIZE;
  }

-  record_alignment_for_reg_var (align);
+  if (stack)
+record_alignment_for_reg_var (align);


A bit further down we have

  else if (DECL_EXTERNAL (var))
;
  else if (DECL_HAS_VALUE_EXPR_P (var))
;
  else if (TREE_STATIC (var))
;
[]
  return 0;

so I'm thinking the function doesn't do anything for DECL_EXTERNAL or 
TREE_STATIC vars. You're still computing alignment for them but not 
using it. I suggest just doing an early return for non-stack vars.



Bernd

[patch] Improve generated libstdc++ API docs

2015-12-10 Thread Jonathan Wakely


This adjusts some Doxygen comments and updates the Doxygen config file
to ensure all headers are processed (previously doxygen was ignoring
filenames without an extension, which is a lot of the library!)

There's a workaround in include/std/bitset for the bug 
https://bugzilla.gnome.org/show_bug.cgi?id=759242

and another in include/ext/pb_ds/detail/bin_search_tree_/traits.hpp
for https://bugzilla.gnome.org/show_bug.cgi?id=759241

Tested with 'make doc-man-doxygen doc-html-doxygen' and committed to
trunk.


commit a35e6c19ef0d933e0eb6f182a78dfa2624a25eb7
Author: Jonathan Wakely 
Date:   Wed Dec 9 17:20:26 2015 +

Improve generated libstdc++ API docs

	* doc/doxygen/user.cfg.in: Use EXTENSION_MAPPING tag. Add new headers
	to INPUT. Remove obsolete XML_SCHEMA and XML_DTD tags. Update
	PREDEFINED macros. Set BRIEF_MEMBER_DESC for man-pages.
	* include/backward/strstream: Correct @file comment.
	* include/bits/forward_list.h: Improve Doxygen comments.
	* include/bits/locale_facets_nonio.h: Likewise.
	* include/bits/mutex.h: Likewise.
	* include/bits/shared_ptr.h: Likewise.
	* include/bits/stl_deque.h: Likewise.
	* include/debug/vector (_Safe_vector): Add @brief section to comment.
	* include/experimental/bits/fs_dir.h: Correct @file comment.
	* include/experimental/bits/fs_fwd.h: Likewise.
	* include/experimental/bits/fs_ops.h: Likewise.
	* include/experimental/bits/fs_path.h: Likewise.
	* include/experimental/bits/string_view.tcc: Likewise.
	* include/experimental/optional: Document experimental status.
	* include/experimental/string_view: Correct @file comment.
	* include/ext/pb_ds/detail/bin_search_tree_/traits.hpp: Reduce
	whitespace to avoid Doxygen bug.
	* include/std/bitset: Remove redundant @class Doxygen command. Add
	parentheses to avoid Doxygen bug.
	* include/std/mutex: Improve Doxygen comments.
	* include/tr2/dynamic_bitset: Add missing @param documentation.
	* scripts/run_doxygen: Rename man pages for std::experimental types.

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in b/libstdc++-v3/doc/doxygen/user.cfg.in
index ff2db48..a62eeff 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -90,7 +90,7 @@ OUTPUT_LANGUAGE= English
 # documentation (similar to Javadoc). Set to NO to disable this.
 # The default value is: YES.
 
-BRIEF_MEMBER_DESC  = NO
+BRIEF_MEMBER_DESC  = @do_man@
 
 # If the REPEAT_BRIEF tag is set to YES doxygen will prepend the brief
 # description of a member or function before the detailed description
@@ -272,7 +272,7 @@ OPTIMIZE_OUTPUT_VHDL   = NO
 # Note that for custom extensions you also need to set FILE_PATTERNS otherwise
 # the files are not read by doxygen.
 
-EXTENSION_MAPPING  =
+EXTENSION_MAPPING  = no_extension=C++ .h=C++ .tcc=C++ .hpp=C++
 
 # If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments
 # according to the Markdown format, which allows for more readable
@@ -757,6 +757,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/bitset \
  include/chrono \
  include/complex \
+ include/codecvt \
  include/condition_variable \
  include/deque \
  include/forward_list \
@@ -812,6 +813,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/cmath \
  include/csetjmp \
  include/csignal \
+ include/cstdalign \
  include/cstdarg \
  include/cstdbool \
  include/cstddef \
@@ -822,6 +824,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/ctgmath \
  include/ctime \
  include/cwchar \
+ include/cuchar \
  include/cwctype \
  include/ \
  include/bits \
@@ -831,6 +834,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/backward/hash_set \
  include/backward/strstream \
  include/debug \
+ include/debug/array \
  include/debug/bitset \
  include/debug/deque \
  include/debug/forward_list \
@@ -853,6 +857,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/profile/unordered_set \
  include/profile/vector \
  include/ext/algorithm \
+ include/ext/cmath \
  include/ext/functional \

[Ada] Fix internal error on call taking empty array Out parameter

This is a regression present on the mainline only, for a pathological case 
where an empty array is passed as Out parameter to a procedure, but the code 
is actually never executed because it's inside an empty loop...

Tested on x86-64/Linux, applied on the mainline.


2015-12-10  Eric Botcazou  

* gcc-interface/trans.c (Call_to_gnu): Remove guard for NULL_EXPR.
* gcc-interface/utils2.c (gnat_rewrite_reference) : Return
the reference unmodified.
: New case.  Likewise.


2015-12-10  Eric Botcazou  

* gnat.dg/array25.adb: New test.
* gnat.dg/array25_pkg.ad[sb]: New helper.


-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 231498)
+++ gcc-interface/trans.c	(working copy)
@@ -4407,9 +4407,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gn
   /* If it's possible we may need to use this expression twice, make sure
 	 that any side-effects are handled via SAVE_EXPRs; likewise if we need
 	 to force side-effects before the call.  */
-  if (Ekind (gnat_formal) != E_In_Parameter
-	  && !is_by_ref_formal_parm
-	  && TREE_CODE (gnu_name) != NULL_EXPR)
+  if (Ekind (gnat_formal) != E_In_Parameter && !is_by_ref_formal_parm)
 	{
 	  tree init = NULL_TREE;
 	  gnu_name = gnat_stabilize_reference (gnu_name, true, &init);
Index: gcc-interface/utils2.c
===
--- gcc-interface/utils2.c	(revision 231498)
+++ gcc-interface/utils2.c	(working copy)
@@ -2733,7 +2733,8 @@ gnat_rewrite_reference (tree ref, rewrit
   break;
 
 case ERROR_MARK:
-  return error_mark_node;
+case NULL_EXPR:
+  return ref;
 
 default:
   gcc_unreachable ();
-- { dg-do compile }

with Array25_Pkg;

procedure Array25 is

   package My_Pkg is new Array25_Pkg (0, 0);

begin
   null;
end;
package body Array25_Pkg is

   procedure Get_Inner (A : out Arr1) is
   begin
  null;
   end;

   procedure Get (A : out Arr2) is
   begin
  for I in Arr2'Range loop
 Get_Inner (A (I).Data);
  end loop;
   end;

end Array25_Pkg;
generic

   UB1 : Natural;

   UB2 : Natural;

package Array25_Pkg is

   type Arr1 is array (1 .. UB1) of Integer;

   type Rec is record
  Data : Arr1;
   end record;

   type Arr2  is array (1 .. UB2) of Rec;

   procedure Get (A : out Arr2);

end Array25_Pkg;

[PATCH][RX] v2 instructions support.

2015-12-10 Thread Yoshinori Sato

Add RX v2 enhancement instructions support.

gcc/ChangeLog
2015/12/10  Yoshinori Sato  

* config/rx/constraints.md: Add constraint "q".
* doc/md.texi: Likewise.
* config/rx/rx-opts.h(rx_cpu_types): Add type RXV2.
* config/rx/rx.c(rx_print_operand): Add a0, a1 and extb.
(rx_expand_prologue): Use v2 instraction.
(rx_expand_epilogue): Likewise.
(rx_builtin): Add v2 instruction.
(rx_expand_builtin): Likewise.
(rx_expand_builtin_mac): 3 operand support.
(rx_expand_int_builtin_1_arg): Likewise.
(rx_expand_int_builtin_0_arg): Delete.
(rx_expand_builtin_mac2): New function.
(rx_expand_builtin_fromacc): Likewise.
(rx_expand_builtin_fromacc2): Likewise.
(rx_expand_builtin_toacc): Likewise.
(rx_expand_builtin_toacc2): Likewise.
(rx_expand_builtin_rac): Likewise.
* config/rx/rx.h(TARGET_CPU_CPP_BUILTINS): Add v2 define.
(ALLOW_RXV2_INSNS): New.
* config/rx/rx.md(constants): Add new builtin enum.
(addsf3): 3 operands support.
(mulsf3): Likewise.
(subsf3): Likewise.
(fixuns_truncsfsi2): New.
(floatunssisf2): Likewise.
(sqrtsf2): Likewise.
(machi): 3 operands support.
(maclo): Likewise.
(mulhi): Likewise.
(mullo): Likewise.
(mvfachi): Likewise.
(mvfacmi): Likewise.
(mvtachi): Likewise.
(mvtaclo): Likewise.
(racw): Likewise.
(mvfacgu): New.
(mvfaclo): Likewise.
(racl): Likewise.
(rdacl): Likewise.
(rdacw): Likewise.
(emaca): Likewise.
(emsba): Likewise.
(maclh): Likewise.
(msbhi): Likewise.
(msblo): Likewise.
(msblh): Likewise.
* config/rx/sync.md: New file.
* config/rx/rx.opt: Add rxv2 for mcpu option.
* config/rx/t-rx: Add v2 for multilib

---
 gcc/config/rx/constraints.md |   7 +
 gcc/config/rx/rx-opts.h  |   3 +-
 gcc/config/rx/rx.c   | 425 +--
 gcc/config/rx/rx.h   |   7 +
 gcc/config/rx/rx.md  | 363 +---
 gcc/config/rx/rx.opt |   3 +
 gcc/config/rx/sync.md| 191 +++
 gcc/config/rx/t-rx   |   2 +
 gcc/doc/md.texi  |   2 +
 9 files changed, 915 insertions(+), 88 deletions(-)
 create mode 100644 gcc/config/rx/sync.md

diff --git a/gcc/config/rx/constraints.md b/gcc/config/rx/constraints.md
index b41c232..e388350 100644
--- a/gcc/config/rx/constraints.md
+++ b/gcc/config/rx/constraints.md
@@ -106,3 +106,10 @@
)
   )
 )
+
+(define_memory_constraint "q"
+  "A MEM which only uses REG addressing."
+  (and (match_code "mem")
+(match_code "reg" "0")
+  )
+)
diff --git a/gcc/config/rx/rx-opts.h b/gcc/config/rx/rx-opts.h
index fa83e91..52d4dce 100644
--- a/gcc/config/rx/rx-opts.h
+++ b/gcc/config/rx/rx-opts.h
@@ -25,7 +25,8 @@ enum rx_cpu_types
   RX600,
   RX610,
   RX200,
-  RX100
+  RX100,
+  RXV2
 };
 
 #endif
diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index 781b6b1..7b35308 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -639,6 +639,19 @@ rx_print_operand (FILE * file, rtx op, int letter)
case 0xa: fprintf (file, "isp"); break;
case 0xb: fprintf (file, "fintv"); break;
case 0xc: fprintf (file, "intb"); break;
+   case 0xd:
+ if (ALLOW_RXV2_INSNS)
+   {
+ fprintf (file, "extb"); break;
+   }
+   goto invalid_register;
+   case 0x40:
+   case 0x41:
+ if (ALLOW_RXV2_INSNS)
+   {
+ fprintf (file, "a%ld", INTVAL(op) - 0x40); break;
+   } /* RXv1 fall through */
+   invalid_register:
default:
  warning (0, "unrecognized control register number: %d - using 'psw'",
   (int) INTVAL (op));
@@ -731,6 +744,11 @@ rx_print_operand (FILE * file, rtx op, int letter)
   fprintf (file, "%s", reg_names [rx_pid_base_regnum ()]);
   break;
 
+case 'V':
+  gcc_assert (CONST_INT_P (op));
+  fprintf (file, "a%ld", INTVAL(op));
+  break;
+
 case 'R':
   gcc_assert (GET_MODE_SIZE (GET_MODE (op)) <= 4);
   unsigned_load = true;
@@ -1772,12 +1790,32 @@ rx_expand_prologue (void)
  /* We have assumed that there are at least two registers pushed... */
  gcc_assert (acc_high != 0);
 
- /* Note - the bottom 16 bits of the accumulator are inaccessible.
-We just assume that they are zero.  */
- emit_insn (gen_mvfacmi (gen_rtx_REG (SImode, acc_low)));
- emit_insn (gen_mvfachi (gen_rtx_REG (SImode, acc_high)));
- emit_insn (gen_stack_push (gen_rtx_REG (SImode, acc_low)));
- emit_insn (gen_stack_push (gen_rtx_REG (SImode, acc_high)));
+ if (!ALLOW_RXV2_INSNS)
+   {
+ /* Note - the bottom 16 bits of the accumulator

[PATCH] Fix IPA PTA wrong-code bug with LTO


The following patch fixes a latent issue in IPA PTA with LTO where
it doesn't properly honor partition boundaries.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-10  Richard Biener  

* tree-ssa-structalias.c (struct variable_info): Add
is_ipa_escape_point flag.
(new_var_info): Initialize to false.
(find_func_aliases): Generate escape constraints for stores
properly in IPA mode.
(ipa_pta_execute): Compute is_ipa_escape_point for globals.

Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 231498)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -160,6 +160,8 @@
The is_global_var bit which marks escape points is overly conservative
in IPA mode.  Split it to is_escape_point and is_global_var - only
externally visible globals are escape points in IPA mode.
+   There is now is_ipa_escape_point but this is only used in a few
+   selected places.
 
The way we introduce DECL_PT_UID to avoid fixing up all points-to
sets in the translation unit when we copy a DECL during inlining
@@ -265,6 +267,9 @@ struct variable_info
   /* True if this represents a global variable.  */
   unsigned int is_global_var : 1;
 
+  /* True if this represents a module escape point for IPA analysis.  */
+  unsigned int is_ipa_escape_point : 1;
+
   /* True if this represents a IPA function info.  */
   unsigned int is_fn_info : 1;
 
@@ -374,6 +379,7 @@ new_var_info (tree t, const char *name,
   ret->is_restrict_var = false;
   ret->ruid = 0;
   ret->is_global_var = (t == NULL_TREE);
+  ret->is_ipa_escape_point = false;
   ret->is_fn_info = false;
   if (t && DECL_P (t))
 ret->is_global_var = (is_global_var (t)
@@ -4779,11 +4785,13 @@ find_func_aliases (struct function *fn,
}
   /* If there is a store to a global variable the rhs escapes.  */
   if ((lhsop = get_base_address (lhsop)) != NULL_TREE
- && DECL_P (lhsop)
- && is_global_var (lhsop)
- && (!in_ipa_mode
- || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop)))
-   make_escape_constraint (rhsop);
+ && DECL_P (lhsop))
+   {
+ varinfo_t vi = get_vi_for_tree (lhsop);
+ if ((! in_ipa_mode && vi->is_global_var)
+ || vi->is_ipa_escape_point)
+   make_escape_constraint (rhsop);
+   }
 }
   /* Handle escapes through return.  */
   else if (gimple_code (t) == GIMPLE_RETURN
@@ -4794,8 +4802,7 @@ find_func_aliases (struct function *fn,
   if (!in_ipa_mode
  || !(fi = get_vi_for_tree (fn->decl)))
make_escape_constraint (gimple_return_retval (return_stmt));
-  else if (in_ipa_mode
-  && fi != NULL)
+  else if (in_ipa_mode)
{
  struct constraint_expr lhs ;
  struct constraint_expr *rhsp;
@@ -7498,7 +7505,15 @@ ipa_pta_execute (void)
   if (var->alias && var->analyzed)
continue;
 
-  get_vi_for_tree (var->decl);
+  varinfo_t vi = get_vi_for_tree (var->decl);
+
+  /* For the purpose of IPA PTA unit-local globals are not
+ escape points.  */
+  bool nonlocal_p = (var->used_from_other_partition
+|| var->externally_visible
+|| var->force_output);
+  if (nonlocal_p)
+   vi->is_ipa_escape_point = true;
 }
 
   if (dump_file

Fix PR ada/48013

This reports an interesting case where the compiler emits an empty elaboration 
routine and doesn't report the violation of the associated No_Elaboration_Code 
restriction.  Probably harmless in practice, but this small hole in gigi can 
easily be plugged.

Tested on x86-64/Linux, applied on the mainline.


2015-12-10  Eric Botcazou  

PR ada/48013
* gcc-interface/trans.c (empty_stmt_list_p): New predicate.
(gigi): Invoke it to compute the No_Elaboration_Code property.
(insert_code_for): Do not insert the code if it's empty.


2015-12-10  Eric Botcazou  

* gnat.dg/specs/elab4.ads: New test.
* gnat.dg/specs/elab4_proc.ad[sb]: New helper.


-- 
Eric Botcazou-- { dg-do compile }

pragma Restrictions(No_Elaboration_Code);

with Elab4_Proc;

package Elab4 is

  procedure My_G is new Elab4_Proc;

end Elab4;

-- { dg-final { scan-assembler-not "elabs" } }
procedure Elab4_Proc is
begin
   null;
end;
-- { dg-excess-errors "no code generated" }

generic
procedure Elab4_Proc;
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 231513)
+++ gcc-interface/trans.c	(working copy)
@@ -218,6 +218,7 @@ static enum tree_code gnu_codes[Number_N
 
 static void init_code_table (void);
 static void Compilation_Unit_to_gnu (Node_Id);
+static bool empty_stmt_list_p (tree);
 static void record_code_position (Node_Id);
 static void insert_code_for (Node_Id);
 static void add_cleanup (tree, Node_Id);
@@ -675,15 +676,15 @@ gigi (Node_Id gnat_root,
   /* Finally see if we have any elaboration procedures to deal with.  */
   for (info = elab_info_list; info; info = info->next)
 {
-  tree gnu_body = DECL_SAVED_TREE (info->elab_proc), gnu_stmts;
+  tree gnu_body = DECL_SAVED_TREE (info->elab_proc);
 
   /* We should have a BIND_EXPR but it may not have any statements in it.
 	 If it doesn't have any, we have nothing to do except for setting the
 	 flag on the GNAT node.  Otherwise, process the function as others.  */
-  gnu_stmts = gnu_body;
+  tree gnu_stmts = gnu_body;
   if (TREE_CODE (gnu_stmts) == BIND_EXPR)
 	gnu_stmts = BIND_EXPR_BODY (gnu_stmts);
-  if (!gnu_stmts || !STATEMENT_LIST_HEAD (gnu_stmts))
+  if (!gnu_stmts || empty_stmt_list_p (gnu_stmts))
 	Set_Has_No_Elaboration_Code (info->gnat_node, 1);
   else
 	{
@@ -7881,6 +7882,25 @@ push_exception_label_stack (vec

[patch] libstdc++/68825 Fix dejagnu directives in shared_ptr test

2015-12-10 Thread Jonathan Wakely


This XFAIL test should have been used { dg-do compile } so it doesn't
get marked UNRESOLVED when it fails to compile. The test was also
broken, the assignment operator compiled OK, but because the test used
dg-excess-errors instead of dg-error the later failure for the copy
construction matched it and made it pass. Changing it to dg-error made
it FAIL, until I also constrained the assignment operators.

Tested powerpc64le-linux, committed to trunk.

commit e1eaa02699034c3f918c28ace1a193ab98392bdd
Author: Jonathan Wakely 
Date:   Thu Dec 10 14:23:40 2015 +

Fix dejagnu directives in shared_ptr test

	PR libstdc++/68825
	* include/experimental/bits/shared_ptr.h (__shared_ptr, __weak_ptr,
	experimental::shared_ptr, experimental::weak_ptr): Constrain
	assignment operators.
	* testsuite/experimental/memory/shared_ptr/cons/copy_ctor_neg.cc:
	Change to a compile-only test and change dg-excess-errors to dg-error.

diff --git a/libstdc++-v3/include/experimental/bits/shared_ptr.h b/libstdc++-v3/include/experimental/bits/shared_ptr.h
index 413652d..bd22f28 100644
--- a/libstdc++-v3/include/experimental/bits/shared_ptr.h
+++ b/libstdc++-v3/include/experimental/bits/shared_ptr.h
@@ -91,9 +91,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class __shared_ptr<__libfund_v1<_Tp, false>, _Lp>
 : private __shared_ptr<_Tp, _Lp>
 {
-  template
+  template
 	using _Compatible
-	  = enable_if_t<__sp_compatible<_Tp1, _Tp>::value>;
+	  = enable_if_t<__sp_compatible<_Tp1, _Tp>::value, _Res>;
 
   using _Base_type = __shared_ptr<_Tp>;
 
@@ -201,7 +201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using _Base_type::operator->;
 
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(const __shared_ptr<__libfund_v1<_Tp1>, _Lp>& __r) noexcept
 	{
 	  _Base_type::operator=(__r._M_get_base());
@@ -209,7 +209,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(__shared_ptr<__libfund_v1<_Tp1>, _Lp>&& __r) noexcept
 	{
 	  _Base_type::operator=(std::move(__r._M_get_base()));
@@ -217,7 +217,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(std::unique_ptr<_Tp1>&& __r)
 	{
 	  _Base_type::operator=(std::move(__r));
@@ -226,7 +226,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if _GLIBCXX_USE_DEPRECATED
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(std::auto_ptr<_Tp1>&& __r)
 	{
 	  _Base_type::operator=(std::move(__r));
@@ -292,26 +292,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using element_type = remove_extent_t<_Tp>;
 
 private:
-  struct _Array_Deleter
+  struct _Array_deleter
   {
 	void
 	operator()(element_type const *__p) const
 	{ delete [] __p; }
   };
 
-  struct _Normal_Deleter
-  {
-	void
-	operator()(element_type const *__p) const
-	{ delete __p; }
-  };
-
-  template
+  template
 	using _Compatible
-	  = enable_if_t<__sp_compatible<_Tp1, _Tp>::value>;
-
-  using _Deleter_type
-	= conditional_t::value, _Array_Deleter, _Normal_Deleter>;
+	  = enable_if_t<__sp_compatible<_Tp1, _Tp>::value, _Res>;
 
   using _Base_type = __shared_ptr;
 
@@ -325,7 +315,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 	explicit __shared_ptr(_Tp1* __p)
-	: _Base_type(__p, _Deleter_type())
+	: _Base_type(__p, _Array_deleter())
 	{ }
 
   template
@@ -402,7 +392,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	reset(_Tp1* __p)
 	{
 	  _GLIBCXX_DEBUG_ASSERT(__p == 0 || __p != get());
-	  __shared_ptr(__p, _Deleter_type()).swap(*this);
+	  __shared_ptr(__p, _Array_deleter()).swap(*this);
 	}
 
   template
@@ -423,7 +413,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(const __shared_ptr<__libfund_v1<_Tp1>, _Lp>& __r) noexcept
 	{
 	  _Base_type::operator=(__r._M_get_base());
@@ -431,7 +421,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(__shared_ptr<__libfund_v1<_Tp1>, _Lp>&& __r) noexcept
 	{
 	  _Base_type::operator=(std::move(__r._M_get_base()));
@@ -439,7 +429,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(std::unique_ptr<_Tp1>&& __r)
 	{
 	  _Base_type::operator=(std::move(__r));
@@ -448,7 +438,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if _GLIBCXX_USE_DEPRECATED
   template
-	__shared_ptr&
+	_Compatible<_Tp1, __shared_ptr&>
 	operator=(std::auto_ptr<_Tp1>&& __r)
 	{
 	  _Base_type::operator=(std::move(__r));
@@ -509,8 +499,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class __weak_ptr<__libfund_v1<_Tp>, _Lp>
 : __weak_ptr, _Lp>
 {
-  template
-	using _Compatible = enable_if_t<__sp_compatible<_Tp1, _Tp>::value>;
+  template
+	using _Compatible
+	  = enable_if_t<__sp_compatible<_Tp1, _Tp>::value, _Res>;

[PATCH][combine] Don't create LSHIFTRT of zero bits in change_zero_ext


Hi all,

At the end of combine if it fails to recognise the patterns it produces it tries
again after transforming all zero_extends and zero_extracts into and-immediate 
plus
an LSHIFTRT if needed. However, it will construct an LSHIFTRT inside the AND 
even
if the shift distance is 0, which can hurt recognisability.

This patch fixes that by not creating the LSHIFRT of zero.
I hit this when I was experimenting with some new backend patterns and other 
unrelated
combine changes, so I don't have a testcase that shows this on a clean trunk, 
but
I believe this could hurt recognisability in the future.

Bootstrapped and tested on arm, aarch64, x86_64.
I didn't see any codegen differences with clean trunk on aarch64 SPEC2006.

I'm okay with delaying this for next stage 1 if people prefer, though I think 
it's
pretty low risk.

What do people think?

Kyrill

2015-12-10  Kyrylo Tkachov  

* combine.c (change_zero_ext): Do not create a shift of zero length.
diff --git a/gcc/combine.c b/gcc/combine.c
index 7d4ffbcc766113c0af1c903f3d0dadbe74dec7fa..2628e1a437c2cd63d439a3ad28d1799b8c679be3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11037,7 +11037,9 @@ change_zero_ext (rtx *src)
 	  if (BITS_BIG_ENDIAN)
 	start = GET_MODE_PRECISION (mode) - size - start;
 
-	  x = gen_rtx_LSHIFTRT (mode, XEXP (x, 0), GEN_INT (start));
+	  x = XEXP (x, 0);
+	  if (start > 0)
+	x = gen_rtx_LSHIFTRT (mode, x, GEN_INT (start));
 	}
   else if (GET_CODE (x) == ZERO_EXTEND
 	   && GET_CODE (XEXP (x, 0)) == SUBREG

[PATCH] testsuite/lib/multline.exp: show test name and line numbers

The output from multiline.exp doesn't scale well when reviewing
and comparing .sum files.

Lines in e.g. gcc.sum from a dg-{begin|end}-multiline-output pair are
currently of the form:
 PASS: expected multiline pattern 0 was found: "\s*myvar = myvar\.x;.*\n
   ~\^~\n"

as compared to e.g. this result line for a dg-warning:
 PASS: gcc.dg/plugin/diagnostic-test-show-locus-bw.c 
-fplugin=./diagnostic_plugin_test_show_locus.so  (test for warnings, line 198)

In particular the line doesn't identify the filename/options of the test,
which means in a .sum file with a FAIL of one of these we can't easily
locate the testcase of interest.

The following patch updates multiline.exp to use the global
  $testname_with_flags
as a prefix in such results.

I also dropped the printing of the index in favor of printing the line
numbers enclosed within dg-{begin|end}-multiline-output.

After the patch, we get result lines like this:
 PASS: gcc.dg/plugin/diagnostic-test-show-locus-bw.c 
-fplugin=./diagnostic_plugin_test_show_locus.so  expected multiline pattern 
lines 17-18 was found: "\s*myvar = myvar\.x;.*\n   ~\^~\n"

(note the presence of the source file and options and the "lines 17-18")

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu;
I compared the results against a control build (of r231445), and
the results were unchanged, other than the expected changes from
the above, leading to
- 92 PASS results changing name within g++.sum
- 7 PASS results changing name within each of obj-c++.sum
  and objc.sum, and
- 125 PASS results changing name within gcc.sum.

OK for trunk for gcc 6?

gcc/testsuite/ChangeLog:
* lib/multiline.exp (_multiline_expected_outputs): Update comment.
(dg-end-multiline-output): Capture line numbers within
_multiline_expected_outputs.
(handle-multiline-outputs): Access global $testname_with_flags
and add it as a prefix to pass/fail results.  Extract line numbers
from $_multiline_expected_outputs and print them within pass/fail
results, replacing the printing of $index.  Consolidate the
string prefix shared between pass/fail into a new local: $title.
---
 gcc/testsuite/lib/multiline.exp | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/lib/multiline.exp b/gcc/testsuite/lib/multiline.exp
index c3d0506..5367437 100644
--- a/gcc/testsuite/lib/multiline.exp
+++ b/gcc/testsuite/lib/multiline.exp
@@ -54,7 +54,9 @@
 # The line number of the last dg-begin-multiline-output directive.
 set _multiline_last_beginning_line -1
 
-# A list of lists of strings.
+# A list of
+#   first-line-number, last-line-number, lines
+# where each "lines" is a list of strings.
 set _multiline_expected_outputs []
 
 
@@ -88,12 +90,15 @@ proc dg-end-multiline-output { args } {
 # Load it and split it into lines
 
 set lines [_get_lines $prog $_multiline_last_beginning_line $line]
-set _multiline_last_beginning_line -1
 
 verbose "lines: $lines" 3
+# Create an entry of the form:  first-line, last-line, lines
+set entry [list $_multiline_last_beginning_line $line $lines]
 global _multiline_expected_outputs
-lappend _multiline_expected_outputs $lines
+lappend _multiline_expected_outputs $entry
 verbose "within dg-end-multiline-output: _multiline_expected_outputs: 
$_multiline_expected_outputs" 3
+
+set _multiline_last_beginning_line -1
 }
 
 # Hook to be called by prune.exp's prune_gcc_output to
@@ -107,9 +112,14 @@ proc dg-end-multiline-output { args } {
 
 proc handle-multiline-outputs { text } {
 global _multiline_expected_outputs
+global testname_with_flags
 set index 0
-foreach multiline $_multiline_expected_outputs {
-   verbose "  multiline: $multiline" 4
+foreach entry $_multiline_expected_outputs {
+   verbose "  entry: $entry" 3
+   set start_line [lindex $entry 0]
+   set end_line   [lindex $entry 1]
+   set multiline  [lindex $entry 2]
+   verbose "  multiline: $multiline" 3
set rexp [_build_multiline_regex $multiline $index]
verbose "rexp: ${rexp}" 4
# Escape newlines in $rexp so that we can print them in
@@ -117,12 +127,14 @@ proc handle-multiline-outputs { text } {
set escaped_regex [string map {"\n" "\\n"} $rexp]
verbose "escaped_regex: ${escaped_regex}" 4
 
+   set title "$testname_with_flags expected multiline pattern lines 
$start_line-$end_line"
+
# Use "regsub" to attempt to prune the pattern from $text
if {[regsub -line $rexp $text "" text]} {
# Success; the multiline pattern was pruned.
-   pass "expected multiline pattern $index was found: 
\"$escaped_regex\""
+   pass "$title was found: \"$escaped_regex\""
} else {
-   fail "expected multiline pattern $index not found: 
\"$escaped_regex\""

[patch] Update -std=gnu++1y in experimental::shared_ptr tests

2015-12-10 Thread Jonathan Wakely


Mechanical change to update the spelling of the -std option.

Committed to trunk.
commit ec7bb2c77a41d6e34e4e29e24195845380ef071f
Author: Jonathan Wakely 
Date:   Thu Dec 10 14:34:52 2015 +

Change tests to use -std=gnu++14 not -std=gnu++1y

	* testsuite/experimental/memory/shared_ptr/assign/assign.cc: Replace
	-std=gnu++1y with -std=gnu++14.
	* testsuite/experimental/memory/shared_ptr/cast/cast.cc: Likewise.
	* testsuite/experimental/memory/shared_ptr/comparison/comparison.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/alias_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/alloc_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/copy_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/copy_ctor_neg.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/default_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/move_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/pointer_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/unique_ptr_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/cons/weak_ptr_ctor.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/dest/dest.cc: Likewise.
	* testsuite/experimental/memory/shared_ptr/modifiers/reset.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/modifiers/swap.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/observers/bool_conv.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/observers/operators.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/observers/owner_before.cc:
	Likewise.
	* testsuite/experimental/memory/shared_ptr/observers/use_count.cc:
	Likewise.

diff --git a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/assign/assign.cc b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/assign/assign.cc
index 7656c98..141be63 100644
--- a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/assign/assign.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/assign/assign.cc
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++1y" }
+// { dg-options "-std=gnu++14" }
 
 // Copyright (C) 2015 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cast/cast.cc b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cast/cast.cc
index 18c2ba4..06b5ea1 100644
--- a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cast/cast.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cast/cast.cc
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++1y" }
+// { dg-options "-std=gnu++14" }
 
 // Copyright (C) 2015 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc
index 52fc193..61b9d0c 100644
--- a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++1y" }
+// { dg-options "-std=gnu++14" }
 
 // Copyright (C) 2015 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alias_ctor.cc b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alias_ctor.cc
index 15d1de8..d37026a 100644
--- a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alias_ctor.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alias_ctor.cc
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++1y" }
+// { dg-options "-std=gnu++14" }
 
 // Copyright (C) 2015 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alloc_ctor.cc b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alloc_ctor.cc
index 12eb15c..7c99d4e 100644
--- a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alloc_ctor.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/alloc_ctor.cc
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++1y" }
+// { dg-options "-std=gnu++14" }
 
 // Copyright (C) 2015 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/copy_ctor.cc b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/copy_ctor.cc
index 4c3680e..e754d56 100644
--- a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/copy_ctor.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/copy_ctor.cc
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++1y" }
+// { dg-options "-std=gnu++14" }
 
 // Copyright (C) 2015 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/copy_ctor_neg.cc b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/cons/copy_ctor_neg.cc
index b269

RE: [PATCH][ARC] Refurbish emitting DWARF2 for epilogue.

2015-12-10 Thread Claudiu Zissulescu


> 
> OTOH, the example you give also shows a much more nuanced approach to
> throttling optimization:
> the patch doesn't dead all epilogue scheduling, but specifically tests for the
> presence of a frame related insn at the point where it could cause trouble.

Actually, in my revised patch I do move the blockage instruction earlier on, 
before emitting any epilogue related instruction.

> Or have some target hook to make it not even bother filling delay slots
> speculatively; for targets that can fully unexpose the delay slot, like SH and
> ARC >= ARC700, this aspect of fill_eager_delay_slots only mucks up
> schedules and increases code size.
> 
GCC has such a target:

bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void) [Target Hook]
This predicate controls the use of the eager delay slot filler to disallow 
speculatively
executed instructions being placed in delay slots. Targets such as certain MIPS 
architectures
possess both branches with and without delay slots. As the eager delay slot
filler can decrease performance, disabling it is beneficial when ordinary 
branches are
available. Use of delay slot branches filled using the basic filler is often 
still desirable
as the delay slot can hide a pipeline bubble.

We can enable this hook for ARC and see what are the numbers. Initially, I 
thought that we may lose more by enabling this hook. What do u say?

//Claudiu

Free up bits in DECLs and TYPEs

2015-12-10 Thread Michael Matz

Hello,

the other day Richi wondered why we specify alignment in bits, instead of 
in log2, as if e.g. a 12 byte alignment would make much sense (sure, an 
alignment of 12 byte means the address is evenly dividable by 12; great!).

This patch changes the two places where we specify alignment (types and 
decls) to use the log2 of alignment and accordingly reduces the field-size 
to 6 bits from 32 bits.  This frees up many bits in both structures (26 in 
tree_type_common and 20 in tree_decl_common (plus in 32bit hole on 64bit 
archs)), but at this point there's not much possibility to move around 
some other members to really reduce the overall structure sizes (we could 
move address_space from tree_base to type_common now, but that doesn't 
reduce tree_base, except for giving _it_ eight more bits, but that's not 
necessary right now).

Unfortunately we need to support an alignment of up to 
MAX_OFILE_ALIGNMENT, which on ELF is 8<<28, even though I think this is 
fairly silly, so we need 6 bits, instead of 4 or 5 (which would support up 
to page or large-page alignment).  While being at it I reduce also the 
other alignment specifier (off_align for field_decls) to that number of 
bits.

The core of the patch is trivial: tree-core.h for the structure change and 
tree.h for accessor macros.  As TYPE_ALIGN and DECL_ALIGN can't be lvalues 
anymore we need a SET_{DECL,TYPE}_ALIGN accessor as well, and that's the 
cause of the patches size: we need to change all write accesses 
everywhere.

I've changed all places I could find via building and greping.  The patch 
was regstrapped on x86_64-linux (all languages) without regressions.

This isn't stage 3 material really, OTOH fairly low risk.  Anyway, okay 
for trunk now or once stage 1 opens?


Ciao,
Michael.
* tree.h (TYPE_ALIGN, DECL_ALIGN): Return shifted amount.
(SET_TYPE_ALIGN, SET_DECL_ALIGN): New.
* tree-core.h (tree_type_common.align): Use bit-field.
(tree_type_common.spare): New.
(tree_decl_common.off_align): Make smaller.
(tree_decl_common.align): Use bit-field.

* expr.c (expand_expr_addr_expr_1): Use SET_TYPE_ALIGN.
* omp-low.c (install_var_field): Use SET_DECL_ALIGN.
(scan_sharing_clauses): Ditto.
(finish_taskreg_scan): Use SET_DECL_ALIGN and SET_TYPE_ALIGN.
(omp_finish_file): Ditto.
* stor-layout.c (do_type_align): Use SET_DECL_ALIGN.
(layout_decl): Ditto.
(relayout_decl): Ditto.
(finalize_record_size): Use SET_TYPE_ALIGN.
(finalize_type_size): Ditto.
(finish_builtin_struct): Ditto.
(layout_type): Ditto.
(initialize_sizetypes): Ditto.
* targhooks.c (std_gimplify_va_arg_expr): Use SET_TYPE_ALIGN.
* tree-nested.c (insert_field_into_struct): Use SET_TYPE_ALIGN.
(lookup_field_for_decl): Use SET_DECL_ALIGN.
(get_chain_field): Ditto.
(get_trampoline_type): Ditto.
(get_nl_goto_field): Ditto.
* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
SET_DECL_ALIGN.
(unpack_ts_type_common_value_fields): Use SET_TYPE_ALIGN.
* tree.c (make_node_stat): Use SET_DECL_ALIGN and SET_TYPE_ALIGN.
(build_qualified_type): Use SET_TYPE_ALIGN.
(build_aligned_type, build_range_type_1): Ditto.
(build_atomic_base): Ditto.
(build_common_tree_nodes): Ditto.
* cfgexpand.c (align_local_variable): Use SET_DECL_ALIGN.
(expand_one_stack_var_at): Ditto.
* coverage.c (build_var): Use SET_DECL_ALIGN.
* except.c (init_eh): Ditto.
* function.c (assign_parm_setup_block): Ditto.
* symtab.c (increase_alignment_1): Ditto.
* tree-ssa-ccp.c (fold_builtin_alloca_with_align): Ditto.
* tree-vect-stmts.c (ensure_base_align): Ditto.
* varasm.c (align_variable): Ditto.
(assemble_variable): Ditto.
(build_constant_desc): Ditto.
(output_constant_def_contents): Ditto.

* config/arm/arm.c (arm_relayout_function): Use SET_DECL_ALIGN.
* config/avr/avr.c (avr_adjust_type_node): Use SET_TYPE_ALIGN.
* config/mips/mips.c (mips_std_gimplify_va_arg_expr): Ditto.
* config/msp430/msp430.c (msp430_gimplify_va_arg_expr): Ditto.
* config/spu/spu.c (spu_build_builtin_va_list): Use SET_DECL_ALIGN.

ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): Use SET_TYPE_ALIGN.
(gnat_to_gnu_field): Ditto.
(components_to_record): Ditto.
(create_variant_part_from): Ditto.
(copy_and_substitute_in_size): Ditto.
(substitute_in_type): Ditto.
* gcc-interface/utils.c (make_aligning_type): Use SET_TYPE_ALIGN.
(make_packable_type): Ditto.
(maybe_pad_type): Ditto.
(finish_fat_pointer_type): Ditto.
(finish_record_type): Ditto and use SET_DECL_ALIGN.
(rest_of_record_type_compilation): Use SET_TYPE_ALIGN.
(create_field_decl): Use SET_DECL_ALIGN.

Re: [ARM] Use vector wide add for mixed-mode adds


Hi Michael,

A few comments while I look deeper into this patch...

On 30/11/15 01:18, Michael Collison wrote:


This is a modified version of my previous patch that supports vector wide add. 
I added support for vaddw on big endian when generating the parallel operand 
for the vector select.

There are four failing test cases on arm big endian with similar code. They are:

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


The failures occur without my patch and are related to a bug with vector loads 
using VUZP operations.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532

Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, 
and armeb-none-linux-gnueabihf.

2015-11-29  Michael Collison  

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype
for new function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
aarch32_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

Okay for trunk?



--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
  ATTRIBUTE_UNUSED);
 extern void arm_init_builtins (void);
 extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree 
*update);
-
+extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high);
+extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
+  bool high);


Please use arm instead of aarch32 in the name to be consistent with the rest of 
the
backend. Also, for functions that return a bool without side-effects it's 
preferable
to finish their name with '_p'. So for the second one I'd drop the 'check' and 
call
it something like "arm_vector_of_lane_nums_p ", is that a more descriptive name?

+/* Check OP for validity as a PARALLEL RTX vector with elements
+   numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+   from the perspective of the architecture.  See the diagram above
+   aarch64_simd_vect_par_cnst_half for more details.  */
+

aarch64?

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,51 @@
 
 ;; Widening operations
 
+(define_expand "widen_ssum3"

+  [(set (match_operand: 0 "s_register_operand" "")
+   (plus: (sign_extend: (match_operand:VQI 1 
"s_register_operand" ""))
+  (match_operand: 2 "s_register_operand" 
"")))]
+  "TARGET_NEON"
+  {
+machine_mode mode = GET_MODE (operands[1]);
+rtx p1, p2;
+
+p1  = aarch32_simd_vect_par_cnst_half (mode, false);
+p2  = aarch32_simd_vect_par_cnst_half (mode, true);
+
+if (operands[0] != operands[2])
+  emit_move_insn (operands[0], operands[2]);
+
+emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], 
operands[1], p1, operands[0]));
+emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], 
operands[1], p2, operands[0]));
+DONE;
+  }

Please format these properly to avoid long lines.
Thanks,
Kyrill

Re: [PATCH] Add testcase for c++/68348

2015-12-10 Thread Jason Merrill


OK.

Jason

[committed 4/5] Fix -Wmisleading-indentation warning in ifcvt.c

../../../src/gcc/ifcvt.c: In function ‘bool 
noce_try_inverse_constants(noce_if_info*)’:
../../../src/gcc/ifcvt.c:1233:2: error: statement is indented as if it were 
guarded by... [-Werror=misleading-indentation]
  seq = end_ifcvt_sequence (if_info);
  ^~~

../../../src/gcc/ifcvt.c:1230:7: note: ...this ‘if’ clause, but it is not
   if (target != if_info->x)
   ^~

due to the lack of an outdent after the conditional at lines 1230-1:

  1220if (target)
  1221  {
  1222rtx_insn *seq = get_insns ();
  1223
  1224if (!seq)
  1225  {
  1226end_sequence ();
  1227return false;
  1228  }
  1229
>>1230if (target != if_info->x)
>>1231  noce_emit_move_insn (if_info->x, target);
  1232
>>1233  seq = end_ifcvt_sequence (if_info);
  1234
  1235  if (!seq)
  1236return false;
  1237
  1238  emit_insn_before_setloc (seq, if_info->jump,
  1239   INSN_LOCATION (if_info->insn_a));
  1240  return true;
  1241  }

[This warning would have been suppressed by the blank-lines heuristic
 from:
   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01011.html ]

Preapproved by Jeff in:
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03246.html

Bootstrapped®rtested on x86_64-pc-linux-gnu.

Committed to trunk as r231521.

gcc/ChangeLog:
* ifcvt.c (noce_try_inverse_constants): Fix indentation.
---
 gcc/ifcvt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index d474b3b..7fb1dab 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1230,14 +1230,14 @@ noce_try_inverse_constants (struct noce_if_info 
*if_info)
   if (target != if_info->x)
noce_emit_move_insn (if_info->x, target);
 
-   seq = end_ifcvt_sequence (if_info);
+  seq = end_ifcvt_sequence (if_info);
 
-   if (!seq)
- return false;
+  if (!seq)
+   return false;
 
-   emit_insn_before_setloc (seq, if_info->jump,
-INSN_LOCATION (if_info->insn_a));
-   return true;
+  emit_insn_before_setloc (seq, if_info->jump,
+  INSN_LOCATION (if_info->insn_a));
+  return true;
 }
 
   end_sequence ();
-- 
1.8.5.3

[committed 5/5] Fix -Wmisleading-indentation warning in graphite-optimize-isl.c

../../../src/gcc/graphite-optimize-isl.c: In function ‘isl_union_set* 
scop_get_domains(scop_p)’:
../../../src/gcc/graphite-optimize-isl.c:362:5: error: statement is indented as 
if it were guarded by... [-Werror=misleading-indentation]
 return res;
 ^~

In file included from ../../../src/gcc/hash-table.h:236:0,
 from ../../../src/gcc/coretypes.h:348,
 from ../../../src/gcc/graphite-optimize-isl.c:28:
../../../src/gcc/vec.h:1343:3: note: ...this ‘for’ clause, but it is not
   for (I = 0; (V).iterate ((I), &(P)); ++(I))
   ^

../../../src/gcc/graphite-optimize-isl.c:359:3: note: in expansion of macro 
‘FOR_EACH_VEC_ELT’
   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
   ^~~~

   351  static isl_union_set *
   352  scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
   353  {
   354int i;
   355poly_bb_p pbb;
   356isl_space *space = isl_set_get_space (scop->param_context);
   357isl_union_set *res = isl_union_set_empty (space);
   358
   359FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
   360  res = isl_union_set_add_set (res, isl_set_copy (pbb->domain));
   361
   362  return res;
^ warning is here
   363  }

[This warning would have been suppressed by the blank-lines heuristic
 from:
   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01011.html ]

Preapproved by Jeff in:
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03246.html

Bootstrapped®rtested on x86_64-pc-linux-gnu.

Committed to trunk as r231522.

gcc/ChangeLog:
* graphite-optimize-isl.c (scop_get_domains): Fix indentation.
---
 gcc/graphite-optimize-isl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 8727e39..0c6a971 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -359,7 +359,7 @@ scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
 res = isl_union_set_add_set (res, isl_set_copy (pbb->domain));
 
-return res;
+  return res;
 }
 
 static const int CONSTANT_BOUND = 20;
-- 
1.8.5.3

[committed 2/5] Fix misleading indentation in gcc/fortran/io.c

Fix this warning:
../../../src/gcc/fortran/io.c: In function ‘match gfc_match_open()’:
../../../src/gcc/fortran/io.c:2003:4: error: statement is indented as if it 
were guarded by... [-Werror=misleading-indentation]
if (!compare_to_allowed_values ("DELIM", delim, NULL, NULL,
^
../../../src/gcc/fortran/io.c:2000:2: note: ...this ‘if’ clause, but it is not
  if (!is_char_type ("DELIM", open->delim))
  ^

[This warning would have been suppressed by the blank-lines heuristic
 from:
   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01011.html ]

Preapproved by Jeff in:
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03246.html

Bootstrapped®rtested on x86_64-pc-linux-gnu.

Committed to trunk as r231519.

gcc/fortran/ChangeLog:
* io.c (gfc_match_open): Fix indentation.
---
 gcc/fortran/io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
index 8cf952f..18816f2 100644
--- a/gcc/fortran/io.c
+++ b/gcc/fortran/io.c
@@ -2006,8 +2006,8 @@ gfc_match_open (void)
{
  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE", NULL };
 
-   if (!is_char_type ("DELIM", open->delim))
- goto cleanup;
+ if (!is_char_type ("DELIM", open->delim))
+   goto cleanup;
 
  if (!compare_to_allowed_values ("DELIM", delim, NULL, NULL,
  open->delim->value.character.string,
-- 
1.8.5.3

[committed 3/5] Fix -Wmisleading-indentation warning in gcc/regrename.c

../../../src/gcc/regrename.c: In function ‘void scan_rtx_address(rtx_insn*, 
rtx_def**, reg_class, scan_actions, machine_mode, addr_space_t)’:
../../../src/gcc/regrename.c:1314:7: error: statement is indented as if it were 
guarded by... [-Werror=misleading-indentation]
   break;
   ^
../../../src/gcc/regrename.c:1311:5: note: ...this ‘if’ clause, but it is not
 if (!AUTO_INC_DEC)
 ^

[This warning would have been suppressed by the blank-lines heuristic
 from:
   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01011.html ]

Preapproved by Jeff in:
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03246.html

Bootstrapped®rtested on x86_64-pc-linux-gnu.

Committed to trunk as r231520.

gcc/ChangeLog:
* regrename.c (scan_rtx_address): Fix indentation.
---
 gcc/regrename.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 701e078..60157dc 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1360,8 +1360,8 @@ scan_rtx_address (rtx_insn *insn, rtx *loc, enum 
reg_class cl,
 case PRE_MODIFY:
   /* If the target doesn't claim to handle autoinc, this must be
 something special, like a stack push.  Kill this chain.  */
-if (!AUTO_INC_DEC)
-  action = mark_all_read;
+  if (!AUTO_INC_DEC)
+   action = mark_all_read;
 
   break;
 
-- 
1.8.5.3

Re: [committed 1/5] Fix -Wmisleading-indentation warning in function.c

FWIW, I believe trunk is clean for -Wmisleading-indentation after these
5 patches (for x86_64-pc-linux-gnu, at least).

Gather hash-tab statistics only with GATHER_STATISTICS

2015-12-10 Thread Michael Matz

Hi,

while profiling cc1plus I noticed high hash-table activity for gathering 
statistics, even though I haven't configured with 
--enable-gather-detailed-mem-stats.  Turns out the hash table rewrite 
hard-coded the relevant settings to true.  This patch makes it initialized 
by GATHER_STATISTICS.

Regstrapping on x86_64-linux in progress, I'll commit once that went 
through, as it's IMHO obvious.


Ciao,
Michael.

* hash-map.h (hash_map::hash_map): Gather statistics only
when GATHER_STATISTICS is true.
* hash-set.h (hash_set::hash_set): Ditto.
* hash-table.h (hash_table::hash_table): Ditto.
(hash_table::create_ggc): Ditto.

Index: hash-map.h
===
--- hash-map.h  (revision 231115)
+++ hash-map.h  (working copy)
@@ -107,11 +107,13 @@ class GTY((user)) hash_map
 
 public:
   explicit hash_map (size_t n = 13, bool ggc = false,
-bool gather_mem_stats = true CXX_MEM_STAT_INFO)
+bool gather_mem_stats = GATHER_STATISTICS
+CXX_MEM_STAT_INFO)
 : m_table (n, ggc, gather_mem_stats, HASH_MAP_ORIGIN PASS_MEM_STAT) {}
 
   /* Create a hash_map in ggc memory.  */
-  static hash_map *create_ggc (size_t size, bool gather_mem_stats = true
+  static hash_map *create_ggc (size_t size,
+  bool gather_mem_stats = GATHER_STATISTICS
   CXX_MEM_STAT_INFO)
 {
   hash_map *map = ggc_alloc ();
Index: hash-set.h
===
--- hash-set.h  (revision 231115)
+++ hash-set.h  (working copy)
@@ -27,7 +27,7 @@ class hash_set
 public:
   typedef typename Traits::value_type Key;
   explicit hash_set (size_t n = 13, bool ggc = false CXX_MEM_STAT_INFO)
-: m_table (n, ggc, true, HASH_SET_ORIGIN PASS_MEM_STAT) {}
+: m_table (n, ggc, GATHER_STATISTICS, HASH_SET_ORIGIN PASS_MEM_STAT) {}
 
   /* Create a hash_set in gc memory with space for at least n elements.  */
 
Index: hash-table.h
===
--- hash-table.h(revision 231115)
+++ hash-table.h(working copy)
@@ -361,7 +361,8 @@ class hash_table
   typedef typename Descriptor::compare_type compare_type;
 
 public:
-  explicit hash_table (size_t, bool ggc = false, bool gather_mem_stats = true,
+  explicit hash_table (size_t, bool ggc = false,
+  bool gather_mem_stats = GATHER_STATISTICS,
   mem_alloc_origin origin = HASH_TABLE_ORIGIN
   CXX_MEM_STAT_INFO);
   ~hash_table ();
@@ -371,7 +372,8 @@ public:
   create_ggc (size_t n CXX_MEM_STAT_INFO)
   {
 hash_table *table = ggc_alloc ();
-new (table) hash_table (n, true, true, HASH_TABLE_ORIGIN PASS_MEM_STAT);
+new (table) hash_table (n, true, GATHER_STATISTICS,
+   HASH_TABLE_ORIGIN PASS_MEM_STAT);
 return table;
   }

[committed 1/5] Fix -Wmisleading-indentation warning in function.c

Fix this warning:
../../../src/gcc/function.c: In function ‘void 
locate_and_pad_parm(machine_mode, tree, int, int, int, tree, args_size*, 
locate_and_pad_arg_data*)’:
../../../src/gcc/function.c:4123:2: error: statement is indented as if it were 
guarded by... [-Werror=misleading-indentation]
  {
  ^
../../../src/gcc/function.c:4119:7: note: ...this ‘if’ clause, but it is not
   if (initial_offset_ptr->var)
   ^

[This warning would have been suppressed by the blank-lines heuristic
 from:
   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01011.html ]

Preapproved by Jeff in:
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03246.html

Bootstrapped®rtested on x86_64-pc-linux-gnu.

Committed to trunk as r231518.

gcc/ChangeLog:
* function.c (locate_and_pad_parm): Fix indentation.
---
 gcc/function.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index b513ead..035a49e 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4181,14 +4181,14 @@ locate_and_pad_parm (machine_mode passed_mode, tree 
type, int in_regs,
locate->slot_offset.var = size_binop (MINUS_EXPR, ssize_int (0),
  initial_offset_ptr->var);
 
-   {
- tree s2 = sizetree;
- if (where_pad != none
- && (!tree_fits_uhwi_p (sizetree)
- || (tree_to_uhwi (sizetree) * BITS_PER_UNIT) % 
round_boundary))
-   s2 = round_up (s2, round_boundary / BITS_PER_UNIT);
- SUB_PARM_SIZE (locate->slot_offset, s2);
-   }
+  {
+   tree s2 = sizetree;
+   if (where_pad != none
+   && (!tree_fits_uhwi_p (sizetree)
+   || (tree_to_uhwi (sizetree) * BITS_PER_UNIT) % round_boundary))
+ s2 = round_up (s2, round_boundary / BITS_PER_UNIT);
+   SUB_PARM_SIZE (locate->slot_offset, s2);
+  }
 
   locate->slot_offset.constant += part_size_in_regs;
 
-- 
1.8.5.3

[Fortran, Patch} Fix ICE for coarray Critical inside module procedure

2015-12-10 Thread Alessandro Fanfarillo

Dear all,

the compiler returns an ICE when a coarray critical section is used
inside a module procedure.
The symbols related with the lock variables were left uncommitted
inside resolve_critical(). A gfc_commit_symbol after each symbol or a
gfc_commit_symbols at the end of resolve_critical() fixed the issue.

The latter solution is proposed in the attached patch.

Built and regtested on x86_64-pc-linux-gnu

Cheers

Alessandro


PS: This patch should be also included in GCC 5.
2015-12-10  Alessandro Fanfarillo 

* resolve.c (resolve_critical): Committing symbols of
lock variables.

2015-12-10  Alessandro Fanfarillo 

* gfortran.dg/coarray_critical_1.f90: New.

commit ccc06accb4891ab95d33135b62c479d895b2270f
Author: Alessandro Fanfarillo 
Date:   Thu Dec 10 15:28:33 2015 +

Committing symbols of lock variables inside resolve_critical()

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 65a2b7f..3988b3c 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -8852,6 +8852,7 @@ resolve_critical (gfc_code *code)
   symtree->n.sym->as->cotype = AS_EXPLICIT;
   symtree->n.sym->as->lower[0] = gfc_get_int_expr (gfc_default_integer_kind,
   NULL, 1);
+  gfc_commit_symbols();
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/coarray_critical_1.f90 
b/gcc/testsuite/gfortran.dg/coarray_critical_1.f90
new file mode 100644
index 000..4d93bf6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray_critical_1.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-fcoarray=lib" }
+!
+
+module m
+ contains
+   subroutine f()
+ critical
+ end critical
+   end subroutine f
+ end module m
+end program

Re: [PATCH][combine] Don't create LSHIFTRT of zero bits in change_zero_ext


On 12/10/2015 03:36 PM, Kyrill Tkachov wrote:

I'm okay with delaying this for next stage 1 if people prefer, though I
think it's
pretty low risk.


I think this is something we should fix now.


+ x = XEXP (x, 0);
+ if (start > 0)
+   x = gen_rtx_LSHIFTRT (mode, x, GEN_INT (start));


I think this should just use simplify_shift_const. gen_rtx_FOO should be 
avoided.



Bernd

Re: [v3] avoid alignment of static variables affecting stack's

2015-12-10 Thread Jan Beulich

>>> On 10.12.15 at 14:53,  wrote:
> On 12/10/2015 01:38 PM, Jan Beulich wrote:
>> --- 2015-12-09/gcc/cfgexpand.c
>> +++ 2015-12-09/gcc/cfgexpand.c
>> @@ -1544,12 +1544,15 @@ static HOST_WIDE_INT
>>   expand_one_var (tree var, bool toplevel, bool really_expand)
>>   {
>> unsigned int align = BITS_PER_UNIT;
>> +  bool stack = true;
>> tree origvar = var;
>>
>> var = SSAVAR (var);
>>
>> if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
>>   {
>> +  stack = !TREE_STATIC (var) && !DECL_EXTERNAL (var);
>> +
>> /* Because we don't know if VAR will be in register or on stack,
>>   we conservatively assume it will be on stack even if VAR is
>>   eventually put into register after RA pass.  For non-automatic
>> @@ -1578,7 +1581,8 @@ expand_one_var (tree var, bool toplevel,
>>  align = POINTER_SIZE;
>>   }
>>
>> -  record_alignment_for_reg_var (align);
>> +  if (stack)
>> +record_alignment_for_reg_var (align);
> 
> A bit further down we have
> 
>else if (DECL_EXTERNAL (var))
>  ;
>else if (DECL_HAS_VALUE_EXPR_P (var))
>  ;
>else if (TREE_STATIC (var))
>  ;
> []
>return 0;
> 
> so I'm thinking the function doesn't do anything for DECL_EXTERNAL or 
> TREE_STATIC vars. You're still computing alignment for them but not 
> using it. I suggest just doing an early return for non-stack vars.

If not reaching

  if (TREE_CODE (origvar) == SSA_NAME)
{
  gcc_assert (TREE_CODE (var) != VAR_DECL
  || (!DECL_EXTERNAL (var)
  && !DECL_HAS_VALUE_EXPR_P (var)
  && !TREE_STATIC (var)
  && TREE_TYPE (var) != error_mark_node
  && !DECL_HARD_REGISTER (var)
  && really_expand));
}

in that case is not a problem, this would make for even simpler a
change. Please let me know.

Jan

Re: [PATCH] v2 Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-10 Thread Tobias Burnus

David Malcolm wrote:
> On Sat, 2015-12-05 at 20:53 +0100, Mikael Morin wrote:
> > to get things moving again, a few comments on top of David Malcolm's:
[...]
> > It seems you are considering some candidates more than once here.
[...]
> > You have to start the lookup with the current namespace's sym_root (not 
> > with fun), otherwise you'll miss some candidates.
> > You may also want to query parent namespaces for host-associated symbols.
[...]

I think the current patch doesn't not address those (as stated) and I think
that some suggestions should honour the attributes better (variable vs.
subroutine vs. function etc.). But I very much like the general patch.

Regarding Malcolm's update:
> I can't comment on Mikael's observations, but here's an updated version
> of Bernhard's patch which moves the duplicated code into a new
> "find_closest_string" function in gcc/spellcheck.c.

That change looks good to me.

BTW: I think you should write a quip for https://gcc.gnu.org/gcc-6/changes.html

Tobias

PS: Talking about the release notes, my feeling is that both the wiki and
the release notes miss some changes, but I have to admit that I am really
out of sync. It currently only lists Submodules at the Wiki,
   https://gcc.gnu.org/wiki/GFortran/News#GCC6
and https://gcc.gnu.org/gcc-6/changes.html has a few other items. (Both
should be synced crosswise.) As additional item, I know of coarray events
but there must be more items.

Re: [PATCH] PR c/68473: sanitize source range-printing within certain macro expansions

2015-12-10 Thread Martin Sebor


On 11/23/2015 10:52 AM, David Malcolm wrote:

This patch fixes PR c/68473 by bulletproofing the new
diagnostic_show_locus implementation against ranges that finish before
they start (which can happen when using the C preprocessor), falling
back to simply printing a caret.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu; adds 7 new
PASS results to gcc.sum.


I just ran into the same ICE with a different (simpler) example
in bug 68839 and confirm that the patch fixes that one as well.

Martin

Re: [PATCH] update -Wall and -Wextra documentation

2015-12-10 Thread Martin Sebor


On 12/10/2015 04:57 AM, Bernd Schmidt wrote:

On 12/10/2015 12:07 AM, Martin Sebor wrote:


* invoke.texi (Warning Options): Update -Wall options.  Clarify
when some -Wextra options are enabled.  Add -Wplacement-new example.


I tried to check this list against c.opt - I figure this should contain
essentially the ones that are have an EnabledBy(Wall), plus whatever
logic there is in c-opts.


Thanks for double-checking me!  FWIW, I did something similar.
I also verified those I wasn't sure about them using simple
examples, but it's possible I overlooked something.




+-Wduplicated-cond  @gol


I don't see this one as enabled by Wall, and


Yes, this one should not be there.  I removed it in the updated
patch.  (Note the first patch didn't add it, just moved it up
the list).

Incidentally, testing this using the example from the manual
triggers an ICE.  I raised c/68839 for the record and closed
it as a duplicate of c/68473.




+-Wplacement-new @r{(only for C++)} @gol


This one appears to be on by default?


Hmm.  You're right, it is.  I'm not sure that's what we want
but let me confirm that with Jason and deal with it separately.
I've corrected the patch to reflect the status quo.




+In C++, this warning is also enabled by @option{-Wall}.  In C, it is
also
+enabled by @option{-Wextra}; to get the other warnings of
@option{-Wextra}
+without this warning, use @option{-Wextra -Wno-sign-compare}.


Is the last part of the sentence really necessary? It kind of follows
from the rest of the documentation and we don't spell this out for other
-Wextra options.


I don't think explaining how to turn the option off is necessary
or adds any value but the text was already there and I didn't want
to make gratuitous changes unrelated to the main purpose of the
patch.  But if you think it's appropriate I'm fine with removing
it and have done that in the updated patch.

The updated patch is attached.

Martin
2015-12-09  Martin Sebor  

	* invoke.texi (Warning Options): Update -Wall options.  Clarify
	when some -Wextra options are enabled.  Add -Wplacement-new example.

Index: invoke.texi
===
--- invoke.texi	(revision 231525)
+++ invoke.texi	(working copy)
@@ -3542,18 +3542,22 @@
 
 @gccoptlist{-Waddress   @gol
 -Warray-bounds=1 @r{(only with} @option{-O2}@r{)}  @gol
+-Wbool-compare  @gol
 -Wc++11-compat  -Wc++14-compat@gol
 -Wchar-subscripts  @gol
+-Wcomment  @gol
 -Wenum-compare @r{(in C/ObjC; this is on by default in C++)} @gol
+-Wformat   @gol
+-Wimplicit @r{(C and Objective-C only)} @gol
 -Wimplicit-int @r{(C and Objective-C only)} @gol
 -Wimplicit-function-declaration @r{(C and Objective-C only)} @gol
--Wbool-compare  @gol
--Wduplicated-cond  @gol
--Wcomment  @gol
--Wformat   @gol
+-Winit-self @r{(only for C++)} @gol
+-Wlogical-not-parentheses
 -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)}  @gol
 -Wmaybe-uninitialized @gol
+-Wmemset-transposed-args @gol
 -Wmissing-braces @r{(only for C/ObjC)} @gol
+-Wnarrowing @r{(only for C++)}  @gol
 -Wnonnull  @gol
 -Wopenmp-simd @gol
 -Wparentheses  @gol
@@ -3562,6 +3566,7 @@
 -Wreturn-type  @gol
 -Wsequence-point  @gol
 -Wsign-compare @r{(only in C++)}  @gol
+-Wsizeof-pointer-memaccess @gol
 -Wstrict-aliasing  @gol
 -Wstrict-overflow=1  @gol
 -Wswitch  @gol
@@ -3599,10 +3604,10 @@
 -Wmissing-parameter-type @r{(C only)}  @gol
 -Wold-style-declaration @r{(C only)}  @gol
 -Woverride-init  @gol
--Wsign-compare  @gol
+-Wsign-compare @r{(C only)} @gol
 -Wtype-limits  @gol
 -Wuninitialized  @gol
--Wshift-negative-value  @gol
+-Wshift-negative-value @r{(in C++03 and in C99 and newer)}  @gol
 -Wunused-parameter @r{(only with} @option{-Wunused} @r{or} @option{-Wall}@r{)} @gol
 -Wunused-but-set-parameter @r{(only with} @option{-Wunused} @r{or} @option{-Wall}@r{)}  @gol
 }
@@ -4589,7 +4594,6 @@
 if (p->q != NULL) @{ @dots{} @}
 else if (p->q != NULL) @{ @dots{} @}
 @end smallexample
-This warning is enabled by @option{-Wall}.
 
 @item -Wframe-address
 @opindex Wno-frame-address
@@ -4896,8 +4900,15 @@
 @opindex Wno-placement-new
 Warn about placement new expressions with undefined behavior, such as
 constructing an object in a buffer that is smaller than the type of
-the object.
-
+the object.  For example, the placement new expression below is diagnosed
+because it attempts to construct an array of 64 integers in a buffer only
+64 bytes large.
+@smallexample
+char buf [64];
+new (buf) int[64];
+@end smallexample
+This warning is enabled by default.
+  
 @item -Wpointer-arith
 @opindex Wpointer-arith
 @opindex Wno-pointer-arith
@@ -5114,8 +5125,8 @@
 @cindex signed and unsigned values, comparison warning
 Warn when a comparison between signed and unsigned values could produce
 an incorrect result when the signed value is converted to unsigned.
-This warning is also enabled by @option{-Wextra}; to get the other warnings
-of @option{-Wextra} without this warning, use @option{-Wextra -Wno-sign-com

Re: ipa-cp heuristics fixes

2015-12-10 Thread Jan Hubicka

> Is this really necessary, is it not enough to remove the assignment to
> ret below?  If the parameter is not used, devirtualization time bonus,
> which you then rely on estimate_local_effects, should be zero for it.
> 
> It is a very minor point, I suppose, but if the function gets cloned
> for a different reason, it might still be beneficial to have as much
> context-independent information for it as possible too, because that
> can then be used in a callee (see the second call of
> gather_context_independent_values).
> 
> Other than that, all the changes seem like a clear improvement.

The cutoff is there mainly for the rest of the function:
  if (known_aggs)
{
  vec *agg_items;
  struct ipa_agg_jump_function *ajf;

  agg_items = context_independent_aggregate_values (plats);
  ajf = &(*known_aggs)[i];
  ajf->items = agg_items;
  ajf->by_ref = plats->aggs_by_ref;
  ret |= agg_items != NULL;
}
I did not want ret to become true if we manage to propagate into an unused
aggregate parameter.

Honza

Re: [PATCH] update -Wall and -Wextra documentation


On 12/10/2015 05:52 PM, Martin Sebor wrote:


The updated patch is attached.


Ok.


Bernd

[PATCH 2/2] [graphite] update required isl versions

2015-12-10 Thread Sebastian Pop

we now check the isl version, as there are no real differences in existing files
in between isl 0.14 and isl 0.15.
---
 config/isl.m4 |  41 +++--
 configure | 112 --
 2 files changed, 123 insertions(+), 30 deletions(-)

diff --git a/config/isl.m4 b/config/isl.m4
index 459fac1..886b0e4 100644
--- a/config/isl.m4
+++ b/config/isl.m4
@@ -19,23 +19,23 @@
 
 # ISL_INIT_FLAGS ()
 # -
-# Provide configure switches for ISL support.
+# Provide configure switches for isl support.
 # Initialize isllibs/islinc according to the user input.
 AC_DEFUN([ISL_INIT_FLAGS],
 [
   AC_ARG_WITH([isl-include],
 [AS_HELP_STRING(
   [--with-isl-include=PATH],
-  [Specify directory for installed ISL include files])])
+  [Specify directory for installed isl include files])])
   AC_ARG_WITH([isl-lib],
 [AS_HELP_STRING(
   [--with-isl-lib=PATH],
-  [Specify the directory for the installed ISL library])])
+  [Specify the directory for the installed isl library])])
 
   AC_ARG_ENABLE(isl-version-check,
 [AS_HELP_STRING(
   [--disable-isl-version-check],
-  [disable check for ISL version])],
+  [disable check for isl version])],
 ENABLE_ISL_CHECK=$enableval,
 ENABLE_ISL_CHECK=yes)
   
@@ -58,15 +58,15 @@ AC_DEFUN([ISL_INIT_FLAGS],
   if test "x${with_isl_lib}" != x; then
 isllibs="-L$with_isl_lib"
   fi
-  dnl If no --with-isl flag was specified and there is in-tree ISL
+  dnl If no --with-isl flag was specified and there is in-tree isl
   dnl source, set up flags to use that and skip any version tests
-  dnl as we cannot run them before building ISL.
+  dnl as we cannot run them before building isl.
   if test "x${islinc}" = x && test "x${isllibs}" = x \
  && test -d ${srcdir}/isl; then
 isllibs='-L$$r/$(HOST_SUBDIR)/isl/'"$lt_cv_objdir"' '
 islinc='-I$$r/$(HOST_SUBDIR)/isl/include -I$$s/isl/include'
 ENABLE_ISL_CHECK=no
-AC_MSG_WARN([using in-tree ISL, disabling version check])
+AC_MSG_WARN([using in-tree isl, disabling version check])
   fi
 
   isllibs="${isllibs} -lisl"
@@ -75,7 +75,7 @@ AC_DEFUN([ISL_INIT_FLAGS],
 
 # ISL_REQUESTED (ACTION-IF-REQUESTED, ACTION-IF-NOT)
 # 
-# Provide actions for failed ISL detection.
+# Provide actions for failed isl detection.
 AC_DEFUN([ISL_REQUESTED],
 [
   AC_REQUIRE([ISL_INIT_FLAGS])
@@ -106,12 +106,31 @@ AC_DEFUN([ISL_CHECK_VERSION],
 LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs}"
 LIBS="${_isl_saved_LIBS} -lisl"
 
-AC_MSG_CHECKING([for compatible ISL])
-AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include ]], [[;]])],
+AC_MSG_CHECKING([for recommended isl 0.15])
+AC_TRY_RUN([#include 
+#include 
+int main() {
+  if (strncmp (_GENERATED_STDINT_H, "isl 0.15", 8))
+return 1;
+  return 0;
+}],
[gcc_cv_isl=yes],
-   [gcc_cv_isl=no])
+   [gcc_cv_isl=no], [gcc_cv_isl=no])
 AC_MSG_RESULT([$gcc_cv_isl])
 
+if test "${gcc_cv_isl}" = no ; then
+   AC_MSG_CHECKING([for deprecated isl 0.14])
+   AC_TRY_RUN([#include 
+   #include 
+   int main() {
+ if (strncmp (_GENERATED_STDINT_H, "isl 0.14", 8))
+   return 1;
+ return 0;
+   }],
+   [gcc_cv_isl=yes],
+   [gcc_cv_isl=no], [gcc_cv_isl=no])
+AC_MSG_RESULT([$gcc_cv_isl, recommended isl version is 0.15, minimum 
required isl version 0.14 is deprecated])
+fi
 CFLAGS=$_isl_saved_CFLAGS
 LDFLAGS=$_isl_saved_LDFLAGS
 LIBS=$_isl_saved_LIBS
diff --git a/configure b/configure
index 090615f..4284ba7 100755
--- a/configure
+++ b/configure
@@ -1492,7 +1492,7 @@ Optional Features:
   build static libjava [default=no]
   --enable-bootstrap  enable bootstrapping [yes if native build]
   --disable-isl-version-check
-  disable check for ISL version
+  disable check for isl version
   --enable-ltoenable link time optimization support
   --enable-linker-plugin-configure-flags=FLAGS
   additional flags for configuring linker plugins
@@ -1553,8 +1553,8 @@ Optional Packages:
   package. Equivalent to
   --with-isl-include=PATH/include plus
   --with-isl-lib=PATH/lib
-  --with-isl-include=PATH Specify directory for installed ISL include files
-  --with-isl-lib=PATH Specify the directory for the installed ISL library
+  --with-isl-include=PATH Specify directory for installed isl include files
+  --with-isl-lib=PATH Specify the directory for the installed isl library
   --with-build-sysroot=SYSROOT
   use sysroot as the system root during the build
   --with-de

[PATCH 1/2] [graphite] document minimal required version for isl

2015-12-10 Thread Sebastian Pop

also update ISL to isl as requested by its author Sven Verdoolaege.
---
 gcc/doc/install.texi | 9 +
 gcc/doc/invoke.texi  | 4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 0b71bef..b43a3ec 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -383,14 +383,15 @@ installed but it is not in your default library search 
path, the
 @option{--with-mpc} configure option should be used.  See also
 @option{--with-mpc-lib} and @option{--with-mpc-include}.
 
-@item ISL Library version 0.15 or 0.14.
+@item isl Library version 0.15 or 0.14.
 
+Minimal isl version supported is 0.14 and it is highly recommended to use 0.15.
 Necessary to build GCC with the Graphite loop optimizations.
 It can be downloaded from @uref{ftp://gcc.gnu.org/pub/gcc/infrastructure/}.
-If an ISL source distribution is found
+If an isl source distribution is found
 in a subdirectory of your GCC sources named @file{isl}, it will be
 built together with GCC.  Alternatively, the @option{--with-isl} configure
-option should be used if ISL is not installed in your default library
+option should be used if isl is not installed in your default library
 search path.
 
 @end table
@@ -1850,7 +1851,7 @@ a cross compiler, they will not be used to configure 
target libraries.
 @item --with-isl=@var{pathname}
 @itemx --with-isl-include=@var{pathname}
 @itemx --with-isl-lib=@var{pathname}
-If you do not have the ISL library installed in a standard location and you
+If you do not have the isl library installed in a standard location and you
 want to build GCC, you can explicitly specify the directory where it is
 installed (@samp{--with-isl=@/@var{islinstalldir}}). The
 @option{--with-isl=@/@var{islinstalldir}} option is shorthand for
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5256031..7ae0849 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8939,12 +8939,12 @@ Enable the identity transformation for graphite.  For 
every SCoP we generate
 the polyhedral representation and transform it back to gimple.  Using
 @option{-fgraphite-identity} we can check the costs or benefits of the
 GIMPLE -> GRAPHITE -> GIMPLE transformation.  Some minimal optimizations
-are also performed by the code generator ISL, like index splitting and
+are also performed by the code generator isl, like index splitting and
 dead code elimination in loops.
 
 @item -floop-nest-optimize
 @opindex floop-nest-optimize
-Enable the ISL based loop nest optimizer.  This is a generic loop nest
+Enable the isl based loop nest optimizer.  This is a generic loop nest
 optimizer based on the Pluto optimization algorithms.  It calculates a loop
 structure optimized for data-locality and parallelism.  This option
 is experimental.
-- 
1.9.1

Re: [PATCH] testsuite/lib/multline.exp: show test name and line numbers


On 12/10/2015 03:56 PM, David Malcolm wrote:

The following patch updates multiline.exp to use the global
   $testname_with_flags
as a prefix in such results.

I also dropped the printing of the index in favor of printing the line
numbers enclosed within dg-{begin|end}-multiline-output.

After the patch, we get result lines like this:
  PASS: gcc.dg/plugin/diagnostic-test-show-locus-bw.c 
-fplugin=./diagnostic_plugin_test_show_locus.so  expected multiline pattern lines 17-18 
was found: "\s*myvar = myvar\.x;.*\n   ~\^~\n"


This seems to be an improvement, so OK. I'm not sure what the value of 
printing the regexp in the output is, I tend to see it as clutter and 
would favour another patch to remove it.



Bernd

Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [4/3] v2

2015-12-10 Thread Uros Bizjak

> Finally the mechanical changes necessary due to the API change in the walker.

You forgot to change the graphite part, as in the attached patch.

2015-12-10  Uros Bizjak  

PR tree-optimization/68619
* graphite-scop-detection.c (gather_bbs::before_dom_children):
Change return type to an edge.  Always return NULL.

OK for mainline after successful bootstrap and regtest?

Uros.
Index: graphite-scop-detection.c
===
--- graphite-scop-detection.c   (revision 231529)
+++ graphite-scop-detection.c   (working copy)
@@ -1828,7 +1828,7 @@ class gather_bbs : public dom_walker
 public:
   gather_bbs (cdi_direction, scop_p);
 
-  virtual void before_dom_children (basic_block);
+  virtual edge before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
 
 private:
@@ -1844,11 +1844,11 @@ gather_bbs::gather_bbs (cdi_direction direction, s
 /* Call-back for dom_walk executed before visiting the dominated
blocks.  */
 
-void
+edge
 gather_bbs::before_dom_children (basic_block bb)
 {
   if (!bb_in_sese_p (bb, scop->scop_info->region))
-return;
+return NULL;
 
   gcond *stmt = single_pred_cond_non_loop_exit (bb);
 
@@ -1868,7 +1868,7 @@ gather_bbs::before_dom_children (basic_block bb)
 
   gimple_poly_bb_p gbb = try_generate_gimple_bb (scop, bb);
   if (!gbb)
-return;
+return NULL;
 
   GBB_CONDITIONS (gbb) = conditions.copy ();
   GBB_CONDITION_CASES (gbb) = cases.copy ();
@@ -1880,6 +1880,7 @@ gather_bbs::before_dom_children (basic_block bb)
   data_reference_p dr;
   FOR_EACH_VEC_ELT (gbb->data_refs, i, dr)
 scop->drs.safe_push (dr_info (dr, pbb));
+  return NULL;
 }
 
 /* Call-back for dom_walk executed after visiting the dominated

Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [4/3] v2


On 12/10/2015 10:05 AM, Uros Bizjak wrote:

Finally the mechanical changes necessary due to the API change in the walker.


You forgot to change the graphite part, as in the attached patch.

2015-12-10  Uros Bizjak  

 PR tree-optimization/68619
 * graphite-scop-detection.c (gather_bbs::before_dom_children):
 Change return type to an edge.  Always return NULL.

OK for mainline after successful bootstrap and regtest?

Yes.

Sorry about that.

jeff

Re: Free up bits in DECLs and TYPEs


On 12/10/2015 04:04 PM, Michael Matz wrote:

This isn't stage 3 material really, OTOH fairly low risk.  Anyway, okay
for trunk now or once stage 1 opens?


This is cool and we want it, but not now. Ok for stage 1, with the 
formatting problems quoted below fixed.



Bernd


+#define TYPE_ALIGN(NODE) \
+(TYPE_CHECK (NODE)->type_common.align \
+   ? ((unsigned)1) << ((NODE)->type_common.align - 1) \
+   : 0)



+#define DECL_ALIGN(NODE) \
+(DECL_COMMON_CHECK (NODE)->decl_common.align \
+   ? ((unsigned)1) << ((NODE)->decl_common.align - 1) \
+   : 0)



  #ifdef BIGGEST_FIELD_ALIGNMENT
- DECL_ALIGN (decl)
-   = MIN (DECL_ALIGN (decl), (unsigned) BIGGEST_FIELD_ALIGNMENT);
+ SET_DECL_ALIGN (decl,
+ MIN (DECL_ALIGN (decl), (unsigned) BIGGEST_FIELD_ALIGNMENT));



+  SET_DECL_ALIGN (fndecl,
+FUNCTION_BOUNDARY_P (TREE_TARGET_OPTION (callee_tree)->x_target_flags));

Re: [v3] avoid alignment of static variables affecting stack's


On 12/10/2015 05:07 PM, Jan Beulich wrote:

If not reaching

   if (TREE_CODE (origvar) == SSA_NAME)
 {
   gcc_assert (TREE_CODE (var) != VAR_DECL
  || (!DECL_EXTERNAL (var)
  && !DECL_HAS_VALUE_EXPR_P (var)
  && !TREE_STATIC (var)
  && TREE_TYPE (var) != error_mark_node
  && !DECL_HARD_REGISTER (var)
  && really_expand));
 }

in that case is not a problem, this would make for even simpler a
change. Please let me know.


I think that's fine. BTW there's a is_global_var predicate that 
{c,sh}ould be used for your test.



Bernd

[PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-10 Thread Daniel Kahn Gillmor

Work on the reproducible-builds project [0] has identified that build
paths are one cause of output variation between builds.  This
changeset allows users to avoid this variation when building C objects
with debug symbols, while leaving the default behavior unchanged.

Background
--

gcc includes the build path in any generated DWARF debugging symbols,
specifically in DW_AT_comp_dir, but allows the embedded path to be
changed via -fdebug-prefix-map.

When -fdebug-prefix-map is used with the current build path, it
removes the build path from DW_AT_comp_dir but places it instead in
DW_AT_producer, so the reproducibility problem isn't resolved.

When building software for binary redistribution, the actual build
path on the build machine is irrelevant, and doesn't need to be
exposed in the debug symbols.

Resolution
--

This patch extends the first argument to -fdebug-prefix-map ("old") to
be able to read from the environment, which allows a packager to avoid
embedded build paths in the debugging symbols with something like:

  export SOURCE_BUILD_DIR="$(pwd)"
  gcc -fdebug-prefix-map=\$SOURCE_BUILD_DIR=/usr/src

Details
---

Specifically, if the first character of the "old" argument is a
literal $, then gcc will treat it as an environment variable name, and
use the value of the env var for prefix mapping.

As a result, DW_AT_producer contains the literal envvar name,
DW_AT_comp_dir contains the transformed build path, and the actual
build path is not at all present in the generated object file.

This has been tested successfully on amd64 machines, and i see no
reason why it would be platform-specific.

More discussion of alternate approaches considered and discarded in
the development of this change can be found at [1] for those
interested.

Feedback welcome!

[0] https://reproducible-builds.org
[1] 
https://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20151130/004051.html
---
 gcc/doc/invoke.texi |  4 +++-
 gcc/final.c | 27 +--
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5256031..234432f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6440,7 +6440,9 @@ link processing time.  Merging is enabled by default.
 @item -fdebug-prefix-map=@var{old}=@var{new}
 @opindex fdebug-prefix-map
 When compiling files in directory @file{@var{old}}, record debugging
-information describing them as in @file{@var{new}} instead.
+information describing them as in @file{@var{new}} instead.  If
+@file{@var{old}} starts with a @samp{$}, the corresponding environment
+variable will be dereferenced, and its value will be used instead.
 
 @item -fno-dwarf2-cfi-asm
 @opindex fdwarf2-cfi-asm
diff --git a/gcc/final.c b/gcc/final.c
index 8cb5533..bc43b61 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -1525,6 +1525,9 @@ add_debug_prefix_map (const char *arg)
 {
   debug_prefix_map *map;
   const char *p;
+  char *env;
+  const char *old;
+  size_t oldlen;
 
   p = strchr (arg, '=');
   if (!p)
@@ -1532,9 +1535,29 @@ add_debug_prefix_map (const char *arg)
   error ("invalid argument %qs to -fdebug-prefix-map", arg);
   return;
 }
+  if (*arg == '$')
+{
+  env = xstrndup (arg+1, p - (arg+1));
+  old = getenv(env);
+  if (!old)
+   {
+ warning (0, "environment variable %qs not set in argument to "
+  "-fdebug-prefix-map", env);
+ free(env);
+ return;
+   }
+  oldlen = strlen(old);
+  free(env);
+}
+  else
+{
+  old = xstrndup (arg, p - arg);
+  oldlen = p - arg;
+}
+
   map = XNEW (debug_prefix_map);
-  map->old_prefix = xstrndup (arg, p - arg);
-  map->old_len = p - arg;
+  map->old_prefix = old;
+  map->old_len = oldlen;
   p++;
   map->new_prefix = xstrdup (p);
   map->new_len = strlen (p);
-- 
2.6.2

Re: [hsa 1/10] Configury changes and new options

Hi,

On Tue, Dec 08, 2015 at 10:43:15PM +, Richard Sandiford wrote:
> [Sorry for the low-quality review, was just reading out of interest...]
> 
> Martin Jambor  writes:
> > +If you configure GCC with HSA offloading but do not have the HSA
> > +run-time library installed in a standard location then you can
> > +explicitely specify the directory where they are installed.  The
> 
> typo: explicitly

oops.  For some reason, my spell-checker accepts this typo.  I will
fix it.

> 
> > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> > index e4772d1..5609207 100644
> > --- a/gcc/lto-wrapper.c
> > +++ b/gcc/lto-wrapper.c
> > @@ -745,6 +745,11 @@ compile_images_for_offload_targets (unsigned in_argc, 
> > char *in_argv[],
> >offload_names = XCNEWVEC (char *, num_targets + 1);
> >for (unsigned i = 0; i < num_targets; i++)
> >  {
> > +  /* HSA does not use LTO-like streaming and a different compiler, skip
> > +it. */
> > +  if (strncmp(names[i], "hsa", 3) == 0)
> > +   continue;
> > +
> >offload_names[i]
> > = compile_offload_image (names[i], compiler_path, in_argc, in_argv,
> >  compiler_opts, compiler_opt_count,
> 
> Looks like this would cause the caller loop:
> 
>   if (offload_names)
>   {
> find_offloadbeginend ();
> for (i = 0; offload_names[i]; i++)
>   printf ("%s\n", offload_names[i]);
> free_array_of_ptrs ((void **) offload_names, i);
>   }
> 
> to terminate early if there was another target after hsa.
> 

Good catch.  I have modified this code so that it never leaves any
holes in offload_names[i].

> names[i] is null-terminated, so it looks like you're deliberately
> allowing anything that starts with "hsa" here, but:

Right, and that was probably a mistake, I have changed the check to
simple strcmp.

> 
> > diff --git a/gcc/opts.c b/gcc/opts.c
> > index 874c84f..5647f0c 100644
> > --- a/gcc/opts.c
> > +++ b/gcc/opts.c
> > @@ -1906,8 +1906,35 @@ common_handle_option (struct gcc_options *opts,
> >break;
> >  
> >  case OPT_foffload_:
> > -  /* Deferred.  */
> > -  break;
> > +  {
> > +   const char *p = arg;
> > +   opts->x_flag_disable_hsa = true;
> > +   while (*p != 0)
> > + {
> > +   const char *comma = strchr (p, ',');
> > +
> > +   if ((strncmp (p, "disable", 7) == 0)
> > +   && (p[7] == ',' || p[7] == '\0'))
> > + {
> > +   opts->x_flag_disable_hsa = true;
> > +   break;
> > + }
> > +
> > +   if ((strncmp (p, "hsa", 3) == 0)
> > +   && (p[3] == ',' || p[3] == '\0'))
> > + {
> > +#ifdef ENABLE_HSA
> > +   opts->x_flag_disable_hsa = false;
> > +#else
> > +   sorry ("HSA has not been enabled during configuration");
> > +#endif
> 
> ...here you only allow "hsa" itself.
> 
> (Not your fault, but: do we have any documentation for -foffload
> and -foffload-abi?  Couldn't see any in the texi files.)

Yes, that is actually PR 67300.  However, I do not understand the more
complex forms the parameter can take enough to attempt to fix it.

In order to address all for you concerns, I am going to install the
following on the branch.

Thanks for the feedback,

Martin


2015-12-09  Martin Jambor  

* lto-wrapper.c (compile_images_for_offload_targets): Do not leave
holes in offload_names.  Use strcmp instead strncmp.
* doc/install.texi (--with-hsa-runtime): Fix typo.
---
 gcc/doc/install.texi | 2 +-
 gcc/lto-wrapper.c| 8 +---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index afd891c..a85a063 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1993,7 +1993,7 @@ compiler will emit the accelerator code, no path should 
be specified.
 
 If you configure GCC with HSA offloading but do not have the HSA
 run-time library installed in a standard location then you can
-explicitely specify the directory where they are installed.  The
+explicitly specify the directory where they are installed.  The
 @option{--with-hsa-runtime=@/@var{hsainstalldir}} option is a
 shorthand for
 @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 5609207..5b58fd6 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -736,6 +736,7 @@ compile_images_for_offload_targets (unsigned in_argc, char 
*in_argv[],
 return;
   unsigned num_targets = parse_env_var (target_names, &names, NULL);
 
+  int next_name_entry = 0;
   const char *compiler_path = getenv ("COMPILER_PATH");
   if (!compiler_path)
 goto out;
@@ -747,16 +748,17 @@ compile_images_for_offload_targets (unsigned in_argc, 
char *in_argv[],
 {
   /* HSA does not use LTO-like streaming and a different compiler, skip
 it. */
-  if (strncmp(names[i], "hsa", 3) == 0)
+  if (strcmp (names[i], "hsa") == 0)
continue;
 
-  offload_names[i]
+  offload_names

Re: [hsa 1/10] Configury changes and new options

Hi,

On Mon, Dec 07, 2015 at 12:19:08PM +0100, Martin Jambor wrote:
> Hi,
> 
> this patch contains changes to the configuration mechanism and offload
> bits, so that users can build compilers with HSA support.

when writing up how to build an HSA-enabled GCC for the wiki page, and
checking the process actually works I realized that AMD no longer
ships libhsakmt as a part of the run time.  So we either have to tell
users to copy the library over to the same directory where the
run-time is (what I did on my machines and then forgot about it) or
provide one more configuration option, otherwise configure libgomp
fails.  The patch below does the second.  I have verified that the
configuration works as intended fo freshly downloaded/built HSA
run-time and libhsakmt.  I am going to commit it to the branch shortly
and of course would need it as part of hsa merge,

Sorry for realizing this so late,

Martin

2015-12-09  Martin Jambor  

libgomp/
* plugin/configfrag.ac (hsa-kmt-lib): New.

gcc/
* doc/install.texi (Configuration): Document --with-hsa-kmt-lib.
---
 gcc/doc/install.texi |  5 +
 libgomp/configure| 17 +++--
 libgomp/plugin/configfrag.ac |  8 
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 232586d..afd891c 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1999,6 +1999,11 @@ shorthand for
 @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
 @option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.

+@item --with-hsa-kmt-lib=@var{pathname}
+
+If you configure GCC with HSA offloading but do not have the HSA
+KMT library installed in a standard location then you can
+explicitly specify the directory where it resides.
 @end table

 @subheading Cross-Compiler-Specific Options
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index c50e5cb..fd77429 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -118,6 +118,14 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi

+AC_ARG_WITH(hsa-kmt-lib,
+   [AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
+   [specify directory for installed HSA KMT library.])])
+if test "x$with_hsa_kmt_lib" != x; then
+  HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
+  HSA_RUNTIME_LIB=
+fi
+
 PLUGIN_HSA=0
 PLUGIN_HSA_CPPFLAGS=
 PLUGIN_HSA_LDFLAGS=
-- 
2.6.3

Re: [hsa 2/10] Modifications to libgomp proper

Hi,

thanks for the feedback.  I have incorporated most of it into the
branch (the diff is below) but  also have a few questions.

On Wed, Dec 09, 2015 at 12:35:36PM +0100, Jakub Jelinek wrote:
> On Mon, Dec 07, 2015 at 12:19:57PM +0100, Martin Jambor wrote:
> > +/* Flag set when the subsequent element in the device-specific argument
> > +   values.  */
> > +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM   (1 << 7)
> > +
> > +/* Bitmask to apply to a target argument to find out the value identifier. 
> >  */
> > +#define GOMP_TARGET_ARG_ID_MASK(((1 << 8) - 1) << 8)
> > +/* Target argument index of NUM_TEAMS.  */
> > +#define GOMP_TARGET_ARG_NUM_TEAMS  (1 << 8)
> > +/* Target argument index of THREAD_LIMIT.  */
> > +#define GOMP_TARGET_ARG_THREAD_LIMIT   (2 << 8)
> 
> I meant that these two would be just special, passed as the first two
> pointers in the array, without the markup.  Because, otherwise you either
> need to use GOMP_TARGET_ARG_SUBSEQUENT_PARAM for these always, or for 32-bit
> arches and for 64-bit ones shift often at runtime.  Having the markup even
> for them is perhaps cleaner, but less efficient, so if you really want to go
> that way, please make sure you handle it properly for 32-bit pointers
> architectures though.  num_teams or thread_limit could be > 32767 or >
> 65535.

I see, I prefer the clean approach, even if it is more work, this
interface looks like it is going to be extended in the future.  But I
am wondering whether embedding the value into the identifier element
is actually worth it.  The passed array is going to be a small local
variable and I wonder whether there is going to be any benefit in it
having two elements instead of four (or four instead of six for
gridified kernels), especially if it means introducing control flow on
the part of the caller.  But if you really want it that way, I will
implement that.

> 
> > -static void
> > -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
> > -  void **hostaddrs, size_t *sizes,
> > -  unsigned short *kinds)
> > +static void *
> > +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs,
> > + size_t *sizes, unsigned short *kinds)
> >  {
> >size_t i, tgt_align = 0, tgt_size = 0;
> >char *tgt = NULL;
> > @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void 
> > *), size_t mapnum,
> >}
> >if (tgt_align)
> >  {
> > -  tgt = gomp_alloca (tgt_size + tgt_align - 1);
> > +  tgt = gomp_malloc (tgt_size + tgt_align - 1);
> 
> I don't like using gomp_malloc here, either copy/paste the function, or
> create separate inline functions for the two loops, one for the first loop
> which returns you tgt_align and tgt_size, and another for the stuff after
> the allocation.  Then you can use those two inline functions to implement
> both gomp_target_fallback_firstprivate which will use alloca, and
> gomp_target_unshare_firstprivate which will use gomp_malloc instead.

OK, I did that.

> 
> > @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> > and several arguments have been added:
> > FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
> > DEPEND is array of dependencies, see GOMP_task for details.
> > +   ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and 
> > a
> > +   variable number of device-specific arguments, which always take two 
> > elements
> > +   where the first specifies the type and the second the actual value.  The
> > +   last element of the array is a single NULL.
> 
> Note, here you document NUM_TEAMS and THREAD_LIMIT as special values, not
> encoded.

I have changed the comment but will remember to do it again if
necessary after changing omp-low.c

> 
> > @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, 
> > size_t mapnum,
> >struct gomp_device_descr *devicep = resolve_device (device);
> >  
> >if (devicep == NULL
> > +  || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> >|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> 
> Would be nice to have some consistency in the order of capabilities checks.
> Usually you check SHARED_MEM after OPENMP_400, so perhaps do it this way
> here too.

Sure.

> 
> > @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data)
> >  
> >if (ttask->state == GOMP_TARGET_TASK_FINISHED)
> > {
> > - gomp_unmap_vars (ttask->tgt, true);
> > + if (ttask->tgt)
> > +   gomp_unmap_vars (ttask->tgt, true);
> >   return false;
> > }
> 
> This doesn't make sense.  For the GOMP_OFFLOAD_CAP_SHARED_MEM case, unless
> you want to run the free (ttask->firstprivate_copies); as a task, you
> shouldn't be queing the target task again for further execution, instead
> it should just be handled like a finished task at that point.  The reason
> w

Re: [hsa 0/10] Merge of HSA branch

Hi,

On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote:
> On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote:
> > Because I have not been able to come up with any solution to failing
> > libgomp/testsuite/libgomp.c++/target-2.C, I have disabled use of
> > dynamic parallelism in this merge (I keep it on the branch) and
> > therefore entirely rely on the gridification process to run loops on
> > the accelerator, because gridified constructs do not have this issue
> > (passing private symbols by reference).
> 
> I'm fine with not doing it in this series, but I'd strongly prefer
> if dynamic parallelism is added for GCC 6.1.  Even for PTX we'll need
> some IPA analysis on what functions might run in the various OpenMP
> contexts (teams, parallel, simd) and what functions contains such
> directives, and let the backends (or HSA) do something based on that
> for sharing of the vars, or other properties of the function code
> generation.

6.1 seems a bit ambitious but I will try my best to bring it back as
soon as possible.

> 
> > HSA tests are still missing, I would need some guidance as to how to
> > best implement them (specially to test gridification which of course
> > does not happen for other accelerators).  There are no failing
> > testcases if HSA is not configured.  If it is, there are some, all of
> > which fall into one the following categories:
> > 
> >   1) HSA cannot compile a function for one reason or another (most
> >  common cause is inability of HSA to take an address of a function
> >  or make an indirect call) and gives a warning, which is regarded
> >  as an "excess error" by dejagnu.
> 
> It would be good if there is a -W* switch to turn such warnings off.
> Not just for the purposes of dejagnu libgomp testing, but say one
> might try to compile a program primarily say for XeonPhi or PTX offloading,
> but have HSA enabled to, but care primarily about the former two, etc.

All these warnings are in the -Whsa group and can be suppressed with
-Wno-hsa.

> 
> >   2) When HSA is not emitted for a function, libgomp runs a host
> >  fallback instead of it.  When the test queries
> >  omp_is_initial_device and asserts it returns false, the test
> >  fails.
> 
> Do you have examples of which tests fall into this category?

For example libgomp.c/examples-4/target-5.c.  HSA-runtime fails
"linking" the generated HSAIL because there is no function abort (we
really need to wait for HSA to implement traps to provide it) and so
the libgomp plugin decides to run the host fallback even when the
deja-gnu predicates would normally preclude it.  And the testcase
tests that omp_is_initial_device returns non-NULL.

> 
> In any case, it will be needed to also update the wiki page with details on
> how to build the HSA support in, what are the prerequisities etc.
> 

I have updated the gcc/README.hsa file on the branch with current
information.  The wiki page apparently requires a bit more editing
that I can accomplish today.  But I will add something similar there
there too.

Thanks,

Martin

Patch to fix PR68691

2015-12-10 Thread Vladimir Makarov


  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68691

  The patch was bootstrapped and tested on x86/x86-64.

  Committed as rev. 231538.


Index: ChangeLog
===
--- ChangeLog	(revision 231537)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2015-12-10  Vladimir Makarov  
+
+	PR rtl-optimization/68691
+	* lra-spills.c (lra_final_code_change): Check pseudo occurrence
+	number in non-debug insns and remove debug insns if necessary.
+
 2015-12-10  Martin Sebor  
 
 	* invoke.texi (Warning Options): Update -Wall options.  Clarify
Index: lra-spills.c
===
--- lra-spills.c	(revision 231423)
+++ lra-spills.c	(working copy)
@@ -708,6 +708,30 @@ lra_final_code_change (void)
 	}
 	
 	  lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	  struct lra_insn_reg *reg;
+
+	  for (reg = id->regs; reg != NULL; reg = reg->next)
+	if (reg->regno >= FIRST_PSEUDO_REGISTER
+		&& lra_reg_info [reg->regno].nrefs == 0)
+	  break;
+	  
+	  if (reg != NULL)
+	{
+	  /* Pseudos still can be in debug insns in some very rare
+		 and complicated cases, e.g. the pseudo was removed by
+		 inheritance and the debug insn is not EBBs where the
+		 inheritance happened.  It is difficult and time
+		 consuming to find what hard register corresponds the
+		 pseudo -- so just remove the debug insn.  Another
+		 solution could be assigning hard reg/memory but it
+		 would be a misleading info.  It is better not to have
+		 info than have it wrong.  */
+	  lra_assert (DEBUG_INSN_P (insn));
+	  lra_invalidate_insn_data (insn);
+	  delete_insn (insn);
+	  continue;
+	}
+	  
 	  struct lra_static_insn_data *static_id = id->insn_static_data;
 	  bool insn_change_p = false;
 	  
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 231537)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2015-12-10  Vladimir Makarov  
+
+	PR rtl-optimization/68691
+	* gcc.target/i386/pr68691.c: New.
+
 2015-12-10  David Malcolm  
 
 	* lib/multiline.exp (_multiline_expected_outputs): Update comment.
Index: testsuite/gcc.target/i386/pr68691.c
===
--- testsuite/gcc.target/i386/pr68691.c	(revision 0)
+++ testsuite/gcc.target/i386/pr68691.c	(working copy)
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -g" } */
+
+char a, b, i, j;
+int c, d, e, f, g, h, n;
+
+char
+fn1 ()
+{
+  char k, l, m;
+  int p;
+  e = g > f;
+  for (b = 0; b < 2; b++)
+{
+  for (p = 0; p < 3; p++)
+	{
+	  for (; h < 1; h++)
+	{
+	  for (; m;)
+		goto lbl;
+	  e = g;
+	}
+	  l = a < 0 || a < d;
+	}
+  d++;
+  for (;;)
+	{
+	  k = g;
+	  n = -k;
+	  j = n;
+	  c = j;
+	  e = 2;
+	  if (l)
+	break;
+	  return 2;
+	}
+}
+  for (;;)
+;
+ lbl:
+  return i;
+}

[google][gcc-4_9] update hardreg costs only when conflict_costs[] < 0

2015-12-10 Thread Wei Mi

Hi,

The patch is for google branch only.

In r216697, when a hardreg is assigned to an allocno, a positive cost
will be added to those conflict allocnos to reflect the disfavor of
the hardreg.

However, the fact that conflict allocno disfavors a hard_regno doesn't
necessarily mean current allocno should prefer the hard_regno, so it
is incorrect to update the costs of an allocno directly according to
its conflict allocnos. The patch changes the code to update costs[i]
of an allocno only when conflict_costs[i] < 0, .i.e, when conflict
allocno prefer hardreg i.

Another issue is the costs of an allocno is updated only when the
conflict allocno is not marked as may_be_spilled_p. However, even if a
conflict allocno is marked as may_be_spilled_p right now, it still has
high probablity to get colored later. It is not right to ignore the
preferences from those conflict allocnos marked as may_be_spilled_p.
The patch changes it.

Test:
Gcc unit tests ok.
Minor improvement for google internal benchmarks.

Thanks,
Wei.
gcc/ChangeLog:

2015-12-10  Wei Mi  

* ira-color.c (restore_costs_from_conflicts): Don't record the
cost change.
(update_conflict_hard_regno_costs): Update costs[i] only when
conflict_costs[i] < 0.
(assign_hard_reg): Ditto.


Index: gcc/ira-color.c
===
--- gcc/ira-color.c (revision 231143)
+++ gcc/ira-color.c (working copy)
@@ -1588,9 +1588,11 @@ restore_costs_from_conflicts (ira_allocn
prev = curr;
}
  /* Propagate the disfavor of hardreg from conflict_a to the
-allocnos connecting with conflict_a via copies.  */
+allocnos connecting with conflict_a via copies.
+Note: once the hardreg is assigned to a, it will not be
+changed, so we don't need to record this change. */
  update_costs_from_allocno (conflict_a, hardreg,
-1, false, true, true);
+1, false, false, true);
}
 }
 }
@@ -1601,7 +1603,7 @@ restore_costs_from_conflicts (ira_allocn
update increases chances to remove some copies.  */
 static void
 update_conflict_hard_regno_costs (int *costs, enum reg_class aclass,
- bool decr_p)
+ bool decr_p, bool conflict)
 {
   int i, cost, class_size, freq, mult, div, divisor;
   int index, hard_regno;
@@ -1682,7 +1684,16 @@ update_conflict_hard_regno_costs (int *c
cont_p = true;
if (decr_p)
  cost = -cost;
-   costs[index] += cost;
+   /* conflict being true indicates this is updating costs[]
+  according to preferences of allocnos connected by copies
+  to the conflict allocnos.
+  The fact conflict allocno disfavors hard_regno doesn't
+  necessarily mean current allocno should prefer hard_regno
+  (actually only a little), so we update costs[] only
+  when conflict allocno prefers hard_regno, .i.e, when
+  conflict_costs[i] < 0. */
+   if (conflict && conflict_costs [i] < 0)
+ costs[index] += cost;
  }
  }
/* Probably 5 hops will be enough.  */
@@ -1934,7 +1945,6 @@ assign_hard_reg (ira_allocno_t a, bool r
}
}
  else if (! retry_p
-  && ! ALLOCNO_COLOR_DATA (conflict_a)->may_be_spilled_p
   /* Don't process the conflict allocno twice.  */
   && (ALLOCNO_COLOR_DATA (conflict_a)->last_process
   != curr_allocno_process))
@@ -1967,7 +1977,13 @@ assign_hard_reg (ira_allocno_t a, bool r
  ->new_conflict_hard_regs,
  hard_regno))
  continue;
-   full_costs[j] -= conflict_costs[k];
+   /* The fact conflict_a disfavors hard_regno doesn't
+  necessarily mean current allocno should prefer
+  hard_regno so much (only a little), so we only
+  update full_costs[] when conflict_a prefers
+  hard_regno, .i.e, when conflict_costs[k] < 0. */
+   if (conflict_costs[k] < 0)
+ full_costs[j] -= conflict_costs[k];
  }
  queue_update_cost (conflict_a, NULL, COST_HOP_DIVISOR);
 
@@ -1977,7 +1993,7 @@ assign_hard_reg (ira_allocno_t a, bool r
   if (! retry_p)
 /* Take into account preferences of allocnos connected by copies to
the conflict allocnos.  */
-update_conflict_hard_regno_costs (full_costs, aclass, true);
+update_conflict_hard_regno_costs (full_costs, aclass, true, true);
 
   /* Take preferences of allocnos connected by copies into

Re: [PATCH 4/4] Add -Wmisleading-indentation to -Wall

On Thu, 2015-10-29 at 11:38 -0600, Jeff Law wrote:
> On 10/29/2015 10:49 AM, David Malcolm wrote:
> > Our documentation describes -Wall as enabling "all the warnings about
> > constructions that some users consider questionable, and that are easy to 
> > avoid
> > (or modify to prevent the warning), even in conjunction with macros."
> >
> > I believe that -Wmisleading-indentation meets these criteria, and is
> > likely to be of benefit to users who may not read release notes; it
> > warns for indentation that's misleading, but not for indentation
> > that's merely bad: the former are places where a user will likely
> > want to fix the code.
> >
> > The fix is usually easy and obvious: fix the misleadingly-indented
> > code.  If that isn't an option for some reason, pragmas can be used to
> > turn off the warning for a particular fragment of code:
> >
> >#pragma GCC diagnostic push
> >#pragma GCC diagnostic ignored "-Wmisleading-indentation"
> >  if (flag)
> >x = 3;
> >y = 2;
> >#pragma GCC diagnostic pop
> >
> > -Wmisleading-indentation has been tested with a variety of indentation
> > styles (see gcc/testsuite/c-c++-common/Wmisleading-indentation.c)
> > and on a variety of real-world projects.  For example, in:
> >https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg119790.html
> > Patrick reports:
> > "Tested by building the linux, git, vim, sqlite and gdb-binutils sources
> >   with -Wmisleading-indentation."
> >
> > With the tweak earlier in this kit I believe we now have a good
> > enough signal:noise ratio for this warning to be widely used; hence this
> > patch adds the warning to -Wall.
> >
> > Bootstrapped®rtested with x86_64-pc-linux-gnu.
> >
> > OK for trunk?
> >
> > gcc/c-family/ChangeLog:
> > * c.opt (Wmisleading-indentation): Add to -Wall for C and C++.
> >
> > gcc/ChangeLog:
> > * doc/invoke.texi (-Wall): Add -Wmisleading-indentation to the
> > list.
> > (-Wmisleading-indentation): Update documentation to reflect
> > being enabled by -Wall in C/C++.
> I'm sure we'll get some grief for this :-)
> 
> Approved once we're clean in GCC.  I'm going to explicitly say that 
> we'll have to watch for fallout, particularly as we start getting 
> feedback from Debian & Fedora mass-rebuilds as we approach release time. 
>   If the fallout is too bad, we'll have to reconsider.

I believe we're now clean; I've committed this to trunk (*without* the
blank-lines heuristic/levels idea) as r231539, having
bootstrapped®rtested on x86_64-pc-linux-gnu (I've also successfully
performed an all-configs build with it, although that was a while back
now).

> I'll pre-approve patches which fix anything caught by this option in GCC 
> as long as the fix just adjusts whitespace :-)
> 
> jeff
> 
>

Request permission to delete gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c

2015-12-10 Thread Bill Schmidt

Hi,

The subject test case has been failing as follows:

FAIL: gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c 
scan-tree-dump-times vect "vectorization not profitable" 1

The test has been failing since r223528, which is:

2015-05-22  Richard Biener  

PR tree-optimization/65701
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Move peeling cost models into one place.  Peel for alignment
for single loads only if an aligned load is cheaper than
an unaligned load.

Thus with that modification, gcc now vectorizes the loop that was
previously deemed unprofitable to vectorize.  As a result, the test case
no longer has any reason to exist, and I would like to delete it.

Ok for trunk?

Thanks,
Bill


[gcc/testsuite]

2015-12-10  Bill Schmidt  

* gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c:
Delete.

Re: Request permission to delete gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c

2015-12-10 Thread David Edelsohn

On Thu, Dec 10, 2015 at 2:23 PM, Bill Schmidt
 wrote:
> Hi,
>
> The subject test case has been failing as follows:
>
> FAIL: gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c 
> scan-tree-dump-times vect "vectorization not profitable" 1
>
> The test has been failing since r223528, which is:
>
> 2015-05-22  Richard Biener  
>
> PR tree-optimization/65701
> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
> Move peeling cost models into one place.  Peel for alignment
> for single loads only if an aligned load is cheaper than
> an unaligned load.
>
> Thus with that modification, gcc now vectorizes the loop that was
> previously deemed unprofitable to vectorize.  As a result, the test case
> no longer has any reason to exist, and I would like to delete it.
>
> Ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc/testsuite]
>
> 2015-12-10  Bill Schmidt  
>
> * gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c:
> Delete.

Okay with me.

Thanks, David

Re: [testsuite][ARM target attributes] Fix effective_target tests

2015-12-10 Thread Christophe Lyon

On 10 December 2015 at 14:14, Kyrill Tkachov  wrote:
>
> On 10/12/15 13:04, Christophe Lyon wrote:
>>
>> On 10 December 2015 at 13:30, Kyrill Tkachov 
>> wrote:
>>>
>>> Hi Christophe,
>>>
>>>
>>> On 08/12/15 11:18, Christophe Lyon wrote:

 On 8 December 2015 at 11:50, Kyrill Tkachov 
 wrote:
>
> Hi Christophe,
>
>
> On 27/11/15 13:00, Christophe Lyon wrote:
>>
>> Hi,
>>
>> After the recent commits from Christian adding target attributes
>> support for ARM FPU settings,  I've noticed that some of the tests
>> were failing because of incorrect assumptions wrt to the default
>> cpu/fpu/float-abi of the compiler.
>>
>> This patch fixes the problems I've noticed in the following way:
>> - do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
>> when gcc is configured --with-float=hard
>>
>> - change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
>> flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
>> defined
>>
>> - introduce arm_fp_ok, which is similar but does not enforce fpu
>> setting
>>
>> - add a new effective_target: arm_crypto_pragma_ok to check that
>> setting this fpu via a pragma is actually supported by the current
>> "multilib". This is different from checking the command-line option
>> because the pragma might conflict with the command-line options in
>> use.
>>
>> The updates in the testcases are as follows:
>> - attr-crypto.c, we have to make sure that the defaut fpu does not
>> conflict with the one forced by pragma. That's why I use the arm_vfp
>> options/effective_target. This is needed if gcc has been configured
>> --with-fpu=neon-fp16, as the pragma fpu=crypto-neon-fp-armv8 would
>> conflict.
>>
>> - attr-neon-builtin-fail.c: use arm_fp to force the appropriate
>> float-abi setting. Enforcing fpu is not needed here.
>>
>> - attr-neon-fp16.c: similar, I also removed arm_neon_ok since it was
>> not necessary to make the test pass in my testing. On second thought,
>> I'm wondering whether I should leave it and make the test unsupported
>> in more cases (such as when forcing -march=armv5t, although it does
>> pass with this patch)
>>
>> - attr-neon2.c: use arm_vfp to force the appropriate float-abi
>> setting. Enforcing mfpu=vfp is needed to avoid conflict with the
>> pragma target fpu=neon (for instance if the toolchain default is
>> neon-fp16)
>>
>> - attr-neon3.c: similar
>>
>> Tested on a variety of configurations, see:
>>
>>
>>
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html
>>
>> Note that the regressions reported fall into 3 categories:
>> - when forcing march=armv5t: tests are now unsupported because I
>> modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.
>>
>> - the warning reported by attr-neon-builtin-fail.c moved from line 12
>> to 14 and is thus seen as a regression + one improvement
>>
>> - finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
>> which I need to post a bugzilla.
>>
>>
>> TBH, I'm a bit concerned by the complexity of all these multilib-like
>> conditions. I'm confident that I'm still missing some combinations :-)
>>
>> And with new target attributes coming, new architectures etc... all
>> this logic is likely to become even more complex.
>>
>> That being said, OK for trunk?
>
>
> I'd like us to clean up and reorganise the gcc.target/arm testsuite
> at some point to make it more robust with respect to the tons of
> multilib
> options and configurations we can have for arm. It's becoming very
> frustrating
> to test feature-specific stuff :(
>
> This is ok in the meantime.
> Sorry for the delay.
>
 Thanks for the approval, glad to see I'm not the only one willing to see
 improvements in this area :)

 Committed as r231403.
>>>
>>>
>>> With this patch we're seeing legitimate PASSes go to NA.
>>> For example:
>>>
>>> gcc.target/arm/vfp-1.c
>>> gcc.target/arm/vfp-ldmdbs.c
>>> and other vfp tests in arm.exp.
>>> This is, for example, for the
>>> variant-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard
>>>
>> Hmm I'm attempting to cover such a configuration in my matrix of
>> validations:
>>
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231403/report-build-info.html
>>
>> The difference is that I use similar flags at GCC configure time, while
>> you
>> override them when running the testsuite:
>> --target arm-none-linux-gnueabihf --with-mode=thum --with-cpu=cortex-a57
>> --with-fpu=crypto-neon-fp-armv8
>>
>> I this case, I do not see the regressions.
>
>
> My gcc is arm-none-eabi configured with
> --with-float=hard --with

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation


On 12/03/2015 07:38 AM, Richard Biener wrote:

This pass is now enabled by default with -Os but has no limits on the amount of
stmts it copies.
The more statements it copies, the more likely it is that the path 
spitting will turn out to be useful!  It's counter-intuitive.


The primary benefit AFAICT with path splitting is that it exposes 
additional CSE, DCE, etc opportunities.


IIRC  Ajit posited that it could help with live/conflict analysis, I 
never saw that, and with the changes to push splitting deeper into the 
pipeline I'd further life/conflict analysis since that work also 
involved preserving the single latch property.




 It also will make all loops with this shape have at least two

exits (if the resulting loop will be disambiguated the inner loop will
have two exits).
Having more than one exit will disable almost all loop optimizations after it.
Hmmm, the updated code keeps the single latch property, but I'm pretty 
sure it won't keep a single exit policy.


To keep a single exit policy would require keeping an additional block 
around.  Each of the split paths would unconditionally transfer to this 
new block.  The new block would then either transfer to the latch block 
or out of the loop.





The pass itself documents the transform it does but does zero to motivate it.

What's the benefit of this pass (apart from disrupting further optimizations)?
It's essentially building superblocks in a special case to enable 
additional CSE, DCE and the like.


Unfortunately what is is missing is heuristics and de-duplication.  The 
former to drive cases when it's not useful and the latter to reduce 
codesize for any statements that did not participate in optimizations 
when they were duplicated.


The de-duplication is the "sink-statements-through-phi" problems, cross 
jumping, tail merging and the like class of problems.


It was only after I approved this code after twiddling it for Ajit that 
I came across Honza's tracer implementation, which may in fact be 
retargettable to these loops and do a better job.  I haven't 
experimented with that.






I can see a _single_ case where duplicating the latch will allow threading
one of the paths through the loop header to eliminate the original exit.  Then
disambiguation may create a nice nested loop out of this.  Of course that
is only profitable again if you know the remaining single exit of the inner
loop (exiting to the outer one) is executed infrequently (thus the inner loop
actually loops).

It wasn't ever about threading.



But no checks other than on the CFG shape exist (oh, it checks it will
at _least_ copy two stmts!).
Again, the more statements it copies the more likely it is to be 
profitable.  Think superblocks to expose CSE, DCE and the like.




Given the profitability constraints above (well, correct me if I am
wrong on these)
it looks like the whole transform should be done within the FSM threading
code which might be able to compute whether there will be an inner loop
with a single exit only.
While it shares some concepts with jump threading, I don't think the 
transformation belongs in jump threading.




I'm inclined to request the pass to be removed again or at least disabled by
default.
I wouldn't lose any sleep if we disabled by default or removed, 
particularly if we can repurpose Honza's code.  In fact, I might 
strongly support the former until we hear back from Ajit on performance 
data.


I also keep coming back to Click's paper on code motion -- in that 
context, copying statements would be a way to break dependencies and 
give the global code motion algorithm more freedom.  The advantage of 
doing it in a framework like Click's is it's got a built-in sinking step.





What closed source benchmark was this transform invented for?
I think it was EEMBC or Coremark.  Ajit should know for sure.  I was 
actually still hoping to see benchmark results from Ajit to confirm the 
new placement in the pipeline didn't negate all the benefits he saw.


jeff

[PATCH] Fix ICE with -fno-if-conversion (PR rtl-optimization/68730)

Hi!

It seems some passes in between the combiner and ira aren't prepared to
update dominance info.  It usually is not a problem, because already before
the combiner we call free_dominance_info.  But we now have a new i?86
stv pass that is injected after the combiner that computes dominators but
does not free them.

So, to fix ICE on the following testcase, we can either do what the patch
does, or could conditionalize both the calculate_dominance_info and
free_dominance_info in the convert_scalars_to_vector function (stv pass)
on the dominance info not being computed (like other places in gcc do),
or we could stick free_dominance_info into all passes that break the
dominators just in case it would be computed (out_of_cfglayout is one
example).

Bootstrapped/regtested on x86_64-linux and i686-linux, is this ok for trunk
(or some other variant is preferrable)?

2015-12-10  Jakub Jelinek  

PR rtl-optimization/68730
* config/i386/i386.c (convert_scalars_to_vector): Call
free_dominance_info at the end.

* gcc.dg/pr68730.c: New test.

--- gcc/config/i386/i386.c.jj   2015-12-09 14:39:02.0 +0100
+++ gcc/config/i386/i386.c  2015-12-10 12:15:59.517609392 +0100
@@ -3577,6 +3577,7 @@ convert_scalars_to_vector ()
   BITMAP_FREE (candidates);
   bitmap_obstack_release (NULL);
   df_process_deferred_rescans ();
+  free_dominance_info (CDI_DOMINATORS);
 
   /* Conversion means we may have 128bit register spills/fills
  which require aligned stack.  */
--- gcc/testsuite/gcc.dg/pr68730.c.jj   2015-12-10 12:22:07.330365019 +0100
+++ gcc/testsuite/gcc.dg/pr68730.c  2015-12-10 12:24:03.908702426 +0100
@@ -0,0 +1,51 @@
+/* PR rtl-optimization/68730 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-if-conversion" } */
+/* { dg-additional-options "-march=x86-64" { target { i?86-*-* x86_64-*-* } } 
} */
+
+int b, d, e;
+unsigned long long c = 4100543410106915;
+
+void
+foo (void)
+{
+  short f, g = 4 % c;
+  int h = c;
+  if (h)
+{
+  int i = ~c;
+  if (~c)
+   i = 25662;
+  f = g = i;
+  h = c - g + ~-f;
+  c = ~(c * h - f);
+}
+  f = g;
+  unsigned long long k = g || c;
+  short l = c ^ g ^ k;
+  if (g > 25662 || c == 74074520320 || !(g < 2))
+{
+  k = c;
+  l = g;
+  c = ~((k && c) + ~l);
+  f = ~(f * (c ^ k) | l);
+  if (c > k)
+   __builtin_printf ("%d\n", f);
+}
+  short m = -f;
+  unsigned long long n = c;
+  c = m * f | n % c;
+  if (n)
+__builtin_printf ("%d\n", f);
+  while (f < -31807)
+;
+  c = ~(n | c) | f;
+  if (n < c)
+__builtin_printf ("%lld\n", (long long) f);
+  for (; d;)
+for (; e;)
+  for (;;)
+   ;
+  c = h;
+  c = l % c;
+}

Jakub

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation


On 12/03/2015 07:45 AM, Richard Biener wrote:



Ah, some EEMBC one.

Btw, the testcase that was added shows

if (xc < xm)
  {
xk = (unsigned char) (xc < xy ? xc : xy);
  }
else
 {
   xk = (unsigned char) (xm < xy ? xm : xy);
 }

which might be better handled by phiopt transforming it into
I don't think the included testcase is a particularly good one for this 
transformation -- I didn't see that the transformation made any 
significant difference on x86_64.  That why I asked for Ajit for more 
data on the benchmarking.





xk = MIN (xc, MIN (xm, xy))

phiopt1 sees (hooray to GENERIC folding)

   xc_26 = ~xr_21;
   xm_27 = ~xg_23;
   xy_28 = ~xb_25;
   if (xr_21 > xg_23)
 goto ;
   else
 goto ;

   :
   xk_29 = MIN_EXPR ;
   goto ;

   :
   xk_30 = MIN_EXPR ;

   :
   # xk_4 = PHI 

btw, see PR67438 for a similar testcase and the above pattern.
That may be elsewhere in BZ database as well.  I've seen stuff that 
looks awful close to that when going through the bug lists in prior 
releases.


jeff

[PTX] reorder function calling hooks

2015-12-10 Thread Nathan Sidwell

This patch reorders the function calling hooks so they come before 
write_one_arg.  There's no change to their functionality, but it will allow them 
to be called from write_one_arg and write_result, rather than have those 
functions continue to independently perform essentially the same calculation. 
Also, I split write_one_arg into a write_one_arg helper, (which does write one 
arg), and a write_arg  main entry point (which writes one or more PTX args).


nathan
2015-12-10  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_function_arg,
	nvptx_function_incoming_arg, nvptx_function_arg_advance,
	nvptx_strict_argument_naming, nvptx_function_arg_boundary,
	nvptx_libcall_value, nvptx_function_value,
	nvptx_function_value_regno_p, nvptx_pass_by_reference,
	nvptx_return_in_memory, nvptx_promote_function_mode,
	nvptx_static_chain): Move earlier.
	(write_one_arg): Break out as helper fn for ...
	(write_arg): ... this new function.  Adjust all callers.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231542)
+++ config/nvptx/nvptx.c	(working copy)
@@ -389,69 +389,270 @@ arg_promotion (machine_mode mode)
   return mode;
 }
 
-/* Process function parameter TYPE, either emitting in a prototype
-   argument, or as a copy a in a function prologue.  ARGNO is the
-   index of this argument in the PTX function.  FOR_REG is negative,
-   if we're emitting the PTX prototype.  It is zero if we're copying
-   to an argument register and it is greater than zero if we're
-   copying to a specific hard register.  PROTOTYPED is true, if this
-   is a prototyped function, rather than an old-style C declaration.
+/* Implement TARGET_FUNCTION_ARG.  */
 
-   The behaviour here must match the regular GCC function parameter
-   marshalling machinery.  */
+static rtx
+nvptx_function_arg (cumulative_args_t, machine_mode mode,
+		const_tree, bool named)
+{
+  if (mode == VOIDmode)
+return NULL_RTX;
 
-static int
-write_one_arg (std::stringstream &s, int for_reg, int argno,
-	   tree type, bool prototyped)
+  if (named)
+return gen_reg_rtx (mode);
+  return NULL_RTX;
+}
+
+/* Implement TARGET_FUNCTION_INCOMING_ARG.  */
+
+static rtx
+nvptx_function_incoming_arg (cumulative_args_t cum_v, machine_mode mode,
+			 const_tree, bool named)
 {
-  machine_mode mode = TYPE_MODE (type);
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  if (mode == VOIDmode)
+return NULL_RTX;
 
-  if (!PASS_IN_REG_P (mode, type))
-mode = Pmode;
+  if (!named)
+return NULL_RTX;
 
-  machine_mode split = maybe_split_mode (mode);
-  if (split != VOIDmode)
+  /* No need to deal with split modes here, the only case that can
+ happen is complex modes and those are dealt with by
+ TARGET_SPLIT_COMPLEX_ARG.  */
+  return gen_rtx_UNSPEC (mode,
+			 gen_rtvec (1, GEN_INT (cum->count)),
+			 UNSPEC_ARG_REG);
+}
+
+/* Implement TARGET_FUNCTION_ARG_ADVANCE.  */
+
+static void
+nvptx_function_arg_advance (cumulative_args_t cum_v,
+			machine_mode ARG_UNUSED (mode),
+			const_tree ARG_UNUSED (type),
+			bool ARG_UNUSED (named))
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  cum->count++;
+}
+
+/* Handle the TARGET_STRICT_ARGUMENT_NAMING target hook.
+
+   For nvptx, we know how to handle functions declared as stdarg: by
+   passing an extra pointer to the unnamed arguments.  However, the
+   Fortran frontend can produce a different situation, where a
+   function pointer is declared with no arguments, but the actual
+   function and calls to it take more arguments.  In that case, we
+   want to ensure the call matches the definition of the function.  */
+
+static bool
+nvptx_strict_argument_naming (cumulative_args_t cum_v)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  return cum->fntype == NULL_TREE || stdarg_p (cum->fntype);
+}
+
+/* Implement TARGET_FUNCTION_ARG_BOUNDARY.  */
+
+static unsigned int
+nvptx_function_arg_boundary (machine_mode mode, const_tree type)
+{
+  unsigned int boundary = type ? TYPE_ALIGN (type) : GET_MODE_BITSIZE (mode);
+
+  if (boundary > BITS_PER_WORD)
+return 2 * BITS_PER_WORD;
+
+  if (mode == BLKmode)
 {
-  mode = split;
-  argno = write_one_arg (s, for_reg, argno,
-			 TREE_TYPE (type), prototyped);
+  HOST_WIDE_INT size = int_size_in_bytes (type);
+  if (size > 4)
+return 2 * BITS_PER_WORD;
+  if (boundary < BITS_PER_WORD)
+{
+  if (size >= 3)
+return BITS_PER_WORD;
+  if (size >= 2)
+return 2 * BITS_PER_UNIT;
+}
 }
+  return boundary;
+}
 
-  if (!prototyped && !AGGREGATE_TYPE_P (type))
+/* Implement TARGET_LIBCALL_VALUE.  */
+
+static rtx
+nvptx_libcall_value (machine_mode mode, const_rtx)
+{
+  if (cfun->machine->start_call == NULL_RTX)
+/* Pretend to return in a hard reg for early uses before pseudos can be
+   generated.  */
+return gen_rtx_REG (mode, NVPTX_RETURN_R

Re: [PATCH] Fix ICE with -fno-if-conversion (PR rtl-optimization/68730)


On 12/10/2015 01:11 PM, Jakub Jelinek wrote:

Hi!

It seems some passes in between the combiner and ira aren't prepared to
update dominance info.  It usually is not a problem, because already before
the combiner we call free_dominance_info.  But we now have a new i?86
stv pass that is injected after the combiner that computes dominators but
does not free them.

So, to fix ICE on the following testcase, we can either do what the patch
does, or could conditionalize both the calculate_dominance_info and
free_dominance_info in the convert_scalars_to_vector function (stv pass)
on the dominance info not being computed (like other places in gcc do),
or we could stick free_dominance_info into all passes that break the
dominators just in case it would be computed (out_of_cfglayout is one
example).

Bootstrapped/regtested on x86_64-linux and i686-linux, is this ok for trunk
(or some other variant is preferrable)?

2015-12-10  Jakub Jelinek  

PR rtl-optimization/68730
* config/i386/i386.c (convert_scalars_to_vector): Call
free_dominance_info at the end.

* gcc.dg/pr68730.c: New test.
Any pass that mucks up the dominator tree ought to be wiping it clean. 
It's obviously better if wiping the dominator tree is conditional on the 
pass actually making transformations to the CFG that invalidate the 
stored information.


Similarly, any pass that needs the dominator tree ought to make sure 
it's around.  Note this is cheap if the prior pass hasn't wiped the 
dominator tree.


At least that's always been my understanding.

So ISTM this patch is the right thing to do in and of itself, though it 
may not be complete as there may be passes that aren't following the 
rules noted above.



jeff

[PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-10 Thread Patrick Palka

This patch fixes name-lookup of operators in template definitions whose
operands are non-dependent expressions, i.e. PR c++/21802 (and
incidentally 53223).

The approach that this patch takes is to detect when build_new_op()
returns a call to an overloaded function and to store a call to this
overload intothe template AST instead of storing the raw operator
(an operator would be erroneously subject to overload resolution during
instantiation).

The new function build_min_non_dep_op_overload is the workhorse of the
patch.  It reconstructs the CALL_EXPR that would have been built had an
explicit operator+, operator* etc call been used, i.e. had the overload
gone through finish_call_expr() / build_new_method_call() instead of
through build_new_op().  The parameter OVERLOAD of this new function is
probably not strictly necessary -- one can probably just look at the
CALL_EXPR_FN of the parameter NON_DEP to figure out the overload to use
-- but since the requisite plumbing from build_new_op() already existed
to conveniently get at the overload information I thought I might as
well use it.

I have also created a test case that hopefully exercises all the changes
that were made and to verify that these operator calls are being built
correctly.

Does this approach seem adequate?  Bootstrap and regtesting in progress
on x86_64, OK to commit if testing succeeds?

gcc/cp/ChangeLog:

PR c++/21802
PR c++/53223
* cp-tree.h (build_min_non_dep_op_overload): Declare.
* tree.c (build_min_non_dep_op_overload): Define.
* typeck.c (build_x_indirect_ref): Use
build_min_non_dep_op_overload when the given expression
has been resolved to an operator overload.
(build_x_binary_op): Likewise.
(build_x_array_ref): Likewise.
(build_x_unary_op): Likewise.
(build_x_compound_expr): Likewise.
(build_x_modify_expr): Likewise.

gcc/testsuite/ChangeLog:

PR c++/21802
PR c++/53223
* g++.dg/cpp0x/pr53223.C: New test.
* g++.dg/lookup/pr21802.C: New test.
* g++.dg/lookup/two-stage4.C: Remove XFAIL.
---
 gcc/cp/cp-tree.h |   1 +
 gcc/cp/tree.c|  64 
 gcc/cp/typeck.c  | 100 +---
 gcc/testsuite/g++.dg/cpp0x/pr53223.C |  35 
 gcc/testsuite/g++.dg/lookup/pr21802.C| 271 +++
 gcc/testsuite/g++.dg/lookup/two-stage4.C |   2 +-
 6 files changed, 453 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr53223.C
 create mode 100644 gcc/testsuite/g++.dg/lookup/pr21802.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6190f4e..3487d77 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6513,6 +6513,7 @@ extern tree build_min (enum 
tree_code, tree, ...);
 extern tree build_min_nt_loc   (location_t, enum tree_code,
 ...);
 extern tree build_min_non_dep  (enum tree_code, tree, ...);
+extern tree build_min_non_dep_op_overload  (enum tree_code, tree, tree, 
...);
 extern tree build_min_non_dep_call_vec (tree, tree, vec 
*);
 extern tree build_cplus_new(tree, tree, tsubst_flags_t);
 extern tree build_aggr_init_expr   (tree, tree);
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 5dad0a7..2635736 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -2744,6 +2744,70 @@ build_min_non_dep_call_vec (tree non_dep, tree fn, 
vec *argvec)
   return convert_from_reference (t);
 }
 
+/* Similar to build_min_non_dep, but for expressions that have been resolved to
+   a call to an operator overload.  OP is the operator that has been
+   overloaded.  NON_DEP is the non-dependent expression that's been built,
+   which should be a CALL_EXPR or an INDIRECT_REF to a CALL_EXPR.  OVERLOAD is
+   the overload that NON_DEP is calling.  */
+
+tree
+build_min_non_dep_op_overload (enum tree_code op,
+  tree non_dep,
+  tree overload, ...)
+{
+  va_list p;
+  int nargs;
+  tree fn, call;
+  vec *args;
+
+  if (REFERENCE_REF_P (non_dep))
+non_dep = TREE_OPERAND (non_dep, 0);
+
+  nargs = call_expr_nargs (non_dep);
+
+  if (op == PREINCREMENT_EXPR
+  || op == PREDECREMENT_EXPR)
+gcc_assert (nargs == 1);
+  else if (op == MODOP_EXPR)
+gcc_assert (nargs == 2);
+  else
+gcc_assert (nargs == TREE_CODE_LENGTH (op));
+
+  args = make_tree_vector ();
+  va_start (p, overload);
+
+  if (TREE_CODE (TREE_TYPE (overload)) == FUNCTION_TYPE)
+{
+  fn = overload;
+  for (int i = 0; i < nargs; i++)
+   {
+ tree arg = va_arg (p, tree);
+ vec_safe_push (args, arg);
+   }
+}
+  else if (TREE_CODE (TREE_TYPE (overload)) == METHOD_TYPE)
+{
+  tree object = va_arg (p, tree);
+  tree binfo = TYPE_BINFO (TREE_TYPE (object));
+  tree meth

Re: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets