date:20150622

Fix ao initialization in ipa-polymorphic-call

2015-06-22 Thread Jan Hubicka

Hi,
this patch fixes thinko when initializing ao oracle in
ipa_polymorphic_call_context::get_dynamic_type.  It took get_deref_alias_set of 
vptr type
instead of get_alias_set that now makes difference because pointer types are 
different.

Bootstrapped/regtested x86_64-linux, comitted.

Honza
PR ipa/66351
* ipa-polymorphic-call.c
(ipa_polymorphic_call_context::get_dynamic_type): Fix thinko when
initializing alias oracle; fix formating; set base_alias_set if it
is known.
Index: ipa-polymorphic-call.c
===
--- ipa-polymorphic-call.c  (revision 224713)
+++ ipa-polymorphic-call.c  (working copy)
@@ -1574,13 +1574,15 @@ ipa_polymorphic_call_context::get_dynami
  tree base_ref = get_ref_base_and_extent
   (ref_exp, &offset2, &size, &max_size);
 
- /* Finally verify that what we found looks like read from 
OTR_OBJECT
-or from INSTANCE with offset OFFSET.  */
+ /* Finally verify that what we found looks like read from
+OTR_OBJECT or from INSTANCE with offset OFFSET.  */
  if (base_ref
  && ((TREE_CODE (base_ref) == MEM_REF
   && ((offset2 == instance_offset
&& TREE_OPERAND (base_ref, 0) == instance)
-  || (!offset2 && TREE_OPERAND (base_ref, 0) == 
otr_object)))
+  || (!offset2
+  && TREE_OPERAND (base_ref, 0)
+ == otr_object)))
  || (DECL_P (instance) && base_ref == instance
  && offset2 == instance_offset)))
{
@@ -1608,9 +1610,17 @@ ipa_polymorphic_call_context::get_dynami
   /* We look for vtbl pointer read.  */
   ao.size = POINTER_SIZE;
   ao.max_size = ao.size;
+  /* We are looking for stores to vptr pointer within the instance of
+ outer type.
+ TODO: The vptr pointer type is globally known, we probably should
+ keep it and do that even when otr_type is unknown.  */
   if (otr_type)
-ao.ref_alias_set
-  = get_deref_alias_set (TREE_TYPE (BINFO_VTABLE (TYPE_BINFO (otr_type;
+{
+  ao.base_alias_set
+   = get_alias_set (outer_type ? outer_type : otr_type);
+  ao.ref_alias_set
+= get_alias_set (TREE_TYPE (BINFO_VTABLE (TYPE_BINFO (otr_type;
+}
 
   if (dump_file)
 {

Re: Fix more of C/fortran canonical type issues

2015-06-22 Thread Jan Hubicka

> > On Mon, 8 Jun 2015, Jan Hubicka wrote:
> > 
> > > > 
> > > > I think we should instead work towards eliminating the get_alias_set
> > > > langhook first.  The LTO langhook variant contains the same handling, 
> > > > btw,
> > > > so just inline that into get_alias_set and see what remains?
> > > 
> > > I see, i completely missed existence of gimple_get_alias_set. It makes 
> > > more
> > > sense now.
> > > 
> > > Is moving everyting to alias.c realy a desirable thing? If non-C 
> > > languages do
> > > not have this rule, why we want to reduce the code quality when compiling
> > > those?
> > 
> > Well, for consistency and for getting rid of one langhook ;)
> :)
> In a way this particular langhook makes sense to me - TBAA rules are language 
> specific.
> We also may with explicit streaming of the TBAA dag, like LLVM does.
> 
> Anyway, this is the updated patch fixing the Fortran's interoperability with
> size_t and signed char.  I will send separate patch for the extra lto-symtab
> warnings shortly.
> 
> I will be happy looking into the TYPE_CANONICAL (int) to be different from
> TYPE_CANONICAL (unsigned int) if that seems desirable. There are two things 
> that
> needs to be solved - hash_canonical_type/gimple_canonical_types_compatible_p 
> can't
> use TYPE_CNAONICAL of subtypes in all cases (that is easy) and we will need 
> some
> way to recognize the conflict in lto-symtab other thanjust comparing 
> TYPE_CANONICAL
> to not warn when a variable is declared signed in Fortran unit and unsigned 
> in C.
> 
> Bootstrapped/regtested ppc64le-linux.
> 
>   * lto/lto.c (hash_canonical_type): Do not hash TYPE_UNSIGNED
>   of INTEGER_TYPE.
>   * tree.c (gimple_canonical_types_compatible_p): Do not compare 
> TYPE_UNSIGNED
>   of INTEGER_TYPE.
>   * gimple-expr.c (useless_type_conversion_p): Move INTEGER type handling
>   ahead the canonical type lookup.
> 
>   * gfortran.dg/lto/bind_c-2_0.f90: New testcase
>   * gfortran.dg/lto/bind_c-2_1.c: New testcase
>   * gfortran.dg/lto/bind_c-3_0.f90: New testcase
>   * gfortran.dg/lto/bind_c-3_1.c: New testcase
>   * gfortran.dg/lto/bind_c-4_0.f90: New testcase
>   * gfortran.dg/lto/bind_c-4_1.c: New testcase

Hi,
I would like to ping this.  There are still few things to fix to make our
merging compliant at least for C/C++/Fortran rules (the array bounds for
Fortran and union ordering for C I believe) and I would like to progress
on this.

Honza

Re: [PATCH] Combine related fail of gcc.target/powerpc/ti_math1.c

2015-06-22 Thread Eric Botcazou

> This patch fixes
> FAIL: gcc.target/powerpc/ti_math1.c scan-assembler-times adde 1
> a failure caused by combine simplifying this i2src
> 
> (plus:DI (plus:DI (reg:DI 165 [ val+8 ])
> (reg:DI 169 [+8 ]))
> (reg:DI 76 ca))
> 
> to this
> 
> (plus:DI (plus:DI (reg:DI 76 ca)
> (reg:DI 165 [ val+8 ]))
> (reg:DI 169 [+8 ]))
> 
> which no longer matches rs6000.md adddi3_carry_in_internal.  See
> https://gcc.gnu.org/ml/gcc/2015-05/msg00206.html for related
> discussion.  Bootstrapped and regression tested powerpc64le-linux,
> powerpc64-linux and x86_64-linux.  OK to apply mainline?
> 
>   * rtlanal.c (commutative_operand_precedence): Correct comments.
>   * simplify-rtx.c (simplify_plus_minus_op_data_cmp): Delete forward
>   declaration.  Return an int.  Distinguish REG,REG return from
>   others.
>   (struct simplify_plus_minus_op_data): Make local to function.
>   (simplify_plus_minus): Rename canonicalized to not_canonical.
>   Don't set not_canonical if merely sorting registers.  Avoid
>   packing ops if nothing changes.  White space fixes.

OK in principle, but...

> Some notes: Renaming canonicalized to not_canonical better reflects
> its usage.  At the time the var is set, the expression hasn't been
> canonicalized.

I'm quite skeptical, in particular given:

+ /* Just swapping registers doesn't count as canonicalization.  */
+ if (cmp != 1)
+   not_canonical = 1;

and

+  /* If nothing changed, fail.  */
+  if (!not_canonical)
+return NULL_RTX;

Both are rather confusing now so the renaming isn't really a progress IMO.

-- 
Eric Botcazou

Re: [C++/58583] ICE instantiating NSDMIs

2015-06-22 Thread Andreas Schwab

Nathan Sidwell  writes:

> On 06/20/15 02:09, Andreas Schwab wrote:
>> This also fails on powerpc.
>
> what  is the build compiler?

It is a bootstrapped build, so the build compiler should not matter.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries


Hi,

during development of a patch I ran into a case where 
compute_dominance_frontiers was called with incorrect dominance info.


The result was a segmentation violation somewhere in the bitmap code 
while executing this bitmap_set_bit in compute_dominance_frontiers_1:

...
  if (!bitmap_set_bit (&frontiers[runner->index],
   b->index))
break;
...

The segmentation violation happens because runner->index is 0, and 
frontiers[0] is uninitialized.


[ The initialization in update_ssa looks like this:
...
 dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
  FOR_EACH_BB_FN (bb, cfun)
bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
  compute_dominance_frontiers (dfs);
...

FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0] 
(frontiers[0] in compute_dominance_frontiers_1) is not initialized.


We could add initialization by making the entry/exit-block bitmap_heads 
empty and setting the obstack to a reserved obstack bitmap_no_obstack 
for which allocation results in an assert. ]


AFAIU, the immediate problem is not that frontiers[0] is uninitialized, 
but that the loop reaches the state of runner->index == 0, due to the 
incorrect dominance info.


The patch adds an assert to the loop in compute_dominance_frontiers_1, 
to make the failure mode cleaner and easier to understand.


I think we wouldn't catch all errors in dominance info with this assert. 
So the patch also contains an ENABLE_CHECKING-enabled verify_dominators 
call at the start of compute_dominance_frontiers. I'm not sure if:

- adding the verify_dominators call is too costly in runtime.
- the verify_dominators call should be inside or outside the
  TV_DOM_FRONTIERS measurement.
- there is a level of ENABLE_CHECKING that is more appropriate for the
  verify_dominators call.

Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?

Thanks,
- Tom
Check dominator info in compute_dominance_frontiers

2015-06-22  Tom de Vries  

	* cfganal.c (compute_dominance_frontiers_1): Add assert.
	(compute_dominance_frontiers): Verify dominators if ENABLE_CHECKING.
---
 gcc/cfganal.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index b8d67bc..0e0e2bb 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -1261,6 +1261,11 @@ compute_dominance_frontiers_1 (bitmap_head *frontiers)
 	  domsb = get_immediate_dominator (CDI_DOMINATORS, b);
 	  while (runner != domsb)
 		{
+		  /* If you're running into this assert, the dominator info is
+		 incorrect.  Try enabling the verify_dominators call at the
+		 start of compute_dominance_frontiers.  */
+		  gcc_assert (runner != ENTRY_BLOCK_PTR_FOR_FN (cfun));
+
 		  if (!bitmap_set_bit (&frontiers[runner->index],
    b->index))
 		break;
@@ -1276,6 +1281,10 @@ compute_dominance_frontiers_1 (bitmap_head *frontiers)
 void
 compute_dominance_frontiers (bitmap_head *frontiers)
 {
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+
   timevar_push (TV_DOM_FRONTIERS);
 
   compute_dominance_frontiers_1 (frontiers);
-- 
1.9.1

[gomp4] dominance info after predicate_omp_regions

2015-06-22 Thread Tom de Vries


On 21/05/15 13:42, ber...@gcc.gnu.org wrote:

Author: bernds
Date: Thu May 21 11:42:14 2015
New Revision: 223478

URL: https://gcc.gnu.org/viewcvs?rev=223478&root=gcc&view=rev
Log:
* omp-low.c (struct omp_region): Add a gwv_this field.
(bb_region_map): New variable.
(find_omp_for_region_data, find_omp_target_region_data): New static
functions.
(build_omp_regions_1): Call them.  Build the bb_region_map.
(enclosing_target_region, requires_vector_predicate,
generate_vector_broadcast, predicate_bb, find_predicatable_bbs,
predicate_omp_regions): New static functions.
(execute_expand_omp): Allocate and free bb_region_map.

Modified:
 branches/gomp-4_0-branch/gcc/ChangeLog.gomp
 branches/gomp-4_0-branch/gcc/omp-low.c



Hi Bernd,

I ran into trouble with invalid dominance info, AFAIU because 
predicate_omp_regions invalidates the dominance info. For now I'm using 
this workaround:

...
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f7e13d3..5601cff 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11268,6 +11268,8 @@ execute_expand_omp (void)
}

   predicate_omp_regions (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  free_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_DOMINATORS);

   remove_exit_barriers (root_omp_region);

...

Thanks,
- Tom

[PATCH, i386]: Ignore the cost of embedded comparison (PR 65871)

2015-06-22 Thread Uros Bizjak

Hello!

As shown in the PR [1], RTX costs can reject combination of the
operation and its embedded comparison. Attached patch fixes this by
ignoring the cost of embedded comparison.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65871#c10

2015-06-22  Uros Bizjak  

PR target/65871
* config/i386/i386.c (ix86_rtx_costs) : Ignore the
cost of embedded comparison.

Bootstrapped on x86_64-linux-gnu, regtest in progress.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 224718)
+++ config/i386/i386.c  (working copy)
@@ -42531,6 +42531,12 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_
+ rtx_cost (const1_rtx, outer_code, opno, speed));
  return true;
}
+
+  /* The embedded comparison operand is completely free.  */
+  if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
+ && XEXP (x, 1) == const0_rtx)
+   *total = 0;
+
   return false;
 
 case FLOAT_EXTEND:

[PATCH 1/3][ARM][PR target/65697] Strengthen memory barriers for __sync builtins

2015-06-22 Thread Matthew Wahab


This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

The problem is that the barriers generated for the __sync builtins for ARMv8
targets are too weak. This affects the full and the acquire barriers in the
__sync fetch-and-op, compare-and-swap functions and __sync_lock_test_and_set.

This patch series changes the code to strengthen the barriers by replacing
initial load-acquires with a simple load and adding a final memory barrier to
prevent code hoisting.

- Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
  __sync_*_compare_and_swap

  [load-acquire; code; store-release]
  becomes
  [load; code ; store-release; barrier].

- Acquire barriers:  __sync_lock_test_and_set

  [load-acquire; code; store]
  becomes
  [load; code; store; barrier]

This patch changes the code generated for __sync_fetch_and_ and
__sync__and_fetch builtins.

Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

PR Target/65697
* config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
initial acquire barrier with a final full barrier.
From 3e9f71c04dba20ba66b5c9bae284fcac5fdd91ec Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 22 May 2015 13:31:58 +0100
Subject: [PATCH 1/3] [ARM] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I18f5af5ba4b2e74b5866009d3a090e251eff4a45
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e79a369..94118f4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27668,6 +27668,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   rtx_code_label *label;
   rtx x;
 
+  bool is_armv8_sync = arm_arch8 && is_mm_sync (model);
+
   bool use_acquire = TARGET_HAVE_LDACQ
  && !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_release (model));
@@ -27676,6 +27678,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
  && !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_acquire (model));
 
+  /* For ARMv8, a load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (model);
@@ -27746,7 +27753,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label));
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (model);
 }
 
-- 
1.9.1

[PATCH 2/3][ARM][PR target/65697] Strengthen barriers for compare-and-swap builtin.

2015-06-22 Thread Matthew Wahab


This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

This patch changes the code generated for __sync_type_compare_and_swap to remove
the acquire-barrier from the load and end the operation with a fence. This also
strengthens the acquire barrier generated for __sync_lock_test_and_set which,
like compare-and-swap, is implemented as a form of atomic exchange.

Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

PR Target/65697
* config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an
initial acquire barrier with a final full barrier.

From ddb9a45acda7bb64d91c446bc40afe4b78fcc1e1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 22 May 2015 13:36:39 +0100
Subject: [PATCH 2/3] [ARM] Strengthen barriers for compare-and-swap builtin.

Change-Id: I43381b2ea88492f807d85a73d233369334c99881
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 94118f4..4610ff6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27603,6 +27603,8 @@ arm_split_compare_and_swap (rtx operands[])
   scratch = operands[7];
   mode = GET_MODE (mem);
 
+  bool is_armv8_sync = arm_arch8 && is_mm_sync (mod_s);
+
   bool use_acquire = TARGET_HAVE_LDACQ
  && !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_release (mod_s));
@@ -27611,6 +27613,11 @@ arm_split_compare_and_swap (rtx operands[])
  && !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_acquire (mod_s));
 
+  /* For ARMv8, the load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (mod_s);
@@ -27651,7 +27658,8 @@ arm_split_compare_and_swap (rtx operands[])
 emit_label (label2);
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (mod_s);
 
   if (is_mm_relaxed (mod_f))
-- 
1.9.1

[PATCH 3/3][ARM][PR target/65697] Add tests for __sync builtins.

2015-06-22 Thread Matthew Wahab


This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

This patch adds tests for the code generated by the ARM backend for the __sync
builtins.

Tested the series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/testsuite
2015-06-22  Matthew Wahab  

PR Target/65697
* gcc.target/arm/armv8-sync-comp-swap.c: New.
* gcc.target/arm/armv8-sync-op-acquire.c: New.
* gcc.target/arm/armv8-sync-op-full.c: New.
* gcc.target/arm/armv8-sync-op-release.c: New.

From 8157c7480a9d6d559013d02e24519d1b7ba1ed5b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 3 Jun 2015 16:27:55 +0100
Subject: [PATCH 3/3] [ARM] Add test cases.

Change-Id: I0f2257ce5b5e7f9d0f75e57e6be22fd9733ed3ca
---
 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c  | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c |  8 
 4 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
new file mode 100644
index 000..f96c81a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 2 } } */
+/* { dg-final { scan-assembler-times "stlex" 2 } } */
+/* { dg-final { scan-assembler-times "dmb" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
new file mode 100644
index 000..8d6659b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 1 } } */
+/* { dg-final { scan-assembler-times "stlex" 1 } } */
+/* { dg-final { scan-assembler-times "dmb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
new file mode 100644
index 000..a5ad3bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 12 } } */
+/* { dg-final { scan-assembler-times "stlex" 12 } } */
+/* { dg-final { scan-assembler-times "dmb" 12 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
new file mode 100644
index 000..0d3be7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-release.x"
+
+/* { dg-final { scan-assembler-times "stl" 1 } } */
-- 
1.9.1

[gomp4, committed] Handle reduction in oacc kernels region

2015-06-22 Thread Tom de Vries


Hi,

attached patch handles reductions in oacc kernels region.

The approach uses the normal parloops reduction handling code, with 
these modifications:


1.

For each reduction, we look for this pattern in the oacc-lowered code, 
and store 'addr' in the corresponding struct reduction_info:

...
 
   .omp_data_i = &.omp_data_arr;
   addr = .omp_data_i->sum;
   sum_a = *addr;

 :
   sum_b = PHI 
...

2.

We replaces the non-atomic store to 'addr' at the end of the kernels 
region with an atomic one.



Bootstrapped and reg-tested on x86_64 on top of gomp-4_0-branch.

Committed to gomp-4_0-branch.

Thanks,
- Tom

Handle reduction in oacc kernels region

2015-06-18  Tom de Vries  

	* tree-parloops.c (struct reduction_info): Add reduc_addr field.
	(create_call_for_reduction_1): Handle case that reduc_addr is non-NULL.
	(gen_parallel_loop): Init clsn_data for oacc_kernels_p case.
	(try_create_reduction_list): Add and handle oacc_kernels_p parameter.
	(parallelize_loops): Add argument to call to try_create_reduction_list.

	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c: New test.

	* c-c++-common/goacc/kernels-reduction.c: New test.
---
 .../c-c++-common/goacc/kernels-reduction.c | 38 +
 gcc/tree-parloops.c| 92 --
 .../libgomp.oacc-c-c++-common/kernels-reduction.c  | 37 +
 3 files changed, 162 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
new file mode 100644
index 000..bfbcdbd
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -0,0 +1,38 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include 
+
+#define n 1
+
+unsigned int a[n];
+
+void  __attribute__((noinline,noclone))
+foo (void)
+{
+  int i;
+  unsigned int sum = 1;
+
+#pragma acc kernels copyin (a[0:n]) copy (sum)
+  {
+for (i = 0; i < n; ++i)
+  sum += a[i];
+  }
+
+  if (sum != 5001)
+abort ();
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 0661b78..c5f4d9a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -218,6 +218,8 @@ struct reduction_info
    of the reduction variable when existing the loop. */
   tree initial_value;		/* The initial value of the reduction var before entering the loop.  */
   tree field;			/*  the name of the field in the parloop data structure intended for reduction.  */
+  tree reduc_addr;		/* The address of the reduction variable for
+   openacc reductions.  */
   tree init;			/* reduction initialization value.  */
   gphi *new_phi;		/* (helper field) Newly created phi node whose result
    will be passed to the atomic operation.  Represents
@@ -1107,10 +1109,30 @@ create_call_for_reduction_1 (reduction_info **slot, struct clsn_data *clsn_data)
   tree tmp_load, name;
   gimple load;
 
-  load_struct = build_simple_mem_ref (clsn_data->load);
-  t = build3 (COMPONENT_REF, type, load_struct, reduc->field, NULL_TREE);
+  if (reduc->reduc_addr == NULL_TREE)
+{
+  load_struct = build_simple_mem_ref (clsn_data->load);
+  t = build3 (COMPONENT_REF, type, load_struct, reduc->field, NULL_TREE);
+
+  addr = build_addr (t, current_function_decl);
+}
+  else
+{
+  /* Set the address for the atomic store.  */
+  addr = reduc->reduc_addr;
+
+  /* Remove the non-atomic store '*addr = sum'.  */
+  tree res = PHI_RESULT (reduc->keep_res);
+  use_operand_p use_p;
+  gimple stmt;
+  bool single_use_p = single_imm_use (res, &use_p, &stmt);
+  gcc_assert (single_use_p);
+  replace_uses_by (gimple_vdef (stmt),
+		   gimple_vuse (stmt));
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gsi_remove (&gsi, true);
+}
 
-  addr = build_addr (t, current_function_decl);
 
   /* Create phi node.  */
   bb = clsn_data->load_bb;
@@ -2441,6 +2463,10 @@ gen_parallel_loop (struct loop *loop,
 {

Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Richard Biener

On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries  wrote:
> Hi,
>
> during development of a patch I ran into a case where
> compute_dominance_frontiers was called with incorrect dominance info.
>
> The result was a segmentation violation somewhere in the bitmap code while
> executing this bitmap_set_bit in compute_dominance_frontiers_1:
> ...
>   if (!bitmap_set_bit (&frontiers[runner->index],
>b->index))
> break;
> ...
>
> The segmentation violation happens because runner->index is 0, and
> frontiers[0] is uninitialized.
>
> [ The initialization in update_ssa looks like this:
> ...
>  dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
>   FOR_EACH_BB_FN (bb, cfun)
> bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
>   compute_dominance_frontiers (dfs);
> ...
>
> FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
> (frontiers[0] in compute_dominance_frontiers_1) is not initialized.
>
> We could add initialization by making the entry/exit-block bitmap_heads
> empty and setting the obstack to a reserved obstack bitmap_no_obstack for
> which allocation results in an assert. ]
>
> AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but
> that the loop reaches the state of runner->index == 0, due to the incorrect
> dominance info.
>
> The patch adds an assert to the loop in compute_dominance_frontiers_1, to
> make the failure mode cleaner and easier to understand.
>
> I think we wouldn't catch all errors in dominance info with this assert. So
> the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at
> the start of compute_dominance_frontiers. I'm not sure if:
> - adding the verify_dominators call is too costly in runtime.
> - the verify_dominators call should be inside or outside the
>   TV_DOM_FRONTIERS measurement.
> - there is a level of ENABLE_CHECKING that is more appropriate for the
>   verify_dominators call.
>
> Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?

I don't think these kind of asserts are good.  A segfault is good by itself
(so you can just add the comment if you like).

Likewise the verify_dominators call is too expensive and misplaced.

If then the call belongs in the dom_computed[] == DOM_OK early-out
in calculate_dominance_info (eventually also for the case where we
end up only computing the fast-query stuff).

Richard.

> Thanks,
> - Tom

[PATCH] Avoid computing scalar iteration cost multiple times

2015-06-22 Thread Richard Biener


This avoids doing $subject in the vectorizer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-06-22  Richard Biener  

* tree-vectorizer.h (_loop_vec_info): Add scalar_cost_vec
and single_scalar_iteration_cost members.
(LOOP_VINFO_SCALAR_ITERATION_COST): New.
(LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Likewise.
(vect_get_single_scalar_iteration_cost): Remove.
* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Use LOOP_VINFO_SCALAR_ITERATION_COST.
* tree-vect-loop.c (destroy_loop_vec_info): Free
scalar_cost_vec.
(vect_get_single_scalar_iteration_cost): Compute result into
LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST and
LOOP_VINFO_SCALAR_ITERATION_COST.  Make static.
(vect_analyze_loop_2): Call vect_get_single_scalar_iteration_cost.
(vect_estimate_min_profitable_iters): Use them.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 224603)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -1165,11 +1165,10 @@ vect_peeling_hash_get_lowest_cost (_vect
   SET_DR_MISALIGNMENT (dr, save_misalignment);
 }
 
-  auto_vec scalar_cost_vec;
-  vect_get_single_scalar_iteration_cost (loop_vinfo, &scalar_cost_vec);
   outside_cost += vect_get_known_peeling_cost
 (loop_vinfo, elem->npeel, &dummy,
- &scalar_cost_vec, &prologue_cost_vec, &epilogue_cost_vec);
+ &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+ &prologue_cost_vec, &epilogue_cost_vec);
 
   /* Prologue and epilogue costs are added to the target model later.
  These costs depend only on the scalar iteration cost, the
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 224603)
+++ gcc/tree-vect-loop.c(working copy)
@@ -1095,12 +1095,82 @@ destroy_loop_vec_info (loop_vec_info loo
   LOOP_VINFO_PEELING_HTAB (loop_vinfo) = NULL;
 
   destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
+  loop_vinfo->scalar_cost_vec.release ();
 
   free (loop_vinfo);
   loop->aux = NULL;
 }
 
 
+/* Calculate the cost of one scalar iteration of the loop.  */
+static void
+vect_get_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+  int nbbs = loop->num_nodes, factor, scalar_single_iter_cost = 0;
+  int innerloop_iters, i;
+
+  /* Count statements in scalar loop.  Using this as scalar cost for a single
+ iteration for now.
+
+ TODO: Add outer loop support.
+
+ TODO: Consider assigning different costs to different scalar
+ statements.  */
+
+  /* FORNOW.  */
+  innerloop_iters = 1;
+  if (loop->inner)
+innerloop_iters = 50; /* FIXME */
+
+  for (i = 0; i < nbbs; i++)
+{
+  gimple_stmt_iterator si;
+  basic_block bb = bbs[i];
+
+  if (bb->loop_father == loop->inner)
+factor = innerloop_iters;
+  else
+factor = 1;
+
+  for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+{
+  gimple stmt = gsi_stmt (si);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+
+  if (!is_gimple_assign (stmt) && !is_gimple_call (stmt))
+continue;
+
+  /* Skip stmts that are not vectorized inside the loop.  */
+  if (stmt_info
+  && !STMT_VINFO_RELEVANT_P (stmt_info)
+  && (!STMT_VINFO_LIVE_P (stmt_info)
+  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)))
+ && !STMT_VINFO_IN_PATTERN_P (stmt_info))
+continue;
+
+ vect_cost_for_stmt kind;
+  if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))
+{
+  if (DR_IS_READ (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt
+   kind = scalar_load;
+ else
+   kind = scalar_store;
+}
+  else
+kind = scalar_stmt;
+
+ scalar_single_iter_cost
+   += record_stmt_cost (&LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+factor, kind, NULL, 0, vect_prologue);
+}
+}
+  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo)
+= scalar_single_iter_cost;
+}
+
+
 /* Function vect_analyze_loop_1.
 
Apply a set of analyses on LOOP, and create a loop_vec_info struct
@@ -1834,6 +1904,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
   return false;
 }
 
+  /* Compute the scalar iteration cost.  */
+  vect_get_single_scalar_iteration_cost (loop_vinfo);
+
   /* This pass will decide on using loop versioning and/or loop peeling in
  order to enhance the alignment of data references in the loop.  */
 
@@ -2706,74 +2779,6 @@ vect_force_simple_reduction (loop_vec_in
 double_reduc, true);
 }
 
-/* Calculate the

Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries


On 22/06/15 12:14, Richard Biener wrote:

On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries  wrote:

Hi,

during development of a patch I ran into a case where
compute_dominance_frontiers was called with incorrect dominance info.

The result was a segmentation violation somewhere in the bitmap code while
executing this bitmap_set_bit in compute_dominance_frontiers_1:
...
   if (!bitmap_set_bit (&frontiers[runner->index],
b->index))
 break;
...

The segmentation violation happens because runner->index is 0, and
frontiers[0] is uninitialized.

[ The initialization in update_ssa looks like this:
...
  dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
   FOR_EACH_BB_FN (bb, cfun)
 bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
   compute_dominance_frontiers (dfs);
...

FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
(frontiers[0] in compute_dominance_frontiers_1) is not initialized.

We could add initialization by making the entry/exit-block bitmap_heads
empty and setting the obstack to a reserved obstack bitmap_no_obstack for
which allocation results in an assert. ]

AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but
that the loop reaches the state of runner->index == 0, due to the incorrect
dominance info.

The patch adds an assert to the loop in compute_dominance_frontiers_1, to
make the failure mode cleaner and easier to understand.

I think we wouldn't catch all errors in dominance info with this assert. So
the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at
the start of compute_dominance_frontiers. I'm not sure if:
- adding the verify_dominators call is too costly in runtime.
- the verify_dominators call should be inside or outside the
   TV_DOM_FRONTIERS measurement.
- there is a level of ENABLE_CHECKING that is more appropriate for the
   verify_dominators call.

Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?


I don't think these kind of asserts are good.  A segfault is good by itself
(so you can just add the comment if you like).



The segfault is not guaranteed to trigger, because it works on 
uninitialized data. Instead, we may end up modifying valid memory and 
silently generating wrong code or causing sigsegvs (which will be 
difficult to track back this error). So I don't think doing nothing is 
an option here. If we're not going to add this assert, we should 
initialize the uninitialized data in such a way that we are guaranteed 
to detect the error. The scheme I proposed above would take care of 
that. Should I implement that instead?



Likewise the verify_dominators call is too expensive and misplaced.

If then the call belongs in the dom_computed[] == DOM_OK early-out
in calculate_dominance_info


OK, like this:
...
diff --git a/gcc/dominance.c b/gcc/dominance.c
index a9e042e..1827eda9 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
   bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;

   if (dom_computed[dir_index] == DOM_OK)
-return;
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+  return;
+}

   timevar_push (TV_DOMINANCE);
   if (!dom_info_available_p (dir))
...

I didn't fully understand your comment, do you want me to test this?

Thanks,
- Tom


(eventually also for the case where we
end up only computing the fast-query stuff).

Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Richard Biener

On Mon, Jun 22, 2015 at 1:33 PM, Tom de Vries  wrote:
> On 22/06/15 12:14, Richard Biener wrote:
>>
>> On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries 
>> wrote:
>>>
>>> Hi,
>>>
>>> during development of a patch I ran into a case where
>>> compute_dominance_frontiers was called with incorrect dominance info.
>>>
>>> The result was a segmentation violation somewhere in the bitmap code
>>> while
>>> executing this bitmap_set_bit in compute_dominance_frontiers_1:
>>> ...
>>>if (!bitmap_set_bit (&frontiers[runner->index],
>>> b->index))
>>>  break;
>>> ...
>>>
>>> The segmentation violation happens because runner->index is 0, and
>>> frontiers[0] is uninitialized.
>>>
>>> [ The initialization in update_ssa looks like this:
>>> ...
>>>   dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
>>>FOR_EACH_BB_FN (bb, cfun)
>>>  bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
>>>compute_dominance_frontiers (dfs);
>>> ...
>>>
>>> FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
>>> (frontiers[0] in compute_dominance_frontiers_1) is not initialized.
>>>
>>> We could add initialization by making the entry/exit-block bitmap_heads
>>> empty and setting the obstack to a reserved obstack bitmap_no_obstack for
>>> which allocation results in an assert. ]
>>>
>>> AFAIU, the immediate problem is not that frontiers[0] is uninitialized,
>>> but
>>> that the loop reaches the state of runner->index == 0, due to the
>>> incorrect
>>> dominance info.
>>>
>>> The patch adds an assert to the loop in compute_dominance_frontiers_1, to
>>> make the failure mode cleaner and easier to understand.
>>>
>>> I think we wouldn't catch all errors in dominance info with this assert.
>>> So
>>> the patch also contains an ENABLE_CHECKING-enabled verify_dominators call
>>> at
>>> the start of compute_dominance_frontiers. I'm not sure if:
>>> - adding the verify_dominators call is too costly in runtime.
>>> - the verify_dominators call should be inside or outside the
>>>TV_DOM_FRONTIERS measurement.
>>> - there is a level of ENABLE_CHECKING that is more appropriate for the
>>>verify_dominators call.
>>>
>>> Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?
>>
>>
>> I don't think these kind of asserts are good.  A segfault is good by
>> itself
>> (so you can just add the comment if you like).
>>
>
> The segfault is not guaranteed to trigger, because it works on uninitialized
> data. Instead, we may end up modifying valid memory and silently generating
> wrong code or causing sigsegvs (which will be difficult to track back this
> error). So I don't think doing nothing is an option here. If we're not going
> to add this assert, we should initialize the uninitialized data in such a
> way that we are guaranteed to detect the error. The scheme I proposed above
> would take care of that. Should I implement that instead?

No, instead the check below should catch the error much earlier.

>> Likewise the verify_dominators call is too expensive and misplaced.
>>
>> If then the call belongs in the dom_computed[] == DOM_OK early-out
>> in calculate_dominance_info
>
>
> OK, like this:
> ...
> diff --git a/gcc/dominance.c b/gcc/dominance.c
> index a9e042e..1827eda9 100644
> --- a/gcc/dominance.c
> +++ b/gcc/dominance.c
> @@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
>bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;
>
>if (dom_computed[dir_index] == DOM_OK)
> -return;
> +{
> +#if ENABLE_CHECKING
> +  verify_dominators (CDI_DOMINATORS);
> +#endif
> +  return;
> +}
>
>timevar_push (TV_DOMINANCE);
>if (!dom_info_available_p (dir))
> ...

Yes.

> I didn't fully understand your comment, do you want me to test this?

Sure, it should catch the error.

Richard.

> Thanks,
> - Tom
>
>
>> (eventually also for the case where we
>> end up only computing the fast-query stuff).
>
>

[Patch, fortran] PR52846 - [F2008] Support submodules

2015-06-22 Thread Paul Richard Thomas

Dear All,

This patch enables submodule support in gfortran. Submodules are a
feature of F2008 but are fully described in ISO/IEC TR 19767:2004(E).

The patch has one significant non-conformance (that I know about,
anyway!); whilst private derived type components are correctly dealt
with, symbols whose access is private within the parent module are
not. They should effectively be host associated in descendant
submodules. At present gfortran handles private access at the module
write stage. This means that when a submodule reads the module file,
there is no information present about symbols whose access was
private. Since this modification might cause significant fall-out to
existing code, I propose to submit a separate patch later on to sort
out the non-conformance. However, as required private and public
statements are not allowed in submodules.

The patch makes maximum possible leverage of existing code to handle
modules. Once the submodule is matched, the ancestor module and
submodules are first "used" and then all the symbols are set host
associated and private derived type components set public.

Most of the work involved matching module procedures, with both the
traditional form of declaration and the abbreviated one. I have chosen
to treat MODULE as a prefix like PURE or ELEMENTAL. This is logical
both because of the form of the declaration and because the
identification of module procedures is most easily done with an
attribute bit. With traditional procedure declarations, the procedure,
result and dummy characteristics are compared with those of the
interface declaration. The comparison of the dummy characteristics is
a bit cobbled together and might be better done by copying the
formal_namespace and it's contents to the new symbol and retaining the
old for the interface symbol. This patch leaves the old dummy symbols
in the formal namespace in the new ones in the formal arglist. I have
checked that cleanup occurs for all objects.

Note the comment in submodule_1.f90 about the possibility of
undetected recursion between procedures in different submodules. I am
not at all sure that I know how to deal with this and am open to
suggestions.

In addition, it should be noted that collisions between the names of
entities and procedures, other than module procedures are detected by
the linker at present.

Apart from this, all is very straightforward and follows the the ChangeLogs.

Thanks for testing of an early version of the patch by Damian Rouson,
Salvatore Filippone and Tobias Burnus.

Bootstrapped and regtested on FC21/x86_64 - OK for trunk?

Cheers

Paul

2015-06-22  Paul Thomas  

PR fortran/52846
* decl.c (get_proc_name): Make a partially populated interface
symbol to carry the characteristics of a module procedure and
its result.
(match_attr_spec): Submodule variables have implicit save
attribute for F2008 onwards.
(gfc_match_prefix): Add 'module' as the a prefix and set the
module_procedure attribute.
(gfc_match_formal_arglist): For a module procedure keep the
interface formal_arglist from the interface, match new the
formal arguments and then compare the number and names of each.
(gfc_match_procedure): Add case COMP_SUBMODULE.
(gfc_match_function_decl, gfc_match_subroutine_decl): Set the
module_procedure attribute.
(gfc_match_entry, gfc_match_end):  Add case COMP_SUBMODULE.
(gfc_match_submod_proc): New function to match the abbreviated
style of submodule declaration.
* gfortran.h : Add ST_SUBMODULE and ST_END_SUBMODULE. Add the
attribute bits 'used_in_submodule' and 'module_procedure'. Add
prototypes for the functions 'gfc_check_dummy_characteristics'
and 'gfc_check_result_characteristics'.
* interface.c : Add the prefix 'gfc_' to the names of functions
'check_dummy(result)_characteristics' and all their references.
* match.h : Add prototype for 'gfc_match_submod_proc' and
'gfc_match_submodule'.
* module.c (gfc_match_submodule): New function. Add handling
for the 'module_procedure' attribute bit.
* parse.c (decode_statement): Handle a match occurring in
'gfc_match_submod_proc' and a match for 'submodule'.
(gfc_enclosing_unit): Include the state COMP_SUBMODULE.
(gfc_ascii_statement): Add END SUBMODULE.
(accept_statement): Add ST_SUBMODULE.
(parse_spec): Disallow statement functions in a submodule
specification part.
(parse_contained): Add ST_END_SUBMODULE and COMP_SUBMODULE
twice each.
(set_syms_host_assoc): Make symbols from the ancestor module
and submodules use associated, as required by the standard and
set all private components public. Module procedures 'external'
attribute bit is reset and the 'used_in_submodule' bit is set.
(parse_module): If this is a submodule, use the ancestor module
and submodules. Traverse the namespace, calling
'set_syms_host_assoc'. Add ST_END_SUBMODULE and COMP_SUBMODULE.
* parse.h : Add CO

[i386, PATCH] Support new psABI for IA MCU.

2015-06-22 Thread Kirill Yukhin

Hello,
I am starting (hopefully small) serie of patches to support 
new ABI dedicated for Intel's MicroController Units [1].

Support for new arch was introduced into Binutils in a few threads, e.g. [2].

This patchset includes:
 - Support in GCC: new switch (-miamcu), macro etc.
 - Changes to libraries.
 - Testsuite.

Whole patch is in the bottom.

[1] - https://groups.google.com/forum/#!topic/ia32-abi/cn7TM6J_TIg
[2] - https://sourceware.org/ml/binutils/2015-05/msg00063.html

--
Thanks, K

diff --git a/config/dfp.m4 b/config/dfp.m4
index 48683f0..5b29089 100644
--- a/config/dfp.m4
+++ b/config/dfp.m4
@@ -21,7 +21,7 @@ Valid choices are 'yes', 'bid', 'dpd', and 'no'.]) ;;
 [
   case $1 in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/configure b/configure
index bced9de..82e45f3 100755
--- a/configure
+++ b/configure
@@ -6914,7 +6914,7 @@ case "${enable_target_optspace}:${target}" in
   :d30v-*)
 ospace_frag="config/mt-d30v"
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag="config/mt-ospace"
 ;;
   no:* | :*)
diff --git a/configure.ac b/configure.ac
index 7c06e6b..dc77a1b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2560,7 +2560,7 @@ case "${enable_target_optspace}:${target}" in
   :d30v-*)
 ospace_frag="config/mt-d30v"
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag="config/mt-ospace"
 ;;
   no:* | :*)
diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 0f8c3e1..79b2472 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -223,7 +223,7 @@ along with GCC; see the file COPYING3.  If not see
 
 bool
 ix86_handle_option (struct gcc_options *opts,
-   struct gcc_options *opts_set ATTRIBUTE_UNUSED,
+   struct gcc_options *opts_set,
const struct cl_decoded_option *decoded,
location_t loc)
 {
@@ -232,6 +232,20 @@ ix86_handle_option (struct gcc_options *opts,
 
   switch (code)
 {
+case OPT_miamcu:
+  if (value)
+   {
+ /* Turn off x87/MMX/SSE/AVX codegen for -miamcu.  */
+ opts->x_target_flags &= ~MASK_80387;
+ opts_set->x_target_flags |= MASK_80387;
+ opts->x_ix86_isa_flags &= ~(OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+ opts->x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+
+   }
+  return true;
+
 case OPT_mmmx:
   if (value)
{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 805638d..2b3af82 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1389,6 +1389,9 @@ x86_64-*-darwin*)
tmake_file="${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc"
tm_file="${tm_file} ${cpu_type}/darwin64.h"
;;
+i[34567]86-*-elfiamcu)
+   tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/iamcu.h"
+   ;;
 i[34567]86-*-elf*)
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h"
;;
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 0228f4b..66f7e37 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -426,6 +426,11 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__CLWB__");
   if (isa_flag & OPTION_MASK_ISA_MWAITX)
 def_or_undef (parse_in, "__MWAITX__");
+  if (TARGET_IAMCU)
+{
+  def_or_undef (parse_in, "__iamcu");
+  def_or_undef (parse_in, "__iamcu__");
+}
 }
 
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 24fccfc..26ffa67 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3433,6 +3433,10 @@ ix86_option_override_internal (bool main_args_p,
  || TARGET_16BIT_P (opts->x_ix86_isa_flags))
opts->x_ix86_isa_flags &= ~OPTION_MASK_ABI_X32;
 #endif
+  if (TARGET_64BIT_P (opts->x_ix86_isa_flags)
+ && TARGET_IAMCU_P (opts->x_target_flags))
+   sorry ("Intel MCU psABI isn%'t supported in %s mode",
+  TARGET_X32_P (opts->x_ix86_isa_flags) ? "x32" : "64-bit");
 }
 #endif
 
@@ -3817,6 +3821,20 @@ ix86_option_override_internal (bool main_args_p,
   if (TARGET_X32 && (ix86_isa_flags & OPTION_MASK_ISA_MPX))
 error ("Intel MPX does not support x32");
 
+  if (TARGET_IAMCU_P (opts->x_target_flags))
+{
+  /* Verify that x87/MMX/SSE/AVX is off for -miamcu.  */
+  if (TARGET_80387_P (opts->x_target_flags))
+   sorry ("X87 FPU isn%'t supported in Intel MCU psABI");
+  else if ((opts->x_ix86_isa_flags & (

[C++ Patch] Remove pointless code in grokdeclarator

2015-06-22 Thread Paolo Carlini


Hi,

I think this qualifies as obvious: we reset type_quals to 
TYPE_UNQUALIFIED and then we only use it in a 'if (type_quals != 
TYPE_UNQUALIFIED)' test before returning.


Thanks,
Paolo.

///
2015-06-22  Paolo Carlini  

* decl.c (grokdeclarator): Remove pointless code.
Index: cp/decl.c
===
--- cp/decl.c   (revision 224724)
+++ cp/decl.c   (working copy)
@@ -10476,19 +10477,9 @@ grokdeclarator (const cp_declarator *declarator,
 
   if (decl_context == TYPENAME)
 {
-  /* Note that the grammar rejects storage classes
-in typenames, fields or parameters.  */
-  if (type_quals != TYPE_UNQUALIFIED)
-   type_quals = TYPE_UNQUALIFIED;
-
   /* Special case: "friend class foo" looks like a TYPENAME context.  */
   if (friendp)
{
- if (type_quals != TYPE_UNQUALIFIED)
-   {
- error ("type qualifiers specified for friend class declaration");
- type_quals = TYPE_UNQUALIFIED;
-   }
  if (inlinep)
{
  error ("% specified for friend class declaration");

[C++ Patch] Use declspecs->locations[ds_virtual]

2015-06-22 Thread Paolo Carlini


Hi,

I think this also qualifies as obvious given the past work / discussion: 
use in one more place declspecs->locations to improve the location of 
the error message.


Thanks,
Paolo.


/cp
2015-06-22  Paolo Carlini  

* decl.c (grokdeclarator): Use declspecs->locations[ds_virtual].

/testsuite
2015-06-22  Paolo Carlini  

* g++.dg/inherit/pure1.C: Test location too.
Index: cp/decl.c
===
--- cp/decl.c   (revision 224724)
+++ cp/decl.c   (working copy)
@@ -9529,7 +9529,8 @@ grokdeclarator (const cp_declarator *declarator,
   if (virtualp
   && (current_class_name == NULL_TREE || decl_context != FIELD))
 {
-  error ("% outside class declaration");
+  error_at (declspecs->locations[ds_virtual],
+   "% outside class declaration");
   virtualp = 0;
 }
 
Index: testsuite/g++.dg/inherit/pure1.C
===
--- testsuite/g++.dg/inherit/pure1.C(revision 224724)
+++ testsuite/g++.dg/inherit/pure1.C(working copy)
@@ -3,8 +3,8 @@
 // { dg-do compile }
 
 void foo0() = 0;   // { dg-error "like a variable" }
-virtual void foo1() = 0;   // { dg-error "outside class|variable" }
-
+virtual void foo1() = 0;   // { dg-error "1:'virtual' outside class" }
+// { dg-error "like a variable" "" { target *-*-* } 6 }
 struct A
 {
   void foo2() = 0; // { dg-error "non-virtual" }

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Julian Brown

On Fri, 19 Jun 2015 14:25:57 +0200
Jakub Jelinek  wrote:

> On Fri, Jun 19, 2015 at 11:53:14AM +0200, Bernd Schmidt wrote:
> > On 05/28/2015 05:08 PM, Jakub Jelinek wrote:
> > 
> > >I understand it is more work, I'd just like to ask that when
> > >designing stuff for the OpenACC offloading you (plural) try to
> > >take the other offloading devices and host fallback into account.
> > 
> > The problem is that many of the transformations we need to do are
> > really GPU specific, and with the current structure of
> > omplow/ompexp they are being done in the host compiler. The
> > offloading scheme we decided on does not give us the means to write
> > out multiple versions of an offloaded function where each target
> > gets a different one. For that reason I think we should postpone
> > these lowering decisions until we're in the accel compiler, where
> > they could be controlled by target hooks, and over the last two
> > weeks I've been doing some experiments to see how that could be
> > achieved.

> I wonder why struct loop flags and other info together with function
> attributes and/or cgraph flags and other info aren't sufficient for
> the OpenACC needs.
> Have you or Thomas looked what we're doing for OpenMP simd / Cilk+
> simd?
> 
> Why can't the execution model (normal, vector-single and
> worker-single) be simply attributes on functions or cgraph node flags
> and the kind of #acc loop simply be flags on struct loop, like
> already OpenMP simd / Cilk+ simd is?

One problem is that (at least on the GPU hardware we've considered so
far) we're somewhat constrained in how much control we have over how the
underlying hardware executes code: it's possible to draw up a scheme
where OpenACC source-level control-flow semantics are reflected directly
in the PTX assembly output (e.g. to say "all threads in a CTA/warp will
be coherent after such-and-such a loop"), and lowering OpenACC
directives quite early seems to make that relatively tractable. (Even
if the resulting code is relatively un-optimisable due to the abnormal
edges inserted to make sure that the CFG doesn't become "ill-formed".)

If arbitrary optimisations are done between OMP-lowering time and
somewhere around vectorisation (say), it's less clear if that
correspondence can be maintained. Say if the code executed by half the
threads in a warp becomes physically separated from the code executed
by the other half of the threads in a warp due to some loop
optimisation, we can no longer easily determine where that warp will
reconverge, and certain other operations (relying on coherent warps --
e.g. CTA synchronisation) become impossible. A similar issue exists for
warps within a CTA.

So, essentially -- I don't know how "late" loop lowering would interact
with:

(a) Maintaining a CFG that will work with PTX.

(b) Predication for worker-single and/or vector-single modes
(actually all currently-proposed schemes have problems with proper
representation of data-dependencies for variables and
compiler-generated temporaries between predicated regions.)

Julian

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Bernd Schmidt


On 06/19/2015 03:45 PM, Jakub Jelinek wrote:

I actually believe having some optimization passes in between the ompexp
and the lowering of the IR into the form PTX wants is highly desirable,
the form with the worker-single or vector-single mode lowered will contain
too complex CFG for many optimizations to be really effective, especially
if it uses abnormal edges.  E.g. inlining supposedly would have harder job
etc.  What exact unpredictable effects do you fear?


Mostly the ones I can't predict. But let's take one example, LICM: let's 
say you pull some assignment out of a loop, then you find yourself in 
one of two possible situations: either it's become not actually 
available inside the loop (because the data and control flow is not 
described correctly and the compiler doesn't know what's going on), or, 
to avoid that, you introduce additional broadcasting operations when 
entering the loop, which might be quite expensive.



Bernd

Re: [Ping, Patch, fortran, 64674, v3] [OOP] ICE in ASSOCIATE with class array

2015-06-22 Thread Paul Richard Thomas

Hi Andre,

Some questions: The first and second chunks look a bit awkward in
parse.c. Do they have to be there in order that primary.c does the
right thing? Could the whole lot be transferred to resolve.c or would
that make it horribly messy? I couldn't apply the patch right now -
does it work with variable expressions for the target array indices?

If the answers are (i) yes (ii) no (iii) yes, then OK for trunk.

If the answer to (iii) is yes, please extend or modify the testcase to
check for variable indices.

Thanks for the patch

Paul

On 19 June 2015 at 12:58, Andre Vehreschild  wrote:
> Hi all,
>
> a ping on this patch. Rebased to current trunk.
>
> Bootstraps and regtests fine on x86_64-linux-gnu/f21.
>
> Ok for trunk?
>
> - Andre
>
>> On Mon, 4 May 2015 16:53:15 +0200
>> Andre Vehreschild  wrote:
>>
>> > Hi all,
>> >
>> > I like to present here a first patch for using class arrays in associate.
>> > Upto now gfortran crashed, when a class array-section/element was selected
>> > in an associate. This patch fixes this now for class array sections as well
>> > as for single elements.
>> >
>> > The story of the patch is told quite shortly:
>> >
>> > - parse.c::parse_associate() needs to gather more information about what 
>> > the
>> >   target is like. Previously the target's rank and array_spec was not
>> > computed, which disallowed the use of further array refs in the associate
>> > body: associate (vec => class_matrix(2:3, 2))
>> > vec(1) = ... ! <- Unclassifiable statement, because no array_spec was
>> >   attached to vec. This is fixed by the second hunk of the patch.
>> >
>> > - The third hunk in primary.c prevents setting the dimension attribute on a
>> >   class object's symbol.
>> >
>> > - The hunks in resolve.c take care about adding dummy full array_refs and 
>> > in
>> >   resolve_assoc_var correct the class type, when the target expression's
>> > rank is 0. Previously the symbol would have an array valued type, when the
>> >   target's base type was array valued. But for a scalar target this needed
>> > some polishing.
>> >
>> > - Additionally a test was added.
>> >
>> > Bootstraps and regtests ok on x86_64-linux-gnu/f21.
>> >
>> > Ok for trunk?
>> >
>> > Note, this patch was diffed from a trunk with my older patches for
>> >
>> > PR65548, v3 https://gcc.gnu.org/ml/fortran/2015-04/msg00123.html and
>> > PR44672, v5 https://gcc.gnu.org/ml/fortran/2015-04/msg00124.html
>> >
>> > applied.
>> >
>> > Regards,
>> > Andre
>>
>>
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Joseph Myers

On Sun, 21 Jun 2015, Martin Sebor wrote:

> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index 636e0bb..637a292 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cilk.h"
>  #include "gomp-constants.h"
> 
> +#include 

Included from system.h, don't include it explicitly in source files.

> +  if (DECL_IS_BUILTIN (exp.value))
> + {
> +   error_at (loc, "converting builtin function to a pointer");

Say "built-in" (see codingconventions.html).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Jakub Jelinek

On Mon, Jun 22, 2015 at 03:59:57PM +0200, Bernd Schmidt wrote:
> On 06/19/2015 03:45 PM, Jakub Jelinek wrote:
> >I actually believe having some optimization passes in between the ompexp
> >and the lowering of the IR into the form PTX wants is highly desirable,
> >the form with the worker-single or vector-single mode lowered will contain
> >too complex CFG for many optimizations to be really effective, especially
> >if it uses abnormal edges.  E.g. inlining supposedly would have harder job
> >etc.  What exact unpredictable effects do you fear?
> 
> Mostly the ones I can't predict. But let's take one example, LICM: let's say
> you pull some assignment out of a loop, then you find yourself in one of two
> possible situations: either it's become not actually available inside the
> loop (because the data and control flow is not described correctly and the
> compiler doesn't know what's going on), or, to avoid that, you introduce

Why do you think that would happen?  E.g. for non-addressable gimple types you'd
most likely just have a PHI for it on the loop.

> additional broadcasting operations when entering the loop, which might be
> quite expensive.

If the PHI has cheap initialization, there is not a problem to emit it as
initialization in the loop instead of a broadcast (kind like RA
rematerialization).  And by actually adding such an optimization, you help
even code that has computation in a vector-single code and uses it in vector
acc loop.

Jakub

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Jakub Jelinek

On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
> One problem is that (at least on the GPU hardware we've considered so
> far) we're somewhat constrained in how much control we have over how the
> underlying hardware executes code: it's possible to draw up a scheme
> where OpenACC source-level control-flow semantics are reflected directly
> in the PTX assembly output (e.g. to say "all threads in a CTA/warp will
> be coherent after such-and-such a loop"), and lowering OpenACC
> directives quite early seems to make that relatively tractable. (Even
> if the resulting code is relatively un-optimisable due to the abnormal
> edges inserted to make sure that the CFG doesn't become "ill-formed".)
> 
> If arbitrary optimisations are done between OMP-lowering time and
> somewhere around vectorisation (say), it's less clear if that
> correspondence can be maintained. Say if the code executed by half the
> threads in a warp becomes physically separated from the code executed
> by the other half of the threads in a warp due to some loop
> optimisation, we can no longer easily determine where that warp will
> reconverge, and certain other operations (relying on coherent warps --
> e.g. CTA synchronisation) become impossible. A similar issue exists for
> warps within a CTA.
> 
> So, essentially -- I don't know how "late" loop lowering would interact
> with:
> 
> (a) Maintaining a CFG that will work with PTX.
> 
> (b) Predication for worker-single and/or vector-single modes
> (actually all currently-proposed schemes have problems with proper
> representation of data-dependencies for variables and
> compiler-generated temporaries between predicated regions.)

I don't understand why lowering the way you suggest helps here at all.
In the proposed scheme, you essentially have whole function
in e.g. worker-single or vector-single mode, which you need to be able to
handle properly in any case, because users can write such routines
themselves.  And then you can have a loop in such a function that
has some special attribute, a hint that it is desirable to vectorize it
(for PTX the PTX way) or use vector-single mode for it in a worker-single
function.  So, the special pass then of course needs to handle all the
needed broadcasting and reduction required to change the mode from e.g.
worker-single to vector-single, but the convergence points still would be
either on the boundary of such loops to be vectorized or parallelized, or
wherever else they appear in normal vector-single or worker-single functions
(around the calls to certainly calls?).

Jakub

Re: [i386, PATCH, 1/3] IA MCU psABI support: GCC changes.

2015-06-22 Thread Kirill Yukhin

Hello,
This patch introduces basic support into GCC.

Bootstrapped and regtested.

/
* configure.ac (ospace_frag): Enable for i?86*-*-elfiamcu
target.
* configure: Regenerate.
gcc/
* config.gcc: Support i[34567]86-*-elfiamcu target.
* config/i386/iamcu.h: New.
* config/i386/i386.opt: Add -miamcu.
* doc/invoke.texi: Document -miamcu.
* common/config/i386/i386-common.c  (ix86_handle_option): Turn
off x87/MMX/SSE/AVX codegen for -miamcu.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__iamcu/__iamcu__ for -miamcu.
* config/i386/i386.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Set
to MIN_STACK_BOUNDARY if TARGET_IAMCU is true.
(BIGGEST_ALIGNMENT): Set to 32 if TARGET_IAMCU is true.
* config/i386/i386.c (ix86_option_override_internal):
  - Ignore and warn -mregparm for Intel MCU. Turn
on -mregparm=3 for Intel MCU by default.
  - Default long double to 64-bit for Intel MCU.
  - Turn on -freg-struct-return for Intel MCU.
  - Issue an error when -miamcu is used in 64-bit or x32 mode,
or if x87, MMX, SSE or AVX is turned on.
(function_arg_advance_32): Pass value whose
size is no larger than 8 bytes in registers for Intel MCU.
(function_arg_32): Likewise.
(ix86_return_in_memory): Return value whose size is no larger
than 8 bytes in registers for Intel MCU.
(iamcu_alignment): New function.
(ix86_data_alignment): Call iamcu_alignment if TARGET_IAMCU is
true.
(ix86_local_alignment): Don't increase
alignment for Intel MCU.
(x86_field_alignment): Return iamcu_alignment if TARGET_IAMCU is
true.

Is it OK for trunk?

--
Thanks, K

diff --git a/configure b/configure
index bced9de..82e45f3 100755
--- a/configure
+++ b/configure
@@ -6914,7 +6914,7 @@ case "${enable_target_optspace}:${target}" in
   :d30v-*)
 ospace_frag="config/mt-d30v"
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag="config/mt-ospace"
 ;;
   no:* | :*)
diff --git a/configure.ac b/configure.ac
index 7c06e6b..dc77a1b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2560,7 +2560,7 @@ case "${enable_target_optspace}:${target}" in
   :d30v-*)
 ospace_frag="config/mt-d30v"
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag="config/mt-ospace"
 ;;
   no:* | :*)
diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 0f8c3e1..79b2472 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -223,7 +223,7 @@ along with GCC; see the file COPYING3.  If not see

 bool
 ix86_handle_option (struct gcc_options *opts,
-   struct gcc_options *opts_set ATTRIBUTE_UNUSED,
+   struct gcc_options *opts_set,
const struct cl_decoded_option *decoded,
location_t loc)
 {
@@ -232,6 +232,20 @@ ix86_handle_option (struct gcc_options *opts,

   switch (code)
 {
+case OPT_miamcu:
+  if (value)
+   {
+ /* Turn off x87/MMX/SSE/AVX codegen for -miamcu.  */
+ opts->x_target_flags &= ~MASK_80387;
+ opts_set->x_target_flags |= MASK_80387;
+ opts->x_ix86_isa_flags &= ~(OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+ opts->x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+
+   }
+  return true;
+
 case OPT_mmmx:
   if (value)
{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 805638d..2b3af82 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1389,6 +1389,9 @@ x86_64-*-darwin*)
tmake_file="${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc"
tm_file="${tm_file} ${cpu_type}/darwin64.h"
;;
+i[34567]86-*-elfiamcu)
+   tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/iamcu.h"
+   ;;
 i[34567]86-*-elf*)
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h"
;;
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 0228f4b..66f7e37 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -426,6 +426,11 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__CLWB__");
   if (isa_flag & OPTION_MASK_ISA_MWAITX)
 def_or_undef (parse_in, "__MWAITX__");
+  if (TARGET_IAMCU)
+{
+  def_or_undef (parse_in, "__iamcu");
+  def_or_undef (parse_in, "__iamcu__");
+}
 }

 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 24fccfc..26ffa67 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3433,6 +3433,10 @@ ix86_option_overri

Re: [gomp4][PATCH] Handle casts in bound in try_transform_to_exit_first_loop_alt

2015-06-22 Thread Richard Biener

On Thu, 18 Jun 2015, Tom de Vries wrote:

> On 13/06/15 16:24, Tom de Vries wrote:
> > Hi,
> > 
> > this patch allows try_transform_to_exit_first_loop_alt to succeed when
> > handling cases where the expression representing the number of
> > iterations contains a cast.
> > 
> > Currently, transform_to_exit_first_loop_alt testcase
> > gfortran/parloops-exit-first-loop-alt.f95 will fail.
> > 
> > The nit is _19, which is defined as follows:
> > ...
> > _20 = _6 + -1;
> > _19 = (unsigned int) _20;
> > ...
> > And transform_to_exit_first_loop_alt currently only handles nits with
> > defining stmt 'nit = x - 1', for which it finds alt_bound 'x'.
> > 
> > The patch:
> > - uses try_get_loop_niter to get nit as a nested tree expression
> >'(unsigned int) (_6 + -1)'
> > - strips the outer nops (assuming no change in value)
> > - uses '(unsigned int)_6' as the alt_bound, and
> > - gimplifies the expression.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> 
> Cleaned up whitespace in testcases.
> 
> Committed to gomp-4_0-branch as atttached.
> 
> > OK for trunk?
> > 

I assume the above also handles the reverse, (int) (_6 + -1)?
In this case what happens if _6 == INT_MAX + 1?  nit is INT_MAX
but (int) _6 is INT_MIN.

Likewise what happens if _6 + -1 under-/overflows?

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)

Re: match.pd: Three new patterns (and some more)

2015-06-22 Thread Richard Biener

On Thu, 18 Jun 2015, Marek Polacek wrote:

> On Tue, Jun 16, 2015 at 03:35:15PM +0200, Richard Biener wrote:
> > We already have
> > 
> > /* (x & y) ^ (x | y) -> x ^ y */
> > (simplify
> >  (bit_xor:c (bit_and @0 @1) (bit_ior @0 @1))
> >  (bit_xor @0 @1))
> > 
> > but of course with minus it doesn't commutate so it's hard to
> > merge.
> 
> Yeah :(.
>  
> > > > +/* (x & y) + (x | y) -> x + y */
> > > 
> > > Again for symmetry, it seems like this comes with
> > > x + y - (x | y) -> x & y
> > > x + y - (x & y) -> x | y
> > > which seem fine when overflow is undefined or wraps, but not if for 
> > > instance
> > > it saturates.
> > 
> > Can you adjust according to Marcs comment and re-submit?  If you like
> > you can do it as followup as well and thus the original patch is ok
> > as well.
> 
> Sure.  This is a new version with some more patters.  Thanks.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2015-06-18  Marek Polacek  
> 
>   * match.pd ((x ^ y) ^ (x | y) -> x & y,
>   (x & y) + (x ^ y) -> x | y, (x & y) | (x ^ y) -> x | y,
>   (x & y) ^ (x ^ y) -> x | y, (x & y) + (x | y) -> x + y,
>   (x | y) - (x ^ y) -> x & y, (x | y) - (x & y) -> x ^ y): New patterns.
> 
>   * gcc.dg/fold-ior-1.c: New test.
>   * gcc.dg/fold-minus-2.c: New test.
>   * gcc.dg/fold-minus-3.c: New test.
>   * gcc.dg/fold-plus-1.c: New test.
>   * gcc.dg/fold-plus-2.c: New test.
>   * gcc.dg/fold-xor-4.c: New test.
>   * gcc.dg/fold-xor-5.c: New test.
> 
> diff --git gcc/match.pd gcc/match.pd
> index 1ab2b1c..badb80a 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -325,6 +325,34 @@ along with GCC; see the file COPYING3.  If not see
>   (bit_xor:c (bit_and @0 @1) (bit_ior @0 @1))
>   (bit_xor @0 @1))
>  
> +/* (x ^ y) ^ (x | y) -> x & y */
> +(simplify
> + (bit_xor:c (bit_xor @0 @1) (bit_ior @0 @1))
> + (bit_and @0 @1))
> +
> +/* (x & y) + (x ^ y) -> x | y */
> +/* (x & y) | (x ^ y) -> x | y */
> +/* (x & y) ^ (x ^ y) -> x | y */
> +(for op (plus bit_ior bit_xor)
> + (simplify
> +  (op:c (bit_and @0 @1) (bit_xor @0 @1))
> +  (bit_ior @0 @1)))
> +
> +/* (x & y) + (x | y) -> x + y */
> +(simplify
> + (plus:c (bit_and @0 @1) (bit_ior @0 @1))
> + (plus @0 @1))
> +
> +/* (x | y) - (x ^ y) -> x & y */
> +(simplify
> + (minus (bit_ior @0 @1) (bit_xor @0 @1))
> + (bit_and @0 @1))
> +
> +/* (x | y) - (x & y) -> x ^ y */
> +(simplify
> + (minus (bit_ior @0 @1) (bit_and @0 @1))
> + (bit_xor @0 @1))
> +
>  (simplify
>   (abs (negate @0))
>   (abs @0))
> diff --git gcc/testsuite/gcc.dg/fold-ior-1.c gcc/testsuite/gcc.dg/fold-ior-1.c
> index e69de29..0358eb5 100644
> --- gcc/testsuite/gcc.dg/fold-ior-1.c
> +++ gcc/testsuite/gcc.dg/fold-ior-1.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-cddce1" } */
> +
> +int
> +fn1 (int a, int b)
> +{
> +  int tem1 = a & b;
> +  int tem2 = a ^ b;
> +  return tem1 | tem2;
> +}
> +
> +int
> +fn2 (int a, int b)
> +{
> +  int tem1 = b & a;
> +  int tem2 = a ^ b;
> +  return tem1 | tem2;
> +}
> +
> +int
> +fn3 (int a, int b)
> +{
> +  int tem1 = a & b;
> +  int tem2 = b ^ a;
> +  return tem1 | tem2;
> +}
> +
> +int
> +fn4 (int a, int b)
> +{
> +  int tem1 = b & a;
> +  int tem2 = b ^ a;
> +  return tem1 | tem2;
> +}
> +
> +int
> +fn5 (int a, int b)
> +{
> +  int tem1 = a ^ b;
> +  int tem2 = a & b;
> +  return tem1 | tem2;
> +}
> +
> +int
> +fn6 (int a, int b)
> +{
> +  int tem1 = b ^ a;
> +  int tem2 = a & b;
> +  return tem1 | tem2;
> +}
> +
> +int
> +fn7 (int a, int b)
> +{
> +  int tem1 = a ^ b;
> +  int tem2 = b & a;
> +  return tem1 | tem2;
> +}
> +
> +int
> +fn8 (int a, int b)
> +{
> +  int tem1 = b ^ a;
> +  int tem2 = b & a;
> +  return tem1 | tem2;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " & " "cddce1" } } */
> +/* { dg-final { scan-tree-dump-not " \\^ " "cddce1" } } */
> diff --git gcc/testsuite/gcc.dg/fold-minus-2.c 
> gcc/testsuite/gcc.dg/fold-minus-2.c
> index e69de29..6501f2f 100644
> --- gcc/testsuite/gcc.dg/fold-minus-2.c
> +++ gcc/testsuite/gcc.dg/fold-minus-2.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-cddce1" } */
> +
> +int
> +fn1 (int a, int b)
> +{
> +  int tem1 = a | b;
> +  int tem2 = a ^ b;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn2 (int a, int b)
> +{
> +  int tem1 = b | a;
> +  int tem2 = a ^ b;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn3 (int a, int b)
> +{
> +  int tem1 = a | b;
> +  int tem2 = b ^ a;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn4 (int a, int b)
> +{
> +  int tem1 = b | a;
> +  int tem2 = b ^ a;
> +  return tem1 - tem2;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " \\^ " "cddce1" } } */
> +/* { dg-final { scan-tree-dump-not " \\| " "cddce1" } } */
> diff --git gcc/testsuite/gcc.dg/fold-minus-3.c 
> gcc/testsuite/gcc.dg/fold-minus-3.c
> index e69de29..e7adce6 100644
> --- gcc/testsuite/gcc.dg/fold-minus-3.c
> +++ gcc/testsuite/gcc.dg/fold-minus-3.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
>

Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Marek Polacek

On Sun, Jun 21, 2015 at 05:05:14PM -0600, Martin Sebor wrote:
> Attached is a patch to reject C and C++ constructs that result
> in obtaining a pointer (or a reference in C++) to a builtin
> function.  These constructs are currently silently accepted by
> GCC and, in most cases(*), result in a linker error.  The patch
> brings GCC on par with Clang by rejecting all such constructs.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
 
It seems like this patch regresess pr59630.c testcase; I don't see
the testcase being addressed in this patch.

> 2015-06-21  Martin Sebor  
> 
>   PR c/66516
>   * c/c-typeck.c (default_function_array_conversion): Reject
>   converting a builtin function to a pointer.
>   (parser_build_unary_op): Reject taking the address of a builtin
>   function.
>   * cp/call.c (convert_like_real): Reject converting a builtin function
>   to a pointer.
>   (initialize_reference): Reject initializing a reference with a builtin
>   function.
>   * cp/typeck.c (cp_build_addr_expr_strict): Reject taking the address
>   of a builtin function.
>   (build_reinterpret_cast_1): Reject casting a builtin function to
>   a pointer.
>   (convert_for_initialization): Reject initializing a pointer with
>   the a builtin function.
 
Please no c/ and cp/ prefixes.

> +#include 

As Joseph already pointed out, this is redundant.

> @@ -3384,7 +3392,14 @@ parser_build_unary_op (location_t loc, enum tree_code 
> code, struct c_expr arg)
>result.original_code = code;
>result.original_type = NULL;
> 
> -  if (TREE_OVERFLOW_P (result.value) && !TREE_OVERFLOW_P (arg.value))
> +  if (code == ADDR_EXPR
> +  && TREE_CODE (TREE_TYPE (arg.value)) == FUNCTION_TYPE
> +  && DECL_IS_BUILTIN (arg.value))
> +{
> +  error_at (loc, "taking address of a builtin function");
> +  result.value = error_mark_node;
> +}
> +  else if (TREE_OVERFLOW_P (result.value) && !TREE_OVERFLOW_P (arg.value))
>  overflow_warning (loc, result.value);

It seems like you can move the new hunk a bit above so that we don't call
build_unary_op in a case when taking the address of a built-in function.

Unfortunately, it doesn't seem possible to do this error in build_unary_op
or in function_to_pointer_conversion :(.

Marek

Re: match.pd: Three new patterns

2015-06-22 Thread Richard Biener

On Fri, 19 Jun 2015, Marek Polacek wrote:

> On Thu, Jun 18, 2015 at 05:41:18PM +0200, Marek Polacek wrote:
> > > Again for symmetry, it seems like this comes with
> > > x + y - (x | y) -> x & y
> > > x + y - (x & y) -> x | y
> > > which seem fine when overflow is undefined or wraps, but not if for 
> > > instance
> > > it saturates.
> > 
> > I'll leave this as a follow-up.
> 
> ...here.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-06-19  Marek Polacek  
> 
>   * match.pd (x + y - (x | y) -> x & y,

) missing

>   (x + y) - (x & y) -> x | y): New patterns.
> 
>   * gcc.dg/fold-minus-4.c: New test.
>   * gcc.dg/fold-minus-5.c: New test.
> 
> diff --git gcc/match.pd gcc/match.pd
> index badb80a..61ff710 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -343,6 +343,18 @@ along with GCC; see the file COPYING3.  If not see
>   (plus:c (bit_and @0 @1) (bit_ior @0 @1))
>   (plus @0 @1))
>  
> +/* x + y - (x | y) -> x & y */

Please wrap x + y in () here as well.

Ok with that changes.

Thanks,
Richard.

> +(simplify
> + (minus (plus @0 @1) (bit_ior @0 @1))
> + (if (!TYPE_OVERFLOW_SANITIZED (type) && !TYPE_SATURATING (type))
> +  (bit_and @0 @1)))
> +
> +/* (x + y) - (x & y) -> x | y */
> +(simplify
> + (minus (plus @0 @1) (bit_and @0 @1))
> + (if (!TYPE_OVERFLOW_SANITIZED (type) && !TYPE_SATURATING (type))
> +  (bit_ior @0 @1)))
> +
>  /* (x | y) - (x ^ y) -> x & y */
>  (simplify
>   (minus (bit_ior @0 @1) (bit_xor @0 @1))
> diff --git gcc/testsuite/gcc.dg/fold-minus-4.c 
> gcc/testsuite/gcc.dg/fold-minus-4.c
> index e69de29..2d76b4f 100644
> --- gcc/testsuite/gcc.dg/fold-minus-4.c
> +++ gcc/testsuite/gcc.dg/fold-minus-4.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-cddce1" } */
> +
> +int
> +fn1 (int a, int b)
> +{
> +  int tem1 = a + b;
> +  int tem2 = a & b;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn2 (int a, int b)
> +{
> +  int tem1 = b + a;
> +  int tem2 = a & b;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn3 (int a, int b)
> +{
> +  int tem1 = a + b;
> +  int tem2 = b & a;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn4 (int a, int b)
> +{
> +  int tem1 = b + a;
> +  int tem2 = b & a;
> +  return tem1 - tem2;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " & " "cddce1" } } */
> +/* { dg-final { scan-tree-dump-not " \\+ " "cddce1" } } */
> diff --git gcc/testsuite/gcc.dg/fold-minus-5.c 
> gcc/testsuite/gcc.dg/fold-minus-5.c
> index e69de29..a31e1cc 100644
> --- gcc/testsuite/gcc.dg/fold-minus-5.c
> +++ gcc/testsuite/gcc.dg/fold-minus-5.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-cddce1" } */
> +
> +int
> +fn1 (int a, int b)
> +{
> +  int tem1 = a + b;
> +  int tem2 = a | b;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn2 (int a, int b)
> +{
> +  int tem1 = b + a;
> +  int tem2 = a | b;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn3 (int a, int b)
> +{
> +  int tem1 = a + b;
> +  int tem2 = b | a;
> +  return tem1 - tem2;
> +}
> +
> +int
> +fn4 (int a, int b)
> +{
> +  int tem1 = b + a;
> +  int tem2 = b | a;
> +  return tem1 - tem2;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " \\+ " "cddce1" } } */
> +/* { dg-final { scan-tree-dump-not " \\| " "cddce1" } } */
> 
>   Marek
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)

Re: Fix more of C/fortran canonical type issues

2015-06-22 Thread Richard Biener

On Mon, 22 Jun 2015, Jan Hubicka wrote:

> > > On Mon, 8 Jun 2015, Jan Hubicka wrote:
> > > 
> > > > > 
> > > > > I think we should instead work towards eliminating the get_alias_set
> > > > > langhook first.  The LTO langhook variant contains the same handling, 
> > > > > btw,
> > > > > so just inline that into get_alias_set and see what remains?
> > > > 
> > > > I see, i completely missed existence of gimple_get_alias_set. It makes 
> > > > more
> > > > sense now.
> > > > 
> > > > Is moving everyting to alias.c realy a desirable thing? If non-C 
> > > > languages do
> > > > not have this rule, why we want to reduce the code quality when 
> > > > compiling
> > > > those?
> > > 
> > > Well, for consistency and for getting rid of one langhook ;)
> > :)
> > In a way this particular langhook makes sense to me - TBAA rules are 
> > language specific.
> > We also may with explicit streaming of the TBAA dag, like LLVM does.
> > 
> > Anyway, this is the updated patch fixing the Fortran's interoperability with
> > size_t and signed char.  I will send separate patch for the extra lto-symtab
> > warnings shortly.
> > 
> > I will be happy looking into the TYPE_CANONICAL (int) to be different from
> > TYPE_CANONICAL (unsigned int) if that seems desirable. There are two things 
> > that
> > needs to be solved - 
> > hash_canonical_type/gimple_canonical_types_compatible_p can't
> > use TYPE_CNAONICAL of subtypes in all cases (that is easy) and we will need 
> > some
> > way to recognize the conflict in lto-symtab other thanjust comparing 
> > TYPE_CANONICAL
> > to not warn when a variable is declared signed in Fortran unit and unsigned 
> > in C.
> > 
> > Bootstrapped/regtested ppc64le-linux.
> > 
> > * lto/lto.c (hash_canonical_type): Do not hash TYPE_UNSIGNED
> > of INTEGER_TYPE.
> > * tree.c (gimple_canonical_types_compatible_p): Do not compare 
> > TYPE_UNSIGNED
> > of INTEGER_TYPE.
> > * gimple-expr.c (useless_type_conversion_p): Move INTEGER type handling
> > ahead the canonical type lookup.
> > 
> > * gfortran.dg/lto/bind_c-2_0.f90: New testcase
> > * gfortran.dg/lto/bind_c-2_1.c: New testcase
> > * gfortran.dg/lto/bind_c-3_0.f90: New testcase
> > * gfortran.dg/lto/bind_c-3_1.c: New testcase
> > * gfortran.dg/lto/bind_c-4_0.f90: New testcase
> > * gfortran.dg/lto/bind_c-4_1.c: New testcase
> 
> Hi,
> I would like to ping this.  There are still few things to fix to make our
> merging compliant at least for C/C++/Fortran rules (the array bounds for
> Fortran and union ordering for C I believe) and I would like to progress
> on this.

I don't like the changes to useless_type_conversion_p much.  Why
do you preserve qualifiers for the integer kind compares?

All the testcases have the integral types in aggregates as members.
I already said that I'm happy globbing them together in aggregates.

I'm still not convinced that we need a 1:1 correspondence between
canonical types and alias sets.  In particular canonical types are
used for type compatibility in lhs = rhs assignments
(useless_type_conversion_p) which is a transitive relation.
Mixing both too much will cause serious confusion.  We have
alias-sets for a reason.

Richard.

Re: [patch] libstdc++/64657 support iterators with overloaded comma operator

2015-06-22 Thread Jonathan Wakely


On 29/04/15 16:22 +0100, Jonathan Wakely wrote:

I think this covers all the places in the library where we do:

++i, ++j


I missed one. Tested powerpc64-linux, committed to trunk.
commit 28542877cbeae9e1da3bdc8e3a5a3053b2f0ee23
Author: Jonathan Wakely 
Date:   Mon Jun 22 13:44:06 2015 +0100

	PR libstdc++/64657
	* include/bits/stl_uninitialized.h
	(__uninitialized_copy::__uninit_copy): Cast expression to void.

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h b/libstdc++-v3/include/bits/stl_uninitialized.h
index 715cb58..045bdd7 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _ForwardIterator __cur = __result;
 	  __try
 	{
-	  for (; __first != __last; ++__first, ++__cur)
+	  for (; __first != __last; ++__first, (void)++__cur)
 		std::_Construct(std::__addressof(*__cur), *__first);
 	  return __cur;
 	}

Re: [VRP] Improve value ranges for unsigned division

2015-06-22 Thread Richard Biener

On Sat, Jun 20, 2015 at 9:12 AM, Kugan
 wrote:
> As discussed in PR64130, this patch improves the VRP value ranges for
> unsigned division.
>
> Bootstrapped and regression tested on x86_64-linux-gnu and regression
> tested on arm-none-linux-gnu with no new regression.
>
> Is this OK for trunk?

Hum, the patch is at least incomplete not covering the
cmp == -1 case in the max value computation, no?

Also I wonder if we have two VR_RANGEs as you require
the code using extract_range_from_multiplicative_op_1 isn't
better suited and already handles the case properly?

Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2015-06-20  Kugan Vivekanandarajah  
>
> PR middle-end/64130
> * tree-vrp.c (extract_range_from_binary_expr_1): For unsigned
> division, compute minimum when value ranges for dividend and
> divisor are available.
>
>
> gcc/testsuite/ChangeLog:
>
> 2015-06-20  Kugan Vivekanandarajah  
>
> PR middle-end/64130
> * gcc.dg/tree-ssa/pr64130.c: New test.

Re: Do not take address of empty string front

2015-06-22 Thread Jonathan Wakely


On 20/06/15 12:59 +0100, Jonathan Wakely wrote:

On 20/06/15 12:03 +0200, François Dumont wrote:

Hi

  2 experimental tests are failing in debug mode because
__do_str_codecvt is sometimes taking address of string front() and
back() even if empty. It wasn't use so not a big issue but it still
seems better to avoid. I propose to rather use string begin() to get
buffer address.


But derefencing begin() is still undefined for an empty string.
Shouldn't that fail for debug mode too? Why change one form of
undefined behaviour that we diagnose to another form that we don't
diagnose?

It would be better if that function didn't do any work when the input
range is empty:

--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _OutStr& __outstr, const _Codecvt& __cvt, _State& __state,
   size_t& __count, _Fn __fn)
   {
+  if (__first == __last)
+   {
+ __outstr.clear();
+ return true;
+   }
+
 size_t __outchars = 0;
 auto __next = __first;
 const auto __maxlen = __cvt.max_length() + 1;


This makes that change, and also moves wstring_convert into the
ABI-tagged __cxx11 namespace, and fixes a copy&paste error in the
exception thrown from wbuffer_convert.

Tested powerpc64le-linux, committed to trunk.

François, your changes to add extra checks in std::string are still
useful separately.

commit 4ab3f0a76f7e18074c91c4644cbfdf23084e93ba
Author: Jonathan Wakely 
Date:   Mon Jun 22 13:47:24 2015 +0100

	* include/bits/locale_conv.h (__do_str_codecvt): Handle empty range.
	(wstring_convert): Move into __cxx11 namespace.
	(wbuffer_convert(streambuf*, _Codecvt*, state_type)): Fix exception
	message.

diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h
index 61b535c..fd99499 100644
--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		 _OutStr& __outstr, const _Codecvt& __cvt, _State& __state,
 		 size_t& __count, _Fn __fn)
 {
+  if (__first == __last)
+	{
+	  __outstr.clear();
+	  return true;
+	}
+
   size_t __outchars = 0;
   auto __next = __first;
   const auto __maxlen = __cvt.max_length() + 1;
@@ -150,6 +156,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __str_codecvt_out(__first, __last, __outstr, __cvt, __state, __n);
 }
 
+_GLIBCXX_BEGIN_NAMESPACE_CXX11
+
   /// String conversions
   template,
@@ -301,6 +309,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool			_M_with_strings = false;
 };
 
+_GLIBCXX_END_NAMESPACE_CXX11
+
   /// Buffer conversions
   template>
@@ -325,7 +335,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : _M_buf(__bytebuf), _M_cvt(__pcvt), _M_state(__state)
   {
 	if (!_M_cvt)
-	  __throw_logic_error("wstring_convert");
+	  __throw_logic_error("wbuffer_convert");
 
 	_M_always_noconv = _M_cvt->always_noconv();

[patch] Fix std::polar() test FAIL

2015-06-22 Thread Jonathan Wakely


I recently added a debug mode assertion that std::polar is not called
with a negative rho argument, which this test does.

Tested powerpc64le-linux, committed to trunk.

commit 3592c4a31ba7f3af4eb8111565888651652ad7b1
Author: Jonathan Wakely 
Date:   Mon Jun 22 15:08:55 2015 +0100

	* testsuite/26_numerics/complex/value_operations/1.cc: Use
	non-negative rho argument.

diff --git a/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc b/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc
index 1caf9f1..a1e0a6b 100644
--- a/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc
+++ b/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc
@@ -53,7 +53,7 @@ void test01()
 
  complex_type e __attribute__((unused)) = conj(c);
  
- complex_type f = polar(c.imag(), 0.0);
+ complex_type f = polar(std::abs(c.imag()), 0.0);
  VERIFY( f.real() != 0 );
 }

Re: [patch] libstdc++/55409 C++11 allocator support for std::list

2015-06-22 Thread Jonathan Wakely


On 17/06/15 21:36 +0100, Jonathan Wakely wrote:

I didn't get time to finish this for 5.1, but this adds missing C++11
allocator support to std::list.


François pointed out that this change means we can update
__gnu_debug::list to derive from an allocator-aware _Safe_sequence.

We can do the same for __cxx11::basic_string by evaluating
bool(_GLIBCXX_USE_CXX11_ABI).

Tested powerpc64-linux, committed to trunk.
commit e5fb5e8dc5d1b1ffeaa48cd1d05f76ee93bc377d
Author: Jonathan Wakely 
Date:   Thu Jun 18 10:02:15 2015 +0100

	* include/debug/list (__gnu_debug::list): Use allocator-aware
	_Safe_container base.
	* include/debug/string (__gnu_debug::basic_string): Use
	allocator-aware _Safe_container base for cxx11 ABI.

diff --git a/libstdc++-v3/include/debug/list b/libstdc++-v3/include/debug/list
index 1562946..12ac53c 100644
--- a/libstdc++-v3/include/debug/list
+++ b/libstdc++-v3/include/debug/list
@@ -43,12 +43,12 @@ namespace __debug
 class list
 : public __gnu_debug::_Safe_container<
 	list<_Tp, _Allocator>, _Allocator,
-	__gnu_debug::_Safe_node_sequence, false>,
+	__gnu_debug::_Safe_node_sequence>,
   public _GLIBCXX_STD_C::list<_Tp, _Allocator>
 {
-  typedef _GLIBCXX_STD_C::list<_Tp, _Allocator>			_Base;
+  typedef _GLIBCXX_STD_C::list<_Tp, _Allocator>		_Base;
   typedef __gnu_debug::_Safe_container<
-	list, _Allocator, __gnu_debug::_Safe_node_sequence, false>	_Safe;
+	list, _Allocator, __gnu_debug::_Safe_node_sequence>	_Safe;
 
   typedef typename _Base::iterator		_Base_iterator;
   typedef typename _Base::const_iterator	_Base_const_iterator;
diff --git a/libstdc++-v3/include/debug/string b/libstdc++-v3/include/debug/string
index 3793a35..f068ef0 100644
--- a/libstdc++-v3/include/debug/string
+++ b/libstdc++-v3/include/debug/string
@@ -42,12 +42,13 @@ namespace __gnu_debug
 class basic_string
 : public __gnu_debug::_Safe_container<
 	basic_string<_CharT, _Traits, _Allocator>,
-	_Allocator, _Safe_sequence, false>,
+	_Allocator, _Safe_sequence, bool(_GLIBCXX_USE_CXX11_ABI)>,
   public std::basic_string<_CharT, _Traits, _Allocator>
 {
   typedef std::basic_string<_CharT, _Traits, _Allocator>	_Base;
   typedef __gnu_debug::_Safe_container<
-	basic_string, _Allocator, _Safe_sequence, false>	_Safe;
+	basic_string, _Allocator, _Safe_sequence, bool(_GLIBCXX_USE_CXX11_ABI)>
+	_Safe;
 
   public:
 // types:

Re: [Ping, Patch, fortran, 64674, v3] [OOP] ICE in ASSOCIATE with class array

2015-06-22 Thread Andre Vehreschild

Hi Paul,

On Mon, 22 Jun 2015 16:04:09 +0200
Paul Richard Thomas  wrote:

> Hi Andre,
> 
> Some questions: The first and second chunks look a bit awkward in
> parse.c. Do they have to be there in order that primary.c does the
> right thing?

I tried at first to do this rank resolution in primary.c, but that was too
late. parse.c needs to propagate the rank correctly. When I remember correctly,
then doing so later prevents parse.c to correctly recognize the vector (from
the example in the initial description) as such, i.e., the indexing in
vector(2) was not allowed. gfortran assumed vector to be scalar. So, IMHO yes.

> Could the whole lot be transferred to resolve.c or would that make it
> horribly messy?

Again, IMO is it not easily transferable to primary.c or even resolve.c.
Therefore no.

> I couldn't apply the patch right now -
> does it work with variable expressions for the target array indices?

I am not quite sure, what you mean. Something like this:

  associate(pam => im(2:3, 2:3))
pam = 9
pam(1,2) = 10
do c = 1, 2
pam(2, c) = 0
end do
  end associate

? I have added that to the testcase and it works. Or do you want the variable
expressions in the target, like this:

  integer :: expect(20)= 23
  integer :: im(4,5) = 23
  integer :: c

  expect(2:3) = 9
  do c = 1, 5
im = 23
associate(pam => im(:, c))
  pam(2:3) = 9
end associate
if (any (reshape(im, [20]) /= expect)) call abort()
! Shift expect
expect = [expect(17:), expect(:16)]
  end do

Will this do, or did you have something more elaborate in mind? This is also
working and in the testcase now.

Thanks for the review so far.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de

[PATCH 1/2][ARM] Record FPU features as a bit-set

2015-06-22 Thread Matthew Wahab


Hello,

The ARM backend records FPU features as booleans, one for each feature. This
means that adding support for a new feature involves updating every entry in the
list of FPU descriptions in arm-fpus.def. This patch series changes the
representation of FPU features to use a simple bit-set and flags, as is done
elsewhere.

This patch adds the new FPU feature representation, with feature sets
represented as unsigned longs.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm.h (arm_fpu_fset): New.
(ARM_FPU_FSET_HAS): New.
(FPU_FL_NONE): New.
(FPU_FL_NEON): New.
(FPU_FL_FP16): New.
(FPU_FL_CRYPTO): New.
From 0ae697751afd9420ece15432e4892a60574b1d56 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 10 Jun 2015 09:57:55 +0100
Subject: [PATCH 1/2] Add fpu feature set definitions.

Change-Id: I9614d12b19f068ae2e0cebc1a6c3903972c73d6a
---
 gcc/config/arm/arm.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 373dc85..eadbcec 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -318,6 +318,19 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
   {"mode", "%{!marm:%{!mthumb:-m%(VALUE)}}"}, \
   {"tls", "%{!mtls-dialect=*:-mtls-dialect=%(VALUE)}"},
 
+/* FPU feature sets.  */
+
+typedef unsigned long arm_fpu_fset;
+
+/* Test for an FPU feature.  */
+#define ARM_FPU_FSET_HAS(S,F) (((S) & (F)) == F)
+
+/* FPU Features.  */
+#define FPU_FL_NONE	(0)
+#define FPU_FL_NEON	(1 << 0)	/* NEON instructions.  */
+#define FPU_FL_FP16	(1 << 1)	/* Half-precision.  */
+#define FPU_FL_CRYPTO	(1 << 2)	/* Crypto extensions.  */
+
 /* Which floating point model to use.  */
 enum arm_fp_model
 {
-- 
1.9.1

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Julian Brown

On Mon, 22 Jun 2015 16:24:56 +0200
Jakub Jelinek  wrote:

> On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
> > One problem is that (at least on the GPU hardware we've considered
> > so far) we're somewhat constrained in how much control we have over
> > how the underlying hardware executes code: it's possible to draw up
> > a scheme where OpenACC source-level control-flow semantics are
> > reflected directly in the PTX assembly output (e.g. to say "all
> > threads in a CTA/warp will be coherent after such-and-such a
> > loop"), and lowering OpenACC directives quite early seems to make
> > that relatively tractable. (Even if the resulting code is
> > relatively un-optimisable due to the abnormal edges inserted to
> > make sure that the CFG doesn't become "ill-formed".)
> > 
> > If arbitrary optimisations are done between OMP-lowering time and
> > somewhere around vectorisation (say), it's less clear if that
> > correspondence can be maintained. Say if the code executed by half
> > the threads in a warp becomes physically separated from the code
> > executed by the other half of the threads in a warp due to some loop
> > optimisation, we can no longer easily determine where that warp will
> > reconverge, and certain other operations (relying on coherent warps
> > -- e.g. CTA synchronisation) become impossible. A similar issue
> > exists for warps within a CTA.
> > 
> > So, essentially -- I don't know how "late" loop lowering would
> > interact with:
> > 
> > (a) Maintaining a CFG that will work with PTX.
> > 
> > (b) Predication for worker-single and/or vector-single modes
> > (actually all currently-proposed schemes have problems with proper
> > representation of data-dependencies for variables and
> > compiler-generated temporaries between predicated regions.)
> 
> I don't understand why lowering the way you suggest helps here at all.
> In the proposed scheme, you essentially have whole function
> in e.g. worker-single or vector-single mode, which you need to be
> able to handle properly in any case, because users can write such
> routines themselves.  And then you can have a loop in such a function
> that has some special attribute, a hint that it is desirable to
> vectorize it (for PTX the PTX way) or use vector-single mode for it
> in a worker-single function.  So, the special pass then of course
> needs to handle all the needed broadcasting and reduction required to
> change the mode from e.g. worker-single to vector-single, but the
> convergence points still would be either on the boundary of such
> loops to be vectorized or parallelized, or wherever else they appear
> in normal vector-single or worker-single functions (around the calls
> to certainly calls?).

I think most of my concerns are centred around loops (with the markings
you suggest) that might be split into parts: if that cannot happen for
loops that are annotated as you describe, maybe things will work out OK.

(Apologies for my ignorance here, this isn't a part of the compiler
that I know anything about.)

Julian

[PATCH 2/2][ARM] Use new FPU features representation

2015-06-22 Thread Matthew Wahab


Hello,

This patch series changes the representation of FPU features to use a simple
bit-set and flags, as is done elsewhere.

This patch uses the new representation of FPU feature sets.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-fpus.def: Replace neon, fp16 and crypto boolean
fields with feature flags.  Update comment.
* config/arm/arm.c (ARM_FPU): Update macro.
* config/arm/arm.h (TARGET_NEON_FP16): Update feature test.
(TARGET_FP16): Likewise.
(TARGET_CRYPTO): Likewise.
(TARGET_NEON): Likewise.
(struct arm_fpu_desc): Remove fields neon, fp16 and crypto.  Add
field features.

From 6f9cd1b41d7597d95bd80aa21344f8e6e011e168 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 10 Jun 2015 10:11:56 +0100
Subject: [PATCH 2/2] Use new FPU feature definitions.

Change-Id: I0c45e52b08b31433ec2b30fcb666584cabcb826b
---
 gcc/config/arm/arm-fpus.def | 40 
 gcc/config/arm/arm.c|  4 ++--
 gcc/config/arm/arm.h| 22 +-
 3 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 2dfefd6..efd5896 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,30 +19,30 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO)
+  ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false, false)
-ARM_FPU("vfpv3",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
-ARM_FPU("vfpv3-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true, false)
-ARM_FPU("vfpv3-d16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false, false)
-ARM_FPU("vfpv3-d16-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true, false)
-ARM_FPU("vfpv3xd",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false, false)
-ARM_FPU("vfpv3xd-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("neon",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false, false)
-ARM_FPU("neon-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true, true, false)
-ARM_FPU("vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true, false)
-ARM_FPU("vfpv4-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true, false)
-ARM_FPU("fpv4-sp-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("fpv5-sp-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("fpv5-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, false, true, false)
-ARM_FPU("neon-vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, true, true, false)
-ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, false, true, false)
-ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, false)
+ARM_FPU("vfp",		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU("vfpv3",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
+ARM_FPU("vfpv3-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("vfpv3-d16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU("vfpv3-d16-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("vfpv3xd",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_NONE)
+ARM_FPU("vfpv3xd-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("neon",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON)
+ARM_FPU("neon-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("vfpv4-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("fpv4-sp-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("fpv5-sp-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("fpv5-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("neon-vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
 ARM_FPU("crypto-neon-fp-armv8",
-			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, true)
+			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
 /* Compatibility aliases.  */
-ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
+ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e79a369..e104d2f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2231,8 +2231,8 @@ char arm_arch_name[] = "__ARM_ARCH_0UNK__";
 
 static const struct arm_fpu_desc all_fpus[] =
 {
-#define ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO) \
-  { NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO },
+#define ARM_FPU(NAME, MODE

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Bernd Schmidt


On 06/22/2015 04:24 PM, Jakub Jelinek wrote:

I don't understand why lowering the way you suggest helps here at all.
In the proposed scheme, you essentially have whole function
in e.g. worker-single or vector-single mode, which you need to be able to
handle properly in any case, because users can write such routines
themselves.  And then you can have a loop in such a function that
has some special attribute, a hint that it is desirable to vectorize it
(for PTX the PTX way) or use vector-single mode for it in a worker-single
function.


You can have a hint that it is desirable, but not a hint that it is 
correct (because passes in between may invalidate that). The OpenACC 
directives guarantee to the compiler that the program can be transformed 
into a parallel form. If we lose them early we must then rely on our 
analysis which may not be strong enough to prove that the loop can be 
parallelized. If we make these transformations early enough, while we 
still have the OpenACC directives, we can guarantee that we do exactly 
what the programmer specified.



Bernd

[ping] Couple of patches for -fdump-ada-spec

2015-06-22 Thread Eric Botcazou

Add query for template-dependent arguments to -fdump-ada-spec:
  http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00403.html

Get rid of assembly file with -fdump-ada-spec:
  http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00420.html

Thanks in advance.

-- 
Eric Botcazou

Re: [Ping, Patch, fortran, 64674, v3] [OOP] ICE in ASSOCIATE with class array

2015-06-22 Thread Paul Richard Thomas

Dear Andre,

It was indeed the associate(pam => im(:, c)) that I had in mind. If
you have that working and in the tescase, that's good enough for me.

Cheers

Paul

On 22 June 2015 at 17:15, Andre Vehreschild  wrote:
> Hi Paul,
>
> On Mon, 22 Jun 2015 16:04:09 +0200
> Paul Richard Thomas  wrote:
>
>> Hi Andre,
>>
>> Some questions: The first and second chunks look a bit awkward in
>> parse.c. Do they have to be there in order that primary.c does the
>> right thing?
>
> I tried at first to do this rank resolution in primary.c, but that was too
> late. parse.c needs to propagate the rank correctly. When I remember 
> correctly,
> then doing so later prevents parse.c to correctly recognize the vector (from
> the example in the initial description) as such, i.e., the indexing in
> vector(2) was not allowed. gfortran assumed vector to be scalar. So, IMHO yes.
>
>> Could the whole lot be transferred to resolve.c or would that make it
>> horribly messy?
>
> Again, IMO is it not easily transferable to primary.c or even resolve.c.
> Therefore no.
>
>> I couldn't apply the patch right now -
>> does it work with variable expressions for the target array indices?
>
> I am not quite sure, what you mean. Something like this:
>
>   associate(pam => im(2:3, 2:3))
> pam = 9
> pam(1,2) = 10
> do c = 1, 2
> pam(2, c) = 0
> end do
>   end associate
>
> ? I have added that to the testcase and it works. Or do you want the variable
> expressions in the target, like this:
>
>   integer :: expect(20)= 23
>   integer :: im(4,5) = 23
>   integer :: c
>
>   expect(2:3) = 9
>   do c = 1, 5
> im = 23
> associate(pam => im(:, c))
>   pam(2:3) = 9
> end associate
> if (any (reshape(im, [20]) /= expect)) call abort()
> ! Shift expect
> expect = [expect(17:), expect(:16)]
>   end do
>
> Will this do, or did you have something more elaborate in mind? This is also
> working and in the testcase now.
>
> Thanks for the review so far.
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Nick Clifton

Hi Guys,

  The test file gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c file
  was generating an unexpected failure for the RX.  When I investigated
  I found that a return address on the stack was being corrupted, and I
  tracked it down to the foo() function:

foo (int a[], int n)
{
  int i;
  for (i = 0; i < n; i++)
{
  if (i < 25)
a[i] = i;
  a[n - i] = 1;
}
}

  The problem is that when i is 0, the line a[n - i] writes to a[50]
  which is beyond the end of the a array.  (In the RX case it writes
  over the return address on the stack).

  The patch below fixes the problem, although it could also be solved by
  increasing the size of the a array when it is declared in main().

  OK to apply ?

Cheers
  Nick

gcc/testsuite/ChangeLog
2015-06-22  Nick Clifton  

* gcc.dg/graphite/isl-ast-gen-if-1.c (foo): Prevent writing after
the end of the array.

Index: gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c
===
--- gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(revision 224722)
+++ gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(working copy)
@@ -10,7 +10,8 @@
 {
   if (i < 25)
 a[i] = i;
-  a[n - i] = 1;
+  if (i > 0)
+   a[n - i] = 1;
 }
 }

Re: [i386, PATCH, 2/3] IA MCU psABI support: changes to libraries.

2015-06-22 Thread Kirill Yukhin

Hello,
Patch in the bottom adds support of IA MCU psABI to libgcc
(enables soft-fp) and libdecnumber (enables it for IA MCU).

Bootstrapped and regtested on top of [1/3] patch.

config/
* dfp.m4 (enable_decimal_float): Also set to yes for
 i?86*-*-elfiamcu target.
gcc/
* configure: Regenerated.
libdecnumber/
* configure: Regenerated.
libgcc/
* config.host: Support i[34567]86-*-elfiamcu target.
* config/i386/32/t-iamcu: New file.
* configure: Regenerated.

Is it OK for trunk?

--
Thanks, K

diff --git a/config/dfp.m4 b/config/dfp.m4
index 48683f0..5b29089 100644
--- a/config/dfp.m4
+++ b/config/dfp.m4
@@ -21,7 +21,7 @@ Valid choices are 'yes', 'bid', 'dpd', and 'no'.]) ;;
 [
   case $1 in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/gcc/configure b/gcc/configure
index b26a86f..64eeac6 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7317,7 +7317,7 @@ else

   case $target in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/libdecnumber/configure b/libdecnumber/configure
index 2720f46..964837d 100755
--- a/libdecnumber/configure
+++ b/libdecnumber/configure
@@ -4614,7 +4614,7 @@ else

   case $target in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/libgcc/config.host b/libgcc/config.host
index 4df..dd8e356 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -562,6 +562,9 @@ x86_64-*-darwin*)
tm_file="$tm_file i386/darwin-lib.h"
extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o 
crtfastmath.o"
;;
+i[34567]86-*-elfiamcu)
+   tmake_file="$tmake_file i386/t-crtstuff t-softfp-sfdf i386/32/t-softfp 
i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules"
+   ;;
 i[34567]86-*-elf*)
tmake_file="$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic"
;;
diff --git a/libgcc/config/i386/32/t-iamcu b/libgcc/config/i386/32/t-iamcu
new file mode 100644
index 000..0752bff
--- /dev/null
+++ b/libgcc/config/i386/32/t-iamcu
@@ -0,0 +1,6 @@
+softfp_float_modes += tf
+softfp_extensions += sftf dftf xftf
+softfp_truncations += tfsf tfdf tfxf
+softfp_exclude_libgcc2 := n
+
+HOST_LIBGCC2_CFLAGS += -mlong-double-80
diff --git a/libgcc/configure b/libgcc/configure
index ce66d1d..e22cbcb 100644
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -4436,7 +4436,7 @@ else

   case $host in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes

Re: RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Jeff Law


On 06/22/2015 09:38 AM, Nick Clifton wrote:

Hi Guys,

   The test file gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c file
   was generating an unexpected failure for the RX.  When I investigated
   I found that a return address on the stack was being corrupted, and I
   tracked it down to the foo() function:

 foo (int a[], int n)
 {
   int i;
   for (i = 0; i < n; i++)
 {
   if (i < 25)
 a[i] = i;
  a[n - i] = 1;
 }
 }

   The problem is that when i is 0, the line a[n - i] writes to a[50]
   which is beyond the end of the a array.  (In the RX case it writes
   over the return address on the stack).

   The patch below fixes the problem, although it could also be solved by
   increasing the size of the a array when it is declared in main().

   OK to apply ?

Cheers
   Nick

gcc/testsuite/ChangeLog
2015-06-22  Nick Clifton  

* gcc.dg/graphite/isl-ast-gen-if-1.c (foo): Prevent writing after
the end of the array.
I'd tend to prefer to change the size of the array -- adding another 
conditional in the loop may have unintended consequences that possibly 
scramble things just enough to compromise the test.


jeff

[PATCH 1/4][ARM] Make room for more CPU feature flags.

2015-06-22 Thread Matthew Wahab


Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. To be able to support new architecture features, the
current representation will need to be replaced so that more flags can be
recorded.

This series of patches replaces the single unsigned long with a representation
based on an array of unsigned longs. Constructors and operations are explicitly
defined for the new representation and the backend is updated to use the new
operations.

The individual patches:
- Make architecture flags explicit in arm-cores.def, to prepare for the changes.
- Add definitions for the new representation as type arm_feature_set and macros
  with prefix ARM_FSET.
- Replace uses of the old representation with the arm_feature_set type and
  operations.
- Rework arm-cores.def and arm-arches.def to make the feature set constructions
  explicit.

The series tested for arm-none-linux-gnueabihf with check-gcc.

This patch moves the derived FL_FOR_ARCH##ARCH flags from the expansion of macro
arm.c/ARM_CORE and makes them explicit in the entries in arm-cores.def.

This patch tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

2015-06-22  Matthew Wahab  

* gcc/config/arm/arm-cores.def: Add FL_FOR_ARCH flag for each
ARM_CORE entry.  Fix some white-space.
* gcc/config/arm/arm.c: Remove FL_FOR_ARCH derivation from
ARM_CORE definition.
From b8d4b4ef938d64996d0d20aaa9974757057aaad2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 5 Jun 2015 12:33:34 +0100
Subject: [PATCH 1/4] [ARM] Make ARCH flags explicit in arm-cores.def

Change-Id: I13a79c89bebaf82aa921f0502b721ff5d9b92dbe
---
 gcc/config/arm/arm-cores.def | 200 +--
 gcc/config/arm/arm.c |   2 +-
 2 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..f362c27 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -43,134 +43,134 @@
Some tools assume no whitespace up to the first "," in each entry.  */
 
 /* V2/V2A Architecture Processors */
-ARM_CORE("arm2", 	arm2, arm2,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm250", 	arm250, arm250,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm2",	arm2, arm2,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm250",	arm250, arm250,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
 
 /* V3 Architecture Processors */
-ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm700",	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm700i",	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710",	arm710, arm710,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm720",	arm720, arm720,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710c",	arm710c, arm710c,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7100",	arm7100, arm7100,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7500",	arm7500, arm7500,	3, FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm700",	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm700i",	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm710",	arm710, arm710,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm720",	arm720, arm720,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slow

Re: genmatch: guess the type of a?b:c as b instead of a

2015-06-22 Thread Jeff Law


On 06/06/2015 05:34 AM, Marc Glisse wrote:

Hello,

as discussed around
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00041.html
we are currently guessing the type of a?b:c incorrectly. This does not
affect current simplifications, because the only 'cond' in output
patterns are at the outermost level, so their type is forced to 'type'
and never guessed. Indeed, the patch does not change the generated
*-match.c. It would allow removing an explicit cond:itype in a patch
posted by Jeff.

I tested it on a dummy .pd file containing:
(simplify
  (plus @0 (plus @1 @2))
  (negate (cond @0 @1 @2)))

and the generated files differ by:

-  res = fold_build3_loc (loc, COND_EXPR, TREE_TYPE (ops1[0]), ops1[0],
ops1[1], ops1[2]);
+  res = fold_build3_loc (loc, COND_EXPR, TREE_TYPE (ops1[1]), ops1[0],
ops1[1], ops1[2]);

(and something similar for gimple)

I wondered about using something like
VOID_TYPE_P (TREE_TYPE (ops1[1])) ? TREE_TYPE (ops1[2]) : TREE_TYPE
(ops1[1])
but I don't think that will be necessary.

Bootstrap is currently broken on many platforms with comparison
failures, but since it went that far and generated the same *-match.c
files, that seems sufficient testing.

2015-06-08  Marc Glisse  

 * genmatch.c (expr::gen_transform): For conditions, guess the type
 from the second operand.
Thanks for taking care of this.  I'd gone back and verified the type was 
needed, but didn't have to time reduce a testcase for it prior to going 
on PTO.


Jeff

[PATCH 2/4][ARM] Add feature set definitions.

2015-06-22 Thread Matthew Wahab


Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch adds, but doesn't use, type arm_feature_set and macros prefixed
with ARM_FSET to represent and operate on feature sets.

Tested by building with no errors. Also tested as part of the series, for
arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-protos.h (FL_NONE): New.
(FL_ANY): New.
(arm_feature_set): New.
(ARM_FSET_MAKE): New.
(ARM_FSET_MAKE_CPU1): New.
(ARM_FSET_MAKE_CPU2): New.
(ARM_FSET_CPU1): New.
(ARM_FSET_CPU2): New.
(ARM_FSET_EMPTY): New.
(ARM_FSET_ANY): New.
(ARM_FSET_HAS_CPU1): New.
(ARM_FSET_HAS_CPU2): New.
(ARM_FSET_ADD_CPU1): New.
(ARM_FSET_ADD_CPU2): New.
(ARM_FSET_DEL_CPU1): New.
(ARM_FSET_DEL_CPU2): New.
(ARM_FSET_UNION): New.
(ARM_FSET_INTER): New.
(ARM_FSET_XOR): New.
(ARM_FSET_EXCLUDE): New.
(AFM_FSET_IS_EMPTY): New.
(ARM_FSET_CPU_SUBSET): New.

From 1a98a80b64427f7bb97212ae9ecff515e980ddb7 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 4 Jun 2015 15:35:25 +0100
Subject: [PATCH 2/4] Add feature set definitions.

Change-Id: I5f89b46ea57e35f477ec4751fea3cb6ee8fce251
---
 gcc/config/arm/arm-protos.h | 101 
 1 file changed, 101 insertions(+)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 62f91ef..a19d54d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -346,6 +346,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 /* Flags used to identify the presence of processor capabilities.  */
 
 /* Bit values used to identify processor capabilities.  */
+#define FL_NONE	  (0)	  /* No flags.  */
+#define FL_ANY	  (0x)/* All flags.  */
 #define FL_CO_PROC(1 << 0)/* Has external co-processor bus */
 #define FL_ARCH3M (1 << 1)/* Extended multiply */
 #define FL_MODE26 (1 << 2)/* 26-bit mode support */
@@ -412,6 +414,105 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
 
+/* There are too many feature bits to fit in a single word so the set of cpu and
+   fpu capabilities is a structure.  A feature set is created and manipulated
+   with the ARM_FSET macros.  */
+
+typedef struct
+{
+  unsigned long cpu[2];
+} arm_feature_set;
+
+
+/* Initialize a feature set.  */
+
+#define ARM_FSET_MAKE(CPU1,CPU2) { { (CPU1), (CPU2) } }
+
+#define ARM_FSET_MAKE_CPU1(CPU1) ARM_FSET_MAKE ((CPU1), (FL_NONE))
+#define ARM_FSET_MAKE_CPU2(CPU2) ARM_FSET_MAKE ((FL_NONE), (CPU2))
+
+/* Accessors.  */
+
+#define ARM_FSET_CPU1(S) ((S).cpu[0])
+#define ARM_FSET_CPU2(S) ((S).cpu[1])
+
+/* Useful combinations.  */
+
+#define ARM_FSET_EMPTY ARM_FSET_MAKE (FL_NONE, FL_NONE)
+#define ARM_FSET_ANY ARM_FSET_MAKE (FL_ANY, FL_ANY)
+
+/* Tests for a specific CPU feature.  */
+
+#define ARM_FSET_HAS_CPU1(A, F)  (((A).cpu[0] & (F)) == F)
+#define ARM_FSET_HAS_CPU2(A, F)  (((A).cpu[1] & (F)) == F)
+
+/* Add a feature to a feature set.  */
+
+#define ARM_FSET_ADD_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] |= (F);			\
+  } while (0)
+
+#define ARM_FSET_ADD_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] |= (F);			\
+  } while (0)
+
+/* Remove a feature from a feature set.  */
+
+#define ARM_FSET_DEL_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] &= ~(F);			\
+  } while (0)
+
+#define ARM_FSET_DEL_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] &= ~(F);			\
+  } while (0)
+
+/* Union of feature sets.  */
+
+#define ARM_FSET_UNION(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] | (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] | (F2).cpu[1];	\
+  } while (0)
+
+/* Intersection of feature sets.  */
+
+#define ARM_FSET_INTER(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & (F2).cpu[1];	\
+  } while (0)
+
+/* Exclusive disjunction.  */
+
+#define ARM_FSET_XOR(DST,F1,F2)\
+  do {			\
+(DST).cpu[0] = (F1).cpu[0] ^ (F2).cpu[0];		\
+(DST).cpu[1] = (F1).cpu[1] ^ (F2).cpu[1];		\
+  } while (0)
+
+/* Difference of feature sets: F1 excluding the elements of F2.  */
+
+#define ARM_FSET_EXCLUDE(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & ~(F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & ~(F2).cpu[1];	\
+  } while (0)
+
+/* Test for an empty feature set.  */
+
+#define ARM_FSET_IS_EMPTY(A)		\
+  (!((A).cpu[0]) && !((A).cpu[1]))
+
+/* Tests whether the cpu features of A are a subset of B.  */
+
+#define ARM_FSET_CPU_SUBSET(A,B)	\
+  A).cpu[0] & (B).cpu[0]) == (A).cpu[0])\
+   && (((A).cpu[1] &

Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-06-22 Thread Jiong Wang



On 04/05/15 17:37, Alexander Monakov wrote:

This patch introduces option -fno-plt that allows to expand calls that would
go via PLT to load the address of the function immediately at call site (which
introduces a GOT load).  Cover letter explains the motivation for this patch.

New option documentation for invoke.texi is missing from the patch; if this is
accepted I'll be happy to send a v2 with documentation added.

* calls.c (prepare_call_address): Transform PLT call to GOT lookup and
indirect call by forcing address into a pseudo with -fno-plt.
* common.opt (flag_plt): New option.


Have done a quick experiment, -fno-plt doesn't work on AArch64.

it's because although this patch force the function address into register,
but the combine pass runs later combine it back as AArch64 have defined such
insn pattern.

For X86, it's not combined back. From the rtl dump, it's because the rtl 
pre pass
has moved the address load instruction into another basic block and 
combine pass
don't combine across basic blocks. Also, x86 backend has done some check 
on flag_plt
in the new added ix86_nopic_noplt_attribute_p which could help generate 
correct insns.


What I can think of the fix on AArch64 is by restricting the call symbol 
under
"flag_plt == true" only, so that call via register can't be combined 
into call

symbol direct,

Or better to prohibit combine pass for such combining? as the generic 
fix on combine may

fix other broken targets.

Thoughts?

Regards,
Jiong

[PATCH 3/4][ARM] Use new feature set representation.

2015-06-22 Thread Matthew Wahab


Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch replaces the existing representation of CPU feature sets with the
type arm_feature_set and ARM_FSET macros added in an earlier patch in this
series.

Tested arm-none-linux-gnueabihf with check-gcc. Also tested as part of the
series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-builtins.c (def_mbuiltin): Use ARM_FSET macro.
* config/arm/arm-protos.h (insn_flags): Declare as type
arm_feature_set.
(tune_flags): Likewise.
* config/arm/arm.c (feature_count): New.
(insn_flags): Define as type arm_feature_set.
(tune_flags): Likewise.
(struct processors): Define field flags as type arm_feature_set.
(all_cores): Update for change to struct processors.
(all_architectures): Likewise.
(arm_option_check_internal): Use arm_feature_set and ARM_FSET macros.
(arm_option_override_internal): Likewise.
(arm_option_override): Likewise.

From 8b5e132868da066eb8a8673286b796656b9ed127 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 8 Jun 2015 14:11:13 +0100
Subject: [PATCH 3/4] Use feature sets.

Change-Id: I5a1b162102dd19b6376637218dc548502112cf4b
---
 gcc/config/arm/arm-builtins.c |   4 +-
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm.c  | 131 --
 3 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index f960e0a..31203d4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1074,10 +1074,10 @@ arm_init_neon_builtins (void)
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
-#define def_mbuiltin(MASK, NAME, TYPE, CODE)\
+#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\
   do	\
 {	\
-  if ((MASK) & insn_flags)		\
+  if (ARM_FSET_HAS_CPU1 (insn_flags, (FLAG)))			\
 	{\
 	  tree bdecl;			\
 	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index a19d54d..859b5d2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -515,11 +515,11 @@ typedef struct
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-extern unsigned long insn_flags;
+extern arm_feature_set insn_flags;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-extern unsigned long tune_flags;
+extern arm_feature_set tune_flags;
 
 /* Nonzero if this chip supports the ARM Architecture 3M extensions.  */
 extern int arm_arch3m;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b21f433..dd892a7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -105,6 +105,7 @@ static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			 HOST_WIDE_INT, rtx, rtx, int, int);
 static unsigned bit_count (unsigned long);
+static unsigned feature_count (const arm_feature_set*);
 static int arm_address_register_rtx_p (rtx, int);
 static int arm_legitimate_index_p (machine_mode, rtx, RTX_CODE, int);
 static bool is_called_in_ARM_mode (tree);
@@ -771,11 +772,11 @@ static int thumb_call_reg_needed;
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-unsigned long insn_flags = 0;
+arm_feature_set insn_flags = ARM_FSET_EMPTY;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-unsigned long tune_flags = 0;
+arm_feature_set tune_flags = ARM_FSET_EMPTY;
 
 /* The highest ARM architecture version supported by the
target.  */
@@ -928,7 +929,7 @@ struct processors
   enum processor_type core;
   const char *arch;
   enum base_architecture base_arch;
-  const unsigned long flags;
+  const arm_feature_set flags;
   const struct tune_params *const tune;
 };
 
@@ -2197,10 +2198,10 @@ static const struct processors all_cores[] =
   /* ARM Cores */
 #define ARM_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \
   {NAME, IDENT, #ARCH, BASE_ARCH_##ARCH,	  \
-FLAGS, &arm_##COSTS##_tune},
+   ARM_FSET_MAKE_CPU1 (FLAGS), &arm_##COSTS##_tune},
 #include "arm-cores.def"
 #undef ARM_CORE
-  {NULL, arm_none, NULL, BASE_ARCH_0, 0, NULL}
+  {NULL, arm_none, NULL, BASE_ARCH_0, ARM_FSET_EMPTY, NULL}
 };
 
 static const struct processors all_architectures[] =
@@ -2210,10 +2211,10 @@ static const struct processors all_architectures[] =
  from the core.  */
 
 #define ARM_ARCH(NAME, CORE, ARCH, FLAGS) \
-  {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, FLAGS, NULL},
+  {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, ARM_FSET_MAKE_CPU1 (FLAGS), NULL},
 #include "arm-arches.def"

[PATCH 4/4][ARM] Move initializer into arm-cores.def and arm-arches.def

2015-06-22 Thread Matthew Wahab


Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch updates the entries in the arm-core.def and arm-arches.def files
for the new arm_feature_set representation, moving the initializers from a macro
expansion and making them explicit in the file entries.

Tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-08-22  Matthew Wahab  

* config/arm/arm-arches.def: Replace single value flags with
initializer built from ARM_FSET_MAKE_CPU1.
* config/arm/arm-cores.def: Likewise.
* config/arm/arm.c: (all_cores): Remove ARM_FSET_MAKE_CPU1
derivation from the ARM_CORE macro definition, use the given value
instead.
(all_architectures): Remove ARM_FSET_MAKE_CPU1 derivation from the
ARM_ARCH macro definition, use the given value instead.

From 389cfb0e1046b1d84dd3d8920aa5bed50dc19164 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 8 Jun 2015 16:15:52 +0100
Subject: [PATCH 4/4] Move feature sets into core and arch def files.

Change-Id: Ica484c7d9f46413c196b26a630ff49413b10289b
---
 gcc/config/arm/arm-arches.def |  56 ++--
 gcc/config/arm/arm-cores.def  | 200 +-
 gcc/config/arm/arm.c  |   4 +-
 3 files changed, 130 insertions(+), 130 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..6d0374a 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -28,33 +28,33 @@
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_ARCH("armv2",   arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv2a",  arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv3",   arm6,   3,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3)
-ARM_ARCH("armv3m",  arm7m,  3M,  FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M)
-ARM_ARCH("armv4",   arm7tdmi,   4,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4)
+ARM_ARCH("armv2",   arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv2a",  arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv3",   arm6,   3,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3))
+ARM_ARCH("armv3m",  arm7m,  3M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M))
+ARM_ARCH("armv4",   arm7tdmi,   4,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4))
 /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH("armv4t",  arm7tdmi,   4T,  FL_CO_PROC | FL_FOR_ARCH4T)
-ARM_ARCH("armv5",   arm10tdmi,  5,   FL_CO_PROC | FL_FOR_ARCH5)
-ARM_ARCH("armv5t",  arm10tdmi,  5T,  FL_CO_PROC | FL_FOR_ARCH5T)
-ARM_ARCH("armv5e",  arm1026ejs, 5E,  FL_CO_PROC | FL_FOR_ARCH5E)
-ARM_ARCH("armv5te", arm1026ejs, 5TE, FL_CO_PROC | FL_FOR_ARCH5TE)
-ARM_ARCH("armv6",   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
-ARM_ARCH("armv6j",  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
-ARM_ARCH("armv6k",  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
-ARM_ARCH("armv6z",  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH("armv6zk", arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
-ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
-ARM_ARCH("armv6-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv6s-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv7",   cortexa8,	7,   FL_CO_PROC |	  FL_FOR_ARCH7)
-ARM_ARCH("armv7-a", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7A)
-ARM_ARCH("armv7ve", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7VE)
-ARM_ARCH("armv7-r", cortexr4,	7R,  FL_CO_PROC |	  FL_FOR_ARCH7R)
-ARM_ARCH("armv7-m", cortexm3,	7M,  FL_CO_PROC |	  FL_FOR_ARCH7M)
-ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |	  FL_FOR_ARCH7EM)
-ARM_ARCH("armv8-a", cortexa53,  8A,  FL_CO_PROC | FL_FOR_ARCH8A)
-ARM_ARCH("armv8-a+crc",cortexa53, 8A,FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A)
-ARM_ARCH("iwmmxt",  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
-ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
+ARM_ARCH("armv4t",  arm7tdmi,   4T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH4T))
+ARM_ARCH("armv5",   arm10tdmi,  5,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5))
+ARM_ARCH("armv5t",  arm10tdmi,  5T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5T))
+ARM_ARCH("armv5e",  arm1026ejs, 5E,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5E))
+ARM_ARCH("armv5te", arm1026ejs, 5TE,	ARM_FSET_MAKE_CPU

Re: RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Nicholas Clifton


Hi Jeff,


I'd tend to prefer to change the size of the array -- adding another
conditional in the loop may have unintended consequences that possibly
scramble things just enough to compromise the test.


Okey dokey, here is a revised version.  Is this one OK ?

Cheers
  Nick

gcc/ChangeLog

Index: 2015-06-22  Nick Clifton  

* gcc.dg/graphite/isl-ast-gen-if.c (main): Increase size of a
array to allow a[50] to be a valid location.

gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c
===
--- gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(revision 224722)
+++ gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(working copy)
@@ -28,7 +28,7 @@
 int
 main (void)
 {
-  int a[50];
+  int a[51]; /* NB This size allows foo's first iteration to write to 
a[50].  */

   foo (a, 50);
   int res = array_sum (a);
   if (res != 49)

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Nathan Sidwell


On 06/22/15 11:18, Bernd Schmidt wrote:


You can have a hint that it is desirable, but not a hint that it is correct
(because passes in between may invalidate that). The OpenACC directives
guarantee to the compiler that the program can be transformed into a parallel
form. If we lose them early we must then rely on our analysis which may not be
strong enough to prove that the loop can be parallelized. If we make these
transformations early enough, while we still have the OpenACC directives, we can
guarantee that we do exactly what the programmer specified.


How does this differ from openmp's needs to preserve parallelism on a parallel 
loop?  Is it more than the reconvergence issue?


nathan

--
Nathan Sidwell

Re: [C++/58583] ICE instantiating NSDMIs

2015-06-22 Thread Nathan Sidwell


On 06/22/15 03:37, Andreas Schwab wrote:

Nathan Sidwell  writes:


On 06/20/15 02:09, Andreas Schwab wrote:

This also fails on powerpc.


what  is the build compiler?


It is a bootstrapped build, so the build compiler should not matter.


ok, thanks.

I've just built me a powerpc-linux targeting compiler from an x86_64-linux host. 
 That is showing the expected diagnostic, so I'm still unable to reproduce the 
failure.


nsidwell@build6-lucid-cs:19>install/bin/powerpc-linux-gnu-g++ -std=c++11 -c 
nsdmi-template14.C
nsdmi-template14.C:6:20: error: constructor required before non-static data 
member for 'A<0>::i' has been parsed

   int i = (A<0>(), 0); // { dg-error "has been parsed" }
^
nsdmi-template14.C: In constructor 'constexpr A<0>::A()':
nsdmi-template14.C:4:22: error: constructor required before non-static data 
member for 'A<0>::i' has been parsed

 template struct A // { dg-error "has been parsed" }
  ^
nsdmi-template14.C: At global scope:
nsdmi-template14.C:6:20: note: synthesized method 'constexpr A<0>::A()' first 
required here

   int i = (A<0>(), 0); // { dg-error "has been parsed" }
^
nsdmi-template14.C:14:6: error: recursive instantiation of non-static data 
member initializer for 'B<1>::p'

 B<1> x; // { dg-error "recursive instantiation of non-static data" }
  ^
nathan

Re: [PATCH] top-level for libvtv: use normal (not raw_cxx) target exports

2015-06-22 Thread Michael Haubenwallner

On 06/19/2015 11:36 AM, Paolo Bonzini wrote:
> On 09/06/2015 16:22, Michael Haubenwallner wrote:
>> Hi build machinery maintainers,
>>
>> since we always build the C++ compiler now, I fail to see the need to still
>> use RAW_CXX_TARGET_EXPORTS for libvtv.
>>
>> The situation to expose the problem is:
>> * Use a multilib-enabled x86_64-linux box.
>> * Use a 64-bit (multilib-disabled) bootstrap compiler (binary image).
>> $ configure --enable-multilib --with-system-zlib
>> $ make bootstrap
>>
>> When it comes to build the 32-bit libvtv, it breaks because of using
>> "CC=/build/prev-gcc/xgcc -m32" "CXX=g++ -m32", while it should use
>> "CC=/build/prev-gcc/xgcc -m32" "CXX=/build/prev-gcc/xg++ -m32" instead.

Unfortunately, I've been unable to reproduce this problem for a while now,
and it turns out that I also used --enable-maintainer-mode. And it happens
only when some generated autotool-file is updated while it is in use - not
really sure which one though, probably some configure script.

But still:

>> However, I'm not sure about the general question behind:
>> Should it work to bootstrap the multilib-compiler using a non-multilib one?
>>
>> This also needs above configure flags to work around two more but minor 
>> issues,
>> which I'm unsure about whether I can/should fix at all:
>> * --enable-multilib: Without this, the "user friendly check" is breaking,
>> since https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=205975
> 
> Why is it breaking?

The OS I'm running on is a multilib-enabled x86_64 (Gentoo) Linux,
and "32-bit development libraries (libc and headers)" are available.

The compiler I want to bootstrap the multilib compiler with is an
x86_64-only binary image (c,c++,ada), built _without_ multilib.

Now this "user friendly check" tries to create a 32-bit executable using
that 64-bit-only bootstrap gcc, which lacks 32-bit libgcc* - while it
can create 32-bit object files though.

>> * --with-system-zlib: Without this, --enable-multilib tries to build a
>> 32-bit zlib with "CC=/build/32/./prev-gcc/xgcc"
> 
> Ouch, that's a separate bug...  Arguably --with-system-zlib should be
> the default these days (and should have been for 10 years or so).

This one I'll leave untouched.

> The patch is ok.

Even if the problem raises only because maintainer-mode isn't multilib-save?

Thanks!
/haubi/

Re: Fix more of C/fortran canonical type issues

2015-06-22 Thread Jan Hubicka

> > Hi,
> > I would like to ping this.  There are still few things to fix to make our
> > merging compliant at least for C/C++/Fortran rules (the array bounds for
> > Fortran and union ordering for C I believe) and I would like to progress
> > on this.
> 
> I don't like the changes to useless_type_conversion_p much.  Why
> do you preserve qualifiers for the integer kind compares?
> 
> All the testcases have the integral types in aggregates as members.
> I already said that I'm happy globbing them together in aggregates.

I originally made the testcase this way because I wanted to test the way
aggregates are built and because it needs less of fortran code to realize it.
It is possible to consutrct same testcase with scalar variables.  See the other
patch fixing the surprious warning.  You need also variant "size_t a" to be
compatible with fortran equivalent of "signed size_t a", so it is not only
about variables.

> 
> I'm still not convinced that we need a 1:1 correspondence between
> canonical types and alias sets.  In particular canonical types are
> used for type compatibility in lhs = rhs assignments
> (useless_type_conversion_p) which is a transitive relation.
> Mixing both too much will cause serious confusion.  We have
> alias-sets for a reason.

OK, I am not sure if canonical types needs to actually mean the type
compatibility in the middle-end sense.  It is a language specific thing:

   The "canonical" type for this type node, which is used by frontends to
   compare the type for equality with another type.  If two types are
   equal (based on the semantics of the language), then they will have
   equivalent TYPE_CANONICAL entries.

In a way TYPE_CANONICAL seems bit schizofrenic about if it means language level
compatibility, representation compatibility or middle end semantic
compatibility.  It seems bit odd to define something like
useless_type_conversion_p by language specific manner despite the fact its
definition is now sound in language independent way as we now have all semantics
represented in IL (and flags, well)

but I would be happy to update the patch to assign different canonical types to
signed/unsigned integers and avoid recursion on those for aggregates/arrays and
all other derived types. (After all I plan to do that for pointers
incrementally)

Here we are "lucky" that alias.c already contains the globbing for
signed/unsigned.  Do we want to do the same scheme in other cases? For example
next on my list is the fact that array with bounds 3...5 is interoperable with
array[3] in C.  Here again we can not consider these useless_type_conversion_p
because index operation is different, but they are representation compatible.
This will need a special case in get_alias_set. I would not like to make
get_alias_set, or (with less loss of code quality on non-C languages) in lto's
get_alias_set langhook.

Honza
> 
> Richard.

[Aarch64] Expand +rdma documentation, small changes to march and mcpu text.

2015-06-22 Thread Matthew Wahab


Hello,

The documentation for the ARMv8.1 +rdma option doesn't mention that enabling it
also implies enabling Adv.SIMD. This patch fixes that.

The documentation for the -march and -mcpu options are also a little messy, this
patch tries to make the text clearer and adds a (texinfo) link to the subsection
documenting the feature modifiers.

Tested by checking the html output.

Ok for trunk?
Matthew

2015-06-22  Matthew Wahab  

* doc/invoke.texi (Aarch64 Options, -march): Split out arch and
feature description, split out the native option, add a link to
the feature documentation, rearrange and slightly rewrite text.
(Aarch64 options, -mcpu): Likewise.
(Aarch64 options, Feature Modifiers): Add an anchor.  Mention +rdma
implies Adv. SIMD.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b99ab1c..599dbf0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12426,24 +12426,26 @@ corresponding flag to the linker.
 @opindex march
 Specify the name of the target architecture, optionally suffixed by one or
 more feature modifiers.  This option has the form
-@option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
-permissible values for @var{arch} are @samp{armv8-a} or @samp{armv8.1-a}.
-The permissible values for @var{feature} are documented in the sub-section
-below.  Additionally on native AArch64 GNU/Linux systems the value
+@option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
+
+The permissible values for @var{arch} are @samp{armv8-a} or
+@samp{armv8.1-a}.
+
+For the permissible values for @var{feature}, see the sub-section on
+@ref{aarch64-feature-modifiers,,@option{-march} and @option{-mcpu}
+Feature Modifiers}.  Where conflicting feature modifiers are
+specified, the right-most feature is used.
+
+Additionally on native AArch64 GNU/Linux systems the value
 @samp{native} is available.  This option causes the compiler to pick the
 architecture of the host system.  If the compiler is unable to recognize the
 architecture of the host system this option has no effect.
 
-Where conflicting feature modifiers are specified, the right-most feature is
-used.
-
-GCC uses this name to determine what kind of instructions it can emit when
-generating assembly code.
-
-Where @option{-march} is specified without either of @option{-mtune}
-or @option{-mcpu} also being specified, the code is tuned to perform
-well across a range of target processors implementing the target
-architecture.
+GCC uses @var{name} to determine what kind of instructions it can emit
+when generating assembly code.  If @option{-march} is specified
+without either of @option{-mtune} or @option{-mcpu} also being
+specified, the code is tuned to perform well across a range of target
+processors implementing the target architecture.
 
 @item -mtune=@var{name}
 @opindex mtune
@@ -12456,12 +12458,11 @@ Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible values for this
 option are: @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}.
 
-Additionally on native AArch64 GNU/Linux systems the value @samp{native}
-is available.
-This option causes the compiler to pick the architecture of and tune the
-performance of the code for the processor of the host system.
-If the compiler is unable to recognize the processor of the host system
-this option has no effect.
+Additionally on native AArch64 GNU/Linux systems the value
+@samp{native} is available.  This option causes the compiler to pick
+the architecture of and tune the performance of the code for the
+processor of the host system.  If the compiler is unable to recognize
+the processor of the host system this option has no effect.
 
 Where none of @option{-mtune=}, @option{-mcpu=} or @option{-march=}
 are specified, the code is tuned to perform well across a range
@@ -12471,23 +12472,23 @@ This option cannot be suffixed by feature modifiers.
 
 @item -mcpu=@var{name}
 @opindex mcpu
-Specify the name of the target processor, optionally suffixed by one or more
-feature modifiers.  This option has the form
-@option{-mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
-permissible values for @var{cpu} are the same as those available for
-@option{-mtune}.  Additionally on native AArch64 GNU/Linux systems the
-value @samp{native} is available.
-This option causes the compiler to tune the performance of the code for the
-processor of the host system.  If the compiler is unable to recognize the
-processor of the host system this option has no effect.
-
-The permissible values for @var{feature} are documented in the sub-section
-below.
-
-Where conflicting feature modifiers are specified, the right-most feature is
-used.
+Specify the name of the target processor, optionally suffixed by one
+or more feature modifiers.  This option has the form
+@option{-mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where
+the per

Re: RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Jeff Law


On 06/22/2015 10:07 AM, Nicholas Clifton wrote:

Hi Jeff,


I'd tend to prefer to change the size of the array -- adding another
conditional in the loop may have unintended consequences that possibly
scramble things just enough to compromise the test.


Okey dokey, here is a revised version.  Is this one OK ?

Cheers
   Nick

gcc/ChangeLog

Index: 2015-06-22  Nick Clifton  

 * gcc.dg/graphite/isl-ast-gen-if.c (main): Increase size of a
 array to allow a[50] to be a valid location.

OK.
jeff

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Jakub Jelinek

On Mon, Jun 22, 2015 at 12:08:36PM -0400, Nathan Sidwell wrote:
> On 06/22/15 11:18, Bernd Schmidt wrote:
> 
> >You can have a hint that it is desirable, but not a hint that it is correct
> >(because passes in between may invalidate that). The OpenACC directives
> >guarantee to the compiler that the program can be transformed into a parallel
> >form. If we lose them early we must then rely on our analysis which may not 
> >be
> >strong enough to prove that the loop can be parallelized. If we make these
> >transformations early enough, while we still have the OpenACC directives, we 
> >can
> >guarantee that we do exactly what the programmer specified.
> 
> How does this differ from openmp's needs to preserve parallelism on a
> parallel loop?  Is it more than the reconvergence issue?

OpenMP has significantly different execution model, a parallel block in
OpenMP is run by certain number of threads (the initial thread (the one
encountering that region) and then dpeending on clauses and library
decisions perhaps others), with a barrier at the end of the region, and
afterwards only the initial thread continues again.
So, an OpenMP parallel is implemented as a library call, taking outlined
function from the parallel's body as one of its arguments and the body
is executed by the initial thread and perhaps others.
OpenMP worksharing loop is just coordination between the threads in the
team, which thread takes which subset of the loop's iterations, and
optionally followed by a barrier.  OpenMP simd loop is a loop that has
certain properties guaranteed by the user and can be vectorized.
In contrast to this, OpenACC spawns all the threads/CTAs upfront, and then
idles on some of them until there is work for them.

Jakub

Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-22 Thread Alan Lawrence


Abe Skolnik wrote:

Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

  * loads were always executed, even when they should have not been.
Some source code could be rendered invalid due to null pointers
that were OK in the original program because they were never
dereferenced.

  * writes were if-converted via load/maybe-modify/store, which
renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

  * loads are done through an indirection, reading either the correct
data from the correct source [if the condition is true] or reading
from the scratchpad and later ignoring this read result [if the
condition is false].

  * writes are also done through an indirection, writing either to the
correct destination [if the condition is true] or to the
scratchpad [if the condition is false].

Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?

Regards,

Abe



Thanks for getting back to this!

My main thought concerns the direction we are travelling here. A major reason 
why we do if-conversion is to enable vectorization. Is this is targetted at 
gathering/scattering loads? Following vectorization, different elements of the 
vector being loaded/stored may have to go to/from the scratchpad or to/from main 
memory.


Or, are we aiming at the case where the predicate or address are invariant? That 
seems unlikely - loop unswitching would be better for the predicate; loading 
from an address, we'd just peel and hoist; storing, this'd result in the address 
holding the last value written, at exit from the loop, a curious idiom. Where 
the predicate/address is invariant across the vector? (!)


Or, at we aiming at non-vectorized code?

Beyond that question...

Does the description for -ftree-loop-if-convert-stores in doc/invoke.texi 
describe what the flag now does? (It doesn't mention loads; the code doesn't 
look like we use scratchpads at all without -ftree-loop-if-convert-stores, or am 
I missing something?)


In tree-if-conv.c:
@@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,

   if (flag_tree_loop_if_convert_stores)
 {
-  if (ifcvt_could_trap_p (stmt, refs))
+  if (ifcvt_could_trap_p (stmt))
{
  if (ifcvt_can_use_mask_load_store (stmt))
{

and

+
+  if (has_non_addressable_refs (stmt))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "has non-addressable memory references\n");
+ return false;
+   }
+

if it doesn't trap, but has_non_addressable_refs, can't we use 
ifcvt_can_use_mask_load_store there too?


And/or, I think I may be misunderstanding here, but if an access could trap, but 
is addressable, can't we use the scratchpad technique to get round the trapping 
problem?


(Look at it another way - this patch makes strictly more things return true from 
ifcvt_could_trap_p, which always exits immediately from 
if_convertible_gimple_assign_stmt_p...?)



Re. creation of scratchpads:
   (1) Should the '64' byte size be the result of scanning the function, for 
the largest data size to which we store? (ideally, conditionally store!)
   (2) Allocating only once per function: if we had one scratchpad per loop, it 
could/would live inside the test of "gimple_build_call_internal 
(IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more loops in the 
function, but then fail to vectorize them, we'll leave the scratchpad around for 
later phases to clean up. Is that OK?



Also some style nits:

@@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop,
   /* Check the if-convertibility of statements in predicated BBs.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
- if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+ if (!if_convertible_stmt_p (gsi_stmt (itr),
  any_mask_load_store))
return false;
 }

bet that fits on one line now.

+ * Returns a memory reference to the pointer defined by the
+conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+  gimple_stmt_iterator *gsi, bool swap)

in comment, should A[i] just be AI, as I see nothing in 
create_indirect_cond_expr that requires ai to be an array dereference?


@@ -2063,12 +1998,14 @@ mask_exists (int size, vec vec)
| end_bb_1
|
| bb_2
+   |   cond

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Nathan Sidwell


On 06/22/15 12:20, Jakub Jelinek wrote:


OpenMP worksharing loop is just coordination between the threads in the
team, which thread takes which subset of the loop's iterations, and
optionally followed by a barrier.  OpenMP simd loop is a loop that has
certain properties guaranteed by the user and can be vectorized.
In contrast to this, OpenACC spawns all the threads/CTAs upfront, and then
idles on some of them until there is work for them.


correct.  I expressed my question poorly.  What I mean is that in openmp, a loop 
that is parallelizeable (by user decree, I guess[*]), should not be transformed 
such that it is not parallelizeable.


This seems to me to be a common requirement of both languages.  How one gets 
parallel threads of execution to the body of the loop is a different question.


nathan

[*] For ones where the compiler needs to detect parallizeablilty, it's 
preferable that it doesn't do something earlier to force serializeablility.


--
Nathan Sidwell

Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation

2015-06-22 Thread Jeff Law


On 06/18/2015 10:39 AM, David Malcolm wrote:

On Thu, 2015-06-18 at 11:41 -0400, Patrick Palka wrote:

On Tue, Jun 9, 2015 at 1:31 PM, Patrick Palka  wrote:

This patch refactors the entry point of -Wmisleading-indentation from:

   void
   warn_for_misleading_indentation (location_t guard_loc,
location_t body_loc,
location_t next_stmt_loc,
enum cpp_ttype next_tok_type,
const char *guard_kind);

to

   struct token_indent_info
   {
 location_t location;
 cpp_ttype type;
 rid keyword;
   };

   void
   warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
const token_indent_info &body_tinfo,
const token_indent_info &next_tinfo);

The purpose of this refactoring is to expose more information to the
-Wmisleading-indentation implementation to allow for more advanced
heuristics and for better coverage.

(I decided to keep the usage of const references because nobody
seems to mind.  Also I added a new header file, c-indentation.h.)

gcc/c-family/ChangeLog:

 * c-indentation.h (struct token_indent_info): Define.
 (get_token_indent_info): Define.
 (warn_for_misleading_information): Declare.
 * c-common.h (warn_for_misleading_information): Remove.
 * c-identation.c (warn_for_misleading_indentation):
 Change declaration to take three token_indent_infos.  Adjust
 accordingly.
 * c-identation.c (should_warn_for_misleading_indentation):
 Likewise.  Bail out early if the body is a compound statement.
 (guard_tinfo_to_string): Define.

gcc/c/ChangeLog:

 * c-parser.c (c_parser_if_body): Take token_indent_info
 argument. Call warn_for_misleading_indentation even when the
 body is a semicolon.  Extract token_indent_infos corresponding
 to the guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.
 (c_parser_else_body): Likewise.
 (c_parser_if_statement): Likewise.
 (c_parser_while_statement): Likewise.
 (c_parser_for_statement): Likewise.

gcc/cp/ChangeLog:

 * parser.c (cp_parser_selection_statement): Move handling of
 semicolon body to ...
 (cp_parser_implicitly_scoped_statement): .. here.  Call
 warn_for_misleading_indentation even when the body is a
 semicolon.  Extract token_indent_infos corresponding to the
 guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.  Take
 token_indent_info argument.
 (cp_parser_already_scoped_statement): Likewise.
 (cp_parser_selection_statement, cp_parser_iteration_statement):
 Extract a token_indent_info corresponding to the guard token.


Pinging this series.


FWIW, they look reasonable to me; I'm not a reviewer.
But as the implementer of the warning, your comments/thoughts are 
definitely helpful in the review process.


We've never worked too hard to find a way to formalize this into a set 
of policies and procedures, which is probably a mistake.


jeff

Re: [PATCH 3/3] Improve -Wmissing-indentation heuristics

2015-06-22 Thread Jeff Law


On 06/18/2015 10:57 AM, Patrick Palka wrote:
[ big snip ]



These bogus warnings are pre-existing, however (i.e. not caused by this
patch).


(nods)   Fixing the false positives from libpng/bdwgc sounds like a
separate issue and thus a separate patch then.

Agreed.

jeff

Re: [C++ Patch] Use declspecs->locations[ds_virtual]

2015-06-22 Thread Jason Merrill


On 06/22/2015 09:54 AM, Paolo Carlini wrote:

I think this also qualifies as obvious given the past work / discussion:
use in one more place declspecs->locations to improve the location of
the error message.


Agreed, thanks.

Jason

Re: Re: [PATCH] [PATCH][ARM] Fix split-live-ranges-for-shrink-wrap.c testcase.

2015-06-22 Thread Alex Velenko


On 20/05/15 21:14, Joseph Myers wrote:

Again, the condition you propose to add doesn't make sense.  arm_arch_X_ok
is only appropriate for tests using an explicit -march=X.  Testing with
-march=armv7* should automatically skip this test anyway because it would
cause arm_thumb1_ok to fail.



Hi,

I adjusted the patch to skip execution split-live-ranges-for-shrink-wrap.c
with explicitly specified -march=armv4t and provide -march=armv5t flag =
for
arm_arch_v5t_ok targets.

Is patch ok?

Alex

gcc/testsuite

2015-06-22  Alex Velenko  

 * gcc.target/arm/split-live-ranges-for-shrink-wrap.c (dg-skip-if):
Skip -march=armv4t.
(dg-additional-options): Set armv5t flag.

diff --git
a/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
b/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
index e36000b..3cb93dc 100644
--- a/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
+++ b/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
@@ -1,6 +1,8 @@
  /* { dg-do assemble } */
  /* { dg-options "-mthumb -Os -fdump-rtl-ira " }  */
  /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-skip-if "do not test on armv4t" { *-*-* } { "-march=armv4t" } } =
*/
+/* { dg-additional-options "-march=armv5t" {target arm_arch_v5t_ok} } */

  int foo (char *, char *, int);
  int test (int d, char * out, char *in, int len)

--1.8.1.2--

Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries


On 22/06/15 13:47, Richard Biener wrote:

On Mon, Jun 22, 2015 at 1:33 PM, Tom de Vries  wrote:

On 22/06/15 12:14, Richard Biener wrote:


On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries 
wrote:


Hi,

during development of a patch I ran into a case where
compute_dominance_frontiers was called with incorrect dominance info.

The result was a segmentation violation somewhere in the bitmap code
while
executing this bitmap_set_bit in compute_dominance_frontiers_1:
...
if (!bitmap_set_bit (&frontiers[runner->index],
 b->index))
  break;
...

The segmentation violation happens because runner->index is 0, and
frontiers[0] is uninitialized.

[ The initialization in update_ssa looks like this:
...
   dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
FOR_EACH_BB_FN (bb, cfun)
  bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
compute_dominance_frontiers (dfs);
...

FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
(frontiers[0] in compute_dominance_frontiers_1) is not initialized.

We could add initialization by making the entry/exit-block bitmap_heads
empty and setting the obstack to a reserved obstack bitmap_no_obstack for
which allocation results in an assert. ]

AFAIU, the immediate problem is not that frontiers[0] is uninitialized,
but
that the loop reaches the state of runner->index == 0, due to the
incorrect
dominance info.

The patch adds an assert to the loop in compute_dominance_frontiers_1, to
make the failure mode cleaner and easier to understand.

I think we wouldn't catch all errors in dominance info with this assert.
So
the patch also contains an ENABLE_CHECKING-enabled verify_dominators call
at
the start of compute_dominance_frontiers. I'm not sure if:
- adding the verify_dominators call is too costly in runtime.
- the verify_dominators call should be inside or outside the
TV_DOM_FRONTIERS measurement.
- there is a level of ENABLE_CHECKING that is more appropriate for the
verify_dominators call.

Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?



I don't think these kind of asserts are good.  A segfault is good by
itself
(so you can just add the comment if you like).



The segfault is not guaranteed to trigger, because it works on uninitialized
data. Instead, we may end up modifying valid memory and silently generating
wrong code or causing sigsegvs (which will be difficult to track back this
error). So I don't think doing nothing is an option here. If we're not going
to add this assert, we should initialize the uninitialized data in such a
way that we are guaranteed to detect the error. The scheme I proposed above
would take care of that. Should I implement that instead?


No, instead the check below should catch the error much earlier.


Likewise the verify_dominators call is too expensive and misplaced.

If then the call belongs in the dom_computed[] == DOM_OK early-out
in calculate_dominance_info



OK, like this:
...
diff --git a/gcc/dominance.c b/gcc/dominance.c
index a9e042e..1827eda9 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;

if (dom_computed[dir_index] == DOM_OK)
-return;
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+  return;
+}

timevar_push (TV_DOMINANCE);
if (!dom_info_available_p (dir))
...


Yes.


I didn't fully understand your comment, do you want me to test this?


Sure, it should catch the error.



Bootstrapped and reg-tested on x86_64. Committed as attached.

Thanks,
- Tom
Verify dominators in early-out calculate_dominance_info

2015-06-22  Tom de Vries  

	* dominance.c (calculate_dominance_info): Verify dominators if
	early-out.
---
 gcc/dominance.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/dominance.c b/gcc/dominance.c
index a9e042e..9c66ca2 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
   bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;
 
   if (dom_computed[dir_index] == DOM_OK)
-return;
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+  return;
+}
 
   timevar_push (TV_DOMINANCE);
   if (!dom_info_available_p (dir))
-- 
1.9.1

Re: [C++ Patch] Remove pointless code in grokdeclarator

2015-06-22 Thread Jason Merrill

I think we should keep a comment to clarify why we don't care about 
type_quals here.


Jason

Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1

2015-06-22 Thread Jason Merrill


On 06/19/2015 08:23 PM, Mikhail Maltsev wrote:

I see that version 5.2 is set as target milestone for this bug. Should I
backport the patch?


Please.

Jason

Re: Re: [PATCH] [PATCH][ARM] Fix split-live-ranges-for-shrink-wrap.c testcase.

2015-06-22 Thread Joseph Myers

I have no more comments on this patch.

-- 
Joseph S. Myers
jos...@codesourcery.com

[gomp4] Remove some ptxness from middle end

2015-06-22 Thread Nathan Sidwell

I've committed this patch to the gomp4 branch, after testing.  It does a number 
of cleanups


1) removes the ptx-specific TID, NTID, CTAID & NCTAID builtins, replacing them 
with openacc-specific GOACC_id and GOACC_nid builtins, using gang/worker & 
vector level enumeration.  These are mapped by the PTX backend to PTX-specifc 
instructions.


2) Created a  oacc_loop_levels enumeration, and generate the loop nest masks 
from that.


3) Removed a bunch of duplicate calculations in omp-low related to determining 
number of threads and thread index. With #2 it becomes easier to use a loop.


nathan
--
Nathan Sidwell
2015-06-20  Nathan Sidwell  

	gcc/
	* omp-builtins.def (BUILT_IN_GOACC_NTID, BUILTIN_NCTAID): Replace
	with ...
	(BUILT_IN_GOACC_NID): ... this.
	(BUILT_IN_GOACC_TID, BUILTIN_CTAID): Replace with ...
	(BUILT_IN_GOACC_ID): ... this.
	* builtins.c: Include omp-low.h.
	(expand_oacc_buoltin): Replace with ...
	(expand_oacc_id): ... this.
	(expand_builtin, is_simple_builtin): Adjust.oo
	* omp-low.h (enum oacc_loop_levels): New.
	* omp-low.c (MASK_GANG, MASK_WORKER, MASK_VECTOR): Replace with ...
	(OACC_LOOP_MASK): ... this.
	(scan_omp_for, scan_omp_target): Adjust.
	(expand_oacc_get_num_threads): Adjust and use a loop.
	(expand_oacc_get_thread_num): Likewise.
	(oacc_loop_needs_thread_barrier_p, find_omp_for_region_gwv,
	find_omp_taarget_region_data, required_predication_mask,
	generate_vector_broadcast, generate_oacc_broadcast): Adjust.
	(make_predication_test): Adjust and use a loop.
	(predicate_bb, oacc_broadcast, oacc_init_count_vars): Adjust.
	* config/nvptx/nvptx.md (UNSPEC_NTID, UNSPEC_TID, UNSPEC_NCTAID,
	UNSPEC_CTAID): Replace with ...
	(UNSPEC_NID, UNSPEC_ID): ... these.
	(*oacc_ntid_insn, oacc_ntid, *oacc_tid_insn, oacc_tid,
	*oacc_nctaid_insn, oacc_nctaid, *oacc_ctaid_insn,
	oacc_ctaid): Replace with ...
	(oacc_nid, oacc_id): ... these.
	* config/nvptx/nvptx.c (nvptx_print_operand [CASE 'd']): Remove.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Replace
	GOACC_ctaid builtin with GOACC_id.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c	(revision 224671)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c	(working copy)
@@ -35,38 +35,38 @@ main ()
 
 #pragma acc parallel loop gang (static:*) num_gangs (10)
   for (i = 0; i < 100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_nonstatic (a, 10);
 
 #pragma acc parallel loop gang (static:1) num_gangs (10)
   for (i = 0; i < 100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 1);
 
 #pragma acc parallel loop gang (static:2) num_gangs (10)
   for (i = 0; i < 100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 2);
 
 #pragma acc parallel loop gang (static:5) num_gangs (10)
   for (i = 0; i < 100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 5);
 
 #pragma acc parallel loop gang (static:20) num_gangs (10)
   for (i = 0; i < 100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 20);
 
   /* Non-static gang.  */
 #pragma acc parallel loop gang num_gangs (10)
   for (i = 0; i < 100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_nonstatic (a, 10);
 
Index: gcc/omp-builtins.def
===
--- gcc/omp-builtins.def	(revision 224671)
+++ gcc/omp-builtins.def	(working copy)
@@ -61,13 +61,9 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NTID, "GOACC_ntid",
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ID, "GOACC_id",
 		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_TID, "GOACC_tid",
-		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NCTAID, "GOACC_nctaid",
-		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_CTAID, "GOACC_ctaid",
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NID, "GOACC_nid",
 		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, "GOACC_get_ganglocal_ptr",
 		   BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
Index: gcc/config/nvptx/nvptx.md
===
--- gcc/config/nvptx/nvptx.md	(revision 224671)
+++ gcc/config/nvptx/nvptx.md	(working copy)
@@ -49,10 +49,8 @@
 
UNSPEC_ALLOCA
 
-   UNSPEC_NTID
-   UNSPEC_TID
-   UNSPEC_NCTAID
-   UNSPEC_CTAID
+   UNSPEC_NID
+   UNSPEC_ID
 
UNSPEC_SHARED_DATA
 ])
@@ -1263,65 +1261,32 @@
   DONE;
 })
 
-(define_insn "*oacc_ntid

[AArch64][TLSGD Desc][3/3] Implement TLS Global Dynamic Descriptor for tiny model

2015-06-22 Thread Jiong Wang


As we have generalized GD Descriptor support for all memory model in
the first patch.  Support for tiny model is quite straightforward. We
just need to output different instruction sequences according on memory
model.

OK for trunk?

2015-06-22  Jiong Wang  

gcc/
  * config/aarch64/aarch64.md (tlsdesc_): Support tiny model
  constraint.
  
gcc/testsuite/
  * gcc.target/aarch64/tlsdesc_small.c: New.
  * gcc.target/aarch64/tlsdesc_tiny.c: Ditto.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 827ae8e..1b4e387 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4394,9 +4394,19 @@
(clobber (reg:CC CC_REGNUM))
(clobber (match_scratch:DI 1 "=r"))]
   "TARGET_TLS_DESC"
-  "adrp\\tx0, %A0\;ldr\\t%1, [x0, #%L0]\;add\\t0, 0, %L0\;.tlsdesccall\\t%0\;blr\\t%1"
+  {
+if (aarch64_cmodel_var == AARCH64_CMODEL_TINY)
+  return "ldr\t%1, #%A0;adr\t0, %A0;.tlsdesccall\t%0;blr\t%1";
+else if (aarch64_cmodel_var == AARCH64_CMODEL_SMALL)
+  return "adrp\tx0, %A0;ldr\t%1, [x0, #%L0];add\t0, 0, %L0;.tlsdesccall\t%0;blr\t%1";
+else
+  /* TBD: Large model to be supported.  */
+  gcc_unreachable ();
+  }
   [(set_attr "type" "call")
-   (set_attr "length" "16")])
+   (set (attr "length")
+	(if_then_else (match_test "aarch64_cmodel_var == AARCH64_CMODEL_TINY")
+	(const_int 12) (const_int 16)))])
 
 (define_insn "stack_tie"
   [(set (mem:BLK (scratch))
diff --git a/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c b/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c
new file mode 100644
index 000..f1429b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options "-O2 -ftls-model=global-dynamic -fPIC --save-temps" } */
+
+#include "tls.c"
+
+/* { dg-final { scan-assembler-times "adrp\tx0, :tlsdesc:" 2 } } */
+/* { dg-final { scan-assembler-times "tlsdesccall" 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c b/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c
new file mode 100644
index 000..a107650
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options "-O2 -ftls-model=global-dynamic -fPIC -mcmodel=tiny --save-temps" } */
+
+#include "tls.c"
+
+/* { dg-final { scan-assembler-times "adr\tx0, :tlsdesc:" 2 } } */
+/* { dg-final { scan-assembler-times "tlsdesccall" 2 } } */
+/* { dg-final { cleanup-saved-temps } } */

[AArch64][TLSGD Desc][2/3] Sort case label alphabetically

2015-06-22 Thread Jiong Wang


Obivious coding style fix.

2015-06-22  Jiong Wang  
gcc/
  * config/aarch64/aarch64.c (aarch64_expand_move_immediate): Sort case
  label alphabetically.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 16c8dba..dddf401 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1548,12 +1548,12 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  emit_insn (gen_rtx_SET (dest, mem));
 	  return;
 
-case SYMBOL_TLSGD:
-case SYMBOL_SMALL_GOTTPREL:
+	case SYMBOL_SMALL_GOTTPREL:
 	case SYMBOL_SMALL_GOT:
 	case SYMBOL_TINY_GOT:
-case SYMBOL_TINY_TLSIE:
+	case SYMBOL_TINY_TLSIE:
 	case SYMBOL_TLSDESC:
+	case SYMBOL_TLSGD:
 	  if (offset != const0_rtx)
 	{
 	  gcc_assert(can_create_pseudo_p ());

[AArch64][TLSGD Desc][1/3] Generalize TLS Descriptor for Global Dynamic

2015-06-22 Thread Jiong Wang


Currently, there is only small model support for TLS Global Dynamic
(Desciptor) on AArch64. While TLS Global Dynamic (Descriptor) is
actually the same for all memory mode.

We always generate below code sequences:

R0 = GOT entry address of tls descriptor for var.
Rx = speialize_func
.tlsdesccall var
blr Rx
  
Instruction sequences for different memory model differs only for how to
addressing the GOT descriptor of that TLS variable, and they should
always be packed together for later linker relaxation.

Tiny:

  ldr   xr, :tlsdesc:var
  adr   x0, :tlsdesc:var
  .tlsdesccall var
  blr   xr

Small:

  adrp  x0, :tlsdesc:var
  ldr   xr, [x0, #:tlsdesc_lo12:var]
  add   x0, x0, #:tlsdesc_lo12:var
  .tlsdesccall var
  blr   xr

Large:

  movz  x0, #:tlsdesc_off_g1:var
  movk  x0, #:tlsdesc_off_g0_nc:var
  .tlsdescldr var
  ldr   xr, [gp, x0]
  .tlsdescadd var
  add   x0, gp, x0
  .tlsdesccall var
  blr   xr

This patch generalize TLS Global Dynamic Descriptor code for all memory
model. Another seperate patch will add descriptor support for Tiny model.
  
OK for trunk?

2015-06-22  Jiong Wang  

gcc/
  * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Rename
  SYMBOL_SMALL_TLSDESC to SYMBOL_TLSDESC.
  (aarch64_symbol_context): Ditto.
  * config/aarch64/aarch64.md (tlsdesc_small_): Renamed into 
"tlsdesc_".
  * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Rename
  SYMBOL_SMALL_TLSDESC to SYMBOL_TLSDESC. Rename gen_tlsdesc_small_* to 
gen_tlsdesc_*.
  (aarch64_expand_mov_immediate): Ditto.
  (aarch64_print_operand): Ditto.
  (aarch64_classify_tls_symbol): Ditto.

-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 7fad48b..576acc0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -61,9 +61,9 @@ enum aarch64_symbol_context
 
This corresponds to the small PIC model of the compiler.
 
-   SYMBOL_SMALL_TLSDESC
SYMBOL_SMALL_GOTTPREL
SYMBOL_TINY_TLSIE
+   SYMBOL_TLSDESC
SYMBOL_TLSGD
SYMBOL_TLSLE
Each of of these represents a thread-local symbol, and corresponds to the
@@ -96,11 +96,11 @@ enum aarch64_symbol_type
 {
   SYMBOL_SMALL_ABSOLUTE,
   SYMBOL_SMALL_GOT,
-  SYMBOL_SMALL_TLSDESC,
   SYMBOL_SMALL_GOTTPREL,
   SYMBOL_TINY_ABSOLUTE,
   SYMBOL_TINY_GOT,
   SYMBOL_TINY_TLSIE,
+  SYMBOL_TLSDESC,
   SYMBOL_TLSGD,
   SYMBOL_TLSLE,
   SYMBOL_FORCE_TO_MEM
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e724bd4..16c8dba 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -921,7 +921,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	return;
   }
 
-case SYMBOL_SMALL_TLSDESC:
+case SYMBOL_TLSDESC:
   {
 	machine_mode mode = GET_MODE (dest);
 	rtx x0 = gen_rtx_REG (mode, R0_REGNUM);
@@ -932,9 +932,9 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	/* In ILP32, the got entry is always of SImode size.  Unlike
 	   small GOT, the dest is fixed at reg 0.  */
 	if (TARGET_ILP32)
-	  emit_insn (gen_tlsdesc_small_si (imm));
+	  emit_insn (gen_tlsdesc_si (imm));
 	else
-	  emit_insn (gen_tlsdesc_small_di (imm));
+	  emit_insn (gen_tlsdesc_di (imm));
 	tp = aarch64_load_tp (NULL);
 
 	if (mode != Pmode)
@@ -1549,11 +1549,11 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  return;
 
 case SYMBOL_TLSGD:
-case SYMBOL_SMALL_TLSDESC:
 case SYMBOL_SMALL_GOTTPREL:
 	case SYMBOL_SMALL_GOT:
 	case SYMBOL_TINY_GOT:
 case SYMBOL_TINY_TLSIE:
+	case SYMBOL_TLSDESC:
 	  if (offset != const0_rtx)
 	{
 	  gcc_assert(can_create_pseudo_p ());
@@ -4435,7 +4435,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  asm_fprintf (asm_out_file, ":tlsgd:");
 	  break;
 
-	case SYMBOL_SMALL_TLSDESC:
+	case SYMBOL_TLSDESC:
 	  asm_fprintf (asm_out_file, ":tlsdesc:");
 	  break;
 
@@ -4468,7 +4468,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  asm_fprintf (asm_out_file, ":tlsgd_lo12:");
 	  break;
 
-	case SYMBOL_SMALL_TLSDESC:
+	case SYMBOL_TLSDESC:
 	  asm_fprintf (asm_out_file, ":tlsdesc_lo12:");
 	  break;
 
@@ -7273,7 +7273,7 @@ aarch64_classify_tls_symbol (rtx x)
 {
 case TLS_MODEL_GLOBAL_DYNAMIC:
 case TLS_MODEL_LOCAL_DYNAMIC:
-  return TARGET_TLS_DESC ? SYMBOL_SMALL_TLSDESC : SYMBOL_TLSGD;
+  return TARGET_TLS_DESC ? SYMBOL_TLSDESC : SYMBOL_TLSGD;
 
 case TLS_MODEL_INITIAL_EXEC:
   switch (aarch64_cmodel)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9f1b26e..f3d9082 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4378,7 +4378,7 @@
(set_attr "length" "8, 12")]
 )
 
-(define_insn "tlsdesc_small_"
+(define_insn "tlsdesc_"
   [(set (reg:PTR R0_REGNUM)
 (unspec:PTR [(match_operand 0 "aarch64_valid_symref" "S")]
 		   UNSPEC_TLSDESC))

Re: [gomp4] Remove some ptxness from middle end

2015-06-22 Thread Marek Polacek

On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote:
> +  if (GET_CODE (arg) != CONST_INT
> +  || (unsigned HOST_WIDE_INT)INTVAL (arg) >= OACC_HWM)

Don't we have UINTVAL for this?  So UINTVAL (arg).

Marek

Re: [gomp4] Remove some ptxness from middle end

2015-06-22 Thread Nathan Sidwell


On 06/22/15 13:04, Marek Polacek wrote:

On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote:

+  if (GET_CODE (arg) != CONST_INT
+  || (unsigned HOST_WIDE_INT)INTVAL (arg) >= OACC_HWM)


Don't we have UINTVAL for this?  So UINTVAL (arg).


Oh, thanks! will fix

nathan

--
Nathan Sidwell

Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries


On 22/06/15 13:47, Richard Biener wrote:

(eventually also for the case where we
>>end up only computing the fast-query stuff).



Like this?
...
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c66ca2..58fc6fd 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -679,6 +679,12 @@ calculate_dominance_info (enum cdi_direction dir)
   free_dom_info (&di);
   dom_computed[dir_index] = DOM_NO_FAST_QUERY;
 }
+  else
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+}

   compute_dom_fast_query (dir);

...

Thanks,
- Tom

[committed] Test for flag_parallelize_loops > 1

2015-06-22 Thread Tom de Vries


On 19/06/15 11:26, Tom de Vries wrote:

Hi,

DEF_GOMP_BUILTIN tests for 'flag_parallelize_loops'. But if
flag_parallelize_loops is one (which is also the default), then
pass_parloops doesn't do anything, and won't generate any OMP constructs.

This patch makes DEF_GOMP_BUILTIN tests 'flag_parallelize_loops > 1',
just like all the other tests of flag_parallelize_loops in the compiler.

Build on x86_64 and reg-tested libgomp's c.exp.



During bootstrap and reg-test, I found regressions for -fcilkplus. 
There's a dependency of fcilkplus on the gomp builtins, which is exposed 
by this patch.


This updated patch also enables the gomp builtins for fcilkplus.

Bootstrapped and reg-tested on x86_64 on top of trunk.

Committed to trunk as obvious.

Thanks,
- Tom

Test for flag_parallelize_loops > 1

2015-06-19  Tom de Vries  

	* builtins.def (DEF_GOMP_BUILTIN): Test
	'flag_tree_parallelize_loops > 1' instead of
	'flag_tree_parallelize_loops'.  Test flag_cilkplus.
---
 gcc/builtins.def | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 55ce9f6..80e4a9c 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -182,7 +182,9 @@ along with GCC; see the file COPYING3.  If not see
 #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
false, true, true, ATTRS, false, \
-	   (flag_openmp || flag_tree_parallelize_loops \
+	   (flag_openmp \
+		|| flag_tree_parallelize_loops > 1 \
+		|| flag_cilkplus \
 		|| flag_offload_abi != OFFLOAD_ABI_UNSET))
 
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
-- 
1.9.1

Re: C++ PATCH for c++/66515 (ICE with initializer_list)

2015-06-22 Thread Jason Merrill


On 06/17/2015 04:44 PM, Jason Merrill wrote:

Now that reshape_init can return a non-CONSTRUCTOR, we need to call it
earlier in implicit_conversion.


I haven't noticed any problems with the original patch, but just to be 
safe this patch limits the new reshape to the same conditions as the old 
one: only classes.


Tested x86_64-pc-linux-gnu,

commit 98a362ded24db54963524761c3b0613ff844de51
Author: Jason Merrill 
Date:   Fri Jun 19 15:29:58 2015 -0400

	PR c++/66515
	* call.c (implicit_conversion): Only reshape for classes.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index ba5da4c..a6c313a 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -1759,8 +1759,9 @@ implicit_conversion (tree to, tree from, tree expr, bool c_cast_p,
 
   /* Call reshape_init early to remove redundant braces.  */
   if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr)
+  && CLASS_TYPE_P (to)
   && COMPLETE_TYPE_P (complete_type (to))
-  && CP_AGGREGATE_TYPE_P (to))
+  && !CLASSTYPE_NON_AGGREGATE (to))
 {
   expr = reshape_init (to, expr, complain);
   if (expr == error_mark_node)

Re: [C++ Patch] Remove pointless code in grokdeclarator

2015-06-22 Thread Paolo Carlini


Hi,

On 06/22/2015 06:56 PM, Jason Merrill wrote:
I think we should keep a comment to clarify why we don't care about 
type_quals here.

Ok, I will commit with a comment added.

Thanks,
Paolo.

Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation

2015-06-22 Thread Jeff Law


On 06/09/2015 11:31 AM, Patrick Palka wrote:

This patch refactors the entry point of -Wmisleading-indentation from:

   void
   warn_for_misleading_indentation (location_t guard_loc,
location_t body_loc,
location_t next_stmt_loc,
enum cpp_ttype next_tok_type,
const char *guard_kind);

to

   struct token_indent_info
   {
 location_t location;
 cpp_ttype type;
 rid keyword;
   };

   void
   warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
const token_indent_info &body_tinfo,
const token_indent_info &next_tinfo);

The purpose of this refactoring is to expose more information to the
-Wmisleading-indentation implementation to allow for more advanced
heuristics and for better coverage.

(I decided to keep the usage of const references because nobody
seems to mind.  Also I added a new header file, c-indentation.h.)

gcc/c-family/ChangeLog:

* c-indentation.h (struct token_indent_info): Define.
(get_token_indent_info): Define.
(warn_for_misleading_information): Declare.
* c-common.h (warn_for_misleading_information): Remove.
* c-identation.c (warn_for_misleading_indentation):
Change declaration to take three token_indent_infos.  Adjust
accordingly.
* c-identation.c (should_warn_for_misleading_indentation):
Likewise.  Bail out early if the body is a compound statement.
(guard_tinfo_to_string): Define.

gcc/c/ChangeLog:

* c-parser.c (c_parser_if_body): Take token_indent_info
argument. Call warn_for_misleading_indentation even when the
body is a semicolon.  Extract token_indent_infos corresponding
to the guard, body and next tokens.  Adjust call to
warn_for_misleading_indentation accordingly.
(c_parser_else_body): Likewise.
(c_parser_if_statement): Likewise.
(c_parser_while_statement): Likewise.
(c_parser_for_statement): Likewise.

gcc/cp/ChangeLog:

* parser.c (cp_parser_selection_statement): Move handling of
semicolon body to ...
(cp_parser_implicitly_scoped_statement): .. here.  Call
warn_for_misleading_indentation even when the body is a
semicolon.  Extract token_indent_infos corresponding to the
guard, body and next tokens.  Adjust call to
warn_for_misleading_indentation accordingly.  Take
token_indent_info argument.
(cp_parser_already_scoped_statement): Likewise.
(cp_parser_selection_statement, cp_parser_iteration_statement):
Extract a token_indent_info corresponding to the guard token.
The only question in my mind is bootstrap & regression testing.  From 
reading the thread for the earlier version of this patch I got the 
impression you had bootstrapped and regression tested earlier versions.


If you could confirm that you've bootstrapped and regression tested this 
version it'd be appreciated.  You can do it on the individual patches or 
the set as a whole.


Jeff

[committed] Add missing update_stmt in transform_to_exit_first_loop_alt

2015-06-22 Thread Tom de Vries


Hi,

I realized that transform_to_exit_first_loop_alt is missing an 
update_stmt for the gimple_cond_set_rhs (transform_to_exit_first_loop 
has an update_stmt after a similar gimple_cond_set_lhs).


Bootstrapped and reg-tested on x86_64.

Committed as trivial.

Thanks,
- Tom
Add missing update_stmt in transform_to_exit_first_loop_alt

2015-06-22  Tom de Vries  

	* tree-parloops.c (transform_to_exit_first_loop_alt): Add update_stmt
	for cond_stmt.
---
 gcc/tree-parloops.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 28112b2..7123c27 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1679,6 +1679,7 @@ transform_to_exit_first_loop_alt (struct loop *loop,
 
   /* Set the new loop bound.  */
   gimple_cond_set_rhs (cond_stmt, bound);
+  update_stmt (cond_stmt);
 
   /* Repair the ssa.  */
   vec *v = redirect_edge_var_map_vector (post_inc_edge);
-- 
1.9.1

Re: [PATCH 2/3] Remove is_first_nonwhitespace_on_line(), instead improve get_visual_column()

2015-06-22 Thread Jeff Law


On 06/09/2015 11:31 AM, Patrick Palka wrote:

This patch removes the function is_first_nonwhitespace_on_line() in
favor of augmenting the function get_visual_column() to optionally
return the visual column corresponding to the first non-whitespace character
on the line.  Existing usage of is_first_nonwhitespace_on_line() can
be trivially replaced by calling get_visual_column() and comparing *out
with *first_nws.

The rationale for this change is that in many cases it is better to use
the visual column of the first non-whitespace character rather than the
visual column of the token.  Consider:

   if (p) {
 foo (1);
   } else   // GUARD
 if (q) // BODY
   foo (2);
 foo (3);   // NEXT

Here, with current heuristics, we do not emit a warning because we
notice that the visual columns of each token line up ("suggesting"
autogenerated code).  Yet it is obvious that we should warn here because
it misleadingly looks like the foo (3); statement is guarded by the
else.

If we instead consider the visual column of the first non-whitespace
character on the guard line, the columns will not line up thus we will
emit the warning.  This will be done in the next patch.

gcc/c-family/ChangeLog:

* c-indentation.c (get_visual_column): Add parameter first_nws,
use it.  Update comment documenting the function.
(is_first_nonwhitespace_on_line): Remove.
(should_warn_for_misleading_indentation): Replace usage of
of is_first_nonwhitespace_on_line with get_visual_column.

Same comment/question WRT testing as the prior patch.

OK once you've confirmed bootstrap & regression testing was completed 
successfully.


jeff

Re: [PATCH 3/3] Improve -Wmissing-indentation heuristics

2015-06-22 Thread Jeff Law


On 06/09/2015 11:31 AM, Patrick Palka wrote:

This patch improves the heuristics of the warning in a number of ways.
The improvements are hopefully adequately documented in the code
comments.

The additions to the test case also highlight the improvements.

I tested an earlier version of this patch on more than a dozen C code
bases.  I only found one class of bogus warnings yet emitted, in the
libpng and bdwgc projects.  These projects have a coding style which
indents code inside #ifdefs as if this code was guarded by an if(), e.g.

   if (foo != 0)
 x = 10;
   else   // GUARD
 y = 100; // BODY

   #ifdef BAR
 blah ();  // NEXT
   #endif

These bogus warnings are pre-existing, however (i.e. not caused by this
patch).

gcc/c-family/ChangeLog:

* c-indentation.c (should_warn_for_misleading_indentation):
Improve heuristics.

gcc/testsuite/ChangeLog:

* c-c++-common/Wmisleading-indentation.c: Add more tests.

OK after confirming a successful bootstrap & regression test.

jeff

[patch] Delete temporary response file

2015-06-22 Thread Eric Botcazou

Hi,

when you pass a response file at link time and you use the GNU linker, then 
collect2 creates another, temporary response file and passes it to the linker.
But it fails to delete the file after it is done.  This can easily be seen 
with the following manipulation:

eric@polaris:~/build/gcc/native> cat t.c
int main (void) { return 0; }
eric@polaris:~/build/gcc/native> cat t.resp
-L/usr/lib64
eric@polaris:~/build/gcc/native> gcc -c t.c
eric@polaris:~/build/gcc/native> export TMPDIR=$PWD
eric@polaris:~/build/gcc/native> gcc -o t t.o @t.resp
eric@polaris:~/build/gcc/native> ls cc*
ccVSQ6W5

The problem is that do_wait is not invoked by tlink_execute, only collect_wait 
is, so the cleanup code present therein is never invoked.

Tested on x86_64-suse-linux, OK for the mainline?


2015-06-22  Tristan Gingold  

* collect2.c (collect_wait): Unlink the response file here instead of...
(do_wait): ...here.
(utils_cleanup): ...and here.


-- 
Eric BotcazouIndex: collect-utils.c
===
--- collect-utils.c	(revision 224708)
+++ collect-utils.c	(working copy)
@@ -68,6 +68,12 @@ collect_wait (const char *prog, struct p
 fatal_error (input_location, "can't get program status: %m");
   pex_free (pex);
 
+  if (response_file && !save_temps)
+{
+  unlink (response_file);
+  response_file = NULL;
+}
+
   if (status)
 {
   if (WIFSIGNALED (status))
@@ -90,12 +96,6 @@ do_wait (const char *prog, struct pex_ob
   int ret = collect_wait (prog, pex);
   if (ret != 0)
 fatal_error (input_location, "%s returned %d exit status", prog, ret);
-
-  if (response_file && !save_temps)
-{
-  unlink (response_file);
-  response_file = NULL;
-}
 }
 
 
@@ -224,7 +224,5 @@ utils_cleanup (bool from_signal)
  calls to maybe_unlink fails. */
   cleanup_done = true;
 
-  if (response_file)
-maybe_unlink (response_file);
   tool_cleanup (from_signal);
 }

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Julian Brown

On Mon, 22 Jun 2015 16:24:56 +0200
Jakub Jelinek  wrote:

> On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
> > One problem is that (at least on the GPU hardware we've considered
> > so far) we're somewhat constrained in how much control we have over
> > how the underlying hardware executes code: it's possible to draw up
> > a scheme where OpenACC source-level control-flow semantics are
> > reflected directly in the PTX assembly output (e.g. to say "all
> > threads in a CTA/warp will be coherent after such-and-such a
> > loop"), and lowering OpenACC directives quite early seems to make
> > that relatively tractable. (Even if the resulting code is
> > relatively un-optimisable due to the abnormal edges inserted to
> > make sure that the CFG doesn't become "ill-formed".)
> > 
> > If arbitrary optimisations are done between OMP-lowering time and
> > somewhere around vectorisation (say), it's less clear if that
> > correspondence can be maintained. Say if the code executed by half
> > the threads in a warp becomes physically separated from the code
> > executed by the other half of the threads in a warp due to some loop
> > optimisation, we can no longer easily determine where that warp will
> > reconverge, and certain other operations (relying on coherent warps
> > -- e.g. CTA synchronisation) become impossible. A similar issue
> > exists for warps within a CTA.
> > 
> > So, essentially -- I don't know how "late" loop lowering would
> > interact with:
> > 
> > (a) Maintaining a CFG that will work with PTX.
> > 
> > (b) Predication for worker-single and/or vector-single modes
> > (actually all currently-proposed schemes have problems with proper
> > representation of data-dependencies for variables and
> > compiler-generated temporaries between predicated regions.)
> 
> I don't understand why lowering the way you suggest helps here at all.
> In the proposed scheme, you essentially have whole function
> in e.g. worker-single or vector-single mode, which you need to be
> able to handle properly in any case, because users can write such
> routines themselves.

In vector-single or worker-single mode, divergence of threads within a
warp or a CTA is controlled by broadcasting the controlling expression
of conditional branches to the set of "inactive" threads, so each of
those follows along with the active thread. So you only get
potentially-problematic thread divergence when workers or vectors are
operating in partitioned mode.

So, for instance, a made-up example:

#pragma acc parallel
{
  #pragma acc loop gang
  for (i = 0; i < N; i++))
  {
#pragma acc loop worker
for (j = 0; j < M; j++)
{
  if (j < M / 2)
/* stmt 1 */
  else
/* stmt 2 */
}

/* reconvergence point: thread barrier */

[...]
  }
}

Here "stmt 1" and "stmt 2" execute in worker-partitioned, vector-single
mode. With "early lowering", the reconvergence point can be
inserted at the end of the loop, and abnormal edges (etc.) can be used
to ensure that the CFG does not get changed in such a way that there is
no longer a unique point at which the loop threads reconverge.

With "late lowering", it's no longer obvious to me if that can still be
done.

Julian

Re: [ping] Couple of patches for -fdump-ada-spec

2015-06-22 Thread Jeff Law


On 06/22/2015 09:33 AM, Eric Botcazou wrote:

Add query for template-dependent arguments to -fdump-ada-spec:
   http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00403.html

Get rid of assembly file with -fdump-ada-spec:
   http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00420.html

OK for both.

jeff

[PATCH] parloops exit phi fixes

2015-06-22 Thread Tom de Vries


Hi,

the gomp-4_0-branch contains the kernels oacc pass group. I've run into 
trouble before with this pass group due to the fact that it uses passes 
in an unusual location or order (pass_lim before pass_stdarg, 
https://gcc.gnu.org/ml/gcc/2015-01/msg00282.html ).


In an attempt to find this sort of issue pro-actively, I've modified the 
pass list in the following way (similar to the oacc kernels pass group, 
but always functional, not just for functions with kernel regions or 
loops in kernels regions), and bootstrapped and reg-tested on x86_64 on 
top of gomp-4_-0-branch:

...
   NEXT_PASS (pass_build_ealias);
   NEXT_PASS (pass_fre);
+  NEXT_PASS (pass_ch);
+  NEXT_PASS (pass_tree_loop_init);
+  NEXT_PASS (pass_lim);
+  NEXT_PASS (pass_tree_loop_done);
+  NEXT_PASS (pass_fre);
+  NEXT_PASS (pass_tree_loop_init);
+  NEXT_PASS (pass_scev_cprop);
+  NEXT_PASS (pass_parallelize_loops);
+  NEXT_PASS (pass_expand_omp_ssa);
+  NEXT_PASS (pass_tree_loop_done);
   NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_dse);
...

Apart from running into PR66616, I found two issues with the parloops pass:
1. handling of loop header phi, when there's no corresponding loop exit
   phi (unused reduction result)
2. handling of loop exit phi, when there's no corresponding loop header
   phi (value not modified in loop)

The two attached patches fix these problems.

Bootstrapped and reg-tested on x864_64 on top of gomp-4_0-branch in 
combination with the patch series that triggered the problem.


Bootstrapped and reg-tested on x864_64 on top of trunk.

OK for trunk?

Thanks,
- Tom
Handle unused reduction in create_loads_for_reductions

2015-06-22  Tom de Vries  

	* tree-parloops.c (create_loads_for_reductions): Handle case that
	reduction is unused.
---
 gcc/tree-parloops.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 48c143d..28112b2 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1162,6 +1162,10 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data)
   tree name;
   tree x;
 
+  /* If there's no exit phi, the result of the reduction is unused.  */
+  if (red->keep_res == NULL)
+return 1;
+
   gsi = gsi_after_labels (clsn_data->load_bb);
   load_struct = build_simple_mem_ref (clsn_data->load);
   load_struct = build3 (COMPONENT_REF, type, load_struct, red->field,
-- 
1.9.1

Handle exit phi without header phi in create_parallel_loop

2015-06-22  Tom de Vries  

	* tree-parloops.c (create_parallel_loop): Handle case that exit phi does
	not have a corresponding loop header phi.
---
 gcc/tree-parloops.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 7123c27..0693b9e 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2061,13 +2061,17 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
!gsi_end_p (gpi); gsi_next (&gpi))
 {
   source_location locus;
-  tree def;
   gphi *phi = gpi.phi ();
-  gphi *stmt;
+  tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+  gimple def_stmt = SSA_NAME_DEF_STMT (def);
 
-  stmt = as_a  (
-	   SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE (phi, exit)));
+  /* If the exit phi is not connected to a header phi in the same loop, this
+	 value is not modified in the loop, and we're done with this phi.  */
+  if (!(gimple_code (def_stmt) == GIMPLE_PHI
+	&& gimple_bb (def_stmt) == loop->header))
+	continue;
 
+  gphi *stmt = as_a  (def_stmt);
   def = PHI_ARG_DEF_FROM_EDGE (stmt, loop_preheader_edge (loop));
   locus = gimple_phi_arg_location_from_edge (stmt,
 		 loop_preheader_edge (loop));
-- 
1.9.1

Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-06-22 Thread Alexander Monakov

On Mon, 22 Jun 2015, Jiong Wang wrote:
> Have done a quick experiment, -fno-plt doesn't work on AArch64.
> 
> it's because although this patch force the function address into register,
> but the combine pass runs later combine it back as AArch64 have defined such
> insn pattern.
> 
> For X86, it's not combined back. From the rtl dump, it's because the rtl pre
> pass has moved the address load instruction into another basic block and
> combine pass don't combine across basic blocks. Also, x86 backend has done
> some check on flag_plt in the new added ix86_nopic_noplt_attribute_p which
> could help generate correct insns.
> 
> What I can think of the fix on AArch64 is by restricting the call symbol
> under "flag_plt == true" only, so that call via register can't be combined
> into call symbol direct,
> 
> Or better to prohibit combine pass for such combining? as the generic fix on
> combine may fix other broken targets.

My colleagues at ISP RAS (CC'ed) have been looking on arm (and aarch64) no-plt
codegen.  We also saw the problem with the combine pass you describe.  I think
your description of why it's not observed on x86 is incorrect; the newly added
ix86_nopic_noplt_attribute_p should not have anything to do with that.  It's
just that the GOT load insn has a REG_EQUAL note, and the combine pass can use
it to replace the register in the indirect branch, producing a direct branch
to a symbol (i.e. a PLT jump).

Actually we are not hitting the same problem on x86 by pure luck.  Early RTL
passes manage to lose the REG_EQUAL note, so by the time combine runs, the
register annotation is lost.  It's possible to reproduce the arm/aarch64
problem on x86 with -fno-gcse and the following hack:

diff --git a/gcc/cse.c b/gcc/cse.c
index 2a33827..88cff96 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -6634,6 +6634,9 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
   int *rc_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
   int i, n_blocks;

+  if (!flag_gcse)
+return 0;
+
   df_set_flags (DF_LR_RUN_DCE);
   df_note_add_problem ();
   df_analyze ();

Regarding fixing the issue, I also think that combine pass might be a better
place (than the backends).  I'd appreciate comments from maintainers.

If you try disabling the REG_EQUAL note generation [*], you'll probably find a
performance regression on arm32 (and probably on aarch64 as well? we only
tried arm32 so far).  The main reason for that is that GCC emits pretty bad
code for a GOT load.  Instead of using two add instructions and one ldr for
the GOT slot access, like the PLT stubs do, it uses three(!) ldr instructions
and one add.  The first ldr is for loading the GOT address, and the second is
for the offset of the GOT slot.  As I understand, to fix that, GCC has to
learn using the GOT_PREL relocation type.

[*] To do that, we hacked arm legitimize_pic_address not to emit REG_EQUAL
note under !flag_plt.

Alexander

patch to fix PR63740

2015-06-22 Thread Vladimir Makarov


The following patch fixes PR63740 which is describedin details on

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63740

Committed as rev. 224752.

I'll commit later the same patch for the trunk.

2015-06-22  Vladimir Makarov  

PR bootstrap/63740
* lra-lives.c (process_bb_lives): Check insn copying the same
reload pseudo and don't create a copy for it.

Index: lra-lives.c
===
--- lra-lives.c (revision 224739)
+++ lra-lives.c (working copy)
@@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int &c
  dst_regno = REGNO (SET_DEST (set));
  if (dst_regno >= lra_constraint_new_regno_start
  && src_regno >= lra_constraint_new_regno_start)
-   lra_create_copy (dst_regno, src_regno, freq);
+   {
+ /* It might be still an original (non-reload) insn with
+one unused output and a constraint requiring to use
+the same reg for input/output operands. In this case
+dst_regno and src_regno have the same value, we don't
+need a misleading copy for this case.  */
+ if (dst_regno != src_regno)
+   lra_create_copy (dst_regno, src_regno, freq);
+   }
  else if (dst_regno >= lra_constraint_new_regno_start)
{
  if ((hard_regno = src_regno) >= FIRST_PSEUDO_REGISTER)

Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-22 Thread Jakub Jelinek

On Mon, Jun 22, 2015 at 06:48:10PM +0100, Julian Brown wrote:
> In vector-single or worker-single mode, divergence of threads within a
> warp or a CTA is controlled by broadcasting the controlling expression
> of conditional branches to the set of "inactive" threads, so each of
> those follows along with the active thread. So you only get
> potentially-problematic thread divergence when workers or vectors are
> operating in partitioned mode.
> 
> So, for instance, a made-up example:
> 
> #pragma acc parallel
> {
>   #pragma acc loop gang
>   for (i = 0; i < N; i++))
>   {
> #pragma acc loop worker
> for (j = 0; j < M; j++)
> {
>   if (j < M / 2)
> /* stmt 1 */
>   else
> /* stmt 2 */
> }
> 
> /* reconvergence point: thread barrier */
> 
> [...]
>   }
> }
> 
> Here "stmt 1" and "stmt 2" execute in worker-partitioned, vector-single
> mode. With "early lowering", the reconvergence point can be
> inserted at the end of the loop, and abnormal edges (etc.) can be used
> to ensure that the CFG does not get changed in such a way that there is
> no longer a unique point at which the loop threads reconverge.
> 
> With "late lowering", it's no longer obvious to me if that can still be
> done.

Why?  The loop still has an exit edge (if there is no break/return/throw out of
the loop which I bet is not allowed), so you just insert the reconvergence
point at the exit edge from the loop.
For the "late lowering", I said it is up for benchmarking/investigation
where it would be best placed, it doesn't have to be after the loop passes,
there are plenty of optimization passes even before those.  But once you turn
many of the SSA_NAMEs in a function into (ab) ssa vars, many optimizations
just give up.
And, if you really want to avoid certain loop optimizations, you have always
the possibility to e.g. wrap certain statement in the loop in internal
function (e.g. the loop condition) or something similar to make the passes
more careful about those loops and make it easier to lower it later.

Jakub

patch to fix PR63740 on trunk

2015-06-22 Thread Vladimir Makarov


I've committed patch for PR63740 to the trunk as rev. 224753.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63740

The patch was bootstrapped on x86-64.

2015-06-22  Vladimir Makarov 

PR bootstrap/63740
* lra-lives.c (process_bb_lives): Check insn copying the same
reload pseudo and don't create a copy for it.

Index: lra-lives.c
===
--- lra-lives.c (revision 224739)
+++ lra-lives.c (working copy)
@@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int &c
  dst_regno = REGNO (SET_DEST (set));
  if (dst_regno >= lra_constraint_new_regno_start
  && src_regno >= lra_constraint_new_regno_start)
-   lra_create_copy (dst_regno, src_regno, freq);
+   {
+ /* It might be still an original (non-reload) insn with
+one unused output and a constraint requiring to use
+the same reg for input/output operands. In this case
+dst_regno and src_regno have the same value, we don't
+need a misleading copy for this case.  */
+ if (dst_regno != src_regno)
+   lra_create_copy (dst_regno, src_regno, freq);
+   }
  else if (dst_regno >= lra_constraint_new_regno_start)
{
  if ((hard_regno = src_regno) >= FIRST_PSEUDO_REGISTER)

Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation

2015-06-22 Thread Patrick Palka

On Mon, Jun 22, 2015 at 1:29 PM, Jeff Law  wrote:
> On 06/09/2015 11:31 AM, Patrick Palka wrote:
>>
>> This patch refactors the entry point of -Wmisleading-indentation from:
>>
>>void
>>warn_for_misleading_indentation (location_t guard_loc,
>> location_t body_loc,
>> location_t next_stmt_loc,
>> enum cpp_ttype next_tok_type,
>> const char *guard_kind);
>>
>> to
>>
>>struct token_indent_info
>>{
>>  location_t location;
>>  cpp_ttype type;
>>  rid keyword;
>>};
>>
>>void
>>warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
>> const token_indent_info &body_tinfo,
>> const token_indent_info &next_tinfo);
>>
>> The purpose of this refactoring is to expose more information to the
>> -Wmisleading-indentation implementation to allow for more advanced
>> heuristics and for better coverage.
>>
>> (I decided to keep the usage of const references because nobody
>> seems to mind.  Also I added a new header file, c-indentation.h.)
>>
>> gcc/c-family/ChangeLog:
>>
>> * c-indentation.h (struct token_indent_info): Define.
>> (get_token_indent_info): Define.
>> (warn_for_misleading_information): Declare.
>> * c-common.h (warn_for_misleading_information): Remove.
>> * c-identation.c (warn_for_misleading_indentation):
>> Change declaration to take three token_indent_infos.  Adjust
>> accordingly.
>> * c-identation.c (should_warn_for_misleading_indentation):
>> Likewise.  Bail out early if the body is a compound statement.
>> (guard_tinfo_to_string): Define.
>>
>> gcc/c/ChangeLog:
>>
>> * c-parser.c (c_parser_if_body): Take token_indent_info
>> argument. Call warn_for_misleading_indentation even when the
>> body is a semicolon.  Extract token_indent_infos corresponding
>> to the guard, body and next tokens.  Adjust call to
>> warn_for_misleading_indentation accordingly.
>> (c_parser_else_body): Likewise.
>> (c_parser_if_statement): Likewise.
>> (c_parser_while_statement): Likewise.
>> (c_parser_for_statement): Likewise.
>>
>> gcc/cp/ChangeLog:
>>
>> * parser.c (cp_parser_selection_statement): Move handling of
>> semicolon body to ...
>> (cp_parser_implicitly_scoped_statement): .. here.  Call
>> warn_for_misleading_indentation even when the body is a
>> semicolon.  Extract token_indent_infos corresponding to the
>> guard, body and next tokens.  Adjust call to
>> warn_for_misleading_indentation accordingly.  Take
>> token_indent_info argument.
>> (cp_parser_already_scoped_statement): Likewise.
>> (cp_parser_selection_statement, cp_parser_iteration_statement):
>> Extract a token_indent_info corresponding to the guard token.
>
> The only question in my mind is bootstrap & regression testing.  From
> reading the thread for the earlier version of this patch I got the
> impression you had bootstrapped and regression tested earlier versions.
>
> If you could confirm that you've bootstrapped and regression tested this
> version it'd be appreciated.  You can do it on the individual patches or the
> set as a whole.

I think I successfully bootstrapped + regtested this exact version but
I'm not sure.  I was going to do so again before committing anyway.
I will fire off a build tonight and confirm the results tomorrow.

>
> Jeff
>
>

Re: match.pd: Three new patterns

2015-06-22 Thread Marek Polacek

On Fri, Jun 19, 2015 at 05:51:53PM +0200, Marc Glisse wrote:
> On Fri, 19 Jun 2015, Marek Polacek wrote:
> 
> >+/* x + y - (x | y) -> x & y */
> >+(simplify
> >+ (minus (plus @0 @1) (bit_ior @0 @1))
> >+ (if (!TYPE_OVERFLOW_SANITIZED (type) && !TYPE_SATURATING (type))
> >+  (bit_and @0 @1)))
> >+
> >+/* (x + y) - (x & y) -> x | y */
> >+(simplify
> >+ (minus (plus @0 @1) (bit_and @0 @1))
> >+ (if (!TYPE_OVERFLOW_SANITIZED (type) && !TYPE_SATURATING (type))
> >+  (bit_ior @0 @1)))
> 
> It could be macroized so they are handled by the same piece of code, but
> that's not important for a couple lines.
 
Yeah, that could be done, but I didn't see much value in doing that.

> As far as I can tell, TYPE_SATURATING is for fixed point numbers only, are
> we allowed to use bit_ior/bit_and on those? I never know what kind of
> integers are supposed to be supported, so I would have checked
> TYPE_OVERFLOW_UNDEFINED (type) || TYPE_OVERFLOW_WRAPS (type) since those are
> the 2 cases where we know it is safe (for TYPE_OVERFLOW_TRAPS it is never
> clear if we are supposed to preserve traps or just avoid introducing new
> ones). Well, the reviewer will know, I'll shut up :-)
 
I think you're right about TYPE_SATURATING so I've dropped that and instead
replaced it with TYPE_OVERFLOW_TRAPS.  That should do the right thing
together with TYPE_OVERFLOW_SANITIZED.

> (I still believe that the necessity for TYPE_OVERFLOW_SANITIZED here points
> to a design issue in ubsan, but it is way too late to discuss that)

I think delayed folding would help here a bit.  Also, we've been talking
about doing the signed overflow sanitization earlier, but so far I didn't
implement that.  And -ftrapv should be merged into the ubsan infrastructure
some day.

> It is probably not worth the trouble adding the variant:
> x+(y-(x&y)) -> x|y
> since it decomposes as
> y-(x&y) -> y&~x
> x+(y&~x) -> x|y
> x+(y-(x|y)) -> x-(x&~y) -> x&y is less likely to happen because the first
> transform y-(x|y) -> -(x&~y) increases the number of insns. Bah, we can't
> handle everything...

That sounds about right ;).  Thanks!

So, Richi, is this variant ok as well?  I also added one ubsan test.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-22  Marek Polacek  

* match.pd ((x + y) - (x | y) -> x & y,
(x + y) - (x & y) -> x | y): New patterns.

* gcc.dg/fold-minus-4.c: New test.
* gcc.dg/fold-minus-5.c: New test.
* c-c++-common/ubsan/overflow-add-5.c: New test.

diff --git gcc/match.pd gcc/match.pd
index badb80a..6d520ef 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -343,6 +343,18 @@ along with GCC; see the file COPYING3.  If not see
  (plus:c (bit_and @0 @1) (bit_ior @0 @1))
  (plus @0 @1))
 
+/* (x + y) - (x | y) -> x & y */
+(simplify
+ (minus (plus @0 @1) (bit_ior @0 @1))
+ (if (!TYPE_OVERFLOW_SANITIZED (type) && !TYPE_OVERFLOW_TRAPS (type))
+  (bit_and @0 @1)))
+
+/* (x + y) - (x & y) -> x | y */
+(simplify
+ (minus (plus @0 @1) (bit_and @0 @1))
+ (if (!TYPE_OVERFLOW_SANITIZED (type) && !TYPE_OVERFLOW_TRAPS (type))
+  (bit_ior @0 @1)))
+
 /* (x | y) - (x ^ y) -> x & y */
 (simplify
  (minus (bit_ior @0 @1) (bit_xor @0 @1))
diff --git gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c 
gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c
index e69de29..905a60a 100644
--- gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c
+++ gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=signed-integer-overflow" } */
+
+int __attribute__ ((noinline))
+foo (int i, int j)
+{
+  return (i + j) - (i | j);
+}
+
+/* { dg-output "signed integer overflow: 2147483647 \\+ 1 cannot be 
represented in type 'int'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: -2147483648 - 2147483647 
cannot be represented in type 'int'\[^\n\r]*(\n|\r\n|\r)" } */
+
+int __attribute__ ((noinline))
+bar (int i, int j)
+{
+  return (i + j) - (i & j);
+}
+
+/* { dg-output "\[^\n\r]*signed integer overflow: 2147483647 \\+ 1 cannot be 
represented in type 'int'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*signed integer overflow: -2147483648 - 1 cannot be 
represented in type 'int'" } */
+
+int
+main ()
+{
+  int r = foo (__INT_MAX__, 1);
+  asm volatile ("" : "+g" (r));
+  r = bar (__INT_MAX__, 1);
+  asm volatile ("" : "+g" (r));
+  return 0;
+}
diff --git gcc/testsuite/gcc.dg/fold-minus-4.c 
gcc/testsuite/gcc.dg/fold-minus-4.c
index e69de29..2d76b4f 100644
--- gcc/testsuite/gcc.dg/fold-minus-4.c
+++ gcc/testsuite/gcc.dg/fold-minus-4.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-cddce1" } */
+
+int
+fn1 (int a, int b)
+{
+  int tem1 = a + b;
+  int tem2 = a & b;
+  return tem1 - tem2;
+}
+
+int
+fn2 (int a, int b)
+{
+  int tem1 = b + a;
+  int tem2 = a & b;
+  return tem1 - tem2;
+}
+
+int
+fn3 (int a, int b)
+{
+  int tem1 = a + b;
+  int tem2 = b & a;
+  return tem1 - tem2;
+}
+
+int
+fn4 (int a, int b)
+{
+  int tem1 = b +

Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Marek Polacek

On Sun, Jun 21, 2015 at 05:05:14PM -0600, Martin Sebor wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/addr_builtin-pr66516.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */

One more nit: I think I'd prefer naming the test addr-builtin-1.c
and then putting /* PR c/66516 */ on the first line of the test.

Marek

Re: [gomp4] Remove some ptxness from middle end

2015-06-22 Thread Nathan Sidwell


On 06/22/15 13:04, Marek Polacek wrote:

On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote:

+  if (GET_CODE (arg) != CONST_INT
+  || (unsigned HOST_WIDE_INT)INTVAL (arg) >= OACC_HWM)


Don't we have UINTVAL for this?  So UINTVAL (arg).


Applied the attached, after testing.  Also realized I'd missed some places I 
should have used the new loop level enumeration.


nathan

--
Nathan Sidwell
2015-06-22  Nathan Sidwell  

	* omp-low.c (expand_oacc_get_num_threads): Use OACC enum.
	(expand_oacc_get_thread_num, make_predication_test): Likewise.
	* builtins.c (expand_oacc_id): Use UINTVAL.

Index: omp-low.c
===
--- omp-low.c	(revision 224747)
+++ omp-low.c	(working copy)
@@ -4994,8 +4994,8 @@ expand_oacc_get_num_threads (gimple_seq
   tree  decl = builtin_decl_explicit (BUILT_IN_GOACC_NID);
   unsigned ix;
 
-  for (ix = 0; (1 << ix) <= gwv_bits; ix++)
-if ((1 << ix) & gwv_bits)
+  for (ix = OACC_gang; ix != OACC_HWM; ix++)
+if (OACC_LOOP_MASK(ix) & gwv_bits)
   {
 	tree arg = build_int_cst (unsigned_type_node, ix);
 	tree count = create_tmp_var (unsigned_type_node);
@@ -5022,8 +5022,8 @@ expand_oacc_get_thread_num (gimple_seq *
   unsigned ix;
 
   /* Start at gang level, and examine relevant dimension indices.  */
-  for (ix = 0; (1 << ix) <= gwv_bits; ix++)
-if ((1 << ix) & gwv_bits)
+  for (ix = OACC_gang; ix != OACC_HWM; ix++)
+if (OACC_LOOP_MASK (ix) & gwv_bits)
   {
 	tree arg = build_int_cst (unsigned_type_node, ix);
 
@@ -10671,7 +10671,7 @@ make_predication_test (edge true_edge, b
   unsigned ix;
 
   for (ix = OACC_worker; ix <= OACC_vector; ix++)
-if (mask & (1 << ix))
+if (OACC_LOOP_MASK (ix) & mask)
   {
 	gimple call = gimple_build_call
 	  (decl, 1, build_int_cst (unsigned_type_node, ix));
Index: builtins.c
===
--- builtins.c	(revision 224747)
+++ builtins.c	(working copy)
@@ -5971,8 +5971,7 @@ expand_oacc_id (enum built_in_function f
   rtx arg;
 
   arg = expand_normal (arg0);
-  if (GET_CODE (arg) != CONST_INT
-  || (unsigned HOST_WIDE_INT)INTVAL (arg) >= OACC_HWM)
+  if (GET_CODE (arg) != CONST_INT || UINTVAL (arg) >= OACC_HWM)
 {
   error ("argument to %D must be constant in range 0 to %d",
 	 get_callee_fndecl (exp), OACC_HWM - 1);

patch to fix PR63740 on gcc5 branch

2015-06-22 Thread Vladimir Makarov


I've committed the following patch to gcc 5 branch as rev.224761.

The patch was bootstrapped on x86-64.


2015-06-22  Vladimir Makarov 

PR bootstrap/63740
* lra-lives.c (process_bb_lives): Check insn copying the same
reload pseudo and don't create a copy for it.

Index: lra-lives.c
===
--- lra-lives.c (revision 224739)
+++ lra-lives.c (working copy)
@@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int &c
  dst_regno = REGNO (SET_DEST (set));
  if (dst_regno >= lra_constraint_new_regno_start
  && src_regno >= lra_constraint_new_regno_start)
-   lra_create_copy (dst_regno, src_regno, freq);
+   {
+ /* It might be still an original (non-reload) insn with
+one unused output and a constraint requiring to use
+the same reg for input/output operands. In this case
+dst_regno and src_regno have the same value, we don't
+need a misleading copy for this case.  */
+ if (dst_regno != src_regno)
+   lra_create_copy (dst_regno, src_regno, freq);
+   }
  else if (dst_regno >= lra_constraint_new_regno_start)
{
  if ((hard_regno = src_regno) >= FIRST_PSEUDO_REGISTER)

Re: [PATCH] Combine related fail of gcc.target/powerpc/ti_math1.c

2015-06-22 Thread Alan Modra

On Mon, Jun 22, 2015 at 09:24:07AM +0200, Eric Botcazou wrote:
> > * rtlanal.c (commutative_operand_precedence): Correct comments.
> > * simplify-rtx.c (simplify_plus_minus_op_data_cmp): Delete forward
> > declaration.  Return an int.  Distinguish REG,REG return from
> > others.
> > (struct simplify_plus_minus_op_data): Make local to function.
> > (simplify_plus_minus): Rename canonicalized to not_canonical.
> > Don't set not_canonical if merely sorting registers.  Avoid
> > packing ops if nothing changes.  White space fixes.
> 
> OK in principle, but...

Thanks for reviewing!

> > Some notes: Renaming canonicalized to not_canonical better reflects
> > its usage.  At the time the var is set, the expression hasn't been
> > canonicalized.
> 
> I'm quite skeptical, in particular given:

I'm a little surprised, but committed without the renaming.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Martin Sebor


It seems like this patch regresess pr59630.c testcase; I don't see
the testcase being addressed in this patch.


Thanks for the review and for pointing out this regression!
I missed it among all the C test suite failures (I see 157
of them in 24 distinct tests on x86_64.)

pr59630 is marked ice-on-valid-code even though the call via
the converted pointer is clearly invalid (UB). What's more
relevant, though, is that the test case is one of those that
(while they both compile and link with the unpatched GCC) are
not intended to compile with the patch (and don't compile with
Clang).

In this simple case, the call to __builtin_abs(0) is folded
into the constant 0, but in more involved cases GCC emits
a call to abs. It's not clear to me from the manual or from
the builtin tests I've seen whether this is by design or
an accident of the implementation

Is it intended that programs be able to take the address of
the builtins that correspond to libc functions and make calls
to the underlying libc functions via such pointers? (If so,
the patch will need some tweaking.)



Please no c/ and cp/ prefixes.


Sure, let me fix that in the next patch once the question
above has been settled.




+#include 


As Joseph already pointed out, this is redundant.


Yes, that was an accidental vestige of some debugging code I had
added. I'll take it out.




@@ -3384,7 +3392,14 @@ parser_build_unary_op (location_t loc, enum tree_code 
code, struct c_expr arg)
result.original_code = code;
result.original_type = NULL;

-  if (TREE_OVERFLOW_P (result.value) && !TREE_OVERFLOW_P (arg.value))
+  if (code == ADDR_EXPR
+  && TREE_CODE (TREE_TYPE (arg.value)) == FUNCTION_TYPE
+  && DECL_IS_BUILTIN (arg.value))
+{
+  error_at (loc, "taking address of a builtin function");
+  result.value = error_mark_node;
+}
+  else if (TREE_OVERFLOW_P (result.value) && !TREE_OVERFLOW_P (arg.value))
  overflow_warning (loc, result.value);


It seems like you can move the new hunk a bit above so that we don't call
build_unary_op in a case when taking the address of a built-in function.


Yes, that should work.



Unfortunately, it doesn't seem possible to do this error in build_unary_op
or in function_to_pointer_conversion :(.


Right. I couldn't find a way to do it because it gets called
for function calls too.


One more nit: I think I'd prefer naming the test addr-builtin-1.c
and then putting /* PR c/66516 */ on the first line of the test.


Will do.

Martin

Re: [PATCH] Fix PR c++/30044

2015-06-22 Thread Jason Merrill


On 06/15/2015 02:32 PM, Patrick Palka wrote:

On Mon, Jun 15, 2015 at 2:05 PM, Jason Merrill  wrote:

Any reason not to use grow_tree_vec?


Doing so causes a lot of ICEs in the testsuite.  I think it's because
grow_tree_vec invalidates the older parameter_vec which some trees may
still be holding a reference to in their DECL_TEMPLATE_PARMS field.


Hmm, that's unfortunate, as doing it this way means we get a bunch of 
garbage TREE_VECs in the process.  But I guess the patch is OK as is.


Jason

1 2 >

100 matches

Mail list logo