[committed] gen-pass-instances.awk: Add emacs indent setting

2015-11-12 Thread Tom de Vries

Hi,

this patch adds emacs indentation settings to gen-pass-instances.awk. 
The default indentation width in emacs awk mode seems to be 4, and this 
setting overrides it to 8, which is the style used in this file.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Add emacs indent setting

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk: Add emacs indent setting.

---
 gcc/gen-pass-instances.awk | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index f36f510..a0be6a1 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -64,3 +64,8 @@ function handle_line()
 }
 
 { handle_line() }
+
+# Local Variables:
+# mode:awk
+# c-basic-offset:8
+# End:


gen-pass-instances.awk: Remove unused var in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch removes an unused variable from handle_line in 
gen-pass-instances.awk.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Remove unused var in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Remove unused var line_length.

---
 gcc/gen-pass-instances.awk | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index a0be6a1..7f33e8c 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -47,7 +47,6 @@ function handle_line()
 		len_of_start = length("NEXT_PASS (")
 		len_of_end = length(")")
 		len_of_pass_name = RLENGTH - (len_of_start + len_of_end)
-		line_length = length(line)
 		pass_starts_at = where + len_of_start
 		pass_name = substr(line, pass_starts_at, len_of_pass_name)
 		if (pass_name in pass_counts)


[committed] gen-pass-instances.awk: Unify semicolon use in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch unifies semicolon use in handle_line in gen-pass-instances.awk.

Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Unify semicolon use in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Unify semicolon use.

---
 gcc/gen-pass-instances.awk | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 7f33e8c..9eaac65 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -41,14 +41,14 @@ BEGIN {
 function handle_line()
 {
 	line = $0;
-	where = match(line, /NEXT_PASS \((.+)\)/)
+	where = match(line, /NEXT_PASS \((.+)\)/);
 	if (where != 0)
 	{
-		len_of_start = length("NEXT_PASS (")
-		len_of_end = length(")")
-		len_of_pass_name = RLENGTH - (len_of_start + len_of_end)
-		pass_starts_at = where + len_of_start
-		pass_name = substr(line, pass_starts_at, len_of_pass_name)
+		len_of_start = length("NEXT_PASS (");
+		len_of_end = length(")");
+		len_of_pass_name = RLENGTH - (len_of_start + len_of_end);
+		pass_starts_at = where + len_of_start;
+		pass_name = substr(line, pass_starts_at, len_of_pass_name);
 		if (pass_name in pass_counts)
 			pass_counts[pass_name]++;
 		else


[committed] gen-pass-instances.awk: Use early-out in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch restructures handle_line in gen-pass-instances.awk to use an 
early-out.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Use early-out in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Restructure using early-out.

---
 gcc/gen-pass-instances.awk | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 9eaac65..27e7a98 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -41,25 +41,27 @@ BEGIN {
 function handle_line()
 {
 	line = $0;
+
 	where = match(line, /NEXT_PASS \((.+)\)/);
-	if (where != 0)
+	if (where == 0)
 	{
-		len_of_start = length("NEXT_PASS (");
-		len_of_end = length(")");
-		len_of_pass_name = RLENGTH - (len_of_start + len_of_end);
-		pass_starts_at = where + len_of_start;
-		pass_name = substr(line, pass_starts_at, len_of_pass_name);
-		if (pass_name in pass_counts)
-			pass_counts[pass_name]++;
-		else
-			pass_counts[pass_name] = 1;
-		printf "%s, %s%s\n",
-			substr(line, 1, pass_starts_at + len_of_pass_name - 1),
-			pass_counts[pass_name],
-			substr(line, pass_starts_at + len_of_pass_name);
-	} else {
 		print line;
+		return;
 	}
+
+	len_of_start = length("NEXT_PASS (");
+	len_of_end = length(")");
+	len_of_pass_name = RLENGTH - (len_of_start + len_of_end);
+	pass_starts_at = where + len_of_start;
+	pass_name = substr(line, pass_starts_at, len_of_pass_name);
+	if (pass_name in pass_counts)
+		pass_counts[pass_name]++;
+	else
+		pass_counts[pass_name] = 1;
+	printf "%s, %s%s\n",
+		substr(line, 1, pass_starts_at + len_of_pass_name - 1),
+		pass_counts[pass_name],
+		substr(line, pass_starts_at + len_of_pass_name);
 }
 
 { handle_line() }


[PATCH, i386]: Use ssememalign attribute value to reject insns with misaligned operands

2015-11-12 Thread Uros Bizjak
Hello!

Attached patch uses ssememalign attribute to reject insn combinations
where memory operands would be misaligned.

2015-11-12  Uros Bizjak  

* config/i386/i386.c (ix86_legitimate_combined_insn): Reject
combined insn if the alignment of vector mode memory operand
is less than ssememalign.

testsuite/ChangeLog:

2015-11-12  Uros Bizjak  

* gcc.target/i386/sse-1.c (swizzle): Assume that a is
aligned to 64 bits.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32}, committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 230213)
+++ config/i386/i386.c  (working copy)
@@ -7236,11 +7236,12 @@ ix86_legitimate_combined_insn (rtx_insn *insn)
  /* For pre-AVX disallow unaligned loads/stores where the
 instructions don't support it.  */
  if (!TARGET_AVX
- && VECTOR_MODE_P (GET_MODE (op))
- && misaligned_operand (op, GET_MODE (op)))
+ && VECTOR_MODE_P (mode)
+ && misaligned_operand (op, mode))
{
- int min_align = get_attr_ssememalign (insn);
- if (min_align == 0)
+ unsigned int min_align = get_attr_ssememalign (insn);
+ if (min_align == 0
+ || MEM_ALIGN (op) < min_align)
return false;
}
 
Index: testsuite/gcc.target/i386/sse-1.c
===
--- testsuite/gcc.target/i386/sse-1.c   (revision 230213)
+++ testsuite/gcc.target/i386/sse-1.c   (working copy)
@@ -14,8 +14,10 @@ typedef union
 void
 swizzle (const void *a, vector4_t * b, vector4_t * c)
 {
-  b->v = _mm_loadl_pi (b->v, (__m64 *) a);
-  c->v = _mm_loadl_pi (c->v, ((__m64 *) a) + 1);
+  __m64 *t = __builtin_assume_aligned (a, 64);
+
+  b->v = _mm_loadl_pi (b->v, t);
+  c->v = _mm_loadl_pi (c->v, t + 1);
 }
 
 /* While one legal rendering of each statement would be movaps;movlps;movaps,


[committed] gen-pass-instances.awk: Add len_of_call var in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch adds a variable len_of_call in handle_line in 
gen-pass-instances.awk.  It moves the use of the RLENGTH variable just 
after the related match call.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Add len_of_call var in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Add len_of_call variable.

---
 gcc/gen-pass-instances.awk | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 27e7a98..70b00b7 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -42,6 +42,7 @@ function handle_line()
 {
 	line = $0;
 
+	# Find call expression.
 	where = match(line, /NEXT_PASS \((.+)\)/);
 	if (where == 0)
 	{
@@ -49,9 +50,12 @@ function handle_line()
 		return;
 	}
 
+	# Length of the call expression.
+	len_of_call = RLENGTH;
+
 	len_of_start = length("NEXT_PASS (");
 	len_of_end = length(")");
-	len_of_pass_name = RLENGTH - (len_of_start + len_of_end);
+	len_of_pass_name = len_of_call - (len_of_start + len_of_end);
 	pass_starts_at = where + len_of_start;
 	pass_name = substr(line, pass_starts_at, len_of_pass_name);
 	if (pass_name in pass_counts)


[committed] gen-pass-instances.awk: Rename len_of_end to len_of_close in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch renames variable len_of_end to len_of_close in handle_line in 
gen-pass-instances.awk.


Committed to trunk as obvious.

Thanks,
- Tom
gen-pass-instances.awk: Rename len_of_end to len_of_close in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Rename len_of_end to
	len_of_close.

---
 gcc/gen-pass-instances.awk | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 70b00b7..7624959 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -54,8 +54,9 @@ function handle_line()
 	len_of_call = RLENGTH;
 
 	len_of_start = length("NEXT_PASS (");
-	len_of_end = length(")");
-	len_of_pass_name = len_of_call - (len_of_start + len_of_end);
+	len_of_close = length(")");
+
+	len_of_pass_name = len_of_call - (len_of_start + len_of_close);
 	pass_starts_at = where + len_of_start;
 	pass_name = substr(line, pass_starts_at, len_of_pass_name);
 	if (pass_name in pass_counts)


[committed] gen-pass-instances.awk: Add comments in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch adds some comments in handle_line in gen-pass-instances.awk.

Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Add comments in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Add comments.

---
 gcc/gen-pass-instances.awk | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 7624959..3d5e8b6 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -56,13 +56,18 @@ function handle_line()
 	len_of_start = length("NEXT_PASS (");
 	len_of_close = length(")");
 
+	# Find pass_name argument
 	len_of_pass_name = len_of_call - (len_of_start + len_of_close);
 	pass_starts_at = where + len_of_start;
 	pass_name = substr(line, pass_starts_at, len_of_pass_name);
+
+	# Set pass_counts
 	if (pass_name in pass_counts)
 		pass_counts[pass_name]++;
 	else
 		pass_counts[pass_name] = 1;
+
+	# Print call expression with extra pass_num argument
 	printf "%s, %s%s\n",
 		substr(line, 1, pass_starts_at + len_of_pass_name - 1),
 		pass_counts[pass_name],


[committed] gen-pass-instances.awk: Add pass_num, prefix and postfix vars in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch adds new variables pass_num, prefix and postfix in 
handle_line in gen-pass-instances.awk.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Add pass_num, prefix and postfix vars in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Add pass_num, prefix and postfix
	vars.

---
 gcc/gen-pass-instances.awk | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 3d5e8b6..1aced74 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -61,17 +61,22 @@ function handle_line()
 	pass_starts_at = where + len_of_start;
 	pass_name = substr(line, pass_starts_at, len_of_pass_name);
 
+	# Find prefix (until and including pass_name)
+	prefix = substr(line, 1, pass_starts_at + len_of_pass_name - 1)
+
+	# Find postfix (after pass_name)
+	postfix = substr(line, pass_starts_at + len_of_pass_name)
+
 	# Set pass_counts
 	if (pass_name in pass_counts)
 		pass_counts[pass_name]++;
 	else
 		pass_counts[pass_name] = 1;
 
+	pass_num = pass_counts[pass_name];
+
 	# Print call expression with extra pass_num argument
-	printf "%s, %s%s\n",
-		substr(line, 1, pass_starts_at + len_of_pass_name - 1),
-		pass_counts[pass_name],
-		substr(line, pass_starts_at + len_of_pass_name);
+	printf "%s, %s%s\n", prefix, pass_num, postfix;
 }
 
 { handle_line() }


[committed] gen-pass-instances.awk: Make print command clearer in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch modifies the prefix and postfix expressions in handle_line 
gen-pass-instances.awk, such that the printf command now lists all the 
NEXT_PASS call arguments, and surrounds them with parentheses.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Make print command clearer in handle_line

2015-11-11  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Print parentheses and pass_name
	explicitly.

---
 gcc/gen-pass-instances.awk | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 1aced74..b10c26a 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -54,6 +54,7 @@ function handle_line()
 	len_of_call = RLENGTH;
 
 	len_of_start = length("NEXT_PASS (");
+	len_of_open = length("(");
 	len_of_close = length(")");
 
 	# Find pass_name argument
@@ -61,11 +62,13 @@ function handle_line()
 	pass_starts_at = where + len_of_start;
 	pass_name = substr(line, pass_starts_at, len_of_pass_name);
 
-	# Find prefix (until and including pass_name)
-	prefix = substr(line, 1, pass_starts_at + len_of_pass_name - 1)
+	# Find call expression prefix (until and including called function)
+	prefix_len = pass_starts_at - 1 - len_of_open;
+	prefix = substr(line, 1, prefix_len);
 
-	# Find postfix (after pass_name)
-	postfix = substr(line, pass_starts_at + len_of_pass_name)
+	# Find call expression postfix
+	postfix_starts_at = pass_starts_at + len_of_pass_name + len_of_close;
+	postfix = substr(line, postfix_starts_at);
 
 	# Set pass_counts
 	if (pass_name in pass_counts)
@@ -76,7 +79,7 @@ function handle_line()
 	pass_num = pass_counts[pass_name];
 
 	# Print call expression with extra pass_num argument
-	printf "%s, %s%s\n", prefix, pass_num, postfix;
+	printf "%s(%s, %s)%s\n", prefix, pass_name, pass_num, postfix;
 }
 
 { handle_line() }


Re: open acc default data attribute

2015-11-12 Thread Jakub Jelinek
On Wed, Nov 11, 2015 at 12:19:55PM -0500, Nathan Sidwell wrote:
> this patch implements default data attribute determination.  The current
> behaviour defaults to 'copy' and ignores 'default(none)'. The  patch
> corrects that.
> 
> 1) We emit a diagnostic when 'default(none)' is in effect.  The fortran FE
> emits some artificial decls that it doesn't otherwise annotate, which is why
> we check DECL_ARTIFICIAL.  IIUC Cesar had a patch to address that but it
> needed some reworking?

I don't think treating DECL_ARTIFICIAL specially is a bug of any kind,
there are tons of different artificals even for C/C++ VLAs etc., and user
has no way to put them into any clauses explicitly, so what we do with them
is GCC internal thing.

> 2015-11-11  Nathan Sidwell  
> 
>   gcc/
>   * gimplify.c (oacc_default_clause): New.
>   (omp_notice_variable): Call it.
> 
>   gcc/testsuite/
>   * c-c++-common/goacc/data-default-1.c: New.
> 
>   libgomp/
>   * testsuite/libgomp.oacc-c-c++-common/default-1.c: New.

+  error ("%qE not specified in enclosing OpenACC %s construct",
   
+DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind);   
   
+  error_at (ctx->location, "enclosing OpenACC %s construct", rkind);   
   

I'd use %qs instead of %s.

Otherwise ok.

Jakub


Re: [PATCH] PR ada/66205 gnatbind generates invalid code when finalization is enabled in restricted runtime

2015-11-12 Thread Simon Wright
On 11 Nov 2015, at 19:43, Simon Wright  wrote:

> This situation arises, for example, with an embedded RTS that incorporates the
> Ada 2012 generalized container iterators.

I should add, this PR is the “other half” of PR ada/66242, which is fixed in 
GCC 6; so 
please can it be reviewed?

I didn’t make it plain that the comment I’ve put in the first hunk,

 --  For restricted run-time libraries (ZFP and Ravenscar) tasks
 --  are non-terminating, so we do not want finalization.

is lifted from the unpatched code at line 480, where it relates to the use of 
Configurable_Run_Time_On_Target for this purpose.

Re: OpenACC Firstprivate

2015-11-12 Thread Thomas Schwinge
Hi Nathan!

Merging back your trunk r230169 into gomp-4_0-branch, for the new
libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c test, I'm
seeing the compiler diagnose as follows (compile with "-Wall -O2"):

source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: In 
function 'main._omp_fn.1':

source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c:20:17: 
warning: 'val' is used uninitialized in this function [-Wuninitialized]
   ok  = val == 7;
 ^


source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c:9:7: 
note: 'val' was declared here
   int val = 2;
   ^

..., and execution fails ("return 1" from main), so I XFAILed the
execution in the merge commit r230214 on gomp-4_0-branch.  (..., and I
still think that it's a good idea to change the libgomp testsuite to run
with -Wall enabled...)

Do you have an idea what's going on?  Given your preparatory "[gomp4]
Rework gimplifyier region flags",

(thanks!), the merge commit r230214 on gomp-4_0-branch didn't contain any
changes to gcc/gimplify.c, so that can't be it.  It also can't be the
possibly inconsistent usage of gcc/omp-low.c:is_reference vs. "TREE_CODE
(TREE_TYPE ([...])) == REFERENCE_TYPE" in gcc/omp-low.c, because that
doesn't matter for C code anyway (no artificial REFERENCE_TYPEs
generated), right?  So it must be some other change installed on
gomp-4_0-branch but not on trunk.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [OpenACC] declare directive

2015-11-12 Thread Jakub Jelinek
On Wed, Nov 11, 2015 at 07:07:58PM -0600, James Norris wrote:
> +   oacc_declare_returns->remove (t);
> +
> +   if (oacc_declare_returns->elements () == 0)
> + {
> +   delete oacc_declare_returns;
> +   oacc_declare_returns = NULL;
> + }

Something for incremental patch:
1) might be nice to have some assertion that at the end of gimplify_body
   or so oacc_declare_returns is NULL
2) what happens if you refer to automatic variables of other functions
   (C or Fortran nested functions, maybe C++ lambdas); shall those be
   unmapped at the end of the (nested) function's body?

> @@ -5858,6 +5910,10 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree 
> decl,
>flags |= GOVD_FIRSTPRIVATE;
>break;
>  case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
> +  if (is_global_var (decl)
> +   && ctx->region_type & (ORT_ACC_PARALLEL | ORT_ACC_KERNELS)

Please put this condition as cheapest first.  I'd also surround
it into (), just to make it clear that the bitwise & is intentional.
Perhaps () != 0.

> +   && device_resident_p (decl))
> + flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;

> +   case GOMP_MAP_FROM:
> + kinds[i] = GOMP_MAP_FORCE_FROM;
> + GOACC_enter_exit_data (device, 1, &hostaddrs[i], &sizes[i],
> +&kinds[i], 0, 0);

Wrong indentation.

Ok with those two changes and please think about the incremental stuff.

Jakub


Pointless configure checks for macros

2015-11-12 Thread Jonathan Wakely

PR68307 points out that config/os/mingw32-w64/error_constants.h fails
to define a number of errc constants which correspond to EXXX macros
that are supported on mingw-w64.

Does anyone know why we test explicitly for these macros but not
others?

m4_foreach([syserr], [EOWNERDEAD, ENOTRECOVERABLE, ENOLINK, EPROTO, ENODATA,
 ENOSR, ENOSTR, ETIME, EBADMSG, ECANCELED,
 EOVERFLOW, ENOTSUP, EIDRM, ETXTBSY,
 ECHILD, ENOSPC, EPERM,
 ETIMEDOUT, EWOULDBLOCK],

Why do we even test for these in configure, instead of just checking
whether they exist directly using #ifdef in error_constants.h ?

(This was discussed in
https://gcc.gnu.org/ml/libstdc++/2011-08/msg00125.html where Paolo
questioned the value of these checks, but indicated a preference for
consistency).

A bit of (incomplete) archaeology suggests that at one time we defined
all of these in , but later split them out into
OS-specific error_constants.h files. If we have a file specific to
mingw-w64 can we just uncomment the constants known to be supported by
that target?

This patch uncomments all the constants with a corresponding macro in
mingw-w64-headers/crt/errno.h in the mingw-w64 sources.

2015-11-12  Jonathan Wakely  

PR libstdc++/68307
* config/os/mingw32-w64/error_constants.h: Uncomment all error codes
supported by mingw-w64.

Is there any problem doing this?

diff --git a/libstdc++-v3/config/os/mingw32-w64/error_constants.h b/libstdc++-v3/config/os/mingw32-w64/error_constants.h
index 0168b5f..e0211bf 100644
--- a/libstdc++-v3/config/os/mingw32-w64/error_constants.h
+++ b/libstdc++-v3/config/os/mingw32-w64/error_constants.h
@@ -41,22 +41,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // replaced by Winsock WSA-prefixed equivalents.
   enum class errc
 {
-//address_family_not_supported = 		EAFNOSUPPORT,
-//address_in_use = EADDRINUSE,
-//address_not_available = 			EADDRNOTAVAIL,
-//already_connected = 			EISCONN,
+  address_family_not_supported = 		EAFNOSUPPORT,
+  address_in_use = EADDRINUSE,
+  address_not_available = 			EADDRNOTAVAIL,
+  already_connected = 			EISCONN,
   argument_list_too_long = 			E2BIG,
   argument_out_of_domain = 			EDOM,
   bad_address = EFAULT,
   bad_file_descriptor = 			EBADF,
 //bad_message = EBADMSG,
   broken_pipe = EPIPE,
-//connection_aborted = 			ECONNABORTED,
-//connection_already_in_progress = 		EALREADY,
-//connection_refused = 			ECONNREFUSED,
-//connection_reset = 			ECONNRESET,
-//cross_device_link = 			EXDEV,
-//destination_address_required = 		EDESTADDRREQ,
+  connection_aborted = 			ECONNABORTED,
+  connection_already_in_progress = 		EALREADY,
+  connection_refused = 			ECONNREFUSED,
+  connection_reset = 			ECONNRESET,
+  cross_device_link = 			EXDEV,
+  destination_address_required = 		EDESTADDRREQ,
   device_or_resource_busy = 		EBUSY,
   directory_not_empty = 			ENOTEMPTY,
   executable_format_error = 		ENOEXEC,
@@ -64,7 +64,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   file_too_large = EFBIG,
   filename_too_long = 			ENAMETOOLONG,
   function_not_supported = 			ENOSYS,
-//host_unreachable = 			EHOSTUNREACH,
+  host_unreachable = 			EHOSTUNREACH,
 //identifier_removed = 			EIDRM,
   illegal_byte_sequence = 			EILSEQ,
   inappropriate_io_control_operation = 	ENOTTY,
@@ -73,11 +73,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   invalid_seek = ESPIPE,
   io_error = EIO,
   is_a_directory = EISDIR,
-//message_size = EMSGSIZE,
-//network_down = ENETDOWN,
-//network_reset = ENETRESET,
-//network_unreachable = 			ENETUNREACH,
-//no_buffer_space = 			ENOBUFS,
+  message_size = EMSGSIZE,
+  network_down = ENETDOWN,
+  network_reset = ENETRESET,
+  network_unreachable = 			ENETUNREACH,
+  no_buffer_space = 			ENOBUFS,
 #ifdef _GLIBCXX_HAVE_ECHILD
   no_child_process = 			ECHILD,
 #endif
@@ -85,7 +85,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   no_lock_available = 			ENOLCK,
 //no_message_available = 			ENODATA,
 //no_message = ENOMSG,
-//no_protocol_option = 			ENOPROTOOPT,
+  no_protocol_option = 			ENOPROTOOPT,
 #ifdef _GLIBCXX_HAVE_ENOSPC
   no_space_on_device = 			ENOSPC,
 #endif
@@ -95,26 +95,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   no_such_file_or_directory = 		ENOENT,
   no_such_process = 			ESRCH,
   not_a_directory = 			ENOTDIR,
-//not_a_socket = ENOTSOCK,
+  not_a_socket = ENOTSOCK,
 //not_a_stream = ENOSTR,
-//not_connected = ENOTCONN,
+  not_connected = ENOTCONN,
   not_enough_memory = 			ENOMEM,
 #ifdef _GLIBCXX_HAVE_ENOTSUP
   not_supported = ENOTSUP,
 #endif
-//operation_canceled = 			ECANCELED,
-//operation_in_progress = 			EINPROGRESS,
+  operation_canceled = 			ECANCELED,
+  operation_in_pro

[PATCH][ARM]Fix addsi3_compare_op2 pattern.

2015-11-12 Thread Renlin Li

Hi all,

This is a simply patch to adjust the assembly output for 
addsi3_compare_op2 rtx pattern in ARM backend.


According to the constraints, it's the second alternative which allows 
the second operand to be a constant.
The original pattern will trigger an ICE when the third alternative is 
chosen, and trying to output a constant while the second operand is a 
register.


This is triggered by my experimental backend changes. branch 5, 4.9 all 
have this problem.


arm-none-linux-gnueabihf bootstrap Okay, arm-none-eabi regression test Okay.

Okay to commit into trunk and backport to branch 5 and 4.9?

Regards,
Renlin Li

gcc/ChangeLog:

2015-11-12  Renlin Li  

* config/arm/arm.md (addsi3_compare_op2): Make the order of
assembly pattern consistent with constraint order.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8ebb1bf..73c3088 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -747,8 +747,8 @@
   "TARGET_32BIT"
   "@
adds%?\\t%0, %1, %2
-   adds%?\\t%0, %1, %2
-   subs%?\\t%0, %1, #%n2"
+   subs%?\\t%0, %1, #%n2
+   adds%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )


RE: [PATCH][ARC] Fix ARC backend ICE on pr29921-2

2015-11-12 Thread Claudiu Zissulescu
Patch applied.

Thanks Joern,
Claudiu

> -Original Message-
> From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk]
> Sent: Wednesday, November 11, 2015 7:15 PM
> To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: Francois Bedard
> Subject: Re: [PATCH][ARC] Fix ARC backend ICE on pr29921-2
> 
> 
> 
> On 11/11/15 15:22, Claudiu Zissulescu wrote:
> > Please find attached a patch that fixes the ARC backend ICE on pr29921-2
> test from gcc.dg (dg.exp).
> >
> > The patch will allow generating conditional move also outside expand
> scope. The error was triggered during if-conversion.
> >
> > Ok to apply?
> 
> OK.



Re: [PATCH][ARM]Fix addsi3_compare_op2 pattern.

2015-11-12 Thread Kyrill Tkachov

Hi Renlin,

On 12/11/15 09:29, Renlin Li wrote:

Hi all,

This is a simply patch to adjust the assembly output for addsi3_compare_op2 rtx 
pattern in ARM backend.

According to the constraints, it's the second alternative which allows the 
second operand to be a constant.
The original pattern will trigger an ICE when the third alternative is chosen, 
and trying to output a constant while the second operand is a register.

This is triggered by my experimental backend changes. branch 5, 4.9 all have 
this problem.

arm-none-linux-gnueabihf bootstrap Okay, arm-none-eabi regression test Okay.

Okay to commit into trunk and backport to branch 5 and 4.9?

Regards,
Renlin Li

gcc/ChangeLog:

2015-11-12  Renlin Li  

* config/arm/arm.md (addsi3_compare_op2): Make the order of
assembly pattern consistent with constraint order.


Yes, this is ok. I think the order of the alternatives is obviously wrong.

For context, this is the whole pattern:
(define_insn "*addsi3_compare_op2"
  [(set (reg:CC_C CC_REGNUM)
(compare:CC_C
 (plus:SI (match_operand:SI 1 "s_register_operand" "r,r,r")
  (match_operand:SI 2 "arm_add_operand" "I,L,r"))
 (match_dup 2)))
   (set (match_operand:SI 0 "s_register_operand" "=r,r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
  "TARGET_32BIT"
  "@
   add%.\\t%0, %1, %2
   add%.\\t%0, %1, %2
   sub%.\\t%0, %1, #%n2"
  [(set_attr "conds" "set")
   (set_attr "type" "alus_imm,alus_imm,alus_sreg")]
)

Thanks,
Kyrill



Re: Recent patch craters vector tests on powerpc64le-linux-gnu

2015-11-12 Thread James Greenhalgh
On Wed, Nov 11, 2015 at 05:12:29PM -0600, Bill Schmidt wrote:
> Hi Ilya,
> 
> The patch committed as r230098 has caused a number of ICEs on
> powerpc64le-linux-gnu.

And arm-none-linux-gnueabihf, and aarch64-none-linux-gnu.

> Could you please either revert the patch or fix these issues?
 
Thanks,
James



Re: [PATCH] Fix PR ipa/68035 (v2)

2015-11-12 Thread Martin Liška
Hello.

I'm sending reworked version of the patch, where I renamed 'sem_item::hash' to 
'm_hash'
and wrapped all usages with 'get_hash'. Apart from that, a new member function 
'set_hash'
is utilized for changing the hash value. Hope it's easier for understanding.

Patch can survive regression tests and bootstraps on x86_64-linux-pc.

Ready for trunk?
Thanks,
Martin
>From 29be4ad798d73245715f53fe971a17664b69eeb8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Nov 2015 18:31:31 +0100
Subject: [PATCH] Fix PR ipa/68035

gcc/ChangeLog:

2015-11-12  Martin Liska  

	PR ipa/68035
	* ipa-icf.c (void sem_item::set_hash): New function.
	(sem_function::get_hash): Use renamed m_hash member variable.
	(sem_item::update_hash_by_addr_refs): Utilize get_hash.
	(sem_item::update_hash_by_local_refs): Likewise.
	(sem_variable::get_hash): Use renamed m_hash member variable.
	(sem_item_optimizer::update_hash_by_addr_refs): Utilize get_hash.
	(sem_item_optimizer::build_hash_based_classes): Utilize set_hash.
	(sem_item_optimizer::build_graph): As the hash value of an item
	is lazy initialized, force the calculation.
	* ipa-icf.h (set_hash): Declare new function and rename hash member
	variable to m_hash.

gcc/testsuite/ChangeLog:

2015-11-12  Martin Liska  

	* gcc.dg/ipa/pr68035.c: New test.
---
 gcc/ipa-icf.c  |  46 +---
 gcc/ipa-icf.h  |   9 ++--
 gcc/testsuite/gcc.dg/ipa/pr68035.c | 108 +
 3 files changed, 141 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr68035.c

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 7bb3af5..b6a97c3 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -140,7 +140,7 @@ sem_usage_pair::sem_usage_pair (sem_item *_item, unsigned int _index):
for bitmap memory allocation.  */
 
 sem_item::sem_item (sem_item_type _type,
-		bitmap_obstack *stack): type(_type), hash(0)
+		bitmap_obstack *stack): type (_type), m_hash (0)
 {
   setup (stack);
 }
@@ -151,7 +151,7 @@ sem_item::sem_item (sem_item_type _type,
 
 sem_item::sem_item (sem_item_type _type, symtab_node *_node,
 		hashval_t _hash, bitmap_obstack *stack): type(_type),
-  node (_node), hash (_hash)
+  node (_node), m_hash (_hash)
 {
   decl = node->decl;
   setup (stack);
@@ -227,6 +227,11 @@ sem_item::target_supports_symbol_aliases_p (void)
 #endif
 }
 
+void sem_item::set_hash (hashval_t hash)
+{
+  m_hash = hash;
+}
+
 /* Semantic function constructor that uses STACK as bitmap memory stack.  */
 
 sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack),
@@ -274,7 +279,7 @@ sem_function::get_bb_hash (const sem_bb *basic_block)
 hashval_t
 sem_function::get_hash (void)
 {
-  if(!hash)
+  if (!m_hash)
 {
   inchash::hash hstate;
   hstate.add_int (177454); /* Random number for function type.  */
@@ -289,7 +294,6 @@ sem_function::get_hash (void)
   for (unsigned i = 0; i < bb_sizes.length (); i++)
 	hstate.add_int (bb_sizes[i]);
 
-
   /* Add common features of declaration itself.  */
   if (DECL_FUNCTION_SPECIFIC_TARGET (decl))
 hstate.add_wide_int
@@ -301,10 +305,10 @@ sem_function::get_hash (void)
   hstate.add_flag (DECL_CXX_CONSTRUCTOR_P (decl));
   hstate.add_flag (DECL_CXX_DESTRUCTOR_P (decl));
 
-  hash = hstate.end ();
+  set_hash (hstate.end ());
 }
 
-  return hash;
+  return m_hash;
 }
 
 /* Return ture if A1 and A2 represent equivalent function attribute lists.
@@ -800,7 +804,7 @@ sem_item::update_hash_by_addr_refs (hash_map  &m_symtab_node_map)
 {
   ipa_ref* ref;
-  inchash::hash hstate (hash);
+  inchash::hash hstate (get_hash ());
 
   for (unsigned i = 0; node->iterate_reference (i, ref); i++)
 {
@@ -823,7 +827,7 @@ sem_item::update_hash_by_addr_refs (hash_map  &m_symtab_node_map)
 {
   ipa_ref* ref;
-  inchash::hash state (hash);
+  inchash::hash state (get_hash ());
 
   for (unsigned j = 0; node->iterate_reference (j, ref); j++)
 {
   sem_item **result = m_symtab_node_map.get (ref->referring);
   if (result)
-	state.merge_hash ((*result)->hash);
+	state.merge_hash ((*result)->get_hash ());
 }
 
   if (type == FUNC)
@@ -851,7 +855,7 @@ sem_item::update_hash_by_local_refs (hash_map caller);
 	  if (result)
-	state.merge_hash ((*result)->hash);
+	state.merge_hash ((*result)->get_hash ());
 	}
 }
 
@@ -2099,8 +2103,8 @@ sem_variable::parse (varpool_node *node, bitmap_obstack *stack)
 hashval_t
 sem_variable::get_hash (void)
 {
-  if (hash)
-return hash;
+  if (m_hash)
+return m_hash;
 
   /* All WPA streamed in symbols should have their hashes computed at compile
  time.  At this point, the constructor may not be in memory at all.
@@ -2113,9 +2117,9 @@ sem_variable::get_hash (void)
   if (DECL_SIZE (decl) && tree_fits_shwi_p (DECL_SIZE (decl)))
 hstate.add_wide_int (tree_to_shwi (DECL_SIZE (decl)));
   add_expr (ctor, hstate);
-  hash = hstate.end ();
+  set_hash (hstate.end ());
 

Re: [PATCH] Fix PR ipa/68035

2015-11-12 Thread Martin Liška
On 11/06/2015 05:43 PM, Jan Hubicka wrote:
>> Hello.
>>
>> Following patch triggers hash calculation of items (functions and variables)
>> in situations where LTO mode is not utilized.
>>
>> Patch survives regression tests and bootstraps on x86_64-linux-pc.
>>
>> Ready for trunk?
>> Thanks,
>> Martin
> 
>> >From 62266e21a89777c6dbd680f7c87f15abe603c024 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Thu, 5 Nov 2015 18:31:31 +0100
>> Subject: [PATCH] Fix PR ipa/68035
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-11-05  Martin Liska  
>>
>>  * gcc.dg/ipa/pr68035.c: New test.
>>
>> gcc/ChangeLog:
>>
>> 2015-11-05  Martin Liska  
>>
>>  PR ipa/68035
>>  * ipa-icf.c (sem_item_optimizer::build_graph): Force building
>>  of a hash value for an item if we are not running in LTO mode.
>> ---
>>  gcc/ipa-icf.c  |   4 ++
>>  gcc/testsuite/gcc.dg/ipa/pr68035.c | 108 
>> +
>>  2 files changed, 112 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr68035.c
>>
>> diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
>> index 7bb3af5..09c42a1 100644
>> --- a/gcc/ipa-icf.c
>> +++ b/gcc/ipa-icf.c
>> @@ -2744,6 +2744,10 @@ sem_item_optimizer::build_graph (void)
>>  {
>>sem_item *item = m_items[i];
>>m_symtab_node_map.put (item->node, item);
>> +
>> +  /* Initialize hash values if we are not in LTO mode.  */
>> +  if (!in_lto_p)
>> +item->get_hash ();
>>  }
> 
> Hmm, what is the difference to the LTO mode here. I would have expected that 
> all the items
> was analyzed in both paths?

Difference is that in case of the LTO mode, the hash value is read from 
streamed LTO file.
On the other hand, in classic compilation mode we have to force the calculation 
as a hash value
is computed lazily.

Please take a look at just sent suggested patch.

Thanks,
Martin

> 
> Honza
> 



Re: Recent patch craters vector tests on powerpc64le-linux-gnu

2015-11-12 Thread Andreas Schwab
Bill Schmidt  writes:

> The patch committed as r230098 has caused a number of ICEs on
> powerpc64le-linux-gnu.

This is PR68296.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH, PR68286] Fix vector comparison expand

2015-11-12 Thread Ilya Enkovich
Hi,

My vector comparison patches broken expand of vector comparison on targets 
which don't have new comparison patterns but support VEC_COND_EXPR.  This 
happens because it's not checked vector comparison may be expanded as a 
comparison.  This patch fixes it.  Bootstrapped and regtested on 
powerpc64le-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-12  Ilya Enkovich  

* expr.c (do_store_flag): Expand vector comparison as
VEC_COND_EXPR if vector comparison is not supported
by target.

gcc/testsuite/

2015-11-12  Ilya Enkovich  

* gcc.dg/pr68286.c: New test.


diff --git a/gcc/expr.c b/gcc/expr.c
index 03936ee..bd43dc4 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11128,7 +11128,8 @@ do_store_flag (sepops ops, rtx target, machine_mode 
mode)
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
 {
   tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
-  if (VECTOR_BOOLEAN_TYPE_P (ops->type))
+  if (VECTOR_BOOLEAN_TYPE_P (ops->type)
+ && expand_vec_cmp_expr_p (TREE_TYPE (arg0), ops->type))
return expand_vec_cmp_expr (ops->type, ifexp, target);
   else
{
diff --git a/gcc/testsuite/gcc.dg/pr68286.c b/gcc/testsuite/gcc.dg/pr68286.c
new file mode 100644
index 000..d0392e8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68286.c
@@ -0,0 +1,17 @@
+/* PR target/68286 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b, c;
+int fn1 ()
+{
+  int d[] = {0};
+  for (; c; c++)
+{
+  float e = c;
+  if (e)
+d[0]++;
+}
+  b = d[0];
+  return a;
+}


Re: Recent patch craters vector tests on powerpc64le-linux-gnu

2015-11-12 Thread Ilya Enkovich
2015-11-12 12:48 GMT+03:00 James Greenhalgh :
> On Wed, Nov 11, 2015 at 05:12:29PM -0600, Bill Schmidt wrote:
>> Hi Ilya,
>>
>> The patch committed as r230098 has caused a number of ICEs on
>> powerpc64le-linux-gnu.
>
> And arm-none-linux-gnueabihf, and aarch64-none-linux-gnu.
>
>> Could you please either revert the patch or fix these issues?
>
> Thanks,
> James
>

Sorry for the breakage. I sent a patch to fix it.

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01467.html

Thanks,
Ilya


Re: [PATCH] PR ada/66205 gnatbind generates invalid code when finalization is enabled in restricted runtime

2015-11-12 Thread Arnaud Charlet
> > This situation arises, for example, with an embedded RTS that
> > incorporates the
> > Ada 2012 generalized container iterators.
> 
> I should add, this PR is the ???other half??? of PR ada/66242, which is fixed
> in GCC 6; so please can it be reviewed?

The proper patch for PR ada/66242 hasn't been committed yet (it's pending),
so I'd rather review the situation once PR ada/66242 is dealt with.

I'm not convinced at all that your patch is the way to go, so I'd rather
consider it only after PR ada/66242 is solved properly.

Arno


Re: [mask-vec_cond, patch 1/2] Support vectorization of VEC_COND_EXPR with no embedded comparison

2015-11-12 Thread Ramana Radhakrishnan
On Thu, Oct 8, 2015 at 4:50 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch allows COND_EXPR with no embedded comparison to be vectorized.
>  It's applied on top of vectorized comparison support series.  New optab 
> vcond_mask_optab
> is introduced for such statements.  Bool patterns now avoid comparison in 
> COND_EXPR in case vector comparison is supported by target.

New standard pattern names are documented in the internals manual.
This patch does not do so neither do I see any patches to do so.


regards
Ramana


>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-10-08  Ilya Enkovich  
>
> * optabs-query.h (get_vcond_mask_icode): New.
> * optabs-tree.c (expand_vec_cond_expr_p): Use
> get_vcond_mask_icode for VEC_COND_EXPR with mask.
> * optabs.c (expand_vec_cond_mask_expr): New.
> (expand_vec_cond_expr): Use get_vcond_mask_icode
> when possible.
> * optabs.def (vcond_mask_optab): New.
> * tree-vect-patterns.c (vect_recog_bool_pattern): Don't
> generate redundant comparison for COND_EXPR.
> * tree-vect-stmts.c (vect_is_simple_cond): Allow SSA_NAME
> as a condition.
> (vectorizable_condition): Likewise.
>
>
> diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> index 162d2e9..48bcf7c 100644
> --- a/gcc/optabs-query.h
> +++ b/gcc/optabs-query.h
> @@ -98,6 +98,15 @@ get_vcond_icode (machine_mode vmode, machine_mode cmode, 
> bool uns)
>return icode;
>  }
>
> +/* Return insn code for a conditional operator with a mask mode
> +   MMODE resulting in a value of mode VMODE.  */
> +
> +static inline enum insn_code
> +get_vcond_mask_icode (machine_mode vmode, machine_mode mmode)
> +{
> +  return convert_optab_handler (vcond_mask_optab, vmode, mmode);
> +}
> +
>  /* Enumerates the possible extraction_insn operations.  */
>  enum extraction_pattern { EP_insv, EP_extv, EP_extzv };
>
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index aa863cf..d887619 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -342,6 +342,9 @@ expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
>  {
>machine_mode value_mode = TYPE_MODE (value_type);
>machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
> +  if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type))
> +return get_vcond_mask_icode (TYPE_MODE (value_type),
> +TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing;
>if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
>|| GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode)
>|| get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index ca1a6e7..d26b8f8 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -5346,6 +5346,38 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, 
> rtx sel, rtx target)
>return tmp;
>  }
>
> +/* Generate insns for a VEC_COND_EXPR with mask, given its TYPE and its
> +   three operands.  */
> +
> +rtx
> +expand_vec_cond_mask_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
> +  rtx target)
> +{
> +  struct expand_operand ops[4];
> +  machine_mode mode = TYPE_MODE (vec_cond_type);
> +  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
> +  enum insn_code icode = get_vcond_mask_icode (mode, mask_mode);
> +  rtx mask, rtx_op1, rtx_op2;
> +
> +  if (icode == CODE_FOR_nothing)
> +return 0;
> +
> +  mask = expand_normal (op0);
> +  rtx_op1 = expand_normal (op1);
> +  rtx_op2 = expand_normal (op2);
> +
> +  mask = force_reg (GET_MODE (mask), mask);
> +  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
> +
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[1], rtx_op1, mode);
> +  create_input_operand (&ops[2], rtx_op2, mode);
> +  create_input_operand (&ops[3], mask, mask_mode);
> +  expand_insn (icode, 4, ops);
> +
> +  return ops[0].value;
> +}
> +
>  /* Generate insns for a VEC_COND_EXPR, given its TYPE and its
> three operands.  */
>
> @@ -5371,12 +5403,21 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, 
> tree op1, tree op2,
>  }
>else
>  {
> -  /* Fake op0 < 0.  */
>gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
> -  op0a = op0;
> -  op0b = build_zero_cst (TREE_TYPE (op0));
> -  tcode = LT_EXPR;
> -  unsignedp = false;
> +  if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
> + != CODE_FOR_nothing)
> +   return expand_vec_cond_mask_expr (vec_cond_type, op0, op1,
> + op2, target);
> +  /* Fake op0 < 0.  */
> +  else
> +   {
> + gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
> + == MODE_VECTOR_INT);
> + op0a = op0;
> + op0b = build_zero_cst (TREE_TYPE (op0));
> + tcode = LT_EXPR;
> + unsignedp = false;
> +   }
>  }
>cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
>
> diff --git a/gcc/optabs.

[PATCH 04/N] Fix big memory leak in ix86_valid_target_attribute_p

2015-11-12 Thread Martin Liška
Hello.

Following patch was a bit negotiated with Jakub and can save a huge amount of 
memory in cases
where target attributes are heavily utilized.

Can bootstrap and survives regression tests on x86_64-linux-pc.

Ready for trunk?
Thanks,
Martin
>From ebb7bd3cf513dc437622868eddbed6c8f725a67c Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 Nov 2015 12:52:11 +0100
Subject: [PATCH] Fix big memory leak in ix86_valid_target_attribute_p

---
 gcc/config/i386/i386.c |  2 ++
 gcc/gcc.c  |  2 +-
 gcc/lto-wrapper.c  |  2 +-
 gcc/opts-common.c  |  1 +
 gcc/opts.c | 16 +++-
 gcc/opts.h |  1 +
 6 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b84a11d..1325cf0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6237,6 +6237,8 @@ ix86_valid_target_attribute_p (tree fndecl,
 	DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize;
 }
 
+  finalize_options_struct (&func_options);
+
   return ret;
 }
 
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 8bbf5be..87d1979 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -9915,7 +9915,7 @@ driver_get_configure_time_options (void (*cb) (const char *option,
   size_t i;
 
   obstack_init (&obstack);
-  gcc_obstack_init (&opts_obstack);
+  init_opts_obstack ();
   n_switches = 0;
 
   for (i = 0; i < ARRAY_SIZE (option_default_specs); i++)
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 20e67ed..b9ac535 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1355,7 +1355,7 @@ main (int argc, char *argv[])
 {
   const char *p;
 
-  gcc_obstack_init (&opts_obstack);
+  init_opts_obstack ();
 
   p = argv[0] + strlen (argv[0]);
   while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index d9bf4d4..06e88b5 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -706,6 +706,7 @@ decode_cmdline_option (const char **argv, unsigned int lang_mask,
 /* Obstack for option strings.  */
 
 struct obstack opts_obstack;
+bool opts_obstack_initialized = false;
 
 /* Like libiberty concat, but allocate using opts_obstack.  */
 
diff --git a/gcc/opts.c b/gcc/opts.c
index 9a3fbb3..527e678 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -266,6 +266,20 @@ add_comma_separated_to_vector (void **pvec, const char *arg)
   *pvec = v;
 }
 
+static bool opts_obstack_initialized = false;
+
+/* Initialize opts_obstack if not initialized.  */
+
+void
+init_opts_obstack (void)
+{
+  if (!opts_obstack_initialized)
+{
+  opts_obstack_initialized = true;
+  gcc_obstack_init (&opts_obstack);
+}
+}
+
 /* Initialize OPTS and OPTS_SET before using them in parsing options.  */
 
 void
@@ -273,7 +287,7 @@ init_options_struct (struct gcc_options *opts, struct gcc_options *opts_set)
 {
   size_t num_params = get_num_compiler_params ();
 
-  gcc_obstack_init (&opts_obstack);
+  init_opts_obstack ();
 
   *opts = global_options_init;
 
diff --git a/gcc/opts.h b/gcc/opts.h
index 38b3837..2eb2d97 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -323,6 +323,7 @@ extern void decode_cmdline_options_to_array (unsigned int argc,
 extern void init_options_once (void);
 extern void init_options_struct (struct gcc_options *opts,
  struct gcc_options *opts_set);
+extern void init_opts_obstack (void);
 extern void finalize_options_struct (struct gcc_options *opts);
 extern void decode_cmdline_options_to_array_default_mask (unsigned int argc,
 			  const char **argv, 
-- 
2.6.2



[committed] gen-pass-instances.awk: Rename var where to call_starts_at in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch renames the rather generic variable 'where' to the more 
specific 'call_starts_at' in handle_line in gen-pass-instances.awk.


Committed as to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Rename var where to call_starts_at in handle_line

2015-11-12  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Rename var where to
	call_starts_at.

---
 gcc/gen-pass-instances.awk | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index b10c26a..311273e 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -43,8 +43,8 @@ function handle_line()
 	line = $0;
 
 	# Find call expression.
-	where = match(line, /NEXT_PASS \((.+)\)/);
-	if (where == 0)
+	call_starts_at = match(line, /NEXT_PASS \((.+)\)/);
+	if (call_starts_at == 0)
 	{
 		print line;
 		return;
@@ -59,7 +59,7 @@ function handle_line()
 
 	# Find pass_name argument
 	len_of_pass_name = len_of_call - (len_of_start + len_of_close);
-	pass_starts_at = where + len_of_start;
+	pass_starts_at = call_starts_at + len_of_start;
 	pass_name = substr(line, pass_starts_at, len_of_pass_name);
 
 	# Find call expression prefix (until and including called function)


[committed] gen-pass-instances.awk: Simplify init of postfix_starts_at in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch simplifies the initialization of postfix_starts_at in 
handle_line in gen-pass-instances.awk.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Simplify init of postfix_starts_at in handle_line

2015-11-12  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Simplify init of
	postfix_starts_at.

---
 gcc/gen-pass-instances.awk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 311273e..08d4a37 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -67,7 +67,7 @@ function handle_line()
 	prefix = substr(line, 1, prefix_len);
 
 	# Find call expression postfix
-	postfix_starts_at = pass_starts_at + len_of_pass_name + len_of_close;
+	postfix_starts_at = call_starts_at + len_of_call;
 	postfix = substr(line, postfix_starts_at);
 
 	# Set pass_counts


Re: [Patch ARM] Switch ARM to unified asm.

2015-11-12 Thread Ramana Radhakrishnan
On Thu, Nov 12, 2015 at 9:21 AM, Christian Bruel  wrote:
> Hi Ramana,
>
> On 11/10/2015 12:48 PM, Ramana Radhakrishnan wrote:
>>
>> [Resending as I managed to muck this up with my mail client]
>>
>> Hi,
>>
>> I held off committing a previous version of this patch that I posted in
>> July to be nice to folks backporting fixes and to watch for any objections
>> to move the ARM backend completely over into the unified assembler.
>>
>> The patch does the following.
>>
>> * The compiler now generates code in all ISA modes in unified asm.
>> * We've had unified asm only for the last 10 years, ever since the first
>> Thumb2 support was put in, the disassembler generates output in unified
>> assembler, while the compiler output is always in divided syntax for ARM
>> state.
>> * This means patterns get simpler not having to worry about the position
>> of the condition in a conditional instruction. For example we now
>> consistently use
>> a. ldrbeq rather than ldreqb
>> b. movseq rather than moveqs
>> c. Or indeed the appropriate push / pop instructions whereever
>> appropriate.
>>
>>
>> The compiler behaviour has not changed in terms of what it does with
>> inline assembler, that still remains in divided syntax and over time we need
>> to move all of this over to unified syntax if we can do so as all the
>> official documentation is now in terms of unified asm. I've been carrying
>> this in my tree for quite a while and am reasonably happy that it is stable.
>> I will watch out for any fallout in the coming weeks with this but it is
>> better to take this now rather than later given we are hitting the end of
>> stage1.
>>
>> Tested on arm-none-eabi - applied to trunk.
>>
>>
>
> I see a failure with an outdated check for the unified assembly. OK to fix ?
>

OK thanks.

Ramana
>
>


[committed] gen-pass-instances.awk: Simplify match regexp in handle_line

2015-11-12 Thread Tom de Vries

Hi,

this patch simplifies the match regexp in handle_line in 
gen-pass-instances.awk.


Committed to trunk as trivial.

Thanks,
- Tom
gen-pass-instances.awk: Simplify match regexp in handle_line

2015-11-12  Tom de Vries  

	* gen-pass-instances.awk (handle_line): Simplify match regexp.

---
 gcc/gen-pass-instances.awk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 08d4a37..cbfaa86 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -43,7 +43,7 @@ function handle_line()
 	line = $0;
 
 	# Find call expression.
-	call_starts_at = match(line, /NEXT_PASS \((.+)\)/);
+	call_starts_at = match(line, /NEXT_PASS \(.+\)/);
 	if (call_starts_at == 0)
 	{
 		print line;


[patch] Fix doxygen @file comment in libstdc++ header

2015-11-12 Thread Jonathan Wakely

A trivial patch, I didn't edit the @file when I moved this file to the
new bits sub-directory.

Committed as obvious.


commit 1229ad46adf4d9a74b3da4e354120ebaa1be8eb1
Author: Jonathan Wakely 
Date:   Thu Nov 12 10:07:08 2015 +

	* include/experimental/bits/string_view.tcc: Fix doxygen @file.

diff --git a/libstdc++-v3/include/experimental/bits/string_view.tcc b/libstdc++-v3/include/experimental/bits/string_view.tcc
index 75a34f9..0eb4f70 100644
--- a/libstdc++-v3/include/experimental/bits/string_view.tcc
+++ b/libstdc++-v3/include/experimental/bits/string_view.tcc
@@ -22,7 +22,7 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
-/** @file experimental/string_view.tcc
+/** @file experimental/bits/string_view.tcc
  *  This is an internal header file, included by other library headers.
  *  Do not attempt to use it directly. @headername{string_view}
  */


Re: [hsa 2/12] Modifications to libgomp proper

2015-11-12 Thread Jakub Jelinek
On Thu, Nov 05, 2015 at 10:54:42PM +0100, Martin Jambor wrote:
> The patch below contains all changes to libgomp files.  First, it adds
> a new constant identifying HSA devices and a structure that is shared
> between libgomp and the compiler when kernels from kernels are invoked
> via dynamic parallelism.
> 
> Second it modifies the GOMP_target_41 function so that it also can take
> kernel attributes (essentially the grid dimension) as a parameter and
> pass it on the HSA libgomp plugin.  Because we do want HSAIL
> generation to gracefully fail and use host fallback in that case, the
> same function calls the host implementation if it cannot map the
> requested function to an accelerated one or of a new callback
> can_run_func indicates there is a problem.
> 
> We need a new hook because we use it to check for linking errors which
> we cannot do when incrementally loading registered images.  And we
> want to handle linking errors, so that when we cannot emit HSAIL for a
> function called from a kernel (possibly in a different compilation
> unit), we also resort to host fallback.
> 
> Last but not least, the patch removes data remapping when the selected
> device is capable of sharing memory with the host.

The patch clearly is not against current trunk, there is no GOMP_target_41
function, the GOMP_target_ext function has extra arguments, etc.

> diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
> index 9c8b1fb..0ad42d2 100644
> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -876,7 +876,8 @@ struct gomp_device_descr
>void *(*dev2host_func) (int, void *, const void *, size_t);
>void *(*host2dev_func) (int, void *, const void *, size_t);
>void *(*dev2dev_func) (int, void *, const void *, size_t);
> -  void (*run_func) (int, void *, void *);
> +  void (*run_func) (int, void *, void *, const void *);

Adding arguments to existing plugin methods is a plugin ABI incompatible
change.  We now have:
  DLSYM (version);
  if (device->version_func () != GOMP_VERSION)
{
  err = "plugin version mismatch";
  goto fail;
}
so there is a way to deal with it, but you need to adjust all plugins.
See below anyway.

> --- a/libgomp/oacc-host.c
> +++ b/libgomp/oacc-host.c
> @@ -123,7 +123,8 @@ host_host2dev (int n __attribute__ ((unused)),
>  }
>  
>  static void
> -host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars)
> +host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars,
> +   const void* kern_launch __attribute__ ((unused)))

This is C, space before * not after it.
>  {
>void (*fn)(void *) = (void (*)(void *)) fn_ptr;

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1248,7 +1248,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
> *devicep,
>splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
>gomp_mutex_unlock (&devicep->lock);
>if (tgt_fn == NULL)
> - gomp_fatal ("Target function wasn't mapped");
> + {
> +   if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> + return NULL;
> +   else
> + gomp_fatal ("Target function wasn't mapped");
> + }
>  
>return (void *) tgt_fn->tgt_offset;
>  }
> @@ -1276,6 +1281,7 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
>  return gomp_target_fallback (fn, hostaddrs);
>  
>void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> +  assert (fn_addr);

I must say I really don't like putting asserts into libgomp, in production
it is after all not built with -D_NDEBUG.  But this shows a worse problem,
if you have GCC 5 compiled OpenMP code, of course there won't be HSA
offloaded copy, but if you try to run it on a box with HSA offloading
enabled, you can run into this assertion failure.
Supposedly the old APIs (GOMP_target, GOMP_target_update, GOMP_target_data)
should treat GOMP_OFFLOAD_CAP_SHARED_MEM capable devices as unconditional
device fallback?

> @@ -1297,7 +1304,7 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
>  void
>  GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
>   void **hostaddrs, size_t *sizes, unsigned short *kinds,
> - unsigned int flags, void **depend)
> + unsigned int flags, void **depend, const void *kernel_launch)

GOMP_target_ext has different arguments, you get the num_teams and
thread_limit clauses values in there already (if known at compile time or
before entering target region; 0 stands for implementation defined choice,
-1 for unknown before GOMP_target_ext).
Plus I must say I really don't like the addition of HSA specific argument
to the API, it is unclean and really doesn't scale, when somebody adds
support for another offloading target, would we add again another argument?
Can't use the same one, because one could have configured both HSA and that
other kind offloading at the same time and which one is picked would be only
a runtime decision, based on env vars of omp_set_defa

Re: [Patch] Optimize condition reductions where the result is an integer induction variable

2015-11-12 Thread Richard Biener
On Wed, Nov 11, 2015 at 7:54 PM, Alan Hayward  wrote:
>
>
> On 11/11/2015 13:25, "Richard Biener"  wrote:
>
>>On Wed, Nov 11, 2015 at 1:22 PM, Alan Hayward 
>>wrote:
>>> Hi,
>>> I hoped to post this in time for Monday’s cut off date, but
>>>circumstances
>>> delayed me until today. Hoping if possible this patch will still be able
>>> to go in.
>>>
>>>
>>> This patch builds upon the change for PR65947, and reduces the amount of
>>> code produced in a vectorized condition reduction where operand 2 of the
>>> COND_EXPR is an assignment of a increasing integer induction variable
>>>that
>>> won't wrap.
>>>
>>>
>>> For example (assuming all types are ints), this is a match:
>>>
>>> last = 5;
>>> for (i = 0; i < N; i++)
>>>   if (a[i] < min_v)
>>> last = i;
>>>
>>> Whereas, this is not because the result is based off a memory access:
>>> last = 5;
>>> for (i = 0; i < N; i++)
>>>   if (a[i] < min_v)
>>> last = a[i];
>>>
>>> In the integer induction variable case we can just use a MAX reduction
>>>and
>>> skip all the code I added in my vectorized condition reduction patch -
>>>the
>>> additional induction variables in vectorizable_reduction () and the
>>> additional checks in vect_create_epilog_for_reduction (). From the patch
>>> diff only, it's not immediately obvious that those parts will be skipped
>>> as there is no code changes in those areas.
>>>
>>> The initial value of the induction variable is force set to zero, as any
>>> other value could effect the result of the induction. At the end of the
>>> loop, if the result is zero, then we restore the original initial value.
>>
>>+static bool
>>+is_integer_induction (gimple *stmt, struct loop *loop)
>>
>>is_nonwrapping_integer_induction?
>>
>>+  tree lhs_max = TYPE_MAX_VALUE (TREE_TYPE (gimple_phi_result (stmt)));
>>
>>don't use TYPE_MAX_VALUE.
>>
>>+  /* Check that the induction increments.  */
>>+  if (tree_int_cst_compare (step, size_zero_node) <= 0)
>>+return false;
>>
>>tree_int_cst_sgn (step) == -1
>>
>>+  /* Check that the max size of the loop will not wrap.  */
>>+
>>+  if (! max_loop_iterations (loop, &ni))
>>+return false;
>>+  /* Convert backedges to iterations.  */
>>+  ni += 1;
>>
>>just use max_stmt_executions (loop, &ni) which properly checks for
>>overflow
>>of the +1.
>>
>>+  max_loop_value = wi::add (wi::to_widest (base),
>>+   wi::mul (wi::to_widest (step), ni));
>>+
>>+  if (wi::gtu_p (max_loop_value, wi::to_widest (lhs_max)))
>>+return false;
>>
>>you miss a check for the wi::add / wi::mul to overflow.  You can use
>>extra args to determine this.
>>
>>Instead of TYPE_MAX_VALUE use wi::max_value (precision, sign).
>>
>>I wonder if you want to skip all the overflow checks for
>>TYPE_OVERFLOW_UNDEFINED
>>IV types?
>>
>
> Ok with all the above.
>
> Tried using max_value () but this gave me a wide_int instead of a
> widest_int.
> Instead I've replaced with min_precision and GET_MODE_BITSIZE.
>
> Added an extra check for when the type is TYPE_OVERFLOW_UNDEFINED.

+ /* Set the loop-entry arg of the reduction-phi.  */
+
+ if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+   == INTEGER_INDUC_COND_REDUCTION)

extra vertical space

+ tree zero = build_int_cst ( TREE_TYPE (vec_init_def_type), 0);
+ tree zero_vec = build_vector_from_val (vec_init_def_type, zero);
+

build_zero_cst (vec_init_def_type);

+ else
+   {
+ add_phi_arg (as_a  (phi), vec_init_def,
   loop_preheader_edge (loop), UNKNOWN_LOCATION);
+   }

no {}s around single stmts

+ tree comparez = build2 (EQ_EXPR, boolean_type_node, new_temp, zero);

please no l33t speech

+ tmp = build3 (COND_EXPR, scalar_type, comparez, initial_def,
+   new_temp);
+
+ epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+ gimple_assign_set_lhs (epilog_stmt, new_temp);

epilog_stmt = gimple_build_assign (make_ssa_name (new_scalar_dest),
COND_EXPR,
compare, initial_def, new_temp);


+  /* Check that the max size of the loop will not wrap.  */
+
+  if (TYPE_OVERFLOW_UNDEFINED (lhs_type))
+{
+  return (GET_MODE_BITSIZE (TYPE_MODE (lhs_type))
+ >= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (base;

this mode check will always be true as lhs_type and base are from the
same PHI node.

+  return (wi::min_precision (max_loop_value, TYPE_SIGN (lhs_type))
+ <= GET_MODE_BITSIZE (TYPE_MODE (lhs_type)));

please use TYPE_PRECISION (lhs_type) instead.

Ok with those changes.

Thanks,
Richard.

>
>
> Alan.
>


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-12 Thread Richard Biener
On Wed, Nov 11, 2015 at 9:38 PM, Jeff Law  wrote:
> On 09/04/2015 11:36 AM, Ajit Kumar Agarwal wrote:
>
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index 6b66f8f..20ddf3d 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
>>>   NEXT_PASS (pass_ccp);
>>>   /* After CCP we rewrite no longer addressed locals into SSA
>>>  form if possible.  */
>>> +  NEXT_PASS (pass_path_split);
>>>   NEXT_PASS (pass_forwprop);
>>>   NEXT_PASS (pass_sra_early);
>>
>> I can't recall if we've discussed the location of the pass at all.  I'm
>> not objecting to this location, but would like to hear why you chose
>> this particular location in the optimization pipeline.
>
> So returning to the question of where this would live in the optimization
> pipeline and how it interacts with if-conversion and vectorization.

Note that adding passes to the early pipeline that do code duplication
is a no-no.
The early pipeline should be exclusively for things making functions
more suitable
for inlining.

> The concern with moving it to late in the pipeline was that we'd miss
> VRP/DCE/CSE opportunities.  I'm not sure if you're aware, but we actually
> run those passes more than once.  So it would be possible to run path
> splitting after if-conversion & vectorization, but before the second passs
> of VRP & DOM.  But trying that seems to result in something scrambling the
> loop enough that the path splitting opportunity is missed.  That might be
> worth deeper investigation if we can't come up with some kind of heuristics
> to fire or suppress path splitting.

As I still think it is a transform similar to tracer just put it next to that.

But IIRC you mentioned it should enable vectorization or so?  In this case
that's obviously too late.

Richard.

> Other random notes as I look over the code:
>
> Call the pass "path-split", not "path_split".  I don't think we have any
> passes with underscores in their names, dump files, etc.
>
> You factored out the code for transform_duplicate.  When you create new
> functions, they should all have a block comment indicating what they do,
> return values, etc.
>
> I asked you to trim down the #includes in tree-ssa-path-split.c  Most were
> ultimately unnecessary.  The trimmed list is just 11 headers.
>
> Various functions in tree-ssa-path-split.c were missing their block
> comments.  There were several places in tree-ssa-path-split that I felt
> deserved a comment.  While you are familiar with the code, it's likely
> someone else will have to look at and modify this code at some point in the
> future.  The comments help make that easier.
>
> In find_trace_loop_latch_same_as_join_blk, we find the immediate dominator
> of the latch and verify it ends in a conditional.  That's fine.  Then we
> look at the predecessors of the latch to see if one is succeeded only by the
> latch and falls through to the latch.  That is the block we'll end up
> redirecting to a copy of the latch.  Also fine.
>
> Note how there is no testing for the relationship between the immediate
> dominator of the latch and the predecessors of the latch.  ISTM that we can
> have a fairly arbitrary region in the THEN/ELSE arms of the conditional.
> Was this intentional?  Would it be advisable to verify that the THEN/ELSE
> arms are single blocks?  Do we want to verify that neither the THEN/ELSE
> arms transfer control other than to the latch?  Do we want to verify the
> predecessors of the latch are immediate successors of the latch's immediate
> dominator?
>
> The is_feasible_trace routine was still checking if the block had a
> conversion and rejecting it.  I removed that check.  It does seem to me that
> we need an upper limit on the number of statements.  I wonder if we should
> factor out the maximum statements to copy code from jump threading and use
> it for both jump threading and path splitting.
>
> Instead of creating loop with multiple latches, what ever happened to the
> idea of duplicating the latch block twice -- once into each path. Remove the
> control statement in each duplicate.  Then remove everything but the control
> statement in the original latch.
>
>
> I added some direct dump support.  Essentially anytime we split the path, we
> output something like this:
>
> Split path in loop: latch block 9, predecessor 7.
>
> That allows tests in the testsuite to look for the "Split path in loop"
> string rather than inferring the information from the SSA graph update's
> replacement table.  It also allows us to do things like count how many paths
> get split if we have more complex tests.
>
> On the topic of tests.  Is the one you provided something where path
> splitting results in a significant improvement?  From looking at the x86_64
> output, I can see the path splitting transformation occur, but not any
> improvement in the final code.
>
> While the existing test is useful, testing on 

[Ada] Warn when a non-imported constant overlays a constant

2015-11-12 Thread Arnaud Charlet
The compiler warns when a variable overlays a constant because of an address
clause on the former.  This change makes the compiler issue the same warning
when a non-imported constant overlays a constant.

The patch also removes an old pessimization whereby overlaid objects would
be treated as volatile by the compiler in some circumstances, for example
preventing them from being put into read-only memory if they are constant.

The compiler must issue the warning:

consovl3.adb:4:03: warning: constant "C" may be modified via address clause at
line 5

on the followig code:

with Q; use Q;

procedure Consovl3 is
  A : constant Natural := 0;
  for A'Address use C'Address;
begin
  null;
end;
package Q is

  C : constant Natural := 1;

end Q;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Eric Botcazou  

* einfo.ads (Overlays_Constant): Document usage for E_Constant.
* freeze.adb (Warn_Overlay): Small reformatting.
(Check_Address_Clause): Deal specifically with deferred
constants.  For a variable or a non-imported constant
overlaying a constant object and with initialization value,
either remove the initialization or issue a warning.  Fix a
couple of typos.
* sem_util.adb (Note_Possible_Modification): Overhaul the condition for
the warning on modified constants and use Find_Overlaid_Entity instead
of doing it manually.
* sem_ch13.adb (Analyze_Attribute_Definition_Clause): Compute and
set Overlays_Constant once on entry.  Do not treat the overlaid
entity as volatile.  Do not issue the warning on modified
constants here.
* gcc-interface/decl.c (gnat_to_gnu_entity) : Remove
over-restrictive condition for the special treatment of deferred
constants.
: Remove obsolete associated code.

Index: einfo.ads
===
--- einfo.ads   (revision 230223)
+++ einfo.ads   (working copy)
@@ -3638,8 +3638,9 @@
 -- Points to the component in the base type.
 
 --Overlays_Constant (Flag243)
---   Defined in all entities. Set only for a variable for which there is
---   an address clause which causes the variable to overlay a constant.
+--   Defined in all entities. Set only for E_Constant or E_Variable for
+--   which there is an address clause which causes the entity to overlay
+--   a constant object.
 
 --Overridden_Operation (Node26)
 --   Defined in subprograms. For overriding operations, points to the
Index: freeze.adb
===
--- freeze.adb  (revision 230223)
+++ freeze.adb  (working copy)
@@ -207,10 +207,7 @@
--  this to have a Freeze_Node, so ensure it doesn't. Do the same for any
--  Full_View or Corresponding_Record_Type.
 
-   procedure Warn_Overlay
- (Expr : Node_Id;
-  Typ  : Entity_Id;
-  Nam  : Node_Id);
+   procedure Warn_Overlay (Expr : Node_Id; Typ : Entity_Id; Nam : Node_Id);
--  Expr is the expression for an address clause for entity Nam whose type
--  is Typ. If Typ has a default initialization, and there is no explicit
--  initialization in the source declaration, check whether the address
@@ -598,16 +595,25 @@
--
 
procedure Check_Address_Clause (E : Entity_Id) is
-  Addr   : constant Node_Id:= Address_Clause (E);
+  Addr   : constant Node_Id   := Address_Clause (E);
+  Typ: constant Entity_Id := Etype (E);
+  Decl   : Node_Id;
   Expr   : Node_Id;
-  Decl   : constant Node_Id:= Declaration_Node (E);
-  Loc: constant Source_Ptr := Sloc (Decl);
-  Typ: constant Entity_Id  := Etype (E);
+  Init   : Node_Id;
   Lhs: Node_Id;
   Tag_Assign : Node_Id;
 
begin
   if Present (Addr) then
+
+ --  For a deferred constant, the initialization value is on full view
+
+ if Ekind (E) = E_Constant and then Present (Full_View (E)) then
+Decl := Declaration_Node (Full_View (E));
+ else
+Decl := Declaration_Node (E);
+ end if;
+
  Expr := Expression (Addr);
 
  if Needs_Constant_Address (Decl, Typ) then
@@ -656,29 +662,72 @@
 Warn_Overlay (Expr, Typ, Name (Addr));
  end if;
 
- if Present (Expression (Decl)) then
+ Init := Expression (Decl);
 
+ --  If a variable, or a non-imported constant, overlays a constant
+ --  object and has an initialization value, then the initialization
+ --  may end up writing into read-only memory. Detect the cases of
+ --  statically identical values and remove the initialization. In
+ --  the other cases, give a warning. We will give other warnings
+ --  later for the variable if it is assigned.
+
+ if (Ekind (E) = E_Variable
+   or el

[Ada] More efficient code generated for object overlays

2015-11-12 Thread Arnaud Charlet
This change refines the use of the "volatile hammer" to implement the advice
given in RM 13.3(19) by disabling it for object overlays altogether. relying
instead on the ref-all aliasing property of reference types to achieve the
desired effect.

This will generate better code for object overlays, for example the following
function should now make no memory accesses at all on 64-bit platforms when
compiled at -O2 or above:

package Vec is

  type U64 is mod 2**64;

  function Prod (A, B : U64) return U64;

end Vec;
package body Vec is

  function Prod (A, B : U64) return U64 is
type U16 is mod 2**16;
type V16 is array (1..4) of U16;
VA : V16;
for VA'Address use A'Address;
VB : V16;
for VB'Address use B'Address;
R : U64 := 0;
  begin
for I in V16'Range loop
  R := R + U64(VA (I)) * U64(VB (I));
end loop;
return R;
  end;

end Vec;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Eric Botcazou  

* sem_ch13.adb (Analyze_Attribute_Definition_Clause): For a
variable, if this is not an overlay, set on Treat_As_Volatile on it.
* gcc-interface/decl.c (E_Variable): Do not force the type to volatile
for address clauses. Tweak and adjust various RM references.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 230229)
+++ sem_ch13.adb(working copy)
@@ -4724,10 +4724,24 @@
 
   Find_Overlaid_Entity (N, O_Ent, Off);
 
-  --  If the object overlays a constant view, mark it so
+  if Present (O_Ent) then
+ --  If the object overlays a constant object, mark it so
 
-  if Present (O_Ent) and then Is_Constant_Object (O_Ent) then
- Set_Overlays_Constant (U_Ent);
+ if Is_Constant_Object (O_Ent) then
+Set_Overlays_Constant (U_Ent);
+ end if;
+  else
+ --  If this is not an overlay, mark a variable as being
+ --  volatile to prevent unwanted optimizations. It's a
+ --  conservative interpretation of RM 13.3(19) for the
+ --  cases where the compiler cannot detect potential
+ --  aliasing issues easily and it also covers the case
+ --  of an absolute address where the volatile aspect is
+ --  kind of implicit.
+
+ if Ekind (U_Ent) = E_Variable then
+Set_Treat_As_Volatile (U_Ent);
+ end if;
   end if;
 
   --  Overlaying controlled objects is erroneous.
Index: gcc-interface/decl.c
===
--- gcc-interface/decl.c(revision 230229)
+++ gcc-interface/decl.c(working copy)
@@ -1068,14 +1068,12 @@
  }
 
/* Make a volatile version of this object's type if we are to make
-  the object volatile.  We also interpret 13.3(19) conservatively
-  and disallow any optimizations for such a non-constant object.  */
+  the object volatile.  We also implement RM 13.3(19) for exported
+  and imported (non-constant) objects by making them volatile.  */
if ((Treat_As_Volatile (gnat_entity)
 || (!const_flag
 && gnu_type != except_type_node
-&& (Is_Exported (gnat_entity)
-|| imported_p
-|| Present (Address_Clause (gnat_entity)
+&& (Is_Exported (gnat_entity) || imported_p)))
&& !TYPE_VOLATILE (gnu_type))
  {
const int quals
@@ -1118,7 +1116,8 @@
  gnu_expr = convert (gnu_type, gnu_expr);
 
/* If this is a pointer that doesn't have an initializing expression,
-  initialize it to NULL, unless the object is imported.  */
+  initialize it to NULL, unless the object is declared imported as
+  per RM B.1(24).  */
if (definition
&& (POINTER_TYPE_P (gnu_type) || TYPE_IS_FAT_POINTER_P (gnu_type))
&& !gnu_expr
@@ -1141,7 +1140,7 @@
save_gnu_tree (gnat_entity, NULL_TREE, false);
 
/* Convert the type of the object to a reference type that can
-  alias everything as per 13.3(19).  */
+  alias everything as per RM 13.3(19).  */
gnu_type
  = build_reference_type_for_mode (gnu_type, ptr_mode, true);
gnu_address = convert (gnu_type, gnu_address);
@@ -1206,11 +1205,10 @@
   as an indirect object.  Likewise for Stdcall objects that are
   imported.  */
if ((!definition && Present (Address_Clause (gnat_entity)))
-   || (Is_Imported (gnat_entity)
-   && Has_Stdcall_Convention (gnat_entity)))
+   || (imported_p && Has_Stdcall_Conv

[Ada] Contract_Cases on entries

2015-11-12 Thread Arnaud Charlet
This patch implements apect/pragma Contract_Cases on enties.


-- Source --


--  tracker.ads

package Tracker is
   type Check_Kind is
 (Pre,
  Refined_Post,
  Post,
  Conseq_1,
  Conseq_2);

   type Tested_Array is array (Check_Kind) of Boolean;
   --  A value of "True" indicates that a check has been tested

   function Greater_Than
 (Kind : Check_Kind;
  Val  : Natural;
  Exp  : Natural) return Boolean;
   --  Determine whether value Val is greater than expected value Exp. The
   --  routine also updates the history for check of kind Kind. Duplicate
   --  attempts to modify the history are flagged as errors.

   procedure Reset;
   --  Reset the history

   procedure Verify (Exp : Tested_Array);
   --  Verify whether expected tests Exp were indeed checked. Emit an error if
   --  this is not the case.
end Tracker;

--  tacker.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Tracker is
   History : array (Check_Kind) of Boolean := (others => False);
   --  The history of performed checked. A value of "True" indicates that a
   --  check was performed.

   --
   -- Greater_Than --
   --

   function Greater_Than
 (Kind : Check_Kind;
  Val  : Natural;
  Exp  : Natural) return Boolean
   is
   begin
  if History (Kind) then
 Put_Line ("  ERROR: " & Kind'Img & " tested multiple times");
  else
 History (Kind) := True;
  end if;

  return Val > Exp;
   end Greater_Than;

   ---
   -- Reset --
   ---

   procedure Reset is
   begin
  History := (others => False);
   end Reset;

   
   -- Verify --
   

   procedure Verify (Exp : Tested_Array) is
   begin
  for Index in Check_Kind'Range loop
 if Exp (Index) and not History (Index) then
Put_Line ("  ERROR: " & Index'Img & " was not tested");
 elsif not Exp (Index) and History (Index) then
Put_Line ("  ERROR: " & Index'Img & " was tested");
 end if;
  end loop;
   end Verify;
end Tracker;

--  sync_contracts.ads

with Tracker; use Tracker;

package Sync_Contracts
  with SPARK_Mode,
   Abstract_State => State
is
   protected type Prot_Typ is
  entry Prot_Entry (Input : Natural; Output : out Natural)
with Global  => (Input => State),
 Depends => ((Prot_Typ, Output) => (State, Prot_Typ, Input)),
 Pre  => Greater_Than (Pre,  Input,  1),
 Post => Greater_Than (Post, Output, 4),
 Contract_Cases =>
   (Input < 5 => True,
Input = 5 => Greater_Than (Conseq_1, Output, 6),
Input = 6 => Greater_Than (Conseq_2, Output, 7),
Input > 6 => False);

  procedure Prot_Proc (Input : Natural; Output : out Natural)
with Pre  => Greater_Than (Pre , Input,  1),
 Post => Greater_Than (Post, Output, 4),
 Contract_Cases =>
   (Input < 5 => True,
Input = 5 => Greater_Than (Conseq_1, Output, 6),
Input = 6 => Greater_Than (Conseq_2, Output, 7),
Input > 6 => False);

  function Prot_Func (Input : Natural) return Natural
with Pre  => Greater_Than (Pre , Input, 1),
 Post => Greater_Than (Post, Prot_Func'Result, 4),
 Contract_Cases =>
   (Input < 5 => True,
Input = 5 => Greater_Than (Conseq_1, Prot_Func'Result, 6),
Input = 6 => Greater_Than (Conseq_2, Prot_Func'Result, 7),
Input > 6 => False);
   end Prot_Typ;

   task type Tsk_Typ is
  entry Tsk_Entry (Input : Natural; Output : out Natural)
with Pre  => Greater_Than (Pre , Input,  1),
 Post => Greater_Than (Post, Output, 4),
 Contract_Cases =>
   (Input < 5 => True,
Input = 5 => Greater_Than (Conseq_1, Output, 6),
Input = 6 => Greater_Than (Conseq_2, Output, 7),
Input > 6 => False);
   end Tsk_Typ;
end Sync_Contracts;

--  sync_contracts.adb

package body Sync_Contracts
  with SPARK_Mode,
   Refined_State => (State => Var)
is
   Var : Integer := 1;

   protected body Prot_Typ is
  entry Prot_Entry (Input : Natural; Output : out Natural)
with Refined_Global  => (Input => Var),
 Refined_Depends => ((Prot_Typ, Output) => (Var, Prot_Typ, Input)),
 Refined_Post => Greater_Than (Refined_Post, Output, 3)
when True
  is
  begin
 Output := Input + 1;
  end Prot_Entry;

  procedure Prot_Proc (Input : Natural; Output : out Natural)
with Refined_Post => Greater_Than (Refined_Post, Output, 3)
  is
  begin
 Output := Input + 1;
  end Prot_Proc;

  function Prot_Func (Input : Natural) return Natural
with Refined_Post => Greater_Than (Refined_Post, Prot_Func'Result, 3)
  is
  begin
 return Input + 

Re: [hsa 4/12] OpenMP lowering/expansion changes (gridification)

2015-11-12 Thread Jakub Jelinek
On Thu, Nov 05, 2015 at 10:57:33PM +0100, Martin Jambor wrote:
> the patch in this email contains the changes to make our OpenMP
> lowering and expansion machinery produce GPU kernels for a certain
> limited class of loops.  The plan is to make that class quite a big
> bigger, but only the following is ready for submission now.
> 
> Basically, whenever the compiler configured for HSAIL generation
> encounters the following pattern:
> 
>   #pragma omp target
>   #pragma omp teams thread_limit(workgroup_size) // thread_limit is optional
>   #pragma omp distribute parallel for firstprivate(n) private(i) 
> other_sharing_clauses()
> for (i = 0; i < n; i++)
>   some_loop_body

Do you support only lb 0 or any constant?  Only step 1?  Can the
b be constant, or just a variable?  If you need the number of iterations
computed before GOMP_target_ext, supposedly you also need to check that
n can't change in between target and the distribute (e.g. if it is
addressable or global var) and there are some statements in between.

What about schedule or dist_schedule clauses?  Only schedule(auto) or
missing schedule guarantees you you can distribute the work among the
threads any way the compiler wants.
dist_schedule is always static, but could have different chunk_size.

The current int num_threads, int thread_limit GOMP_target_ext arguments
perhaps could be changed to something like int num_args, long *args,
where args[0] would be the current num_threads and args[1] current
thread_limit, and if any offloading target that might benefit from knowing
the number of iterations of distribute parallel for that is the only
important statement inside, you could perhaps pass it as args[2] and pass
3 instead of 2 to num_args.  That could be something kind of generic
rather than HSA specific, and extensible.  But, looking at your
kernel_launch structure, you want something like multiple dimensions and
compute each dimension separately rather than combine (collapse) all
dimensions together, which is what OpenMP expansion does right now.

> While we have also been experimenting quite a bit with dynamic
> parallelism, we have only been able to achieve any good performance
> via this process of gridification.  The user can be notified whether a
> particular target construct was gridified or not via our process of
> dumping notes, which however only appear in the detailed dump.  I am
> seriously considering emitting some kind of warning, when HSA-enabled
> compiler is about to produce a non-gridified target code.

But then it would warn pretty much on all of libgomp testsuite with target
constructs in them...

> @@ -547,13 +548,13 @@ DEF_FUNCTION_TYPE_7 
> (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_UINT_PTR,

> --- a/gcc/fortran/types.def
> +++ b/gcc/fortran/types.def
> @@ -145,6 +145,7 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, 
> BT_VOLATILE_PTR, BT_I2, BT
>  DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, 
> BT_I4, BT_INT)
>  DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, 
> BT_I8, BT_INT)
>  DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID, BT_VOLATILE_PTR, 
> BT_I16, BT_INT)
> +DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_PTR, BT_VOID, BT_PTR, BT_INT, BT_PTR)
>  
>  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_OMPFN_PTR_UINT_UINT,
>   BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT)
> @@ -215,9 +216,9 @@ DEF_FUNCTION_TYPE_7 
> (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_UINT_PTR,
>  DEF_FUNCTION_TYPE_8 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
>BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
>BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_UINT)
> -DEF_FUNCTION_TYPE_8 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR,
> +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
>BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> -  BT_PTR, BT_PTR, BT_UINT, BT_PTR)
> +  BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)

You'd need to move it if you add arguments (but as I said on the other
patch, this won't really apply on top of the trunk anyway).

> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -153,6 +153,7 @@ enum gf_mask {
>  GF_OMP_FOR_KIND_TASKLOOP = 2,
>  GF_OMP_FOR_KIND_CILKFOR = 3,
>  GF_OMP_FOR_KIND_OACC_LOOP= 4,
> +GF_OMP_FOR_KIND_KERNEL_BODY = 5,
>  /* Flag for SIMD variants of OMP_FOR kinds.  */
>  GF_OMP_FOR_SIMD  = 1 << 3,
>  GF_OMP_FOR_KIND_SIMD = GF_OMP_FOR_SIMD | 0,
> @@ -621,8 +622,24 @@ struct GTY((tag("GSS_OMP_FOR")))
>/* [ WORD 11 ]
>   Pre-body evaluated before the loop body begins.  */
>gimple_seq pre_body;
> +
> +  /* [ WORD 12 ]
> + If set, this statement is part of a gridified kernel, its clauses need 
> to
> + be scanned and lowered but the statement should be discarded after
> + lowering.  */
> +  bool kernel_phony;

A bool flag is better put as a GF_OMP_* flag, there are s

Re: [OpenACC 0/7] host_data construct

2015-11-12 Thread Julian Brown
On Mon, 2 Nov 2015 18:33:39 +
Julian Brown  wrote:

> On Mon, 26 Oct 2015 19:34:22 +0100
> Jakub Jelinek  wrote:
> 
> > Your use_device sounds very similar to use_device_ptr clause in
> > OpenMP, which is allowed on #pragma omp target data construct and is
> > implemented quite a bit differently from this; it is unclear if the
> > OpenACC standard requires this kind of implementation, or you just
> > chose to implement it this way.  In particular, the GOMP_target_data
> > call puts the variables mentioned in the use_device_ptr clauses into
> > the mapping structures (similarly how map clause appears) and the
> > corresponding vars are privatized within the target data region
> > (which is a host region, basically a fancy { } braces), where the
> > private variables contain the offloading device's pointers.  
> 
> As the author of the original patch, I have to say using the mapping
> structures seems like a far better approach, but I've hit some trouble
> with the details of adapting OpenACC to use that method.

Here's a version of the patch which (hopefully) brings OpenACC on par
with OpenMP with respect to use_device/use_device_ptr variables. The
implementation is essentially the same now for OpenACC as for OpenMP
(i.e. using mapping structures): so for now, only array or pointer
variables can be used as use_device variables. The included tests have
been adjusted accordingly.

One awkward part of the implementation concerns nesting offloaded
regions within host_data regions:

#define N 1024

int main (int argc, char* argv[])
{
  int x[N];

#pragma acc data copyin (x[0:N])
  {
int *xp;
#pragma acc host_data use_device (x)
{
  [...]
#pragma acc parallel present (x) copyout (xp)
  {
xp = x;
  }
}

assert (xp == acc_deviceptr (x));
  }

  return 0;
}

I think the meaning of 'x' as seen within the clauses of the parallel
directive should be the *host* version of x, not the mapped target
address (I've asked on the OpenACC technical mailing list to clarify
this point, but no reply as yet). The changes to
{maybe_,}lookup_decl_in_outer_ctx "skip over" host_data contexts when
called from lower_omp_target. There's probably an analogous case for
OpenMP, but I've not tried to handle that.

No regressions for libgomp tests, and the new tests pass. OK for trunk?

Thanks,

Julian

ChangeLog

Julian Brown  
Cesar Philippidis  
James Norris  

gcc/
* c-family/c-pragma.c (oacc_pragmas): Add PRAGMA_OACC_HOST_DATA.
* c-family/c-pragma.h (pragma_kind): Add PRAGMA_OACC_HOST_DATA.
(pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_USE_DEVICE.
* c/c-parser.c (c_parser_omp_clause_name): Add use_device support.
(c_parser_oacc_clause_use_device): New function.
(c_parser_oacc_all_clauses): Add use_device support.
(OACC_HOST_DATA_CLAUSE_MASK): New macro.
(c_parser_oacc_host_data): New function.
(c_parser_omp_construct): Add host_data support.
* c/c-tree.h (c_finish_oacc_host_data): Add prototype.
* c/c-typeck.c (c_finish_oacc_host_data): New function.
(c_finish_omp_clauses): Add use_device support.
* cp/cp-tree.h (finish_oacc_host_data): Add prototype.
* cp/parser.c (cp_parser_omp_clause_name): Add use_device support.
(cp_parser_oacc_all_clauses): Add use_device support.
(OACC_HOST_DATA_CLAUSE_MASK): New macro.
(cp_parser_oacc_host_data): New function.
(cp_parser_omp_construct): Add host_data support.
(cp_parser_pragma): Add host_data support.
* cp/semantics.c (finish_omp_clauses): Add use_device support.
(finish_oacc_host_data): New function.
* gimple-pretty-print.c (dump_gimple_omp_target): Add host_data
support.
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_HOST_DATA.
(is_gimple_omp_oacc): Add support for above.
* gimplify.c (gimplify_scan_omp_clauses): Add host_data, use_device
support.
(gimplify_omp_workshare): Add host_data support.
(gimplify_expr): Likewise.
* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): New.
* omp-low.c (lookup_decl_in_outer_ctx)
(maybe_lookup_decl_in_outer_ctx): Add optional argument to skip
host_data regions.
(scan_sharing_clauses): Support use_device.
(check_omp_nesting_restrictions): Support host_data.
(expand_omp_target): Support host_data.
(lower_omp_target): Skip over outer host_data regions when looking
up decls. Support use_device.
(make_gimple_omp_edges): Support host_data.
* tree-nested.c (convert_nonlocal_omp_clauses): Add use_device
clause.

libgomp/
* oacc-parallel.c (GOACC_host_data): New function.
* libgomp.map (GOACC_host_data): Add to GOACC_2.0.1.
* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/host_data-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/host_data-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/host_data-4.c: New test.
* testsuite/libgomp.oacc-c-c++-common/host_data-5.c: N

Re: [hsa 5/12] New HSA-related GCC options

2015-11-12 Thread Jakub Jelinek
On Mon, Nov 09, 2015 at 05:58:56PM +0100, Martin Jambor wrote:
> > But I don't see any way to disable it on the command line?  (no switch?)
> 
> No, the switch is -foffload, which has missing documentation (PR
> 67300) and is only described at https://gcc.gnu.org/wiki/Offloading
> Nevertheless, the option allows the user to specify compiler option
> -foffload=disable and no offloading should happen, not even HSA.  The
> user can also enumerate just the offload targets they want (and pass
> them special command line stuff).
> 
> It seems I have misplaced a hunk in the patch series.  Nevertheless,
> in the first patch (with configuration stuff), there is a change to
> opts.c which scans the -foffload= contents and sets the flag variable
> if hsa is not present.
> 
> Whenever the compiler has to decide whether HSA is enabled for the
> given compilation or not, it has to look at this variable (if
> configured for HSA).

Buut what is the difference between
-foffload=disable
or
-foffload={list not including hsa}
and the new param?  If you don't gridify, you don't emit any kernels...

Jakub


Re: [PATCH] Fix PR ipa/68035 (v2)

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 10:48 AM, Martin Liška  wrote:
> Hello.
>
> I'm sending reworked version of the patch, where I renamed 'sem_item::hash' 
> to 'm_hash'
> and wrapped all usages with 'get_hash'. Apart from that, a new member 
> function 'set_hash'
> is utilized for changing the hash value. Hope it's easier for understanding.
>
> Patch can survive regression tests and bootstraps on x86_64-linux-pc.
>
> Ready for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Martin


Re: [PATCH, PR68286] Fix vector comparison expand

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 10:57 AM, Ilya Enkovich  wrote:
> Hi,
>
> My vector comparison patches broken expand of vector comparison on targets 
> which don't have new comparison patterns but support VEC_COND_EXPR.  This 
> happens because it's not checked vector comparison may be expanded as a 
> comparison.  This patch fixes it.  Bootstrapped and regtested on 
> powerpc64le-unknown-linux-gnu.  OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-12  Ilya Enkovich  
>
> * expr.c (do_store_flag): Expand vector comparison as
> VEC_COND_EXPR if vector comparison is not supported
> by target.
>
> gcc/testsuite/
>
> 2015-11-12  Ilya Enkovich  
>
> * gcc.dg/pr68286.c: New test.
>
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 03936ee..bd43dc4 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11128,7 +11128,8 @@ do_store_flag (sepops ops, rtx target, machine_mode 
> mode)
>if (TREE_CODE (ops->type) == VECTOR_TYPE)
>  {
>tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
> -  if (VECTOR_BOOLEAN_TYPE_P (ops->type))
> +  if (VECTOR_BOOLEAN_TYPE_P (ops->type)
> + && expand_vec_cmp_expr_p (TREE_TYPE (arg0), ops->type))
> return expand_vec_cmp_expr (ops->type, ifexp, target);
>else
> {
> diff --git a/gcc/testsuite/gcc.dg/pr68286.c b/gcc/testsuite/gcc.dg/pr68286.c
> new file mode 100644
> index 000..d0392e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68286.c
> @@ -0,0 +1,17 @@
> +/* PR target/68286 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a, b, c;
> +int fn1 ()
> +{
> +  int d[] = {0};
> +  for (; c; c++)
> +{
> +  float e = c;
> +  if (e)
> +d[0]++;
> +}
> +  b = d[0];
> +  return a;
> +}


Re: [PATCH 04/N] Fix big memory leak in ix86_valid_target_attribute_p

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 11:03 AM, Martin Liška  wrote:
> Hello.
>
> Following patch was a bit negotiated with Jakub and can save a huge amount of 
> memory in cases
> where target attributes are heavily utilized.
>
> Can bootstrap and survives regression tests on x86_64-linux-pc.
>
> Ready for trunk?

+static bool opts_obstack_initialized = false;
+
+/* Initialize opts_obstack if not initialized.  */
+
+void
+init_opts_obstack (void)
+{
+  if (!opts_obstack_initialized)
+{
+  opts_obstack_initialized = true;
+  gcc_obstack_init (&opts_obstack);

you can move the static global to function scope.

Ok with that change.

Btw, don't other targets need a similar adjustment to their hook?
Grepping shows arm and nios2.

Thanks,
Richard.


> Thanks,
> Martin


[PATCH][AArch64] Documentation fix for -fpic

2015-11-12 Thread Szabolcs Nagy

The documentation for -fpic and -fPIC explicitly mentions some targets
where the difference matters, but not AArch64.  Specifying the GOT size
limit is not entirely correct as it can depend on the -mcmodel setting,
but probably better than leaving the impression that -fpic vs -fPIC does
not matter on AArch64.

ChangeLog:

2015-11-12  Szabolcs Nagy  

* doc/invoke.texi (-fpic): Add the AArch64 limit.
(-fPIC): Add AArch64.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0121832..f925fe0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -23951,7 +23951,7 @@ loader is not part of GCC; it is part of the operating system).  If
 the GOT size for the linked executable exceeds a machine-specific
 maximum size, you get an error message from the linker indicating that
 @option{-fpic} does not work; in that case, recompile with @option{-fPIC}
-instead.  (These maximums are 8k on the SPARC and 32k
+instead.  (These maximums are 8k on the SPARC, 28k on AArch64 and 32k
 on the m68k and RS/6000.  The x86 has no such limit.)
 
 Position-independent code requires special support, and therefore works
@@ -23966,7 +23966,7 @@ are defined to 1.
 @opindex fPIC
 If supported for the target machine, emit position-independent code,
 suitable for dynamic linking and avoiding any limit on the size of the
-global offset table.  This option makes a difference on the m68k,
+global offset table.  This option makes a difference on the AArch64, m68k,
 PowerPC and SPARC@.
 
 Position-independent code requires special support, and therefore works


Re: [PATCH 04/N] Fix big memory leak in ix86_valid_target_attribute_p

2015-11-12 Thread Bernd Schmidt

On 11/12/2015 12:29 PM, Richard Biener wrote:

+static bool opts_obstack_initialized = false;
+
+/* Initialize opts_obstack if not initialized.  */
+
+void
+init_opts_obstack (void)
+{
+  if (!opts_obstack_initialized)
+{
+  opts_obstack_initialized = true;
+  gcc_obstack_init (&opts_obstack);

you can move the static global to function scope.


Also, why bother with it? Why not simply arrange to call the function 
just once at startup?


It's not clear from the submission why this is done and how it relates 
to the i386.c hunk.



Bernd


[Ada] Missing detection of elaboration dependency

2015-11-12 Thread Arnaud Charlet
This patch modifies the elaboration circuitry to detect an issue in SPARK
where an object in package P of a private type in package T subject to
pragma Default_Initial_Condition is default initialized and package P
lacks Elaborate_All (T).


-- Source --


--  pack.ads

package Pack with SPARK_Mode is
   type Elab_Typ is private
 with Default_Initial_Condition => Get_Val (Elab_Typ) = Expect_Val;

   type False_Typ is private
 with Default_Initial_Condition => False;

   type True_Typ is private
 with Default_Initial_Condition => True;

   function Expect_Val return Integer;
   function Get_Val (Obj : Elab_Typ) return Integer;

private
   type Elab_Typ is record
  Comp : Integer;
   end record;

   type False_Typ is null record;
   type True_Typ is null record;
end Pack;

--  pack.adb

package body Pack with SPARK_Mode is
   function Expect_Val return Integer is
   begin
  return 1234;
   end Expect_Val;

   function Get_Val (Obj : Elab_Typ) return Integer is
   begin
  return Obj.Comp;
   end Get_Val;
end Pack;

--  main_pack.ads

with Pack; use Pack;

package Main_Pack with SPARK_Mode is
   Obj_1 : Elab_Typ;
   Obj_2 : False_Typ;
   Obj_3 : True_Typ;
end Main_Pack;


-- Compilation and output --


$ gcc -c -gnata main_pack.ads
main_pack.ads:4:04: call to Default_Initial_Condition during elaboration in
  SPARK
main_pack.ads:4:04: Elaborate_All pragma required for "Pack"

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Hristian Kirtchev  

* sem_elab.adb (Check_A_Call): Add new variable
Is_DIC_Proc. Report elaboration issue in SPARK concerning calls
to source subprograms or nontrivial Default_Initial_Condition
procedures. Add specialized error message to avoid outputting
the internal name of the Default_Initial_Condition procedure.
* sem_util.ads, sem_util.adb
(Is_Non_Trivial_Default_Init_Cond_Procedure): New routine.

Index: sem_util.adb
===
--- sem_util.adb(revision 230235)
+++ sem_util.adb(working copy)
@@ -12362,12 +12362,50 @@
   end if;
end Is_Local_Variable_Reference;
 
+   
+   -- Is_Non_Trivial_Default_Init_Cond_Procedure --
+   
+
+   function Is_Non_Trivial_Default_Init_Cond_Procedure
+ (Id : Entity_Id) return Boolean
+   is
+  Body_Decl : Node_Id;
+  Stmt : Node_Id;
+
+   begin
+  if Ekind (Id) = E_Procedure
+and then Is_Default_Init_Cond_Procedure (Id)
+  then
+ Body_Decl :=
+   Unit_Declaration_Node
+ (Corresponding_Body (Unit_Declaration_Node (Id)));
+
+ --  The body of the Default_Initial_Condition procedure must contain
+ --  at least one statement, otherwise the generation of the subprogram
+ --  body failed.
+
+ pragma Assert (Present (Handled_Statement_Sequence (Body_Decl)));
+
+ --  To qualify as non-trivial, the first statement of the procedure
+ --  must be a check in the form of an if statement. If the original
+ --  Default_Initial_Condition expression was folded, then the first
+ --  statement is not a check.
+
+ Stmt := First (Statements (Handled_Statement_Sequence (Body_Decl)));
+
+ return
+   Nkind (Stmt) = N_If_Statement
+ and then Nkind (Original_Node (Stmt)) = N_Pragma;
+  end if;
+
+  return False;
+   end Is_Non_Trivial_Default_Init_Cond_Procedure;
+
-
-- Is_Object_Reference --
-
 
function Is_Object_Reference (N : Node_Id) return Boolean is
-
   function Is_Internally_Generated_Renaming (N : Node_Id) return Boolean;
   --  Determine whether N is the name of an internally-generated renaming
 
Index: sem_util.ads
===
--- sem_util.ads(revision 230223)
+++ sem_util.ads(working copy)
@@ -1433,6 +1433,12 @@
--  parameter of the current enclosing subprogram.
--  Why are OUT parameters not considered here ???
 
+   function Is_Non_Trivial_Default_Init_Cond_Procedure
+ (Id : Entity_Id) return Boolean;
+   --  Determine whether entity Id denotes the procedure which verifies the
+   --  assertion expression of pragma Default_Initial_Condition and if it does,
+   --  the encapsulated expression is non-trivial.
+
function Is_Object_Reference (N : Node_Id) return Boolean;
--  Determines if the tree referenced by N represents an object. Both
--  variable and constant objects return True (compare Is_Variable).
Index: sem_elab.adb
===
--- sem_elab.adb(revision 230223)
+++ sem_elab.adb(working copy)
@@ -597,6 +597,11 @@
  

[Ada] Legality checks on calls to a Generic_Dispatching_Constructor.

2015-11-12 Thread Arnaud Charlet
This patch adds several legality checks on calls to an instance of the
predefined Generic_Dispatchin_Constructor. The following three tests are
performed:

a) The tag argument is defined, i.e. is not No_Tag.

b) The  tag is not that of an abstract type.

c) The accessibility level of the type denoted by the tag is no greater than
that of the specified constructor function.

Tested in  ACATS 4.0H C390012.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Ed Schonberg  

* exp_intr.adb: Add legality checks on calls to a
Generic_Dispatching_Constructor: the given tag must be defined,
it cannot be the tag of an abstract type, and its accessibility
level must not be greater than that of the constructor.

Index: exp_intr.adb
===
--- exp_intr.adb(revision 230223)
+++ exp_intr.adb(working copy)
@@ -311,6 +311,31 @@
 
   Remove_Side_Effects (Tag_Arg);
 
+  --  Check that we have a proper tag
+
+  Insert_Action (N,
+Make_Implicit_If_Statement (N,
+  Condition   => Make_Op_Eq (Loc,
+Left_Opnd  => New_Copy_Tree (Tag_Arg),
+Right_Opnd => New_Occurrence_Of (RTE (RE_No_Tag), Loc)),
+
+  Then_Statements => New_List (
+Make_Raise_Statement (Loc,
+  New_Occurrence_Of (RTE (RE_Tag_Error), Loc);
+
+  --  Check that it is not the tag of an abstract type
+
+  Insert_Action (N,
+Make_Implicit_If_Statement (N,
+  Condition   => Make_Function_Call (Loc,
+ Name   =>
+   New_Occurrence_Of (RTE (RE_Type_Is_Abstract), Loc),
+ Parameter_Associations => New_List (New_Copy_Tree (Tag_Arg))),
+
+  Then_Statements => New_List (
+Make_Raise_Statement (Loc,
+  New_Occurrence_Of (RTE (RE_Tag_Error), Loc);
+
   --  The subprogram is the third actual in the instantiation, and is
   --  retrieved from the corresponding renaming declaration. However,
   --  freeze nodes may appear before, so we retrieve the declaration
@@ -324,6 +349,22 @@
   Act_Constr := Entity (Name (Act_Rename));
   Result_Typ := Class_Wide_Type (Etype (Act_Constr));
 
+  --  Check that the accessibility level of the tag is no deeper than that
+  --  of the constructor function.
+
+  Insert_Action (N,
+Make_Implicit_If_Statement (N,
+  Condition   =>
+Make_Op_Gt (Loc,
+  Left_Opnd  =>
+Build_Get_Access_Level (Loc, New_Copy_Tree (Tag_Arg)),
+  Right_Opnd =>
+Make_Integer_Literal (Loc, Scope_Depth (Act_Constr))),
+
+  Then_Statements => New_List (
+Make_Raise_Statement (Loc,
+  New_Occurrence_Of (RTE (RE_Tag_Error), Loc);
+
   if Is_Interface (Etype (Act_Constr)) then
 
  --  If the result type is not known to be a parent of Tag_Arg then we
@@ -390,7 +431,6 @@
   --  conversion of the call to the actual constructor.
 
   Rewrite (N, Convert_To (Result_Typ, Cnstr_Call));
-  Analyze_And_Resolve (N, Etype (Act_Constr));
 
   --  Do not generate a run-time check on the built object if tag
   --  checks are suppressed for the result type or tagged type expansion
@@ -458,6 +498,8 @@
  Make_Raise_Statement (Loc,
Name => New_Occurrence_Of (RTE (RE_Tag_Error), Loc);
   end if;
+
+  Analyze_And_Resolve (N, Etype (Act_Constr));
end Expand_Dispatching_Constructor_Call;
 
---
Index: rtsfind.ads
===
--- rtsfind.ads (revision 230223)
+++ rtsfind.ads (working copy)
@@ -640,6 +640,7 @@
  RE_Max_Predef_Prims,-- Ada.Tags
  RE_Needs_Finalization,  -- Ada.Tags
  RE_No_Dispatch_Table_Wrapper,   -- Ada.Tags
+ RE_No_Tag,  -- Ada.Tags
  RE_NDT_Prims_Ptr,   -- Ada.Tags
  RE_NDT_TSD, -- Ada.Tags
  RE_Num_Prims,   -- Ada.Tags
@@ -1871,6 +1872,7 @@
  RE_Max_Predef_Prims => Ada_Tags,
  RE_Needs_Finalization   => Ada_Tags,
  RE_No_Dispatch_Table_Wrapper=> Ada_Tags,
+ RE_No_Tag   => Ada_Tags,
  RE_NDT_Prims_Ptr=> Ada_Tags,
  RE_NDT_TSD  => Ada_Tags,
  RE_Num_Prims=> Ada_Tags,


[RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Tom de Vries

Hi,

[ See also related discussion at 
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]


this patch removes the usage of first_pass_instance from pass_vrp.

the patch:
- limits itself to pass_vrp, but my intention is to remove all
  usage of first_pass_instance
- lacks an update to gdbhooks.py

Modifying the pass behaviour depending on the instance number, as 
first_pass_instance does, break compositionality of the pass list. In 
other words, adding a pass instance in a pass list may change the 
behaviour of another instance of that pass in the pass list. Which 
obviously makes it harder to understand and change the pass list. [ I've 
filed this issue as PR68247 - Remove pass_first_instance ]


The solution is to make the difference in behaviour explicit in the pass 
list, and no longer change behaviour depending on instance number.


One obvious possible fix is to create a duplicate pass with a different 
name, say 'pass_vrp_warn_array_bounds':

...
  NEXT_PASS (pass_vrp_warn_array_bounds);
  ...
  NEXT_PASS (pass_vrp);
...

But, AFAIU that requires us to choose a different dump-file name for 
each pass. And choosing vrp1 and vrp2 as new dump-file names still means 
that -fdump-tree-vrp no longer works (which was mentioned as drawback 
here: https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).


This patch instead makes pass creation parameterizable. So in the pass 
list, we use:

...
  NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
  ...
  NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
...

This approach gives us clarity in the pass list, similar to using a 
duplicate pass 'pass_vrp_warn_array_bounds'.


But it also means -fdump-tree-vrp still works as before.

Good idea? Other comments?

Thanks,
- Tom
Remove first_pass_instance from pass_vrp

---
 gcc/gen-pass-instances.awk | 32 ++--
 gcc/pass_manager.h |  2 ++
 gcc/passes.c   | 20 
 gcc/passes.def |  4 ++--
 gcc/tree-pass.h|  3 ++-
 gcc/tree-vrp.c | 22 --
 6 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index cbfaa86..c77bd64 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -43,7 +43,7 @@ function handle_line()
 	line = $0;
 
 	# Find call expression.
-	call_starts_at = match(line, /NEXT_PASS \(.+\)/);
+	call_starts_at = match(line, /NEXT_PASS(_WITH_ARG)? \(.+\)/);
 	if (call_starts_at == 0)
 	{
 		print line;
@@ -53,23 +53,28 @@ function handle_line()
 	# Length of the call expression.
 	len_of_call = RLENGTH;
 
-	len_of_start = length("NEXT_PASS (");
 	len_of_open = length("(");
 	len_of_close = length(")");
 
-	# Find pass_name argument
-	len_of_pass_name = len_of_call - (len_of_start + len_of_close);
-	pass_starts_at = call_starts_at + len_of_start;
-	pass_name = substr(line, pass_starts_at, len_of_pass_name);
-
 	# Find call expression prefix (until and including called function)
-	prefix_len = pass_starts_at - 1 - len_of_open;
-	prefix = substr(line, 1, prefix_len);
+	match(line, /NEXT_PASS(_WITH_ARG)? /)
+	len_of_call_name = RLENGTH
+	prefix_len = call_starts_at + len_of_call_name - 1
+	prefix = substr(line, 1, prefix_len)
 
 	# Find call expression postfix
 	postfix_starts_at = call_starts_at + len_of_call;
 	postfix = substr(line, postfix_starts_at);
 
+	args_starts_at = prefix_len + 1 + len_of_open;
+	len_of_args = postfix_starts_at - args_starts_at - len_of_close;
+	args_str = substr(line, args_starts_at, len_of_args);
+	split(args_str, args, ",");
+
+	# Find pass_name argument, an optional with_arg argument
+	pass_name = args[1];
+	with_arg = args[2];
+
 	# Set pass_counts
 	if (pass_name in pass_counts)
 		pass_counts[pass_name]++;
@@ -79,7 +84,14 @@ function handle_line()
 	pass_num = pass_counts[pass_name];
 
 	# Print call expression with extra pass_num argument
-	printf "%s(%s, %s)%s\n", prefix, pass_name, pass_num, postfix;
+	printf "%s(", prefix;
+	printf "%s", pass_name;
+	printf ", %s", pass_num;
+	if (with_arg)
+	{
+		printf ", %s", with_arg;
+	}
+	printf ")%s\n", postfix;
 }
 
 { handle_line() }
diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index 7d539e4..a8199e2 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -120,6 +120,7 @@ private:
 #define PUSH_INSERT_PASSES_WITHIN(PASS)
 #define POP_INSERT_PASSES()
 #define NEXT_PASS(PASS, NUM) opt_pass *PASS ## _ ## NUM
+#define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
 #define TERMINATE_PASS_LIST()
 
 #include "pass-instances.def"
@@ -128,6 +129,7 @@ private:
 #undef PUSH_INSERT_PASSES_WITHIN
 #undef POP_INSERT_PASSES
 #undef NEXT_PASS
+#undef NEXT_PASS_WITH_ARG
 #undef TERMINATE_PASS_LIST
 
 }; // class pass_manager
diff --git a/gcc/passes.c b/gcc/passes.c
index dd8d00a..0fd365e 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -81,6 +81,12 @@ opt_pass::clone ()
   internal_error ("pa

[Ada] Obscure messages due to freezing of contracts

2015-11-12 Thread Arnaud Charlet
This patch classifies a misplaced constituent as a critical error and stops the
compilation. This ensures that the missing link between a constituent and state
will not cause obscure cascaded errors.


-- Source --


--  pack.ads

package Pack
   with Spark_Mode => On,
Abstract_State => Top_State,
Initializes=> Top_State
is
   procedure Do_Something (Value   : in out Natural;
   Success :out Boolean)
   with Global  => (In_Out => Top_State),
Depends => (Value =>+ Top_State,
Success   => (Value, Top_State),
Top_State =>+ Value);
end Pack;

--  pack.adb

package body Pack
   with SPARK_Mode=> On,
Refined_State => (Top_State => (Count, A_Pack.State))
is
   package A_Pack
  with Abstract_State => State,
   Initializes=> State
   is
  procedure A_Proc (Test : in out Natural)
 with Global   => (In_Out =>  State),
  Depends  => (Test   =>+ State,
   State  =>+ Test);
   end A_Pack;

   package body A_Pack
  with Refined_State => (State => Total)
   is
  Total : Natural := 0;

  procedure A_Proc (Test : in out Natural)
 with Refined_Global  => (In_Out => Total),
  Refined_Depends => ((Test  =>+ Total,
   Total =>+ Test)) is
  begin
 if Total > Natural'Last - Test   then
Total := abs (Total - Test);
 else
Total := Total + Test;
 end if;
 Test := Total;
  end A_Proc;
   end A_Pack;

   Count : Natural := 0;

   procedure Do_Something (Value   : in out Natural;
   Success :out Boolean)
  with Refined_Global  => (In_Out  =>  (Count, A_Pack.State)),
   Refined_Depends => (Value=>+ (Count, A_Pack.State),
   Success  =>  (Value, Count, A_Pack.State),
   Count=>+ null,
   A_Pack.State =>+ (Count, Value)) is
   begin
  Count := Count rem 128;
  if Count <= Value then
 Value := Count + (Value - Count) / 2;
  else
 Value := Value + (Count - Value) / 2;
  end if;
  A_Pack.A_Proc (Value);
  Success := Value /= 0;
   end Do_Something;
end Pack;


-- Compilation and output --


$ gcc -c pack.adb
pack.adb:3:09: body "A_Pack" declared at line 15 freezes the contract of "Pack"
pack.adb:3:09: all constituents must be declared before body at line 15
pack.adb:3:41: "Count" is undefined
compilation abandoned due to previous error

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Hristian Kirtchev  

* sem_prag.adb (Analyze_Constituent): Stop the
analysis after detecting a misplaced constituent as this is a
critical error.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 230236)
+++ sem_prag.adb(working copy)
@@ -25408,6 +25408,14 @@
 SPARK_Msg_N
   ("\all constituents must be declared before body #",
N);
+
+--  A misplaced constituent is a critical error because
+--  pragma Refined_Depends or Refined_Global depends on
+--  the proper link between a state and a constituent.
+--  Stop the compilation, as this leads to a multitude
+--  of misleading cascaded errors.
+
+raise Program_Error;
  end if;
 
   --  The constituent is a valid state or object


[Ada] Crash on inconsistent IF-expression

2015-11-12 Thread Arnaud Charlet
This change makes sure the compiler produces a proper error (rather
than crash) when compiling an (illegal) IF-expression where THEN-expression
is overloaded, and none of its interpretation is compatible with
the ELSE-expression.

The following compilation must display:

$ gcc -c badelse.adb
badelse.adb:4:50: type incompatible with that of "then" expression

package Badelse is
   type K is (Unknown, Blue, Red);
   type Tristate is (False, True, Unknown);
   Boo : Boolean;
   procedure P (X : K);
end Badelse;
package body Badelse is
   procedure P (X : K) is
   begin
  Boo := (if X = Unknown then Unknown else X = Blue);
   end P;
end Badelse;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Thomas Quinot  

* sem_ch4.adb (analyze_If_Expression): Reject IF-expression where
THEN-expression is overloaded and none of its interpretation is
compatible with the ELSE-expression.

Index: sem_ch4.adb
===
--- sem_ch4.adb (revision 230239)
+++ sem_ch4.adb (working copy)
@@ -2191,6 +2191,17 @@
 
Get_Next_Interp (I, It);
 end loop;
+
+--  If no valid interpretation has been found, then the type of
+--  the ELSE expression does not match any interpretation of
+--  the THEN expression.
+
+if Etype (N) = Any_Type then
+   Error_Msg_N
+ ("type incompatible with that of `THEN` expression",
+  Else_Expr);
+   return;
+end if;
  end;
   end if;
end Analyze_If_Expression;


Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-12 Thread James Greenhalgh
On Wed, Nov 11, 2015 at 10:34:53AM -0800, Jim Wilson wrote:
> This adds an option for the Qualcomm server parts, qdf24xx, just
> optimizing like a cortex-a57 for now, same as how the initial Samsung
> exynos-m1 support worked.
> 
> This was tested with armv8 and aarch64 bootstraps and make check.
> 
> I had to disable the cortex-a57 fma steering pass in the aarch64 port
> while testing the patch.  A bootstrap for aarch64 configured
> --with-cpu=cortex-a57 gives multiple ICEs while building the stage1
> libstdc++.  The ICEs are in scan_rtx_reg at regrename.c:1074.  This
> looks vaguely similar to PR 66785.
> 
> I am also seeing extra make check failures due to ICEs with armv8
> bootstrap builds configured --with-cpu=cortex-a57,  I see ICEs in
> scan_rtx_reg in regrename, and ICEs in decompose_normal_address in
> rtlanal.c.  The arm port doesn't have the fma steering support, which
> seems odd, and is maybe a bug, so it isn't clear what is causing this
> problem.
> 
> I plan to look at these aarch64 and armv8 failures next, including PR
> 66785.  None of these have anything to do with my patch, as they
> trigger for cortex-a57 which is already supported.

The bootstrap bugs should be fixed on trunk as of:

  http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=230149

The AArch64 parts are OK, but the ARM parts look to be missing a hunk to
gcc/config/arm/t-aprofile (and I can't approve those anyway).

Thanks,
James


> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog (revision 230118)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,13 @@
> +2015-11-10  Jim Wilson  
> +
> + * config/aarch64/aarch64-cores.def (qdf24xx): New.
> + * config/aarch64/aarch64-tune.md: Regenerated.
> + * config/arm/arm-cores.def (qdf24xx): New.
> + * config/arm/arm-tables.opt, config/arm/arm-tune.md: Regenerated.
> + * config/arm/bpabi.h (BE8_LINK_SPEC): Add qdf24xx support.
> + * doc/invoke.texi (AArch64 Options/-mtune): Add "qdf24xx".
> + (ARM Options/-mtune); Likewise.



Re: [mask-vec_cond, patch 1/2] Support vectorization of VEC_COND_EXPR with no embedded comparison

2015-11-12 Thread Ilya Enkovich
2015-11-12 13:03 GMT+03:00 Ramana Radhakrishnan :
> On Thu, Oct 8, 2015 at 4:50 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch allows COND_EXPR with no embedded comparison to be vectorized.
>>  It's applied on top of vectorized comparison support series.  New optab 
>> vcond_mask_optab
>> is introduced for such statements.  Bool patterns now avoid comparison in 
>> COND_EXPR in case vector comparison is supported by target.
>
> New standard pattern names are documented in the internals manual.
> This patch does not do so neither do I see any patches to do so.
>
>
> regards
> Ramana

Thanks for the point.  I see we also miss description for some other
patterns (e.g. maskload). Will add it.

Ilya


[Ada] Crash on illegal selected component in synchronized body.

2015-11-12 Thread Arnaud Charlet
The prefix of a selected component in a synchronized body cannot denote
a component of the synchronized type unless the prefix is an entity name.
This was not properly rejected before.

Compiling bakery.adb must yield:

bakery.adb:44:35: invalid reference to internal operation of some object
   of type "Bakery_Instance_Task"

---
procedure Bakery is

   N: Natural := 10; -- Number of Processes [Customers]

   type Integer_Array is array (1 .. N) of Integer;

   type Ticket_And_Queue_Number is record
  R : Natural; -- Ticket Number [Lamport 'Number']
  A : Natural; -- Queue Number  [Lamport 'Choosing']
   end record;

   task type Bakery_Instance_Task is
  entry Initialize(ID : Natural);
   end Bakery_Instance_Task;

   Bakery_Array : array (1 .. N) of Bakery_Instance_Task;

   task body Bakery_Instance_Task is

  R   : Natural; -- This task's current ticket number [Lamport 'Number']
  A   : Integer_Array := (1 .. N => 0);
  ID0 : Natural;

  TQN : Ticket_And_Queue_Number;

  function Read_TQN(J : in Natural) return Ticket_And_Queue_Number is
 TQN : Ticket_And_Queue_Number;
  begin
 TQN := (R => R,
 A => A(J));
 return TQN;
  end Read_TQN;
   begin
  accept Initialize(ID : Natural) do
 R := 0;
 A := (1 .. N => 0);
 ID0 := ID;
  end Initialize;
  -- Start
  R := 1;
  A(ID0) := 1;
  for J in 1 .. N loop
 if J /= ID0 then
TQN := Bakery_Array(J).Read_TQN(J => J);
 end if;
  end loop;
   end Bakery_Instance_Task;

begin
   for I in 1 .. N loop
  Bakery_Array(I).Initialize(ID => I);
   end loop;
end Bakery;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Ed Schonberg  

* sem_ch4.adb (Analyze_Selected_Component): Diagnose an attempt
to reference an internal entity from a synchronized type from
within the body of that type, when the prefix of the selected
component is not the current instance.

Index: sem_ch4.adb
===
--- sem_ch4.adb (revision 230241)
+++ sem_ch4.adb (working copy)
@@ -4655,6 +4655,23 @@
Comp = First_Private_Entity (Base_Type (Prefix_Type));
  end loop;
 
+ --  If the scope is a current instance, the prefix cannot be an
+ --  expression of the same type (that would represent an attempt
+ --  to reach an internal operation of another synchronized object).
+ --  This is legal if prefix is an access to such type and there is
+ --  a dereference.
+
+ if In_Scope
+   and then not Is_Entity_Name (Name)
+   and then Nkind (Name) /= N_Explicit_Dereference
+ then
+Error_Msg_NE ("invalid reference to internal operation "
+   & "of some object of type&", N, Type_To_Use);
+Set_Entity (Sel, Any_Id);
+Set_Etype (Sel, Any_Type);
+return;
+ end if;
+
  --  If there is no visible entity with the given name or none of the
  --  visible entities are plausible interpretations, check whether
  --  there is some other primitive operation with that name.


Re: [patch] Fix PR target/67265

2015-11-12 Thread Eric Botcazou
> Ok if it passes testing.

Thanks, it did so I installed the fix yesterday but further testing then 
revealed an oversight: the following assertion in ix86_adjust_stack_and_probe

  gcc_assert (cfun->machine->fs.cfa_reg != stack_pointer_rtx);

will now evidently trigger (simple testcase attached).

I can sched some light on it here since I wrote the code: the initial version 
of ix86_adjust_stack_and_probe didn't bother generating CFI because it only 
manipulates the stack pointer and the CFA register was guaranteed to be the 
frame pointer until yesterday, so I put the assertion to check this guarantee.
Then Richard H. enhanced the CFI machinery to always track stack adjustments 
(IIRC this was a prerequisite for your implementation of shrink-wrapping) so I 
added code to generate CFI:

  /* Even if the stack pointer isn't the CFA register, we need to correctly
 describe the adjustments made to it, in particular differentiate the
 frame-related ones from the frame-unrelated ones.  */
  if (size > 0)

To sum up, I think that the assertion is obsolete and can be removed without 
further ado; once done, the compiler generates correct CFI for the testcase.
So I installed the following one-liner as obvious after testing on x86-64.


2015-11-12  Eric Botcazou  

PR target/67265
* config/i386/i386.c (ix86_adjust_stack_and_probe): Remove obsolete
assertion on the CFA register.


2015-11-12  Eric Botcazou  

* gcc.target/i386/pr67265-2.c: New test.

-- 
Eric BotcazouIndex: config/i386/i386.c
===
--- config/i386/i386.c	(revision 230204)
+++ config/i386/i386.c	(working copy)
@@ -12245,8 +12245,6 @@ ix86_adjust_stack_and_probe (const HOST_
   release_scratch_register_on_entry (&sr);
 }
 
-  gcc_assert (cfun->machine->fs.cfa_reg != stack_pointer_rtx);
-
   /* Even if the stack pointer isn't the CFA register, we need to correctly
  describe the adjustments made to it, in particular differentiate the
  frame-related ones from the frame-unrelated ones.  */
/* { dg-do compile } */
/* { dg-options "-O -fstack-check" } */

void foo (int n)
{
  volatile char arr[64 * 1024];

  arr[n] = 1;
}


[PATCH, PR tree-optimization/PR68305] Support masked COND_EXPR in SLP

2015-11-12 Thread Ilya Enkovich
Hi,

This patch fixes a way operand is chosen by its num for COND_EXPR.  
Bootstrapped and regtested on x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-12  Ilya Enkovich  

PR tree-optimization/68305
* tree-vect-slp.c (vect_get_constant_vectors): Support
COND_EXPR with SSA_NAME as a condition.

gcc/testsuite/

2015-11-12  Ilya Enkovich  

PR tree-optimization/68305
* gcc.dg/vect/pr68305.c: New test.


diff --git a/gcc/testsuite/gcc.dg/vect/pr68305.c 
b/gcc/testsuite/gcc.dg/vect/pr68305.c
new file mode 100644
index 000..fde3db7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr68305.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-mavx2" { target avx_runtime } } */
+
+int a, b;
+
+void
+fn1 ()
+{
+  int c, d;
+  for (; b; b++)
+a = a ^ !c ^ !d;
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 9d97140..9402474 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2738,18 +2738,20 @@ vect_get_constant_vectors (tree op, slp_tree slp_node,
  switch (code)
{
  case COND_EXPR:
-   if (op_num == 0 || op_num == 1)
- {
-   tree cond = gimple_assign_rhs1 (stmt);
+   {
+ tree cond = gimple_assign_rhs1 (stmt);
+ if (TREE_CODE (cond) == SSA_NAME)
+   op = gimple_op (stmt, op_num + 1);
+ else if (op_num == 0 || op_num == 1)
op = TREE_OPERAND (cond, op_num);
- }
-   else
- {
-   if (op_num == 2)
- op = gimple_assign_rhs2 (stmt);
-   else
- op = gimple_assign_rhs3 (stmt);
- }
+ else
+   {
+ if (op_num == 2)
+   op = gimple_assign_rhs2 (stmt);
+ else
+   op = gimple_assign_rhs3 (stmt);
+   }
+   }
break;
 
  case CALL_EXPR:


Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset

2015-11-12 Thread James Greenhalgh
On Tue, Nov 03, 2015 at 03:43:24PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?

Comments in-line.

> 
> Thanks,
> Kyrill
> 
> 
> 2015-11-03  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.md (*condjump): Rename to...
> (condjump): ... This.
> (*compare_condjump): New define_insn_and_split.
> (*compare_cstore_insn): Likewise.
> (*cstore_insn): Rename to...
> (aarch64_cstore): ... This.
> * config/aarch64/iterators.md (CMP): Handle ne code.
> * config/aarch64/predicates.md (aarch64_imm24): New predicate.
> 
> 2015-11-03  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/cmpimm_branch_1.c: New test.
> * gcc.target/aarch64/cmpimm_cset_1.c: Likewise.

> commit 7df013a391532f39932b80c902e3b4bbd841710f
> Author: Kyrylo Tkachov 
> Date:   Mon Sep 21 10:56:47 2015 +0100
> 
> [AArch64] Improve comparison with complex immediates
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 126c9c2..1bfc870 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -369,7 +369,7 @@ (define_expand "mod3"
>}
>  )
>  
> -(define_insn "*condjump"
> +(define_insn "condjump"
>[(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>   [(match_operand 1 "cc_register" "") (const_int 0)])
>  (label_ref (match_operand 2 "" ""))
> @@ -394,6 +394,40 @@ (define_insn "*condjump"
> (const_int 1)))]
>  )
>  
> +;; For a 24-bit immediate CST we can optimize the compare for equality
> +;; and branch sequence from:
> +;; mov   x0, #imm1
> +;; movk  x0, #imm2, lsl 16 /* x0 contains CST.  */
> +;; cmp   x1, x0
> +;; b .Label

This would be easier on the eyes if you were to indent the code sequence.

+;; and branch sequence from:
+;; mov x0, #imm1
+;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp x1, x0
+;; b .Label
+;; into the shorter:
+;; sub x0, #(CST & 0xfff000)

> +;; into the shorter:
> +;; sub   x0, #(CST & 0xfff000)
> +;; subs  x0, #(CST & 0x000fff)

These instructions are not valid (2 operand sub/subs?) can you write them
out fully for this comment so I can see the data flow?

> +;; b .Label
> +(define_insn_and_split "*compare_condjump"
> +  [(set (pc) (if_then_else (EQL
> +   (match_operand:GPI 0 "register_operand" "r")
> +   (match_operand:GPI 1 "aarch64_imm24" "n"))
> +(label_ref:P (match_operand 2 "" ""))
> +(pc)))]
> +  "!aarch64_move_imm (INTVAL (operands[1]), mode)
> +   && !aarch64_plus_operand (operands[1], mode)"
> +  "#"
> +  "&& true"
> +  [(const_int 0)]
> +  {
> +HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
> +HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
> +rtx tmp = gen_reg_rtx (mode);

Can you guarantee we can always create this pseudo? What if we're a
post-register-allocation split?

> +emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
> +emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
> +rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
> +rtx cmp_rtx = gen_rtx_fmt_ee (, mode, cc_reg, const0_rtx);
> +emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
> +DONE;
> +  }
> +)
> +
>  (define_expand "casesi"
>[(match_operand:SI 0 "register_operand" ""); Index
> (match_operand:SI 1 "const_int_operand" "")   ; Lower bound
> @@ -2898,7 +2932,7 @@ (define_expand "cstore4"
>"
>  )
>  
> -(define_insn "*cstore_insn"
> +(define_insn "aarch64_cstore"
>[(set (match_operand:ALLI 0 "register_operand" "=r")
>   (match_operator:ALLI 1 "aarch64_comparison_operator"
>[(match_operand 2 "cc_register" "") (const_int 0)]))]
> @@ -2907,6 +2941,39 @@ (define_insn "*cstore_insn"
>[(set_attr "type" "csel")]
>  )
>  
> +;; For a 24-bit immediate CST we can optimize the compare for equality
> +;; and branch sequence from:
> +;; mov   x0, #imm1
> +;; movk  x0, #imm2, lsl 16 /* x0 contains CST.  */
> +;; cmp   x1, x0
> +;; cset  x2, 
> +;; into the shorter:
> +;; sub   x0, #(CST & 0xfff000)
> +;; subs  x0, #(CST & 0x000fff)
> +;; cset x1, .

Same comments as above regarding formatting and making this a valid set
of instructions.

> +(define_insn_and_split "*compare_cstore_insn"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +  (EQL:GPI (match_operand:GPI 1 "register_operand" "r")
> +   (match_operand:GPI 2 "aarch64_imm24" "n")))]
> +  "!aarch64_move_imm (INTVAL (operands[2]), mode)
> +   && !aarch64_plus_operand (operands[2], mode)"
> +  "#"
> +  "&& true"
> +  [(const_int 0)]
> +  {
> +HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff;
> +HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000;
> +rtx tmp = gen_reg_rtx (mode);
> +emit_insn (gen_add3 (

[PATCH, alpha]: Hookize some more macros

2015-11-12 Thread Uros Bizjak
2015-11-12  Uros Bizjak  

* config/alpha/alpha.h (FUNCTION_VALUE, LIBCALL_VALUE,
FUNCTION_VALUE_REGNO_P): Remove.
* config/alpha/alpha-protos.h (function_value): Remove.
* config/alpha/alpha.c (function_value): Rename to...
(alpha_function_value_1): ... this.  Make static.
(alpha_function_value, alpha_libcall_value,
alpha_function_value_regno_p): New functions.
(TARGET_FUNCTION_VALUE, TARGET_LIBCALL_VALUE,
TARGET_FUNCTION_VALUE_REGNO_P): Define.

2015-11-12  Uros Bizjak  

* config/alpha/alpha.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
* config/alpha/alpha.c (alpha_memory_latency): Make static.
(alpha_register_move_cost, alpha_memory_move_cost): New functions.
(TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define.

Bootstrapped and regression tested on alphaev68-linux-gnu, committed
to mainline SVN.

Uros.
Index: config/alpha/alpha-protos.h
===
--- config/alpha/alpha-protos.h (revision 230213)
+++ config/alpha/alpha-protos.h (working copy)
@@ -68,7 +68,6 @@
 extern void alpha_initialize_trampoline (rtx, rtx, rtx, int, int, int);
 
 extern rtx alpha_va_arg (tree, tree);
-extern rtx function_value (const_tree, const_tree, machine_mode);
 
 extern void alpha_start_function (FILE *, const char *, tree);
 extern void alpha_end_function (FILE *, const char *, tree);
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 230213)
+++ config/alpha/alpha.c(working copy)
@@ -95,7 +95,7 @@
 
 /* The number of cycles of latency we should assume on memory reads.  */
 
-int alpha_memory_latency = 3;
+static int alpha_memory_latency = 3;
 
 /* Whether the function needs the GP.  */
 
@@ -1339,6 +1339,36 @@
   return NULL_RTX;
 }
 
+/* Return the cost of moving between registers of various classes.  Moving
+   between FLOAT_REGS and anything else except float regs is expensive.
+   In fact, we make it quite expensive because we really don't want to
+   do these moves unless it is clearly worth it.  Optimizations may
+   reduce the impact of not being able to allocate a pseudo to a
+   hard register.  */
+
+static int
+alpha_register_move_cost (machine_mode /*mode*/,
+ reg_class_t from, reg_class_t to)
+{
+  if ((from == FLOAT_REGS) == (to == FLOAT_REGS))
+return 2;
+
+  if (TARGET_FIX)
+return (from == FLOAT_REGS) ? 6 : 8;
+
+  return 4 + 2 * alpha_memory_latency;
+}
+
+/* Return the cost of moving data of MODE from a register to
+   or from memory.  On the Alpha, bump this up a bit.  */
+
+static int
+alpha_memory_move_cost (machine_mode /*mode*/, reg_class_t /*regclass*/,
+   bool /*in*/)
+{
+  return 2 * alpha_memory_latency;
+}
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
scanned.  In either case, *TOTAL contains the cost result.  */
@@ -5736,9 +5766,9 @@
On Alpha the value is found in $0 for integer functions and
$f0 for floating-point functions.  */
 
-rtx
-function_value (const_tree valtype, const_tree func ATTRIBUTE_UNUSED,
-   machine_mode mode)
+static rtx
+alpha_function_value_1 (const_tree valtype, const_tree func ATTRIBUTE_UNUSED,
+   machine_mode mode)
 {
   unsigned int regnum, dummy ATTRIBUTE_UNUSED;
   enum mode_class mclass;
@@ -5793,6 +5823,33 @@
   return gen_rtx_REG (mode, regnum);
 }
 
+/* Implement TARGET_FUNCTION_VALUE.  */
+
+static rtx
+alpha_function_value (const_tree valtype, const_tree fn_decl_or_type,
+ bool /*outgoing*/)
+{
+  return alpha_function_value_1 (valtype, fn_decl_or_type, VOIDmode);
+}
+
+/* Implement TARGET_LIBCALL_VALUE.  */
+
+static rtx
+alpha_libcall_value (machine_mode mode, const_rtx /*fun*/)
+{
+  return alpha_function_value_1 (NULL_TREE, NULL_TREE, mode);
+}
+
+/* Implement TARGET_FUNCTION_VALUE_REGNO_P.
+
+   On the Alpha, $0 $1 and $f0 $f1 are the only register thus used.  */
+
+static bool
+alpha_function_value_regno_p (const unsigned int regno)
+{
+  return (regno == 0 || regno == 1 || regno == 32 || regno == 33);
+}
+
 /* TCmode complex values are passed by invisible reference.  We
should not split these values.  */
 
@@ -9908,6 +9965,10 @@
 #undef TARGET_USE_BLOCKS_FOR_CONSTANT_P
 #define TARGET_USE_BLOCKS_FOR_CONSTANT_P hook_bool_mode_const_rtx_true
 
+#undef TARGET_REGISTER_MOVE_COST
+#define TARGET_REGISTER_MOVE_COST alpha_register_move_cost
+#undef TARGET_MEMORY_MOVE_COST
+#define TARGET_MEMORY_MOVE_COST alpha_memory_move_cost
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS alpha_rtx_costs
 #undef TARGET_ADDRESS_COST
@@ -9920,6 +9981,13 @@
 #define TARGET_PROMOTE_FUNCTION_MODE 
default_promote_function_mode_always_promote
 #undef TARGET_PROMOTE_PROTOTYPES
 #define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_false
+
+#undef TARGET_

[visium] Remove obsolete prototypes

2015-11-12 Thread Eric Botcazou
Tested on visium-elf, applied on the mainline.


2015-11-12  Eric Botcazou  

* config/visium/visium-protos.h (notice_update_cc): Delete.
(print_operand): Likewise.
(print_operand_address): Likewise.

-- 
Eric BotcazouIndex: config/visium/visium-protos.h
===
--- config/visium/visium-protos.h	(revision 230204)
+++ config/visium/visium-protos.h	(working copy)
@@ -49,9 +49,6 @@ extern void visium_split_cbranch (enum r
 extern const char *output_ubranch (rtx, rtx_insn *);
 extern const char *output_cbranch (rtx, enum rtx_code, enum machine_mode, int,
    rtx_insn *);
-extern void notice_update_cc (rtx, rtx);
-extern void print_operand (FILE *, rtx, int);
-extern void print_operand_address (FILE *, rtx);
 extern void split_double_move (rtx *, enum machine_mode);
 extern void visium_expand_copysign (rtx *, enum machine_mode);
 extern void visium_expand_int_cstore (rtx *, enum machine_mode);


Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries  wrote:
> Hi,
>
> [ See also related discussion at
> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]
>
> this patch removes the usage of first_pass_instance from pass_vrp.
>
> the patch:
> - limits itself to pass_vrp, but my intention is to remove all
>   usage of first_pass_instance
> - lacks an update to gdbhooks.py
>
> Modifying the pass behaviour depending on the instance number, as
> first_pass_instance does, break compositionality of the pass list. In other
> words, adding a pass instance in a pass list may change the behaviour of
> another instance of that pass in the pass list. Which obviously makes it
> harder to understand and change the pass list. [ I've filed this issue as
> PR68247 - Remove pass_first_instance ]
>
> The solution is to make the difference in behaviour explicit in the pass
> list, and no longer change behaviour depending on instance number.
>
> One obvious possible fix is to create a duplicate pass with a different
> name, say 'pass_vrp_warn_array_bounds':
> ...
>   NEXT_PASS (pass_vrp_warn_array_bounds);
>   ...
>   NEXT_PASS (pass_vrp);
> ...
>
> But, AFAIU that requires us to choose a different dump-file name for each
> pass. And choosing vrp1 and vrp2 as new dump-file names still means that
> -fdump-tree-vrp no longer works (which was mentioned as drawback here:
> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).
>
> This patch instead makes pass creation parameterizable. So in the pass list,
> we use:
> ...
>   NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
>   ...
>   NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
> ...
>
> This approach gives us clarity in the pass list, similar to using a
> duplicate pass 'pass_vrp_warn_array_bounds'.
>
> But it also means -fdump-tree-vrp still works as before.
>
> Good idea? Other comments?

It's good to get rid of the first_pass_instance hack.

I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
we can just use NEXT_PASS with the extra argument being optional...

I don't see the need for giving clone_with_args a new name, just use an overload
of clone ()?  [ideally C++ would allow us to say that only one overload may be
implemented]

Thanks,
Richard.

> Thanks,
> - Tom


Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-12 Thread Torvald Riegel
On Wed, 2015-11-11 at 15:04 +, Szabolcs Nagy wrote:
> On 10/11/15 18:29, Torvald Riegel wrote:
> > On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote:
> >> On 09/11/15 00:19, Torvald Riegel wrote:
> >>> I've not yet created tests for the full list of functions specified as
> >>> transaction-safe in the TS, but my understanding is that this list was
> >>> created after someone from the ISO C++ TM study group looked at libstdc
> >>> ++'s implementation and investigated which functions might be feasible
> >>> to be declared transaction-safe in it.
> >>>
> >>
> >> is that list available somewhere?
> >
> > See the TM TS, N4514.
> >
> 
> i was looking at an older version,
> things make more sense now.
> 
> i think system() should not be transaction safe..
> 
> i wonder what's the plan for getting libc functions
> instrumented (i assume that is needed unless hw
> support is used).

No specific plans so far.  We'll wait and see, I guess.  TM is still in
a chicken-and-egg situation.

> >> xmalloc
> >> the runtime exits on memory allocation failure,
> >> so it is not possible to use it safely.
> >> (i think it should be possible to roll back the
> >> transaction in case of internal allocation failure
> >> and retry with a strategy that does not need dynamic
> >> allocation).
> >
> > Not sure what you mean by "safely".  Hardening against out-of-memory
> > situations hasn't been considered to be of high priority so far, but I'd
> > accept patches for that that don't increase complexity signifantly and
> > don't hamper performance.
> >
> 
> i consider this a library safety issue.
> 
> (a library or runtime is not safe to use if it may terminate
> the process in case of internal failures.)

If it is truly a purely internal failure, then aborting might be the
best thing one can do if there is no sensible way to try to recover from
the error (ie, take a fail-fast approach).
Out-of-memory errors are not purely internal failures.  I agree that it
would be nice to have a fallback, but for some features there simply is
none (eg, the program can't require rollback to be possible and yet not
provide enough memory for this to be achievable).  Given that this
transactions have to be used from C programs too, there's not much
libitm can do except perhaps call user-supplied handlers.

> >> uint64_t GTM::gtm_spin_count_var = 1000;
> >> i guess this was supposed to be tunable.
> >> it seems libitm needs some knobs (strategy, retries,
> >> spin counts), but there is no easy way to adapt these
> >> for a target/runtime environment.
> >
> > Sure, more performance tuning knobs would be nice.
> >
> 
> my problem was with getting the knobs right at runtime.
> 
> (i think this will need a solution to make tm practically
> useful, there are settings that seem to be sensitive to
> the properties of the underlying hw.. this also seems
> to be a problem for glibc lock elision retry policies.)

Yes, that applies to many tuning settings in lots of places.  And
certainly to TM implementations too :)

> >> sys_futex0
> >> i'm not sure why this has arch specific implementations
> >> for some targets but not others. (syscall is not in the
> >> implementation reserved namespace).
> >
> > Are there archs that support libitm but don't have a definition of this
> > one?
> >
> 
> i thought all targets were supported on linux
> (the global lock based strategies should work)
> i can prepare a sys_futex0 for arm and aarch64.

arm and aarch64 should be supported according to configure.tgt.  Also
see the comment in config/linux/futex_bits.h if you want to change
something there.  I haven't tried arm at all so far.



Re: [PATCH, PR tree-optimization/PR68305] Support masked COND_EXPR in SLP

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 1:03 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch fixes a way operand is chosen by its num for COND_EXPR.  
> Bootstrapped and regtested on x86_64-unknown-linux-gnu.  OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-12  Ilya Enkovich  
>
> PR tree-optimization/68305
> * tree-vect-slp.c (vect_get_constant_vectors): Support
> COND_EXPR with SSA_NAME as a condition.
>
> gcc/testsuite/
>
> 2015-11-12  Ilya Enkovich  
>
> PR tree-optimization/68305
> * gcc.dg/vect/pr68305.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr68305.c 
> b/gcc/testsuite/gcc.dg/vect/pr68305.c
> new file mode 100644
> index 000..fde3db7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr68305.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +/* { dg-additional-options "-mavx2" { target avx_runtime } } */
> +
> +int a, b;
> +
> +void
> +fn1 ()
> +{
> +  int c, d;
> +  for (; b; b++)
> +a = a ^ !c ^ !d;
> +}
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 9d97140..9402474 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -2738,18 +2738,20 @@ vect_get_constant_vectors (tree op, slp_tree slp_node,
>   switch (code)
> {
>   case COND_EXPR:
> -   if (op_num == 0 || op_num == 1)
> - {
> -   tree cond = gimple_assign_rhs1 (stmt);
> +   {
> + tree cond = gimple_assign_rhs1 (stmt);
> + if (TREE_CODE (cond) == SSA_NAME)
> +   op = gimple_op (stmt, op_num + 1);
> + else if (op_num == 0 || op_num == 1)
> op = TREE_OPERAND (cond, op_num);
> - }
> -   else
> - {
> -   if (op_num == 2)
> - op = gimple_assign_rhs2 (stmt);
> -   else
> - op = gimple_assign_rhs3 (stmt);
> - }
> + else
> +   {
> + if (op_num == 2)
> +   op = gimple_assign_rhs2 (stmt);
> + else
> +   op = gimple_assign_rhs3 (stmt);
> +   }
> +   }
> break;
>
>   case CALL_EXPR:


[PATCH] Fix PR68306

2015-11-12 Thread Richard Biener

The following fixes PR68306, an ordering issue with my last BB
vectorization patch.  Fixed by removing that ordering requirement.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2015-11-12  Richard Biener  

PR tree-optimization/68306
* tree-vect-data-refs.c (verify_data_ref_alignment): Remove
relevant and vectorizable checks here.
(vect_verify_datarefs_alignment): Add relevant check here.

* gcc.dg/pr68306.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 230216)
--- gcc/tree-vect-data-refs.c   (working copy)
*** verify_data_ref_alignment (data_referenc
*** 909,922 
gimple *stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
  
!   if (!STMT_VINFO_RELEVANT_P (stmt_info))
! return true;
! 
!   /* For interleaving, only the alignment of the first access matters. 
!  Skip statements marked as not vectorizable.  */
!   if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
!&& GROUP_FIRST_ELEMENT (stmt_info) != stmt)
!   || !STMT_VINFO_VECTORIZABLE (stmt_info))
  return true;
  
/* Strided accesses perform only component accesses, alignment is
--- 889,897 
gimple *stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
  
!   /* For interleaving, only the alignment of the first access matters.   */
!   if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
!   && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
  return true;
  
/* Strided accesses perform only component accesses, alignment is
*** vect_verify_datarefs_alignment (loop_vec
*** 965,972 
unsigned int i;
  
FOR_EACH_VEC_ELT (datarefs, i, dr)
! if (! verify_data_ref_alignment (dr))
!   return false;
  
return true;
  }
--- 940,954 
unsigned int i;
  
FOR_EACH_VEC_ELT (datarefs, i, dr)
! {
!   gimple *stmt = DR_STMT (dr);
!   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
! 
!   if (!STMT_VINFO_RELEVANT_P (stmt_info))
!   continue;
!   if (! verify_data_ref_alignment (dr))
!   return false;
! }
  
return true;
  }
Index: gcc/testsuite/gcc.dg/pr68306.c
===
*** gcc/testsuite/gcc.dg/pr68306.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr68306.c  (working copy)
***
*** 0 
--- 1,10 
+ /* { dg-do compile } */
+ /* { dg-options "-O3" } */
+ 
+ enum powerpc_pmc_type { PPC_PMC_IBM };
+ struct {
+ unsigned num_pmcs;
+ enum powerpc_pmc_type pmc_type;
+ } a;
+ enum powerpc_pmc_type b;
+ void fn1() { a.num_pmcs = a.pmc_type = b; }


RE: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-12 Thread Robert Suchanek
Hi Christophe,

> >
> Hi,
> I confirm that this fixes the build errors I was seeing.
> Thanks.
> 

Thanks for checking this.

I'm still seeing a number of ICEs on the gcc-testresults mailing list
across various ports but these are likely to be caused another patch.
They are already reported as PR68293 and PR68296.

Regards,
Robert


Re: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins

2015-11-12 Thread Charles Baylis
On 9 November 2015 at 09:03, Ramana Radhakrishnan
 wrote:
>
> Missing comment and please prefix this with NEON_ or SIMD_ .
>
>>
>> +#define ENDIAN_LANE_N(mode, n)  \
>> +  (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
>> +
>
> Otherwise OK -

With those changes, the attached patch was applied as r230142
From 4a05b67a1757e88e1989ce7515cd10de0a6def1c Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Wed, 17 Jun 2015 17:08:30 +0100
Subject: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins

gcc/ChangeLog:

  Charles Baylis  

	PR target/63870
	* config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator
	qualifier_struct_load_store_lane_index.
	(builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
	(arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON
	argument qualifiers.
	(arm_expand_neon_builtin): Handle new NEON argument qualifier.
	* config/arm/arm.h (NEON_ENDIAN_LANE_N): New macro.

Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647
---
 gcc/config/arm/arm-builtins.c | 48 +++
 gcc/config/arm/arm.c  |  1 +
 gcc/config/arm/arm.h  |  6 ++
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index bad3dc3..d0bd777 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -67,7 +67,9 @@ enum arm_type_qualifiers
   /* Polynomial types.  */
   qualifier_poly = 0x100,
   /* Lane indices - must be within range of previous argument = a vector.  */
-  qualifier_lane_index = 0x200
+  qualifier_lane_index = 0x200,
+  /* Lane indices for single lane structure loads and stores.  */
+  qualifier_struct_load_store_lane_index = 0x400
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -1963,6 +1965,7 @@ typedef enum {
   NEON_ARG_COPY_TO_REG,
   NEON_ARG_CONSTANT,
   NEON_ARG_LANE_INDEX,
+  NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX,
   NEON_ARG_MEMORY,
   NEON_ARG_STOP
 } builtin_arg;
@@ -2020,9 +2023,9 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
 /* Expand a Neon builtin.  */
 static rtx
 arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
-		  int icode, int have_retval, tree exp, ...)
+		  int icode, int have_retval, tree exp,
+		  builtin_arg *args)
 {
-  va_list ap;
   rtx pat;
   tree arg[SIMD_MAX_BUILTIN_ARGS];
   rtx op[SIMD_MAX_BUILTIN_ARGS];
@@ -2037,13 +2040,11 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 	  || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
 target = gen_reg_rtx (tmode);
 
-  va_start (ap, exp);
-
   formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode]));
 
   for (;;)
 {
-  builtin_arg thisarg = (builtin_arg) va_arg (ap, int);
+  builtin_arg thisarg = args[argc];
 
   if (thisarg == NEON_ARG_STOP)
 	break;
@@ -2079,6 +2080,18 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 		op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
 	  break;
 
+	case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX:
+	  gcc_assert (argc > 1);
+	  if (CONST_INT_P (op[argc]))
+		{
+		  neon_lane_bounds (op[argc], 0,
+GET_MODE_NUNITS (map_mode), exp);
+		  /* Keep to GCC-vector-extension lane indices in the RTL.  */
+		  op[argc] =
+		GEN_INT (NEON_ENDIAN_LANE_N (map_mode, INTVAL (op[argc])));
+		}
+	  goto constant_arg;
+
 	case NEON_ARG_LANE_INDEX:
 	  /* Previous argument must be a vector, which this indexes.  */
 	  gcc_assert (argc > 0);
@@ -2089,19 +2102,22 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 		}
 	  /* Fall through - if the lane index isn't a constant then
 		 the next case will error.  */
+
 	case NEON_ARG_CONSTANT:
+constant_arg:
 	  if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
-		error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, "
-		   "expected %", argc + 1);
+		{
+		  error ("%Kargument %d must be a constant immediate",
+			 exp, argc + 1);
+		  return const0_rtx;
+		}
 	  break;
+
 case NEON_ARG_MEMORY:
 	  /* Check if expand failed.  */
 	  if (op[argc] == const0_rtx)
-	  {
-		va_end (ap);
 		return 0;
-	  }
 	  gcc_assert (MEM_P (op[argc]));
 	  PUT_MODE (op[argc], mode[argc]);
 	  /* ??? arm_neon.h uses the same built-in functions for signed
@@ -2122,8 +2138,6 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 	}
 }
 
-  va_end (ap);
-
   if (have_retval)
 switch (argc)
   {
@@ -2235,6 +2249,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 
   if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
 	args[k] = NEON_ARG_LANE_INDEX;
+  else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
+	args[k] = NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX;
   else if (d->qu

Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-12 Thread Charles Baylis
On 9 November 2015 at 13:35, Ramana Radhakrishnan
 wrote:
>
>
> On 08/11/15 00:26, charles.bay...@linaro.org wrote:
>> From: Charles Baylis 
>>
>>   Charles Baylis  
>>
>>   * config/arm/neon.md (neon_vld1_lane): Remove error for invalid
>>   lane number.
>>   (neon_vst1_lane): Likewise.
>>   (neon_vld2_lane): Likewise.
>>   (neon_vst2_lane): Likewise.
>>   (neon_vld3_lane): Likewise.
>>   (neon_vst3_lane): Likewise.
>>   (neon_vld4_lane): Likewise.
>>   (neon_vst4_lane): Likewise.
>>
>
> The only way we can get here is through the intrinsics - we do a check for 
> lane numbers earlier.
>
> If things go horribly wrong - the assembler will complain, so it's ok to 
> elide this internal_error here, thus OK.

Applied as r230144


Re: [hsa 2/12] Modifications to libgomp proper

2015-11-12 Thread Thomas Schwinge
Hi!

On Thu, 12 Nov 2015 11:11:33 +0100, Jakub Jelinek  wrote:
> On Thu, Nov 05, 2015 at 10:54:42PM +0100, Martin Jambor wrote:
> > --- a/libgomp/libgomp.h
> > +++ b/libgomp/libgomp.h
> > @@ -876,7 +876,8 @@ struct gomp_device_descr
> >void *(*dev2host_func) (int, void *, const void *, size_t);
> >void *(*host2dev_func) (int, void *, const void *, size_t);
> >void *(*dev2dev_func) (int, void *, const void *, size_t);
> > -  void (*run_func) (int, void *, void *);
> > +  void (*run_func) (int, void *, void *, const void *);
> 
> Adding arguments to existing plugin methods is a plugin ABI incompatible
> change.  We now have:
>   DLSYM (version);
>   if (device->version_func () != GOMP_VERSION)
> {
>   err = "plugin version mismatch";
>   goto fail;
> }
> so there is a way to deal with it, but you need to adjust all plugins.

I'm confused -- didn't we agree that we don't need to maintain backwards
compatibility in the libgomp <-> plugins interface?  (Nathan?)  As far as
I remember, the argument was that libgomp and all its plugins will always
be built from the same source tree, so will be compatible with each
other, "by definition"?

(We do need, and have, versioning between GCC proper and libgomp
interfaces.)


> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -1248,7 +1248,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
> > *devicep,
> >splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
> >gomp_mutex_unlock (&devicep->lock);
> >if (tgt_fn == NULL)
> > -   gomp_fatal ("Target function wasn't mapped");
> > +   {
> > + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> > +   return NULL;
> > + else
> > +   gomp_fatal ("Target function wasn't mapped");
> > +   }
> >  
> >return (void *) tgt_fn->tgt_offset;
> >  }
> > @@ -1276,6 +1281,7 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> >  return gomp_target_fallback (fn, hostaddrs);
> >  
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  assert (fn_addr);
> 
> I must say I really don't like putting asserts into libgomp, in production
> it is after all not built with -D_NDEBUG.

I like them, because they help during development, and for getting
higher-quality bug reports from users, and they serve as source code
documentation.  Of course, I understand your -- I suppose -- performance
worries.  Does such an NULL checking assert -- hopefully marked as
"unlikely" -- cause any noticeable overhead, though?


> But this shows a worse problem,
> if you have GCC 5 compiled OpenMP code, of course there won't be HSA
> offloaded copy, but if you try to run it on a box with HSA offloading
> enabled, you can run into this assertion failure.

That's one of the issues that I'm working on resolving with my
"Forwarding -foffload=[...] from the driver (compile-time) to libgomp
(run-time)" patch,
.
In such a case (no GOMP_offload_register_ver call for HSA), HSA
offloading would not be considered (not "enabled") in libgomp.  (It'll be
two more weeks before I can make progress with that patch; will be
attending SuperComputing 2015 next week -- anyone else will be there,
too?)

> Supposedly the old APIs (GOMP_target, GOMP_target_update, GOMP_target_data)
> should treat GOMP_OFFLOAD_CAP_SHARED_MEM capable devices as unconditional
> device fallback?


> > @@ -1297,7 +1304,7 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> >  void
> >  GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
> > void **hostaddrs, size_t *sizes, unsigned short *kinds,
> > -   unsigned int flags, void **depend)
> > +   unsigned int flags, void **depend, const void *kernel_launch)
> 
> GOMP_target_ext has different arguments, you get the num_teams and
> thread_limit clauses values in there already (if known at compile time or
> before entering target region; 0 stands for implementation defined choice,
> -1 for unknown before GOMP_target_ext).
> Plus I must say I really don't like the addition of HSA specific argument
> to the API, it is unclean and really doesn't scale, when somebody adds
> support for another offloading target, would we add again another argument?
> Can't use the same one, because one could have configured both HSA and that
> other kind offloading at the same time and which one is picked would be only
> a runtime decision, based on env vars of omp_set_default_device etc.
> num_teams/thread_limit, as runtime arguments, you already get on the trunk.
> For compile time decided values, those should go into some data section
> and be somehow attached to what fn is translated into in the AVL tree (which
> you really don't need to use for variables on GOMP_OFFLOAD_CAP_SHARED_MEM
> obviously, but can still use for the kernels, and populate during
> registration of the offloading regi

[Ada] Spurious visibility error with derivation and incomplete declaration

2015-11-12 Thread Arnaud Charlet
This patch fixes a spurious visibility error on an operator of a derived type,
when the parent type is declared in another unit, and has an incomplete type
declaration. The primitive operations of the derived types are to be found in
the scope of its base type, and not in that of its ancestor.

The following must compile quietly:

   gnatmake -q operator_use

---
with CALC_PACKAGE;
with STORE_PACKAGE;
with ADA.TEXT_IO;
use type CALC_PACKAGE.RECORD_TYPE;
procedure OPERATOR_USE is

  B : CALC_PACKAGE.RECORD_TYPE;

begin
  B := CALC_PACKAGE.GET_VALUE;

  if STORE_PACKAGE.STORE(4).MY_ACCESS.all.MY_VALUE > B
  then
ADA.TEXT_IO.PUT_LINE("TRUE");
  end if;

  if CALC_PACKAGE.">"(STORE_PACKAGE.STORE(4).MY_ACCESS.all.MY_VALUE, B)
  then
ADA.TEXT_IO.PUT_LINE("TRUE again");
  end if;

end OPERATOR_USE;
---
with TYPE_PACKAGE;
package CALC_PACKAGE is

  type RECORD_TYPE is new TYPE_PACKAGE.BASE_TYPE;

  function ">"
(A : in RECORD_TYPE;
 B : in RECORD_TYPE)
return BOOLEAN;

  function GET_VALUE
return RECORD_TYPE;

end CALC_PACKAGE;
---
package body CALC_PACKAGE is
  C_VAL : INTEGER := 0;

  function GET_VALUE
return RECORD_TYPE is
  begin
C_VAL := C_VAL + 1;
return (X => C_VAL,
Y => 0);
  end GET_VALUE;

  function ">"
(A : in RECORD_TYPE;
 B : in RECORD_TYPE)
 return BOOLEAN is
  begin
return A.X > B.Y;
  end ">";
end CALC_PACKAGE;
---
with CALC_PACKAGE;
package STORE_PACKAGE is

  type INDEX_TYPE is range 1 .. 10;

  type RECORD_TYPE is
record
  MY_VALUE : CALC_PACKAGE.RECORD_TYPE;
end record;

  type RECORD_ACCESS_TYPE is access all RECORD_TYPE;

  type STORE_TYPE is
record
  MY_ACCESS : RECORD_ACCESS_TYPE;
end record;

  type ARRAY_TYPE is
array (INDEX_TYPE)
   of STORE_TYPE;

  STORE : ARRAY_TYPE := (others => (my_access => new Record_Type));
end STORE_PACKAGE;
---
package TYPE_PACKAGE is
   type BASE_TYPE;

  type BASE_TYPE is
record
  X : INTEGER := 1;
  Y : INTEGER := 0;
end record;
end TYPE_PACKAGE;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Ed Schonberg  

* sem_util.adb (Collect_Primitive_Operations): If the type is
derived from a type declared elsewhere that has an incomplete
type declaration, the primitives are found in the scope of the
type nat that of its ancestor.

Index: sem_util.adb
===
--- sem_util.adb(revision 230239)
+++ sem_util.adb(working copy)
@@ -4223,6 +4223,14 @@
  then
 Id := Defining_Entity (Incomplete_View (Parent (B_Type)));
 
+--  If T is a derived from a type with an incomplete view declared
+--  elsewhere, that incomplete view is irrelevant, we want the
+--  operations in the scope of T.
+
+if Scope (Id) /= Scope (B_Type) then
+   Id := Next_Entity (B_Type);
+end if;
+
  else
 Id := Next_Entity (B_Type);
  end if;


[Ada] Library-level error on aspects

2015-11-12 Thread Arnaud Charlet
This patch fixes a bug where GNAT fails to detect an error on an aspect that
must be applied to a library-level entity.

The following test must give an error:
tls.adb:2:26: entity for aspect "Thread_Local_Storage" must be library level
entity

procedure Tls is
   V : Natural := 0 with Thread_Local_Storage;
begin
   null;
end Tls;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Bob Duff  

* sem_prag.adb (Check_Arg_Is_Library_Level_Local_Name): A
pragma that comes from an aspect does not "come from source",
so we need to test whether it comes from an aspect.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 230242)
+++ sem_prag.adb(working copy)
@@ -4328,8 +4328,12 @@
   begin
  Check_Arg_Is_Local_Name (Arg);
 
+ --  If it came from an aspect, we want to give the error just as if it
+ --  came from source.
+
  if not Is_Library_Level_Entity (Entity (Get_Pragma_Arg (Arg)))
-   and then Comes_From_Source (N)
+   and then (Comes_From_Source (N)
+   or else Present (Corresponding_Aspect (Parent (Arg
  then
 Error_Pragma_Arg
   ("argument for pragma% must be library level entity", Arg);


Re: [OpenACC] declare directive

2015-11-12 Thread James Norris

Jakub

On 11/12/2015 03:09 AM, Jakub Jelinek wrote:

On Wed, Nov 11, 2015 at 07:07:58PM -0600, James Norris wrote:

+ oacc_declare_returns->remove (t);
+
+ if (oacc_declare_returns->elements () == 0)
+   {
+ delete oacc_declare_returns;
+ oacc_declare_returns = NULL;
+   }


Something for incremental patch:
1) might be nice to have some assertion that at the end of gimplify_body
or so oacc_declare_returns is NULL
2) what happens if you refer to automatic variables of other functions
(C or Fortran nested functions, maybe C++ lambdas); shall those be
unmapped at the end of the (nested) function's body?



Ok. Thanks! Will put on my TODO list.


@@ -5858,6 +5910,10 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree 
decl,
flags |= GOVD_FIRSTPRIVATE;
break;
  case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
+  if (is_global_var (decl)
+ && ctx->region_type & (ORT_ACC_PARALLEL | ORT_ACC_KERNELS)


Please put this condition as cheapest first.  I'd also surround
it into (), just to make it clear that the bitwise & is intentional.
Perhaps () != 0.


+ && device_resident_p (decl))
+   flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;



+ case GOMP_MAP_FROM:
+   kinds[i] = GOMP_MAP_FORCE_FROM;
+   GOACC_enter_exit_data (device, 1, &hostaddrs[i], &sizes[i],
+  &kinds[i], 0, 0);


Wrong indentation.



Fixed.


Ok with those two changes and please think about the incremental stuff.


Again, thanks for taking the time for the review.

Jim



Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-11-12 Thread Jonathan Wakely

On 12/11/15 11:40 +, Jonathan Wakely wrote:

On 18/09/15 12:01 -0400, Jennifer Yao wrote:

Forgot to include the patch.

On Fri, Sep 18, 2015 at 11:17 AM, Jennifer Yao
 wrote:

A number of functions in libstdc++ are guarded by the _GLIBCXX_USE_C99
preprocessor macro, which is only defined on systems that pass all of
the checks for a large set of C99 functions. Consequently, on systems
which lack any of the required C99 facilities (e.g. Cygwin, which
lacks some C99 complex math functions), the numeric conversion
functions (std::stoi(), std::stol(), std::to_string(), etc.) are not
defined—a rather silly outcome, as none of the numeric conversion
functions are implemented using C99 math functions.

This patch enables numeric conversion functions on the aforementioned
systems by splitting the checks for C99 support and defining several
new macros (_GLIBCXX_USE_C99_STDIO, _GLIBCXX_USE_C99_STDLIB, and
_GLIBCXX_USE_C99_WCHAR), which replace the use of _GLIBCXX_USE_C99 in
#if conditionals where appropriate.


(Coming back to this now that Jennifer's copyright assignment is
complete...)

Splitting the _GLIBCXX_USE_C99 macro into more fine-grained macros for
separate features is definitely the right direction.

However your patch also changes the configure tests to use -std=c++0x
(which should be -std=c++11, but that's a minor point). On an OS that
only makes the C99 library available conditionally that will mean that
configure determines that C99 library features are supported, but we
will get errors if we try to use those features in C++03 parts of the
library.

I think a more complete solution is to have two sets of configure
tests and two sets of macros, so that we define _GLIBCXX_USE_C99_STDIO
when C99 stdio is available unconditionally, and define
_GLIBCXX11_USE_C99_STDIO when it's available with -std=c++11.

Then in the library code we can check _GLIBCXX_USE_C99_STDIO if we
want to use C99 features in C++03 code, and check
_GLIBCXX11_USE_C99_STDIO if we want to use the features in C++11 code.

That should still solve the problem for the numeric conversion
functions, because they are defined in C++11 and so would check
_GLIBCXX11_USE_C99_STDIO, which will be defined for newlib.

Other pieces of the library, such as locales, will use
_GLIBCXX_USE_C99_STDIO and that might still be false for newlib (and
for other strict C libraries like the Solaris and FreeBSD libc).

I will make the changes to acinclude.m4 to duplicate the tests, so we
test once with -std=c++98 and once with -std=c++11, and then change
the library to check either _GLIBCXX_xxx or _GLIBCXX11_xxx as
appropriate.


Here's a patch implementing my suggestion.

The major changes since Jennifer's original patch are in acinclude.m4,
to do the autoconf tests once with -std=c++98 and again with
-std=c++11, and in include/bits/c++config to define the
_GLIBCXX_USE_C99_XXX macros according to either _GLIBCXX98_USE_CXX_XXX
or _GLIBCXX11_USE_CXX_XXX, depending on the standard mode in effect
when the file is included.

Because those new definitions in bits/c++config are unconditional I
had to adjust a few #ifdef tests to use #if instead.

I also removed the changes to GLIBCXX_CHECK_C99_TR1, so that there are
no changes to the macros used for the TR1 library. As a follow-up
change I will add a test for  to GLIBCXX_ENABLE_C99 and
change several C++ headers to stop using the TR1 macros.

This passes all tests on powerpc64le-linux, I'll also try to test on
DragonFly and FreeBSD.

Does this look good to everyone?

One downside of this change is that we introduce some (hopefully safe)
ODR violations, where inline functions and templates that depend on
_GLIBCXX_USE_C99_FOO might now be defined differently in C++98 and
C++11 code. Previously they had the same definition, even though in
C++11 mode the value of the _GLIBCXX_USE_C99_FOO macro might have been
sub-optimal (i.e. the C99 features were usable, but the macro said
they weren't). Those ODR violatiosn could be avoided if needed, by
always using the _GLIBCXX98_USE_C99_FOO macro in code that can be
included from either C++98 or C++11. We could still use the
_GLIBCXX11_USE_C99_FOO macro in pure C++11 code (such as the numeric
conversion functions) and get most of the benefit of this change.

commit 1459e1d0a0033a8f2605d33f52e2bf789fb6ab33
Author: Jonathan Wakely 
Date:   Thu Nov 12 13:19:38 2015 +

More fine-grained autoconf checks for C99 library

2015-09-18  Jennifer Yao  
	Jonathan Wakely  

	PR libstdc++/58393
	PR libstdc++/61580
	* acinclude.m4 (GLIBCXX_ENABLE_C99): Perform tests twice, with
	-std=c++11 as well as -std=c++98, and define separate macros for each.
	Cache the results of checking for complex math and wide character
	functions. Reformat for readability.
	* config.h.in: Regenerate.
	* include/bits/c++config: Define _GLIBCXX_USE_C99_XXX macros to
	either _GLIBCXX98_USE_C99_XXX or _GLIBCXX11_USE_C99_XXX according to
	language standa

Re: [v3 PATCH] Implement D0013R2, logical type traits.

2015-11-12 Thread Jonathan Wakely

On 12/11/15 00:46 +0200, Ville Voutilainen wrote:

On 12 November 2015 at 00:18, Jonathan Wakely  wrote:

So I think we want to define them again, independently, in
, even though it might lead to ambiguities


Here. Tested again on Linux-PPC64.

2015-11-11  Ville Voutilainen  

   Implement D0013R2, logical type traits.

   /libstdc++-v3
   * include/experimental/type_traits (conjunction, disjunction,
   negation, conjunction_v, disjunction_v, negation_v): New.
   * include/std/type_traits (conjunction, disjunction, negation):
   Likewise.
   * testsuite/20_util/declval/requirements/1_neg.cc: Adjust.
   * testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Likewise.
   * testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
   Likewise.
   * testsuite/experimental/type_traits/value.cc: Likewise.
   * testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc:
New.
   * testsuite/20_util/logical_traits/requirements/typedefs.cc: Likewise.
   * testsuite/20_util/logical_traits/value.cc: Likewise.

   /testsuite
   * g++.dg/cpp0x/Wattributes1.C: Adjust.


OK for trunk, thanks.




Re: open acc default data attribute

2015-11-12 Thread Nathan Sidwell

On 11/12/15 03:53, Jakub Jelinek wrote:


+  error ("%qE not specified in enclosing OpenACC %s construct",
+DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind);
+  error_at (ctx->location, "enclosing OpenACC %s construct", rkind);

I'd use %qs instead of %s.


thanks,

nathan



Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Tom de Vries

On 12/11/15 13:26, Richard Biener wrote:

On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries  wrote:

Hi,

[ See also related discussion at
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]

this patch removes the usage of first_pass_instance from pass_vrp.

the patch:
- limits itself to pass_vrp, but my intention is to remove all
   usage of first_pass_instance
- lacks an update to gdbhooks.py

Modifying the pass behaviour depending on the instance number, as
first_pass_instance does, break compositionality of the pass list. In other
words, adding a pass instance in a pass list may change the behaviour of
another instance of that pass in the pass list. Which obviously makes it
harder to understand and change the pass list. [ I've filed this issue as
PR68247 - Remove pass_first_instance ]

The solution is to make the difference in behaviour explicit in the pass
list, and no longer change behaviour depending on instance number.

One obvious possible fix is to create a duplicate pass with a different
name, say 'pass_vrp_warn_array_bounds':
...
   NEXT_PASS (pass_vrp_warn_array_bounds);
   ...
   NEXT_PASS (pass_vrp);
...

But, AFAIU that requires us to choose a different dump-file name for each
pass. And choosing vrp1 and vrp2 as new dump-file names still means that
-fdump-tree-vrp no longer works (which was mentioned as drawback here:
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).

This patch instead makes pass creation parameterizable. So in the pass list,
we use:
...
   NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
   ...
   NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
...

This approach gives us clarity in the pass list, similar to using a
duplicate pass 'pass_vrp_warn_array_bounds'.

But it also means -fdump-tree-vrp still works as before.

Good idea? Other comments?


It's good to get rid of the first_pass_instance hack.

I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
we can just use NEXT_PASS with the extra argument being optional...


I suppose I could use NEXT_PASS in the pass list, and expand into 
NEXT_PASS_WITH_ARG in pass-instances.def.


An alternative would be to change the NEXT_PASS macro definitions into 
vararg variants. But the last time I submitted something with a vararg 
macro ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I 
got a question about it ( 
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html ), so I tend to 
avoid using vararg macros.



I don't see the need for giving clone_with_args a new name, just use an overload
of clone ()?


That's what I tried initially, but I ran into:
...
src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* 
opt_pass::clone()’ was hidden [-Woverloaded-virtual]

   virtual opt_pass *clone ();
 ^
src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass* 
{anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
   opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp 
(m_ctxt, warn_array_bounds_p); }

...

Googling the error message gives this discussion: ( 
http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions 
), and indeed adding

  "using gimple_opt_pass::clone;"
in class pass_vrp makes the warning disappear.

I'll submit an updated version.

Thanks,
- Tom

> [ideally C++ would allow us to say that only one overload may be
> implemented]


[PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Alexander Monakov
Hello,

I'm proposing the following patch as a step towards resolving the issue with
inaccessibility of stack storage (.local memory) in PTX to other threads than
the one using that stack.  The idea is to have preallocated stacks, and have
__nvptx_stacks[] array in shared memory hold current stack pointers.  Each
thread is maintaining __nvptx_stacks[tid.y] as its stack pointer, thus for
OpenMP the intent is to preallocate on a per-warp basis (not per-thread).
For OpenMP SIMD regions we'll have to ensure that conflicting accesses are not
introduced.

I've exposed a new command-line option -msoft-stack to ease testing, but for
OpenMP we'll have to automatically flip it based on function attributes.
Right now it's not easy because OpenMP and OpenACC both use "omp declare
target".  Jakub, I seem to recall a discussion about OpenACC changing to use a
separate attribute, but I cannot find it now.  Any advice here?

This approach also allows to implement alloca.  However, to drop
alloca-avoiding changes in libgomp we'd have to selectively enable
-msoft-stack there, only for functions that OpenACC wouldn't use.

I've run it through make -k check-c regtesting.  These are new fails, all
mysterious:

+FAIL: gcc.c-torture/execute/20090113-2.c   -O[123s]  execution test
Execution failure with invalid memory access.

+FAIL: gcc.c-torture/execute/20090113-3.c   -O[123s]  execution test
Times out (looping infinitely).

The above two I had difficulties investigating due to cuda-gdb 7.0 not showing
dissassembly for the misbehaving function.

+FAIL: gcc.c-torture/execute/loop-15.c   -O2  execution test
Rather surprising and unclear failure due to branch stack overflow.

There are also tests that now pass:
+PASS: gcc.c-torture/execute/20020529-1.c   -O0  execution test
Used to fail with invalid memory access.

+PASS: gcc.dg/sibcall-9.c execution test
(not meaningful on NVPTX)

+PASS: gcc.dg/torture/pr54261-1.c   -O[0123s]  execution test
Atomic modification to stack variables now works.

gcc/
* config/nvptx/nvptx.c (need_softstack_decl): Declare.
(nvptx_declare_function_name): Handle TARGET_SOFT_STACK.
(nvptx_output_return): Restore stack pointer if needed.
(nvptx_file_end): Emit declaration of __nvptx_stacks.
* config/nvptx/nvptx.opt (msoft-stack): New option.
* doc/invoke.texi (-msoft-stack): Document.

libgcc/
* config/nvptx/crt0.s (__nvptx_stacks): Define.
(%__softstack): Define 128 KiB stack for -msoft-stack.
(__main): Setup __nvptx_stacks.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 0204ad3..df915b9 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -114,6 +114,9 @@ static unsigned worker_red_align;
 #define worker_red_name "__worker_red"
 static GTY(()) rtx worker_red_sym;
 
+/* True if any function references __nvptx_stacks.  */
+static bool need_softstack_decl;
+
 /* Allocate a new, cleared machine_function structure.  */
 
 static struct machine_function *
@@ -689,15 +692,46 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)
 
   /* Declare a local variable for the frame.  */
   sz = get_frame_size ();
-  if (sz > 0 || cfun->machine->has_call_with_sc)
+  if (sz == 0 && cfun->machine->has_call_with_sc)
+sz = 1;
+  if (sz > 0)
 {
   int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT;
 
-  fprintf (file, "\t.reg.u%d %%frame;\n"
-  "\t.local.align %d .b8 %%farray[" HOST_WIDE_INT_PRINT_DEC"];\n",
-  BITS_PER_WORD, alignment, sz == 0 ? 1 : sz);
-  fprintf (file, "\tcvta.local.u%d %%frame, %%farray;\n",
-  BITS_PER_WORD);
+  fprintf (file, "\t.reg.u%d %%frame;\n", BITS_PER_WORD);
+  if (TARGET_SOFT_STACK)
+   {
+ /* Maintain 64-bit stack alignment.  */
+ int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+ sz = (sz + keep_align - 1) & ~(keep_align - 1);
+ int bits = BITS_PER_WORD;
+ fprintf (file, "\t.reg.u32 %%fstmp0;\n");
+ fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits);
+ fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits);
+ fprintf (file, "\tmov.u32 %%fstmp0, %%tid.y;\n");
+ fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+  bits == 64 ? ".wide" : "", bits);
+ fprintf (file, "\tmov.u%d %%fstmp2, __nvptx_stacks;\n", bits);
+ /* fstmp2 = &__nvptx_stacks[tid.y];  */
+ fprintf (file, "\tadd.u%d %%fstmp2, %%fstmp2, %%fstmp1;\n", bits);
+ fprintf (file, "\tld.shared.u%d %%fstmp1, [%%fstmp2];\n", bits);
+ fprintf (file, "\tsub.u%d %%frame, %%fstmp1, "
+  HOST_WIDE_INT_PRINT_DEC ";\n", bits, sz);
+ if (alignment > keep_align)
+   fprintf (file, "\tand.b%d %%frame, %%frame, %d;\n",
+bits, -alignment);
+ if (!crtl->is_leaf)
+   fprintf (file, "\tst.shared.u%d [%%fstmp2], %%frame;\n", bits);
+ need_so

Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-12 Thread Richard Biener
On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> What we should do to cope with this problem (structure size increasing)?
> Should we return to vector comparison version?

Ok, given this constraint I think the cleanest approach is to allow
integer(!) vector equality(!) compares with scalar result.  This should then
expand via cmp_optab and not via vec_cmp_optab.

On gimple you can then have

 if (mask_vec_1 != {0, 0,  })
...

Note that a fallback expansion (for optabs.c to try) would be
the suggested view-conversion (aka, subreg) variant using
a same-sized integer mode.

Target maintainers can then choose what is a better fit for
their target (and instruction set as register set constraints may apply).

The patch you posted seems to do this but not restrict the compares
to integer ones (please do that).

   if (TREE_CODE (op0_type) == VECTOR_TYPE
  || TREE_CODE (op1_type) == VECTOR_TYPE)
 {
-  error ("vector comparison returning a boolean");
-  debug_generic_expr (op0_type);
-  debug_generic_expr (op1_type);
-  return true;
+ /* Allow vector comparison returning boolean if operand types
+are equal and CODE is EQ/NE.  */
+ if ((code != EQ_EXPR && code != NE_EXPR)
+ || TREE_CODE (op0_type) != TREE_CODE (op1_type)
+ || TYPE_VECTOR_SUBPARTS (op0_type)
+!= TYPE_VECTOR_SUBPARTS (op1_type)
+ || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
+!= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type

These are all checked with the useless_type_conversion_p checks done earlier.

As said I'd like to see a

|| ! VECTOR_INTEGER_TYPE_P (op0_type)

check added so we and targets do not need to worry about using EQ/NE vs. CMP
and worry about signed zeros and friends.

+   {
+ error ("type mismatch for vector comparison returning a boolean");
+ debug_generic_expr (op0_type);
+ debug_generic_expr (op1_type);
+ return true;
+   }



--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt,
  enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
  bool invariant_only_p = !single_use0_p;

+ /* Can't combine vector comparison with scalar boolean type of
+the result and VEC_COND_EXPR having vector type of comparison.  */
+ if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+ && INTEGRAL_TYPE_P (type)
+ && (TREE_CODE (type) == BOOLEAN_TYPE
+ || TYPE_PRECISION (type) == 1)
+ && def_code == VEC_COND_EXPR)
+   return NULL_TREE;

this hints at larger fallout you paper over here.  So this effectively
means we're trying combining (vec1 != vec2) != 0 for example
and that fails miserably?  If so then the solution is to fix whatever
does not expect this (valid) GENERIC tree.

+  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
+return;

not sure if I like a param more than a target hook ... :/

+  /* Create vector comparison with boolean result.  */
+  vectype = TREE_TYPE (mask);
+  zero = build_zero_cst (TREE_TYPE (vectype));
+  zero = build_vector_from_val (vectype, zero);

build_zero_cst (vectype);

+  stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);

you can omit the NULL_TREE operands.

+  gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);

please omit the assert.

+  gimple_set_vdef (last, new_vdef);

do this before you create the PHI.

+ /* Put definition statement of stored value in STORE_BB
+if possible.  */
+ arg3 = gimple_call_arg (last, 3);
+ if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+   {
...

is this really necessary?  It looks incomplete to me anyway.  I'd rather have
a late sink pass if this shows necessary.  Btw,...

+it is legal.  */
+ if (gimple_bb (def_stmt) == bb
+ && is_valid_sink (def_stmt, last_store))

with the implementation of is_valid_sink this is effectively

   && (!gimple_vuse (def_stmt)
  || gimple_vuse (def_stmt) == gimple_vdef (last_store))


I still think this "pass" is quite a hack, esp. as it appears as generic
code in a GIMPLE pass.  And esp. as this hack seems to be needed
for Haswell only, not Boradwell or Skylake.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2015-11-11 12:18 GMT+03:00 Richard Biener :
>> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich  
>> wrote:
>>> 2015-11-10 17:46 GMT+03:00 Richard Biener :
 On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich  
 wrote:
> 2015-11-10 15:33 GMT+03:00 Richard Biener :
>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev  
>> wrote:
>>> Richard,
>>>
>>> I tried it but 256-bit precision integer type is not yet supported.
>>
>> What's the sympt

[gomp4] remove c++ reference restriction

2015-11-12 Thread Nathan Sidwell
I've applied this to gomp4 branch.  It removes the machinery concerning c++ 
references.  The openacc std makes no  mention of such a type, so originally we 
were not permitting the type. But,

(a) OpenMP supports them, which suggests openacc wishes to
(b) Fortran already has reference types that need supporting
(c) it's more work to not support them, by modifying the mappable_type hook.

nathan
2015-11-12  Nathan Sidwell  

	* langhooks-def.h (omp_mappable_type): Remove oacc arg.
	* langhooks.h (lhd_omp_mappable_type): Likewise.
	* langhooks.c (lhd_omp_mappable_type): Likswise.
	* gimplify.c (omp_notice_variable): Adjust omp_mappable_type call.

	c/
	* c-typeck.c (c_finish_omp_clauses): Adjust omp_mappable_type calls.
	* c-decl.c (c_decl_attributes): Likewise.

	testsuite/
	* g++.dg/goacc/reference.C: Adjust.

	cp/
	* semantics.c (finish_ommp_clauses): Adjust omp_mappable_type calls.
	* decl2.c (cp_omp_mappable_type): Remove oacc arg and processing.
	* cp-tree.h (cp_omp_mappable_type): Remove oacc arg.

oIndex: gimplify.c
===
--- gimplify.c	(revision 230177)
+++ gimplify.c	(working copy)
@@ -6106,9 +6106,7 @@ omp_notice_variable (struct gimplify_omp
 		&& lang_hooks.decls.omp_privatize_by_reference (decl))
 	  type = TREE_TYPE (type);
 	if (nflags == flags
-		&& !lang_hooks.types.omp_mappable_type (type,
-			(ctx->region_type
-			 & ORT_ACC) != 0))
+		&& !lang_hooks.types.omp_mappable_type (type))
 	  {
 		error ("%qD referenced in target region does not have "
 		   "a mappable type", decl);
Index: langhooks.c
===
--- langhooks.c	(revision 230177)
+++ langhooks.c	(working copy)
@@ -525,7 +525,7 @@ lhd_omp_firstprivatize_type_sizes (struc
 /* Return true if TYPE is an OpenMP mappable type.  */
 
 bool
-lhd_omp_mappable_type (tree type, bool oacc ATTRIBUTE_UNUSED)
+lhd_omp_mappable_type (tree type)
 {
   /* Mappable type has to be complete.  */
   if (type == error_mark_node || !COMPLETE_TYPE_P (type))
Index: langhooks.h
===
--- langhooks.h	(revision 230177)
+++ langhooks.h	(working copy)
@@ -112,7 +112,7 @@ struct lang_hooks_for_types
   void (*omp_firstprivatize_type_sizes) (struct gimplify_omp_ctx *, tree);
 
   /* Return true if TYPE is a mappable type.  */
-  bool (*omp_mappable_type) (tree type, bool oacc);
+  bool (*omp_mappable_type) (tree type);
 
   /* Return TRUE if TYPE1 and TYPE2 are identical for type hashing purposes.
  Called only after doing all language independent checks.
Index: testsuite/g++.dg/goacc/reference.C
===
--- testsuite/g++.dg/goacc/reference.C	(revision 230177)
+++ testsuite/g++.dg/goacc/reference.C	(working copy)
@@ -4,7 +4,7 @@
 int
 test1 (int &ref)
 {
-#pragma acc kernels copy (ref) // { dg-error "reference types are not supported in OpenACC" }
+#pragma acc kernels copy (ref)
   {
 ref = 10;
   }
@@ -16,12 +16,12 @@ test2 (int &ref)
   int b;
 #pragma acc kernels copyout (b)
   {
-b = ref + 10; // { dg-error "referenced in target region does not have a mappable type" }
+b = ref + 10;
   }
 
 #pragma acc parallel copyout (b)
   {
-b = ref + 10; // { dg-error "referenced in target region does not have a mappable type" }
+b = ref + 10;
   }
 
   ref = b;
@@ -33,7 +33,7 @@ main()
   int a = 0;
   int &ref_a = a;
 
-  #pragma acc parallel copy (a, ref_a) // { dg-error "reference types are not supported in OpenACC" }
+  #pragma acc parallel copy (a, ref_a)
   {
 ref_a = 5;
   }
Index: c/c-typeck.c
===
--- c/c-typeck.c	(revision 230177)
+++ c/c-typeck.c	(working copy)
@@ -12863,8 +12863,7 @@ c_finish_omp_clauses (tree clauses, bool
 	  else
 		{
 		  t = OMP_CLAUSE_DECL (c);
-		  if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t),
-			   is_oacc))
+		  if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t)))
 		{
 		  error_at (OMP_CLAUSE_LOCATION (c),
 "array section does not have mappable type "
@@ -12916,8 +12915,7 @@ c_finish_omp_clauses (tree clauses, bool
 			t, omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 		  remove = true;
 		}
-	  else if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t),
-			is_oacc))
+	  else if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t)))
 		{
 		  error_at (OMP_CLAUSE_LOCATION (c),
 			"%qE does not have a mappable type in %qs clause",
@@ -12967,8 +12965,7 @@ c_finish_omp_clauses (tree clauses, bool
 			 || (OMP_CLAUSE_MAP_KIND (c)
 			 == GOMP_MAP_FORCE_DEVICEPTR)))
 		   && t == OMP_CLAUSE_DECL (c)
-		   && !lang_hooks.types.omp_mappable_type (TREE_TYPE (t),
-			   is_oacc))
+		   && !lang_hooks.types.omp_mappable_type (TREE_TYPE (t)))
 	{
 	  error_at (OMP_CLAUSE_LOCATION (c),
 			"%qD does not have a mappab

Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 2:49 PM, Tom de Vries  wrote:
> On 12/11/15 13:26, Richard Biener wrote:
>>
>> On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries 
>> wrote:
>>>
>>> Hi,
>>>
>>> [ See also related discussion at
>>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]
>>>
>>> this patch removes the usage of first_pass_instance from pass_vrp.
>>>
>>> the patch:
>>> - limits itself to pass_vrp, but my intention is to remove all
>>>usage of first_pass_instance
>>> - lacks an update to gdbhooks.py
>>>
>>> Modifying the pass behaviour depending on the instance number, as
>>> first_pass_instance does, break compositionality of the pass list. In
>>> other
>>> words, adding a pass instance in a pass list may change the behaviour of
>>> another instance of that pass in the pass list. Which obviously makes it
>>> harder to understand and change the pass list. [ I've filed this issue as
>>> PR68247 - Remove pass_first_instance ]
>>>
>>> The solution is to make the difference in behaviour explicit in the pass
>>> list, and no longer change behaviour depending on instance number.
>>>
>>> One obvious possible fix is to create a duplicate pass with a different
>>> name, say 'pass_vrp_warn_array_bounds':
>>> ...
>>>NEXT_PASS (pass_vrp_warn_array_bounds);
>>>...
>>>NEXT_PASS (pass_vrp);
>>> ...
>>>
>>> But, AFAIU that requires us to choose a different dump-file name for each
>>> pass. And choosing vrp1 and vrp2 as new dump-file names still means that
>>> -fdump-tree-vrp no longer works (which was mentioned as drawback here:
>>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).
>>>
>>> This patch instead makes pass creation parameterizable. So in the pass
>>> list,
>>> we use:
>>> ...
>>>NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
>>>...
>>>NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
>>> ...
>>>
>>> This approach gives us clarity in the pass list, similar to using a
>>> duplicate pass 'pass_vrp_warn_array_bounds'.
>>>
>>> But it also means -fdump-tree-vrp still works as before.
>>>
>>> Good idea? Other comments?
>>
>>
>> It's good to get rid of the first_pass_instance hack.
>>
>> I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
>> we can just use NEXT_PASS with the extra argument being optional...
>
>
> I suppose I could use NEXT_PASS in the pass list, and expand into
> NEXT_PASS_WITH_ARG in pass-instances.def.
>
> An alternative would be to change the NEXT_PASS macro definitions into
> vararg variants. But the last time I submitted something with a vararg macro
> ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I got a
> question about it ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html
> ), so I tend to avoid using vararg macros.
>
>> I don't see the need for giving clone_with_args a new name, just use an
>> overload
>> of clone ()?
>
>
> That's what I tried initially, but I ran into:
> ...
> src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* opt_pass::clone()’
> was hidden [-Woverloaded-virtual]
>virtual opt_pass *clone ();
>  ^
> src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass*
> {anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
>opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp
> (m_ctxt, warn_array_bounds_p); }
> ...
>
> Googling the error message gives this discussion: (
> http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions
> ), and indeed adding
>   "using gimple_opt_pass::clone;"
> in class pass_vrp makes the warning disappear.
>
> I'll submit an updated version.

Hmm, but actually the above means the pass does not expose the
non-argument clone
which is good!

Or did you forget to add the virtual-with-arg variant to opt_pass?
That is, why's it
a virtual function in the first place?  (clone_with_arg)

> Thanks,
> - Tom
>
>
>> [ideally C++ would allow us to say that only one overload may be
>> implemented]


Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 3:04 PM, Richard Biener
 wrote:
> On Thu, Nov 12, 2015 at 2:49 PM, Tom de Vries  wrote:
>> On 12/11/15 13:26, Richard Biener wrote:
>>>
>>> On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries 
>>> wrote:

 Hi,

 [ See also related discussion at
 https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]

 this patch removes the usage of first_pass_instance from pass_vrp.

 the patch:
 - limits itself to pass_vrp, but my intention is to remove all
usage of first_pass_instance
 - lacks an update to gdbhooks.py

 Modifying the pass behaviour depending on the instance number, as
 first_pass_instance does, break compositionality of the pass list. In
 other
 words, adding a pass instance in a pass list may change the behaviour of
 another instance of that pass in the pass list. Which obviously makes it
 harder to understand and change the pass list. [ I've filed this issue as
 PR68247 - Remove pass_first_instance ]

 The solution is to make the difference in behaviour explicit in the pass
 list, and no longer change behaviour depending on instance number.

 One obvious possible fix is to create a duplicate pass with a different
 name, say 'pass_vrp_warn_array_bounds':
 ...
NEXT_PASS (pass_vrp_warn_array_bounds);
...
NEXT_PASS (pass_vrp);
 ...

 But, AFAIU that requires us to choose a different dump-file name for each
 pass. And choosing vrp1 and vrp2 as new dump-file names still means that
 -fdump-tree-vrp no longer works (which was mentioned as drawback here:
 https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).

 This patch instead makes pass creation parameterizable. So in the pass
 list,
 we use:
 ...
NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
...
NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
 ...

 This approach gives us clarity in the pass list, similar to using a
 duplicate pass 'pass_vrp_warn_array_bounds'.

 But it also means -fdump-tree-vrp still works as before.

 Good idea? Other comments?
>>>
>>>
>>> It's good to get rid of the first_pass_instance hack.
>>>
>>> I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
>>> we can just use NEXT_PASS with the extra argument being optional...
>>
>>
>> I suppose I could use NEXT_PASS in the pass list, and expand into
>> NEXT_PASS_WITH_ARG in pass-instances.def.
>>
>> An alternative would be to change the NEXT_PASS macro definitions into
>> vararg variants. But the last time I submitted something with a vararg macro
>> ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I got a
>> question about it ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html
>> ), so I tend to avoid using vararg macros.
>>
>>> I don't see the need for giving clone_with_args a new name, just use an
>>> overload
>>> of clone ()?
>>
>>
>> That's what I tried initially, but I ran into:
>> ...
>> src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* opt_pass::clone()’
>> was hidden [-Woverloaded-virtual]
>>virtual opt_pass *clone ();
>>  ^
>> src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass*
>> {anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
>>opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp
>> (m_ctxt, warn_array_bounds_p); }
>> ...
>>
>> Googling the error message gives this discussion: (
>> http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions
>> ), and indeed adding
>>   "using gimple_opt_pass::clone;"
>> in class pass_vrp makes the warning disappear.
>>
>> I'll submit an updated version.
>
> Hmm, but actually the above means the pass does not expose the
> non-argument clone
> which is good!
>
> Or did you forget to add the virtual-with-arg variant to opt_pass?
> That is, why's it
> a virtual function in the first place?  (clone_with_arg)

That said,

--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -83,6 +83,7 @@ public:

  The default implementation prints an error message and aborts.  */
   virtual opt_pass *clone ();
+  virtual opt_pass *clone_with_arg (bool);


means the arg type is fixed at 'bool' (yeah, mimicing
first_pass_instance).  That
looks a bit limiting to me, but anyway.

Richard.

>> Thanks,
>> - Tom
>>
>>
>>> [ideally C++ would allow us to say that only one overload may be
>>> implemented]


Re: [hsa 2/12] Modifications to libgomp proper

2015-11-12 Thread Nathan Sidwell

On 11/12/15 08:21, Thomas Schwinge wrote:

Hi!




so there is a way to deal with it, but you need to adjust all plugins.


I'm confused -- didn't we agree that we don't need to maintain backwards
compatibility in the libgomp <-> plugins interface?  (Nathan?)


Indeed, no need to deal with version skew between libgomp and its plugins.

On 07/24/15 12:30, Jakub Jelinek wrote:

And I'd say that we don't really need to maintain support for mixing libgomp
from one GCC version and libgomp plugins from another version, worst case
there should be some GOMP_OFFLOAD_get_version function that libgomp could
use to verify it is talking to the right version of the plugin and
completely ignore it if it gives wrong version.




(We do need, and have, versioning between GCC proper and libgomp
interfaces.)


Yes. (For avoidance of doubt)

nathan


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Gerald Pfeifer

On Wed, 11 Nov 2015, Jonathan Wakely wrote:

Fixed by this patch.


Thanks, Jonathan!  Unfortunately bootstrap is still broken
(on i386-unknown-freebsd11.0 at least):

In file included from 
/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/thread.cc:27:0:

/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread: In function ‘void std::this_thread::sleep_for(const 
std::chrono::duration<_Rep1, _Period1>&)’:
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:44: error: ‘errno’ was not declared in this scope
 while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
   ^
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:53: error: ‘EINTR’ was not declared in this scope
 while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)

Geraldcommit 97c2da9d4cc11bd5dae077ccb5fda4e72f7c34d5
Author: Jonathan Wakely 
Date:   Wed Nov 11 17:27:23 2015 +

	* libsupc++/new_handler.cc: Fix for explicit constructor change.

diff --git a/libstdc++-v3/libsupc++/new_handler.cc b/libstdc++-v3/libsupc++/new_handler.cc
index a09012c..4da48b3 100644
--- a/libstdc++-v3/libsupc++/new_handler.cc
+++ b/libstdc++-v3/libsupc++/new_handler.cc
@@ -34,7 +34,7 @@ namespace
 }
 #endif
 
-const std::nothrow_t std::nothrow = { };
+const std::nothrow_t std::nothrow = std::nothrow_t{ };
 
 using std::new_handler;
 namespace


Re: [Ada] More efficient code generated for object overlays

2015-11-12 Thread Duncan Sands

Hi Arnaud,

On 12/11/15 12:06, Arnaud Charlet wrote:

This change refines the use of the "volatile hammer" to implement the advice
given in RM 13.3(19) by disabling it for object overlays altogether. relying
instead on the ref-all aliasing property of reference types to achieve the
desired effect.

This will generate better code for object overlays, for example the following
function should now make no memory accesses at all on 64-bit platforms when
compiled at -O2 or above:


this is great!  When doing tricks to improve performance I've several times 
resorted to address overlays, forgetting about the "volatile hammer", only to 
rediscover it for the N'th time due to the poor performance and the horrible 
code generated.


Best wishes, Duncan.


Re: [PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-12 Thread Tom de Vries

On 11/11/15 12:03, Richard Biener wrote:

On Mon, 9 Nov 2015, Tom de Vries wrote:


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

   1Insert new exit block only when needed in
  transform_to_exit_first_loop_alt
   2Make create_parallel_loop return void
   3Ignore reduction clause on kernels directive
   4Implement -foffload-alias
   5Add in_oacc_kernels_region in struct loop
   6Add pass_oacc_kernels
   7Add pass_dominator_oacc_kernels
   8Add pass_ch_oacc_kernels
   9Add pass_parallelize_loops_oacc_kernels
  10Add pass_oacc_kernels pass group in passes.def
  11Update testcases after adding kernels pass group
  12Handle acc loop directive
  13Add c-c++-common/goacc/kernels-*.c
  14Add gfortran.dg/goacc/kernels-*.f95
  15Add libgomp.oacc-c-c++-common/kernels-*.c
  16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch updates existing testcases with new pass numbers, given the passes
that were added in the pass list in patch 10.


I think it would be nice to be able to specify the number in the .def
file instead so we can avoid this kind of churn everytime we do this.


How about something along the lines of:
...
  /* pass_build_ealias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
  /* Pass group that runs when there are oacc kernels in the
  function.  */
  NEXT_PASS (pass_oacc_kernels);
  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
  PUSH_ID ("oacc_kernels")
...
  POP_ID ()
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_fre);
...

where the PUSH_ID/POP_ID pair has the functionality that all the 
contained passes:

- have the id prefixed to the dump file, so the dump file of pass_ch
  which normally is "ch" becomes "oacc_kernels_ch", and
- the pass name in pass_instances.def becomes pass_oacc_kernels_ch, such
  that it doesn't count as numbered instance of pass_ch
?

Thanks,
- Tom


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Ville Voutilainen
On 12 November 2015 at 16:23, Gerald Pfeifer  wrote:
> On Wed, 11 Nov 2015, Jonathan Wakely wrote:
>>
>> Fixed by this patch.
>
>
> Thanks, Jonathan!  Unfortunately bootstrap is still broken
> (on i386-unknown-freebsd11.0 at least):
>
> In file included from
> /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/thread.cc:27:0:
> /scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
> thread: In function ‘void std::this_thread::sleep_for(const
> std::chrono::duration<_Rep1, _Period1>&)’:
> /scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
> thread:300:44: error: ‘errno’ was not declared in this scope
>  while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
>^
> /scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
> thread:300:53: error: ‘EINTR’ was not declared in this scope
>  while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)


Note that that's a separate problem that has nothing to do with the
tag-type-explicit-default-ctor
patch.


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Jonathan Wakely

On 12/11/15 15:23 +0100, Gerald Pfeifer wrote:

On Wed, 11 Nov 2015, Jonathan Wakely wrote:

Fixed by this patch.


Thanks, Jonathan!  Unfortunately bootstrap is still broken
(on i386-unknown-freebsd11.0 at least):


Different issue.

In file included from 
/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/thread.cc:27:0:

/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread: In function ‘void std::this_thread::sleep_for(const 
std::chrono::duration<_Rep1, _Period1>&)’:
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:44: error: ‘errno’ was not declared in this scope
while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
  ^
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:53: error: ‘EINTR’ was not declared in this scope
while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)


Does adding #include  to libstdc++-v3/include/std/thread
solve it?



Re: [PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Bernd Schmidt

I'm proposing the following patch as a step towards resolving the issue with
inaccessibility of stack storage (.local memory) in PTX to other threads than
the one using that stack.  The idea is to have preallocated stacks, and have
__nvptx_stacks[] array in shared memory hold current stack pointers.  Each
thread is maintaining __nvptx_stacks[tid.y] as its stack pointer, thus for
OpenMP the intent is to preallocate on a per-warp basis (not per-thread).
For OpenMP SIMD regions we'll have to ensure that conflicting accesses are not
introduced.


This is of course really ugly; I'd propose we keep it on an nvptx-OpenMP 
specific branch for now until we know that this is really going somewhere.



I've run it through make -k check-c regtesting.  These are new fails, all
mysterious:


These would have to be investigated first.


+ sz = (sz + keep_align - 1) & ~(keep_align - 1);


Use the ROUND_UP macro.


+ fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+  bits == 64 ? ".wide" : "", bits);


Use a shift.


+
+  if (need_softstack_decl)
+{
+  fprintf (asm_out_file, ".extern .shared .u64 __nvptx_stacks[];\n;");
+}


Lose excess braces.


+.global .u64 %__softstack[16384];


Maybe declarea as .u8 so you don't have two different constants for the 
stack size?



+.reg .u64 %stackptr;
+mov.u64%stackptr, %__softstack;
+cvta.global.u64%stackptr, %stackptr;
+add.u64%stackptr, %stackptr, 131072;
+st.shared.u64  [__nvptx_stacks], %stackptr;
+


I'm guessing you have other missing pieces for setting this up for 
multiple threads.



Bernd



Re: [PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 3:31 PM, Tom de Vries  wrote:
> On 11/11/15 12:03, Richard Biener wrote:
>>
>> On Mon, 9 Nov 2015, Tom de Vries wrote:
>>
>>> On 09/11/15 16:35, Tom de Vries wrote:

 Hi,

 this patch series for stage1 trunk adds support to:
 - parallelize oacc kernels regions using parloops, and
 - map the loops onto the oacc gang dimension.

 The patch series contains these patches:

1Insert new exit block only when needed in
   transform_to_exit_first_loop_alt
2Make create_parallel_loop return void
3Ignore reduction clause on kernels directive
4Implement -foffload-alias
5Add in_oacc_kernels_region in struct loop
6Add pass_oacc_kernels
7Add pass_dominator_oacc_kernels
8Add pass_ch_oacc_kernels
9Add pass_parallelize_loops_oacc_kernels
   10Add pass_oacc_kernels pass group in passes.def
   11Update testcases after adding kernels pass group
   12Handle acc loop directive
   13Add c-c++-common/goacc/kernels-*.c
   14Add gfortran.dg/goacc/kernels-*.f95
   15Add libgomp.oacc-c-c++-common/kernels-*.c
   16Add libgomp.oacc-fortran/kernels-*.f95

 The first 9 patches are more or less independent, but patches 10-16 are
 intended to be committed at the same time.

 Bootstrapped and reg-tested on x86_64.

 Build and reg-tested with nvidia accelerator, in combination with a
 patch that enables accelerator testing (which is submitted at
 https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

 I'll post the individual patches in reply to this message.
>>>
>>>
>>> This patch updates existing testcases with new pass numbers, given the
>>> passes
>>> that were added in the pass list in patch 10.
>>
>>
>> I think it would be nice to be able to specify the number in the .def
>> file instead so we can avoid this kind of churn everytime we do this.
>
>
> How about something along the lines of:
> ...
>   /* pass_build_ealias is a dummy pass that ensures that we
>  execute TODO_rebuild_alias at this point.  */
>   NEXT_PASS (pass_build_ealias);
>   /* Pass group that runs when there are oacc kernels in the
>   function.  */
>   NEXT_PASS (pass_oacc_kernels);
>   PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
>   PUSH_ID ("oacc_kernels")
> ...
>   POP_ID ()
>   POP_INSERT_PASSES ()
>   NEXT_PASS (pass_fre);
> ...
>
> where the PUSH_ID/POP_ID pair has the functionality that all the contained
> passes:
> - have the id prefixed to the dump file, so the dump file of pass_ch
>   which normally is "ch" becomes "oacc_kernels_ch", and
> - the pass name in pass_instances.def becomes pass_oacc_kernels_ch, such
>   that it doesn't count as numbered instance of pass_ch
> ?

Hmm.  I'd like to have sth that allows me to add "slp" to both
pass_slp_vectorize
instances, having them share the suffix (as no two functions are in both dumps).

We similarly have "duplicates" across the -Og vs. the -O[0-3] pipeline.

Basically make all dump file name suffixes manually specified which means moving
them from the class definition to the actual instance.

Well, just an idea.  In a distant future I like our pass pipeline to become more
dynamic, getting away from a static passes.def towards, say, a pass "script"
(to be able to say "if inlining did nothing skip this group" or similar).

Richard.


> Thanks,
> - Tom


Re: [PATCH 1/4][AArch64] Add scheduling and cost models for Exynos M1

2015-11-12 Thread James Greenhalgh
On Thu, Nov 05, 2015 at 11:31:33AM -0600, Evandro Menezes wrote:
> James,
> 
> Since other members of the "tune_params" structure were signed
> integers, even though negative numbers would make no sense for most
> either, I followed the same pattern.
> 
> Regardless, here's a patch with unsigned integers as you requested:
> 
>[AArch64] Add extra tuning parameters for target processors
> 
>2015-11-05  Evandro Menezes  
> 
>gcc/
> 
>* config/aarch64/aarch64-protos.h (tune_params): Add new members
>"max_case_values" and "cache_line_size".
>* config/aarch64/aarch64.c (aarch64_case_values_threshold): New
>function.
>(aarch64_override_options_internal): Tune heuristics based on new
>members in "tune_params".
>(TARGET_CASE_VALUES_THRESHOLD): Define macro.
> 
> Please, commit if it's alright.

Hi Evandro,

This is OK with a few nits.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 81792bc..ecf4685 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -195,6 +195,9 @@ struct tune_params
>int vec_reassoc_width;
>int min_div_recip_mul_sf;
>int min_div_recip_mul_df;
> +  unsigned int max_case_values; /* Case values threshold; or 0 for the 
> default.  */
> +
> +  unsigned int cache_line_size; /* Cache line size; or 0 for the default.  */
>  
>  /* An enum specifying how to take into account CPU autoprefetch capabilities
> during instruction scheduling:

I'd put the comments above the field, and make them slightly more
descriptive:

+  /* Value for aarch64_case_values_threshold; or 0 for the default.  */
+  unsigned int max_case_values;
+  /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
+  unsigned int cache_line_size;

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..e7f1c07 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13385,6 +13418,7 @@ aarch64_promoted_type (const_tree t)
>  return float_type_node;
>return NULL_TREE;
>  }
> +
>  #undef TARGET_ADDRESS_COST
>  #define TARGET_ADDRESS_COST aarch64_address_cost
>  

Drop this hunk.

I've applied the patch with those changes as revision 230261 on your behalf.

Thanks,
James



Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-11-12 Thread Jonathan Wakely

On 12/11/15 13:39 +, Jonathan Wakely wrote:

One downside of this change is that we introduce some (hopefully safe)
ODR violations, where inline functions and templates that depend on
_GLIBCXX_USE_C99_FOO might now be defined differently in C++98 and
C++11 code. Previously they had the same definition, even though in
C++11 mode the value of the _GLIBCXX_USE_C99_FOO macro might have been
sub-optimal (i.e. the C99 features were usable, but the macro said
they weren't). Those ODR violatiosn could be avoided if needed, by
always using the _GLIBCXX98_USE_C99_FOO macro in code that can be
included from either C++98 or C++11. We could still use the
_GLIBCXX11_USE_C99_FOO macro in pure C++11 code (such as the numeric
conversion functions) and get most of the benefit of this change.


This patch (relative to the previous one) would avoid the ODR
problems, by only using the C++98 macro in code that gets used in
C++98 and later, and using the _GLIBCXX11_XXX ones in code that is
never compiled as C++98 (specifically, the numeric conversion
functions).

Maybe this is a safer, more conservative change.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index deefa04..11dee8e 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -960,7 +960,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 ])
 AC_MSG_RESULT($glibcxx_cv_c99_math_cxx98)
 if test x"$glibcxx_cv_c99_math_cxx98" = x"yes"; then
-  AC_DEFINE(_GLIBCXX98_USE_C99_MATH, 1,
+  AC_DEFINE(_GLIBCXX_USE_C99_MATH, 1,
 [Define if C99 functions or macros in  should be imported
 in  in namespace std for C++98.])
 fi
@@ -1029,7 +1029,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 fi
 AC_MSG_RESULT($glibcxx_cv_c99_complex_cxx98)
 if test x"$glibcxx_cv_c99_complex_cxx98" = x"yes"; then
-  AC_DEFINE(_GLIBCXX98_USE_C99_COMPLEX, 1,
+  AC_DEFINE(_GLIBCXX_USE_C99_COMPLEX, 1,
 [Define if C99 functions in  should be used in
  for C++98. Using compiler builtins for these functions
 requires corresponding C99 library functions to be present.])
@@ -1054,7 +1054,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 ])
 AC_MSG_RESULT($glibcxx_cv_c99_stdio_cxx98)
 if test x"$glibcxx_cv_c99_stdio_cxx98" = x"yes"; then
-  AC_DEFINE(_GLIBCXX98_USE_C99_STDIO, 1,
+  AC_DEFINE(_GLIBCXX_USE_C99_STDIO, 1,
 [Define if C99 functions or macros in  should be imported
 in  in namespace std for C++98.])
 fi
@@ -1100,7 +1100,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 
   AC_MSG_RESULT($glibcxx_cv_c99_wchar_cxx98)
   if test x"$glibcxx_cv_c99_wchar_cxx98" = x"yes"; then
-AC_DEFINE(_GLIBCXX98_USE_C99_WCHAR, 1,
+AC_DEFINE(_GLIBCXX_USE_C99_WCHAR, 1,
   [Define if C99 functions or macros in  should be imported
   in  in namespace std for C++98.])
   fi
diff --git a/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h b/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
index 055c5b6..b21726f 100644
--- a/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
+++ b/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
@@ -37,5 +37,8 @@
 #define _GLIBCXX_USE_C99_DYNAMIC (!(__ISO_C_VISIBLE >= 1999))
 #define _GLIBCXX_USE_C99_LONG_LONG_CHECK 1
 #define _GLIBCXX_USE_C99_LONG_LONG_DYNAMIC (_GLIBCXX_USE_C99_DYNAMIC || !defined __LONG_LONG_SUPPORTED)
+#define _GLIBCXX11_USE_C99_STDIO 1
+#define _GLIBCXX11_USE_C99_STDLIB 1
+#define _GLIBCXX11_USE_C99_WCHAR 1
 
 #endif
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index b3853cd..2fa345a 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -5396,7 +5396,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
-#if _GLIBCXX_USE_C99_STDLIB
+#if _GLIBCXX11_USE_C99_STDLIB
   // 21.4 Numeric Conversions [string.conversions].
   inline int
   stoi(const string& __str, size_t* __idx = 0, int __base = 10)
@@ -5435,9 +5435,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   inline long double
   stold(const string& __str, size_t* __idx = 0)
   { return __gnu_cxx::__stoa(&std::strtold, "stold", __str.c_str(), __idx); }
-#endif // _GLIBCXX_USE_C99_STDLIB
+#endif // _GLIBCXX11_USE_C99_STDLIB
 
-#if _GLIBCXX_USE_C99_STDIO
+#if _GLIBCXX11_USE_C99_STDIO
   // NB: (v)snprintf vs sprintf.
 
   // DR 1261.
@@ -5501,9 +5501,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 return __gnu_cxx::__to_xstring(&std::vsnprintf, __n,
 	   "%Lf", __val);
   }
-#endif // _GLIBCXX_USE_C99_STDIO
+#endif // _GLIBCXX11_USE_C99_STDIO
 
-#if defined(_GLIBCXX_USE_WCHAR_T) && defined(_GLIBCXX_USE_C99_WCHAR)
+#if defined(_GLIBCXX_USE_WCHAR_T) && defined(_GLIBCXX11_USE_C99_WCHAR)
   inline int 
   stoi(const wstring& __str, size_t* __idx = 0, int __base = 10)
   { return __gnu_cxx::__stoa(&std::wcstol, "stoi", __str.c_str(),
@@ -5605,7 +5605,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	L"%Lf", __val);
   }
 #endi

[PATCH] Fix possible correctness issue in BB dependence analysis

2015-11-12 Thread Richard Biener

This fixes BB vectorization dependence analysis to not rely on
all instances being vectorized.  The dependence check

-   gimple *earlier_stmt = get_earlier_stmt (DR_STMT (dra), DR_STMT 
(drb));
-   if (DR_IS_READ (STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt
- {
-   /* That only holds for load-store pairs taking part in 
vectorization.  */
-   if (STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (dra)))
- && STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (drb
-   return false;

effectively does that and we remove instances later.  I wasn't able
to build a testcase showing this defect as we always fail vectorization
for some other reason.

Still the following fixes this and implements this special case
instance-local only.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-12  Richard Biener  

* tree-vectorizer.h (vect_slp_analyze_data_ref_dependences):
Rename to vect_slp_analyze_instance_dependence.
* tree-vect-data-refs.c (vect_slp_analyze_data_ref_dependence):
Remove WAR special-case.
(vect_slp_analyze_node_dependences): Instead add more specific
code here, not relying on other instances being vectorized.
(vect_slp_analyze_instance_dependence): Adjust accordingly.
* tree-vect-slp.c (vect_build_slp_tree_1): Remove excessive
vertical space in dump files.
(vect_print_slp_tree): Likewise.
(vect_analyze_slp_instance): Dump a header for the final SLP tree.
(vect_slp_analyze_bb_1): Delay computing relevant stmts and
not vectorized stmts until after dependence analysis removed
instances.  Merge alignment and dependence checks.
* tree-vectorizer.c (pass_slp_vectorize::execute): Clear visited
flag on all stmts.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 230216)
--- gcc/tree-vectorizer.h   (working copy)
*** extern enum dr_alignment_support vect_su
*** 1009,1015 
  extern tree vect_get_smallest_scalar_type (gimple *, HOST_WIDE_INT *,
 HOST_WIDE_INT *);
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
! extern bool vect_slp_analyze_data_ref_dependences (bb_vec_info);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
  extern bool vect_analyze_data_refs_alignment (loop_vec_info);
  extern bool vect_verify_datarefs_alignment (loop_vec_info);
--- 1009,1015 
  extern tree vect_get_smallest_scalar_type (gimple *, HOST_WIDE_INT *,
 HOST_WIDE_INT *);
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
! extern bool vect_slp_analyze_instance_dependence (slp_instance);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
  extern bool vect_analyze_data_refs_alignment (loop_vec_info);
  extern bool vect_verify_datarefs_alignment (loop_vec_info);
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 230216)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_slp_analyze_data_ref_dependence (st
*** 537,568 
dump_printf (MSG_NOTE,  "\n");
  }
  
-   /* We do not vectorize basic blocks with write-write dependencies.  */
-   if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb))
- return true;
- 
-   /* If we have a read-write dependence check that the load is before the 
store.
-  When we vectorize basic blocks, vector load can be only before
-  corresponding scalar load, and vector store can be only after its
-  corresponding scalar store.  So the order of the acceses is preserved in
-  case the load is before the store.  */
-   gimple *earlier_stmt = get_earlier_stmt (DR_STMT (dra), DR_STMT (drb));
-   if (DR_IS_READ (STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt
- {
-   /* That only holds for load-store pairs taking part in vectorization.  
*/
-   if (STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (dra)))
- && STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (drb
-   return false;
- }
- 
return true;
  }
  
  
! /* Analyze dependences involved in the transform of SLP NODE.  */
  
  static bool
! vect_slp_analyze_node_dependences (slp_instance instance, slp_tree node)
  {
/* This walks over all stmts involved in the SLP load/store done
   in NODE verifying we can sink them up to the last stmt in the
--- 559,575 
dump_printf (MSG_NOTE,  "\n");
  }
  
return true;
  }
  
  
! /* Analyze dependences involved in the transform of SLP NODE.  STORES
!contain the vector of scalar stores of this instance if we are
!disambiguating the loads.  */
  
  static bool
! vect_slp_analyze_node_dependences (slp_instance instance, slp_tree node,
!   

Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-12 Thread James Greenhalgh
On Tue, Nov 10, 2015 at 11:50:12AM -0600, Evandro Menezes wrote:
>2015-11-10  Evandro Menezes 
> 
>gcc/
> 
>* config/aarch64/aarch64.md (predicated): Copy attribute from
>"arm.md".
> 
> This patch duplicates an attribute from arm.md so that the same
> pipeline model can be used for both AArch32 and AArch64.
> 
> Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.
> 
> Please, commit if it's alright.
> 
> -- 
> Evandro Menezes
> 
> 

> From 3b643a3c026350864713e1700dc44e4794d93809 Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Mon, 9 Nov 2015 17:11:16 -0600
> Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
>  pipeline models
> 
> gcc/
>   * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
> ---
>  gcc/config/aarch64/aarch64.md | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 6b08850..2bc2ff5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -195,6 +195,11 @@
>  ;; 1 :=: yes
>  (define_attr "far_branch" "" (const_int 0))
>  
> +;; [For compatibility with ARM in pipeline models]
> +;; Attribute that specifies whether or not the instruction is executed
> +;; conditionally ( != "AL"? "yes": "no").

I'm not sure this  != "AL" [...] part makes sense to me (thinking only
of AArch64, I'd understand it on AArch32 :) ) and we should document that
this is never set for AArch64. Could you respin with a slightly clearer
comment.

Thanks,
James



Re: [PATCH] New version of libmpx with new memmove wrapper

2015-11-12 Thread Ilya Enkovich
2015-11-05 13:37 GMT+03:00 Aleksandra Tsvetkova :
> New version of libmpx was added. There is a new function get_bd() that
> allows to get bounds directory. Wrapper for memmove was modified. Now
> it moves data and then moves corresponding bounds directly from one
> bounds table to another. This approach made moving unaligned pointers
> possible. It also makes memmove function faster on sizes bigger than
> 64 bytes.

+2015-10-27  Tsvetkova Alexandra  
+
+ * gcc.target/i386/mpx/memmove.c: New test for __mpx_wrapper_memmove.
+

Did you test it on different targets? It seems to me this test will
fail if you run it
on non-MPX target.  Please look at mpx-check.h and how other MPX run
tests use it.

+ * mpxrt/mpxrt.c (NUM_L1_BITS): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (REG_IP_IDX): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (REX_PREFIX): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (XSAVE_OFFSET_IN_FPMEM): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (MPX_L1_SIZE): Moved to mpxrt.h.

No need to repeat file name.

+ * libmpxwrap/mpx_wrappers.c: Rewrite __mpx_wrapper_memmove to make it faster.

You added new functions, types and modified existing function.  Make
ChangeLog more detailed.

--- /dev/null
+++ b/libmpx/mpxrt/mpxrt.h
@@ -0,0 +1,75 @@
+/* mpxrt.h  -*-C++-*-
+ *
+ *
+ *
+ *  @copyright
+ *  Copyright (C) 2014, 2015, Intel Corporation
+ *  All rights reserved.

2015 only

+const uintptr_t MPX_L1_ADDR_MASK = 0xf000UL;
+const uintptr_t MPX_L2_ADDR_MASK = 0xfffcUL;
+const uintptr_t MPX_L2_VALID_MASK = 0x0001UL;

Use defines


--- a/libmpx/mpxwrap/Makefile.am
+++ b/libmpx/mpxwrap/Makefile.am
@@ -1,4 +1,5 @@
 ALCLOCAL_AMFLAGS = -I .. -I ../config
+AM_CPPFLAGS = -I $(top_srcdir)

This is not reflected in ChangeLog

+/* The mpx_bt_entry struct represents a cell in bounds table.
+   *lb is the lower bound, *ub is the upper bound,
+   *p is the stored pointer.  */

Bounds and pointer are in lb, ub, p, not in *lb, *ub, *p. Right?

+static inline void
+alloc_bt (void *ptr)
+{
+  __asm__ __volatile__ ("bndstx %%bnd0, (%0,%0)"::"r" (ptr):"%bnd0");
+}

This should be marked as bnd_legacy.

+/* move_bounds function copies N bytes from SRC to DST.

Really?

+   It also copies bounds for all pointers inside.
+   There are 3 parts of the algorithm:
+   1) We copy everything till the end of the first bounds table SRC)

SRC is not a bounds table

+   2) In loop we copy whole bound tables till the second-last one
+   3) Data in the last bounds table is copied separately, after the loop.
+   If one of bound tables in SRC doesn't exist,
+   we skip it because there are no pointers.
+   Depending on the arrangement of SRC and DST we copy from the beginning
+   or from the end.  */
+__attribute__ ((bnd_legacy)) static void *
+move_bounds (void *dst, const void *src, size_t n)

What is returned value for?

+void *
+__mpx_wrapper_memmove (void *dst, const void *src, size_t n)
+{
+  if (n == 0)
+return dst;
+
+  __bnd_chk_ptr_bounds (dst, n);
+  __bnd_chk_ptr_bounds (src, n);
+
+  memmove (dst, src, n);
+  move_bounds (dst, src, n);
+  return dst;
 }

You completely remove old algorithm which should be faster on small
sizes. __mpx_wrapper_memmove should become a dispatcher between old
and new implementations depending on target (32-bit or 64-bit) and N.
Since old version performs both data and bounds copy, BD check should
be moved into __mpx_wrapper_memmove to never call
it when MPX is disabled.

Thanks,
Ilya


Re: [PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Alexander Monakov
On Thu, 12 Nov 2015, Bernd Schmidt wrote:
> > I've run it through make -k check-c regtesting.  These are new fails, all
> > mysterious:
> 
> These would have to be investigated first.

Any specific suggestions?  The PTX code emitted from GCC differs only in
prologue/epilogue, so whatever's broken... I think is unlikely due to this
change.  I can give it another try after upgrading CUDA driver and cuda-gdb
from 7.0 to latest.

> > + sz = (sz + keep_align - 1) & ~(keep_align - 1);
> 
> Use the ROUND_UP macro.

OK, thanks.
 
> > + fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
> > +  bits == 64 ? ".wide" : "", bits);
> 
> Use a shift.

I think mul is acceptable here: PTX JIT is handling it properly, according to
what I saw while investigating in cuda-gdb.  If I used a shift, I'd also have
to introduce another instruction for a widening integer conversion in the
64-bit case.  Do you insist?

> > +
> > +  if (need_softstack_decl)
> > +{
> > +  fprintf (asm_out_file, ".extern .shared .u64 __nvptx_stacks[];\n;");
> > +}
> 
> Lose excess braces.

OK.
 
> > +.global .u64 %__softstack[16384];
> 
> Maybe declarea as .u8 so you don't have two different constants for the stack
> size?

OK, with ".align 8" to ensure 64-bit alignment.
 
> > +.reg .u64 %stackptr;
> > +mov.u64%stackptr, %__softstack;
> > +cvta.global.u64%stackptr, %stackptr;
> > +add.u64%stackptr, %stackptr, 131072;
> > +st.shared.u64  [__nvptx_stacks], %stackptr;
> > +
> 
> I'm guessing you have other missing pieces for setting this up for multiple
> threads.

This is crt0.s, which is linked in only for single-threaded testing with
-mmainkernel; for OpenMP, the intention is to handle it in the file that
implements libgomp_nvptx_main.

Thanks.
Alexander


Re: [PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Bernd Schmidt

On 11/12/2015 03:59 PM, Alexander Monakov wrote:

On Thu, 12 Nov 2015, Bernd Schmidt wrote:

I've run it through make -k check-c regtesting.  These are new fails, all
mysterious:


These would have to be investigated first.


Any specific suggestions?  The PTX code emitted from GCC differs only in
prologue/epilogue, so whatever's broken... I think is unlikely due to this
change.  I can give it another try after upgrading CUDA driver and cuda-gdb
from 7.0 to latest.


Yeah, load it into cuda-gdb, that may help show what's happening.


+ fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+  bits == 64 ? ".wide" : "", bits);


Use a shift.


I think mul is acceptable here: PTX JIT is handling it properly, according to
what I saw while investigating in cuda-gdb.  If I used a shift, I'd also have
to introduce another instruction for a widening integer conversion in the
64-bit case.  Do you insist?


Nah, it's fine.


This is crt0.s, which is linked in only for single-threaded testing with
-mmainkernel; for OpenMP, the intention is to handle it in the file that
implements libgomp_nvptx_main.


Yeah, that's what I meant. It might be nice to see that too if it 
already exists.



Bernd



[PATCH][GCC] Make stackalign test LTO proof

2015-11-12 Thread Andre Vieira

Hi,

  This patch changes this testcase to make sure LTO will not optimize 
away the assignment of the local array to a global variable which was 
introduced to make sure stack space was made available for the test to work.


  This is correct because LTO is supposed to optimize this global away 
as at link time it knows this global will never be read. By adding a 
read of the global, LTO will no longer optimize it away.


  Tested by running regressions for this testcase for various ARM targets.

  Is this OK to commit?

  Thanks,
  Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

* gcc.dg/torture/stackalign/builtin-return-1.c: Added read
  to global such that a write is not optimized away by LTO.
From 6fbac447475c3b669baee84aa9d6291c3d09f1ab Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 6 Nov 2015 13:13:47 +
Subject: [PATCH] keep the stack testsuite fix

---
 gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
index af017532aeb3878ef7ad717a2743661a87a56b7d..1ccd109256de72419a3c71c2c1be6d07c423c005 100644
--- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
@@ -39,5 +39,10 @@ int main(void)
   if (bar(1) != 2)
 abort();
 
+  /* Make sure there is a read of the global after the function call to bar
+   * such that LTO does not optimize away the assignment above.  */
+  if (g != dummy)
+abort();
+
   return 0;
 }
-- 
1.9.1



[PATCH][GCC][ARM] testcase memset-inline-10.c uses -mfloat-abi=hard but does not check whether target supports it

2015-11-12 Thread Andre Vieira

Hi,

  This patch changes the memset-inline-10.c testcase to make sure that 
it is only compiled for ARM targets that support -mfloat-abi=hard using 
the fact that all non-thumb1 targets do.


  This is correct because all targets for which -mthumb causes the 
compiler to use thumb2 will support the generation of FP instructions.


  Tested by running regressions for this testcase for various ARM targets.

  Is this OK to commit?

  Thanks,
  Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

* gcc.target/arm/memset-inline-10.c: Added
dg-require-effective-target arm_thumb2_ok.



Re: [PATCH][GCC][ARM] testcase memset-inline-10.c uses -mfloat-abi=hard but does not check whether target supports it

2015-11-12 Thread Andre Vieira

On 12/11/15 15:08, Andre Vieira wrote:

Hi,

   This patch changes the memset-inline-10.c testcase to make sure that
it is only compiled for ARM targets that support -mfloat-abi=hard using
the fact that all non-thumb1 targets do.

   This is correct because all targets for which -mthumb causes the
compiler to use thumb2 will support the generation of FP instructions.

   Tested by running regressions for this testcase for various ARM targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

 * gcc.target/arm/memset-inline-10.c: Added
 dg-require-effective-target arm_thumb2_ok.


Now with attachment, sorry about that.

Cheers,
Andre
From f6515d9cceacf99d213aea1236b7027c7839ab4b Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 6 Nov 2015 14:48:27 +
Subject: [PATCH] added check for thumb2_ok

---
 gcc/testsuite/gcc.target/arm/memset-inline-10.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/memset-inline-10.c b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
index c1087c8e693fb723ca9396108f5fe872ede167e9..ce51c1d9eeb800cf67790fe06817ae23215399e9 100644
--- a/gcc/testsuite/gcc.target/arm/memset-inline-10.c
+++ b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-march=armv7-a -mfloat-abi=hard -mfpu=neon -O2" } */
 /* { dg-skip-if "need SIMD instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
 /* { dg-skip-if "need SIMD instructions" { *-*-* } { "-mfpu=vfp*" } { "" } } */
-- 
1.9.1



[PATCH, doc] Document some standard pattern names

2015-11-12 Thread Ilya Enkovich
Hi,

This patch adds description for several standard pattern names.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-12  Ilya Enkovich  

* doc/md.texi (vec_cmp@var{m}@var{n}): New item.
(vec_cmpu@var{m}@var{n}): New item.
(vcond@var{m}@var{n}): Specify comparison is signed.
(vcondu@var{m}@var{n}): New item.
(vcond_mask_@var{m}@var{n}): New item.
(maskload@var{m}@var{n}): New item.
(maskstore@var{m}@var{n}): New item.


diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 71a2791..7fdc935 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4749,17 +4749,51 @@ specify field index and operand 0 place to store value 
into.
 Initialize the vector to given values.  Operand 0 is the vector to initialize
 and operand 1 is parallel containing values for individual fields.
 
+@cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
+@item @samp{vec_cmp@var{m}@var{n}}
+Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
+predicate in operand 1 which is a signed vector comparison with operands of
+mode @var{m} in operands 2 and 3.  Predicate is computed by element-wise
+evaluation of the vector comparison with a truth value of all-ones and a false
+value of all-zeros.
+
+@cindex @code{vec_cmpu@var{m}@var{n}} instruction pattern
+@item @samp{vec_cmpu@var{m}@var{n}}
+Similar to @code{vec_cmp@var{m}@var{n}} but perform unsigned vector comparison.
+
 @cindex @code{vcond@var{m}@var{n}} instruction pattern
 @item @samp{vcond@var{m}@var{n}}
 Output a conditional vector move.  Operand 0 is the destination to
 receive a combination of operand 1 and operand 2, which are of mode @var{m},
-dependent on the outcome of the predicate in operand 3 which is a
+dependent on the outcome of the predicate in operand 3 which is a signed
 vector comparison with operands of mode @var{n} in operands 4 and 5.  The
 modes @var{m} and @var{n} should have the same size.  Operand 0
 will be set to the value @var{op1} & @var{msk} | @var{op2} & ~@var{msk}
 where @var{msk} is computed by element-wise evaluation of the vector
 comparison with a truth value of all-ones and a false value of all-zeros.
 
+@cindex @code{vcondu@var{m}@var{n}} instruction pattern
+@item @samp{vcondu@var{m}@var{n}}
+Similar to @code{vcond@var{m}@var{n}} but performs unsigned vector
+comparison.
+
+@cindex @code{vcond_mask_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
+result of vector comparison.
+
+@cindex @code{maskload@var{m}@var{n}} instruction pattern
+@item @samp{maskload@var{m}@var{n}}
+Perform a masked load of vector from memory operand 1 of mode @var{m}
+into register operand 0.  Mask is provided in register operand 2 of
+mode @var{n}.
+
+@cindex @code{maskstore@var{m}@var{n}} instruction pattern
+@item @samp{maskload@var{m}@var{n}}
+Perform a masked store of vector from register operand 1 of mode @var{m}
+into memory operand 0.  Mask is provided in register operand 2 of
+mode @var{n}.
+
 @cindex @code{vec_perm@var{m}} instruction pattern
 @item @samp{vec_perm@var{m}}
 Output a (variable) vector permutation.  Operand 0 is the destination


Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread David Malcolm
On Thu, 2015-11-12 at 15:06 +0100, Richard Biener wrote:
> On Thu, Nov 12, 2015 at 3:04 PM, Richard Biener
>  wrote:
> > On Thu, Nov 12, 2015 at 2:49 PM, Tom de Vries  
> > wrote:
> >> On 12/11/15 13:26, Richard Biener wrote:
> >>>
> >>> On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries 
> >>> wrote:
> 
>  Hi,
> 
>  [ See also related discussion at
>  https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]
> 
>  this patch removes the usage of first_pass_instance from pass_vrp.
> 
>  the patch:
>  - limits itself to pass_vrp, but my intention is to remove all
> usage of first_pass_instance
>  - lacks an update to gdbhooks.py
> 
>  Modifying the pass behaviour depending on the instance number, as
>  first_pass_instance does, break compositionality of the pass list. In
>  other
>  words, adding a pass instance in a pass list may change the behaviour of
>  another instance of that pass in the pass list. Which obviously makes it
>  harder to understand and change the pass list. [ I've filed this issue as
>  PR68247 - Remove pass_first_instance ]
> 
>  The solution is to make the difference in behaviour explicit in the pass
>  list, and no longer change behaviour depending on instance number.
> 
>  One obvious possible fix is to create a duplicate pass with a different
>  name, say 'pass_vrp_warn_array_bounds':
>  ...
> NEXT_PASS (pass_vrp_warn_array_bounds);
> ...
> NEXT_PASS (pass_vrp);
>  ...
> 
>  But, AFAIU that requires us to choose a different dump-file name for each
>  pass. And choosing vrp1 and vrp2 as new dump-file names still means that
>  -fdump-tree-vrp no longer works (which was mentioned as drawback here:
>  https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).
> 
>  This patch instead makes pass creation parameterizable. So in the pass
>  list,
>  we use:
>  ...
> NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
> ...
> NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
>  ...
> 
>  This approach gives us clarity in the pass list, similar to using a
>  duplicate pass 'pass_vrp_warn_array_bounds'.
> 
>  But it also means -fdump-tree-vrp still works as before.
> 
>  Good idea? Other comments?
> >>>
> >>>
> >>> It's good to get rid of the first_pass_instance hack.
> >>>
> >>> I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
> >>> we can just use NEXT_PASS with the extra argument being optional...
> >>
> >>
> >> I suppose I could use NEXT_PASS in the pass list, and expand into
> >> NEXT_PASS_WITH_ARG in pass-instances.def.
> >>
> >> An alternative would be to change the NEXT_PASS macro definitions into
> >> vararg variants. But the last time I submitted something with a vararg 
> >> macro
> >> ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I got a
> >> question about it ( 
> >> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html
> >> ), so I tend to avoid using vararg macros.
> >>
> >>> I don't see the need for giving clone_with_args a new name, just use an
> >>> overload
> >>> of clone ()?
> >>
> >>
> >> That's what I tried initially, but I ran into:
> >> ...
> >> src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* opt_pass::clone()’
> >> was hidden [-Woverloaded-virtual]
> >>virtual opt_pass *clone ();
> >>  ^
> >> src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass*
> >> {anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
> >>opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp
> >> (m_ctxt, warn_array_bounds_p); }
> >> ...
> >>
> >> Googling the error message gives this discussion: (
> >> http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions
> >> ), and indeed adding
> >>   "using gimple_opt_pass::clone;"
> >> in class pass_vrp makes the warning disappear.
> >>
> >> I'll submit an updated version.
> >
> > Hmm, but actually the above means the pass does not expose the
> > non-argument clone
> > which is good!
> >
> > Or did you forget to add the virtual-with-arg variant to opt_pass?
> > That is, why's it
> > a virtual function in the first place?  (clone_with_arg)
> 
> That said,
> 
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -83,6 +83,7 @@ public:
> 
>   The default implementation prints an error message and aborts.  */
>virtual opt_pass *clone ();
> +  virtual opt_pass *clone_with_arg (bool);
> 
> 
> means the arg type is fixed at 'bool' (yeah, mimicing
> first_pass_instance).  That
> looks a bit limiting to me, but anyway.
> 
> Richard.
> 
> >> Thanks,
> >> - Tom
> >>
> >>
> >>> [ideally C++ would allow us to say that only one overload may be
> >>> implemented]

IIRC, the idea of the clone vfunc was to support state management of
passes: to allow all the of the sibling passe

[PATCH] Enable libmpx by default on supported target

2015-11-12 Thread Ilya Enkovich
Hi,

libmpx was added close to release date and therefore was disabled by default 
for all targets.  This patch enables it by default for supported targets.  Is 
it OK for trunk?

Thanks,
Ilya
--
2015-11-12  Tsvetkova Alexandra  

* configure.ac: Enable libmpx by default.
* configure: Regenerated.


diff --git a/configure.ac b/configure.ac
index cb6ca24..55f9ab0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -660,7 +660,7 @@ fi
 
 # Enable libmpx on supported systems by request.
 if test -d ${srcdir}/libmpx; then
-if test x$enable_libmpx = xyes; then
+if test x$enable_libmpx = x; then
AC_MSG_CHECKING([for libmpx support])
if (srcdir=${srcdir}/libmpx; \
. ${srcdir}/configure.tgt; \
@@ -671,8 +671,6 @@ if test -d ${srcdir}/libmpx; then
else
AC_MSG_RESULT([yes])
fi
-else
-   noconfigdirs="$noconfigdirs target-libmpx"
 fi
 fi
 


  1   2   >