[testsuite] Clean up effective_target cache

2015-08-25 Thread Christophe Lyon
Hi,

Some subsets of the tests override ALWAYS_CXXFLAGS or
TEST_ALWAYS_FLAGS and perform effective_target support tests using
these modified flags.

In case these flags conflict with the effective_target tests, it means
that subsequent tests will be UNSUPPORTED even though
ALWAYS_CXXFLAGS/TEST_ALWAYS_FLAGS have been reset and no longer
conflict.

In practice, we noticed this when running validation under 'ulimit -v
XXX', which can conflict with ASAN. We observed that sse2 and
stack_protector tests would randomly fail when tested from asan.exp,
making non-asan tests UNSUPPORTED.

This patch adds a new function 'clear_effective_target_cache', which
is called at the end of every .exp file which overrides
ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS.

I tested it works well for asan.exp on x86_64 but the changes in other
.exp files seem mechanical.

However, I noticed that lib/g++.exp changes ALWAYS_CXXFLAGS, but does
not appear to restore it. In doubt, I didn't change it.

OK?

Christophe.
2015-08-25  Christophe Lyon  

	* lib/target-supports.exp (clear_effective_target_cache): New.
	(check_cached_effective_target): Update et_prop_list.
	* lib/asan-dg.exp (asan_finish): Call clear_effective_target_cache.
	* g++.dg/compat/compat.exp: Likewise.
	* g++.dg/compat/struct-layout-1.exp: Likewise.
	* lib/asan-dg.exp: Likewise.
	* lib/atomic-dg.exp: Likewise.
	* lib/cilk-plus-dg.exp: Likewise.
	* lib/clearcap.exp: Likewise.
	* lib/mpx-dg.exp: Likewise.
	* lib/target-supports.exp: Likewise.
	* lib/tsan-dg.exp: Likewise.
	* lib/ubsan-dg.exp: Likewise.

diff --git a/gcc/testsuite/g++.dg/compat/compat.exp b/gcc/testsuite/g++.dg/compat/compat.exp
index 1272289..4c4b25f 100644
--- a/gcc/testsuite/g++.dg/compat/compat.exp
+++ b/gcc/testsuite/g++.dg/compat/compat.exp
@@ -78,6 +78,7 @@ proc compat-use-tst-compiler { } {
 	set ALWAYS_CXXFLAGS $save_always_cxxflags
 	set ld_library_path $save_ld_library_path
 	set_ld_library_path_env_vars
+	clear_effective_target_cache
 }
 }
 
diff --git a/gcc/testsuite/g++.dg/compat/struct-layout-1.exp b/gcc/testsuite/g++.dg/compat/struct-layout-1.exp
index d98..097a731 100644
--- a/gcc/testsuite/g++.dg/compat/struct-layout-1.exp
+++ b/gcc/testsuite/g++.dg/compat/struct-layout-1.exp
@@ -61,6 +61,7 @@ proc compat-use-alt-compiler { } {
 	set ld_library_path $alt_ld_library_path
 	set_ld_library_path_env_vars
 	restore_gcc_exec_prefix_env_var
+	clear_effective_target_cache
 }
 }
 
diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 141a479..3ce264e 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -138,6 +138,7 @@ proc asan_finish { args } {
 }
 set ld_library_path $asan_saved_library_path
 set_ld_library_path_env_vars
+clear_effective_target_cache
 }
 
 # Symbolize lines like
diff --git a/gcc/testsuite/lib/atomic-dg.exp b/gcc/testsuite/lib/atomic-dg.exp
index d9df227..fe24127 100644
--- a/gcc/testsuite/lib/atomic-dg.exp
+++ b/gcc/testsuite/lib/atomic-dg.exp
@@ -101,4 +101,5 @@ proc atomic_finish { args } {
 } else {
 	unset TEST_ALWAYS_FLAGS
 }
+clear_effective_target_cache
 }
diff --git a/gcc/testsuite/lib/cilk-plus-dg.exp b/gcc/testsuite/lib/cilk-plus-dg.exp
index 38e5400..7f38f37 100644
--- a/gcc/testsuite/lib/cilk-plus-dg.exp
+++ b/gcc/testsuite/lib/cilk-plus-dg.exp
@@ -101,4 +101,5 @@ proc cilkplus_finish { args } {
 } else {
 	unset TEST_ALWAYS_FLAGS
 }
+clear_effective_target_cache
 }
diff --git a/gcc/testsuite/lib/clearcap.exp b/gcc/testsuite/lib/clearcap.exp
index d41aa1e..3e2a88c 100644
--- a/gcc/testsuite/lib/clearcap.exp
+++ b/gcc/testsuite/lib/clearcap.exp
@@ -55,4 +55,5 @@ proc clearcap-finish { args } {
 } else {
 	unset TEST_ALWAYS_FLAGS
 }
+clear_effective_target_cache
 }
diff --git a/gcc/testsuite/lib/mpx-dg.exp b/gcc/testsuite/lib/mpx-dg.exp
index c8f64cd..b2bd40c 100644
--- a/gcc/testsuite/lib/mpx-dg.exp
+++ b/gcc/testsuite/lib/mpx-dg.exp
@@ -142,4 +142,5 @@ proc mpx_finish { args } {
 }
 set ld_library_path $mpx_saved_library_path
 set_ld_library_path_env_vars
+clear_effective_target_cache
 }
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 1988301..e2084bb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -117,6 +117,7 @@ proc current_target_name { } {
 
 proc check_cached_effective_target { prop args } {
 global et_cache
+global et_prop_list
 
 set target [current_target_name]
 if {![info exists et_cache($prop,target)]
@@ -124,12 +125,30 @@ proc check_cached_effective_target { prop args } {
 	verbose "check_cached_effective_target $prop: checking $target" 2
 	set et_cache($prop,target) $target
 	set et_cache($prop,value) [uplevel eval $args]
+	lappend et_prop_list $prop
+	verbose "check_cached_effective_target cached list is now: $et_prop_list" 2
 }
 set value $et_cache($prop,value)
 verbose "check_cached_effective_targe

Re: [AArch64] [TLSIE][1/2] Rename test source file for reuse

2015-08-25 Thread Marcus Shawcroft
On 19 June 2015 at 10:15, Jiong Wang  wrote:
>
> Rename test source from tlsle.c into tls.c for reuse purpose.
>
> tls.c will be used as test source file for all TLS test, we just need to
> specify different tls options in different testcases.
>
> 2015-06-19  Jiong Wang  
>
> gcc/testsuite/
>   * gcc.target/aarch64/tlsle.c: Rename to tls.c
>   * gcc.target/aarch64/aarch64/tlsle12.c: Update source file name.
>   * gcc.target/aarch64/aarch64/tlsle24.c: Ditto.
>   * gcc.target/aarch64/aarch64/tlsle32.c: Ditto.

OK Thanks /Marcus


[PATCH, ARM] List Cs and US constraints as being used

2015-08-25 Thread Thomas Preud'homme
Hi,

The header in gcc/config/arm/constraints.md list all the ARM-specific 
constraints defined and for which targets they are but miss a couple of them. 
This patch add the missing Cs and US constraints to the list.

Patch was tested by verifying that arm-none-eabi-gcc cross-compiler can still 
be build (ie the comment remains a comment).

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 42935a4..2d9ffb8 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -21,7 +21,7 @@
 ;; The following register constraints have been used:
 ;; - in ARM/Thumb-2 state: t, w, x, y, z
 ;; - in Thumb state: h, b
-;; - in both states: l, c, k, q, US
+;; - in both states: l, c, k, q, Cs, Ts, US
 ;; In ARM state, 'l' is an alias for 'r'
 ;; 'f' and 'v' were previously used for FPA and MAVERICK registers.


Committed as obvious with the following ChangeLog entry:

2015-08-25  Thomas Preud'homme  

* config/arm/constraints.md: Also list Cs and US ARM-specific
constraints as used.

Best regards,

Thomas




[PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-08-25 Thread Andre Vieira
Conditional branches have a maximum range of [-1048576, 1048572]. Any 
destination further away can not be reached by these.
To be able to have conditional branches in very large functions, we 
invert the condition and change the destination to jump over an 
unconditional branch to the original, far away, destination.


gcc/ChangeLog:
2015-08-07  Ramana Radhakrishnan  
Andre Vieira  

* config/aarch64/aarch64.md (*condjump): Handle functions > 1
Mib.
(*cb1): Idem.
(*tb1): Idem.
(*cb1): Idem.
* config/aarch64/iterators.md (inv_cb): New code attribute.
(inv_tb): Idem.
* config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
* config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.

gcc/testsuite/ChangeLog:
2015-08-07  Andre Vieira  

* gcc.target/aarch64/long-branch.c: New test.
From 9759c5a50c44b0421c7911014e63a6222dd9017d Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 14 Aug 2015 10:21:57 +0100
Subject: [PATCH] fix for far branches

---
 gcc/config/aarch64/aarch64-protos.h|   1 +
 gcc/config/aarch64/aarch64.c   |  23 +
 gcc/config/aarch64/aarch64.md  |  89 +++-
 gcc/config/aarch64/iterators.md|   6 +
 gcc/testsuite/gcc.target/aarch64/long_branch.c | 565 +
 5 files changed, 669 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/long_branch.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 32b5d0958a6e0b2356874736f858f007fe68cdda..87a26deb6a0dbf13e25275baeebec21a37a42f41 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -316,6 +316,7 @@ unsigned aarch64_trampoline_size (void);
 void aarch64_asm_output_labelref (FILE *, const char *);
 void aarch64_cpu_cpp_builtins (cpp_reader *);
 void aarch64_elf_asm_named_section (const char *, unsigned, tree);
+const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
 void aarch64_err_no_fpadvsimd (machine_mode, const char *);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7159f5aca5df97f154b3e654f60af9136354f335..3b491a232d81892b6511bad84e4174d939fa1be7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -587,6 +587,29 @@ static const char * const aarch64_condition_codes[] =
   "hi", "ls", "ge", "lt", "gt", "le", "al", "nv"
 };
 
+/* Generate code to enable conditional branches in functions over 1 MiB.  */
+const char *
+aarch64_gen_far_branch (rtx * operands, int pos_label, const char * dest,
+			const char * branch_format)
+{
+rtx_code_label * tmp_label = gen_label_rtx ();
+char label_buf[256];
+char buffer[128];
+ASM_GENERATE_INTERNAL_LABEL (label_buf, dest,
+ CODE_LABEL_NUMBER (tmp_label));
+const char *label_ptr = targetm.strip_name_encoding (label_buf);
+rtx dest_label = operands[pos_label];
+operands[pos_label] = tmp_label;
+
+snprintf (buffer, sizeof (buffer), "%s%s", branch_format, label_ptr);
+output_asm_insn (buffer, operands);
+
+snprintf (buffer, sizeof (buffer), "b\t%%l%d\n%s:", pos_label, label_ptr);
+operands[pos_label] = dest_label;
+output_asm_insn (buffer, operands);
+return "";
+}
+
 void
 aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 35255e91a95cdf20d52270470202f7499ba46bb2..74f6e3ec4bdcd076c2ad6d1431102aa50dfb5068 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -181,6 +181,13 @@
 	 (const_string "no")
 	] (const_string "yes")))
 
+;; Attribute that specifies whether we are dealing with a branch to a
+;; label that is far away, i.e. further away than the maximum/minimum
+;; representable in a signed 21-bits number.
+;; 0 :=: no
+;; 1 :=: yes
+(define_attr "far_branch" "" (const_int 0))
+
 ;; ---
 ;; Pipeline descriptions and scheduling
 ;; ---
@@ -308,8 +315,23 @@
 			   (label_ref (match_operand 2 "" ""))
 			   (pc)))]
   ""
-  "b%m0\\t%l2"
-  [(set_attr "type" "branch")]
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond", "b%M0\\t");
+else
+  return  "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+		  (const_int 4)
+		  (const_int 8)))
+   (set (attr "far_branch")
+	(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+			   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+		  (const_int 0)
+		

Re: [AArch64] [TLSIE][2/2] Implement TLS IE for tiny model

2015-08-25 Thread Marcus Shawcroft
On 19 June 2015 at 10:15, Jiong Wang  wrote:
>
> Currently, TLS IE is supported on small model only. This patch implement
> TLS Initial-exec model support for AArch64 tiny memory model.
>
> Under tiny model, we only allow 1M loadable segment size, one single ldr
> instruction is enough for addressing the got entry for TLS IE directly.
>
> The code sequence is:
>
> A:  mrs  tp, tpidr_el0
> B0: ldr  t0, :gottprel:x1  R_AARCH64_TLSIE_LD_GOTTPREL_PREL19  x1
> B1: add  t0, t0, tp
>
> B0 and B1 should not be scheduled, as the pattern will be recognized
> later for linker IE model to LE model optimization.
>
> 2015-06-19  Marcus Shawcroft 
> Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64.md (UNSPEC_GOTTINYTLS): New UNSPEC.
>   (tlsie_tiny_): New define_insn.
>   (tlsie_tiny_sidi): Ditto.
>   * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Define
>   SYMBOL_TINY_TLSIE.
>   (aarch64_symbol_context): New comment for SYMBOL_TINY_TLSIE.
>   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Support
>   SYMBOL_TINY_TLSIE.
>   (aarch64_expand_mov_immediate): Ditto.
>   (aarch64_print_operand): Ditto.
>   (arch64_classify_tls_symbol): Ditto.
>
> gcc/testsuite/
>   * gcc.target/aarch64/tlsie_tiny.c: New testcase.
>
> --
> Regards,
> Jiong
>

OK /Marcus


Re: [AArch64][TLSLE][2/3] Add the option "-mtls-size" for AArch64

2015-08-25 Thread Marcus Shawcroft
> 2015-08-19  Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Rename
>   SYMBOL_TLSLE to SYMBOL_TLSLE24.
>   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Likewise
>   (aarch64_expand_mov_immediate): Likewise
>   (aarch64_print_operand): Likewise
>   (aarch64_classify_symbol): Likewise
>

OK /Marcus


[PATCH, PR 57195] Allow mode iterators inside angle brackets

2015-08-25 Thread Michael Collison
This patch allow mode iterators inside angle brackets in machine 
description files. I
discovered the issue when attempting to use iterators on match_operand's 
as follows:


match_operand: 0 "s_register_operand" "=w")

The function 'read_name' is nor properly handling ':' inside angle brackets.

Bootstrapped on arm-linux.

OK for trunk?

2015-08-25  Michael Collison  

PR other/57195
* read-md.c (read_name): Allow mode iterators inside angle
brackets in rtl expressions.

diff --git a/gcc/read-md.c b/gcc/read-md.c
index 9f158ec..0171fb0 100644
--- a/gcc/read-md.c
+++ b/gcc/read-md.c
@@ -399,17 +399,25 @@ read_name (struct md_name *name)
 {
   int c;
   size_t i;
+  bool in_angle_bracket;

   c = read_skip_spaces ();

   i = 0;
+  in_angle_bracket = false;
   while (1)
 {
+  if (c == '<')
+in_angle_bracket = true;
+
+  if (c == '>')
+in_angle_bracket = false;
+
   if (c == ' ' || c == '\n' || c == '\t' || c == '\f' || c == '\r'
   || c == EOF)
 break;
-  if (c == ':' || c == ')' || c == ']' || c == '"' || c == '/'
-  || c == '(' || c == '[')
+  if (((c == ':') and (!in_angle_bracket)) || c == ')' || c == ']'
+  || c == '"' || c == '/' || c == '(' || c == '[')
 {
   unread_char (c);
   break;

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [AArch64][TLSLE][3/3] Implement local executable mode for all memory model

2015-08-25 Thread Marcus Shawcroft
> 2015-08-19  Marcus Shawcroft  
> Jiong Wang  
> gcc/
>   * config/aarch64/aarch64.c (initialize_aarch64_tls_size): Set default
>   tls size for tiny, small, large memory model.
>   (aarch64_load_symref_appropriately): Support new symbol types.
>   (aarch64_expand_mov_immediate): Likewise.
>   (aarch64_print_operand): Likewise.
>   (aarch64_classify_tls_symbol): Likewise.
>   * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Likewise.
>   (aarch64_symbol_type): Likewise.
>   * config/aarch64/aarch64.md (tlsle): Deleted.
>   (tlsle12_): New define_insn.
>   (tlsle24_): Likewise.
>   (tlsle32_): Likewise.
>   (tlsle48_): Likewise.
>   * doc/sourcebuild.texi (AArch64-specific attributes): Document
>   "aarch64_tlsle32".
>
> gcc/testsuite/
>   * lib/target-supports.exp (check_effective_target_aarch64_tlsle32):
>   New test directive.
>   * gcc.target/aarch64/tlsle_1.x: New test source.
>   * gcc.target/aarch64/tlsle12.c: New testcase.
>   * gcc.target/aarch64/tlsle24.c: New testcase.
>   * gcc.target/aarch64/tlsle32.c: New testcase.
> --
>
OK /Marcus


Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-08-25 Thread Andrew Pinski
On Tue, Aug 25, 2015 at 5:37 PM, Andre Vieira
 wrote:
> Conditional branches have a maximum range of [-1048576, 1048572]. Any
> destination further away can not be reached by these.
> To be able to have conditional branches in very large functions, we invert
> the condition and change the destination to jump over an unconditional
> branch to the original, far away, destination.
>
> gcc/ChangeLog:
> 2015-08-07  Ramana Radhakrishnan  
> Andre Vieira  
>
> * config/aarch64/aarch64.md (*condjump): Handle functions > 1
> Mib.
> (*cb1): Idem.
> (*tb1): Idem.
> (*cb1): Idem.
> * config/aarch64/iterators.md (inv_cb): New code attribute.
> (inv_tb): Idem.
> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
>
> gcc/testsuite/ChangeLog:
> 2015-08-07  Andre Vieira  
>
> * gcc.target/aarch64/long-branch.c: New test.

Just a few comments about the testcase.  You could improve the size
(on disk) of the testcase by using the preprocessor some more:
Something like:
#define CASE_ENTRY2 (x) CASE_ENTRY ((x)) CASE_ENTRY ((x)+1)
#define CASE_ENTRY4 (x) CASE_ENTRY2 ((x)) CASE_ENTRY2 ((x)+2+1)
#define CASE_ENTRY8 (x) CASE_ENTRY4 ((x)) CASE_ENTRY4 ((x)+4+1)
#define CASE_ENTRY16 (x) CASE_ENTRY8 ((x)) CASE_ENTRY8 ((x)+8+1)
#define CASE_ENTRY32 (x) CASE_ENTRY16 ((x)) CASE_ENTRY16 ((x)+16)
#define CASE_ENTRY64 (x) CASE_ENTRY32 ((x)) CASE_ENTRY32 ((x)+32+1)
#define CASE_ENTRY128 (x) CASE_ENTRY64 ((x)) CASE_ENTRY16 ((x)+64+1)
#define CASE_ENTRY256 (x) CASE_ENTRY128 ((x)) CASE_ENTRY128 ((x)+128+1)

And then use
CASE_ENTRY256 (1)

You can do the same trick to reduce the size of CASE_ENTRY too.

Thanks,
Andrew Pinski


Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-08-25 Thread Andrew Pinski
On Tue, Aug 25, 2015 at 5:50 PM, Andrew Pinski  wrote:
> On Tue, Aug 25, 2015 at 5:37 PM, Andre Vieira
>  wrote:
>> Conditional branches have a maximum range of [-1048576, 1048572]. Any
>> destination further away can not be reached by these.
>> To be able to have conditional branches in very large functions, we invert
>> the condition and change the destination to jump over an unconditional
>> branch to the original, far away, destination.
>>
>> gcc/ChangeLog:
>> 2015-08-07  Ramana Radhakrishnan  
>> Andre Vieira  
>>
>> * config/aarch64/aarch64.md (*condjump): Handle functions > 1
>> Mib.
>> (*cb1): Idem.
>> (*tb1): Idem.
>> (*cb1): Idem.
>> * config/aarch64/iterators.md (inv_cb): New code attribute.
>> (inv_tb): Idem.
>> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
>> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
>>
>> gcc/testsuite/ChangeLog:
>> 2015-08-07  Andre Vieira  
>>
>> * gcc.target/aarch64/long-branch.c: New test.
>
> Just a few comments about the testcase.  You could improve the size
> (on disk) of the testcase by using the preprocessor some more:
> Something like:
> #define CASE_ENTRY2 (x) CASE_ENTRY ((x)) CASE_ENTRY ((x)+1)
> #define CASE_ENTRY4 (x) CASE_ENTRY2 ((x)) CASE_ENTRY2 ((x)+2+1)
> #define CASE_ENTRY8 (x) CASE_ENTRY4 ((x)) CASE_ENTRY4 ((x)+4+1)
> #define CASE_ENTRY16 (x) CASE_ENTRY8 ((x)) CASE_ENTRY8 ((x)+8+1)
> #define CASE_ENTRY32 (x) CASE_ENTRY16 ((x)) CASE_ENTRY16 ((x)+16)
> #define CASE_ENTRY64 (x) CASE_ENTRY32 ((x)) CASE_ENTRY32 ((x)+32+1)
> #define CASE_ENTRY128 (x) CASE_ENTRY64 ((x)) CASE_ENTRY16 ((x)+64+1)
> #define CASE_ENTRY256 (x) CASE_ENTRY128 ((x)) CASE_ENTRY128 ((x)+128+1)


I do have an off by one error but you should get the idea.  Basically
instead of 200 lines, we only have 9 lines (log2(256) == 8).

Thanks,
Andrew

>
> And then use
> CASE_ENTRY256 (1)
>
> You can do the same trick to reduce the size of CASE_ENTRY too.
>
> Thanks,
> Andrew Pinski


[PATCH, PR other/67320] Fix wide add standard names

2015-08-25 Thread Michael Collison
The standard names for signed and unsigned vector wide adds are wrong in 
the documentation.


OK for trunk?

2015-08-25  Michael Collison  

PR other/67320
* doc/md.texi: Rename [su]sum_widen to widen_[su]sum to reflect correct
standard names

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 0bffdc6..619259f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4946,10 +4946,10 @@ is of a wider mode, is computed and added to 
operand 3. Operand 3 is of a mode
 equal or wider than the mode of the absolute difference. The result is 
placed

 in operand 0, which is of the same mode as operand 3.

-@cindex @code{ssum_widen@var{m3}} instruction pattern
-@item @samp{ssum_widen@var{m3}}
-@cindex @code{usum_widen@var{m3}} instruction pattern
-@itemx @samp{usum_widen@var{m3}}
+@cindex @code{widen_ssum@var{m3}} instruction pattern
+@item @samp{widen_ssum@var{m3}}
+@cindex @code{widen_usum@var{m3}} instruction pattern
+@itemx @samp{widen_usum@var{m3}}
 Operands 0 and 2 are of the same mode, which is wider than the mode of
 operand 1. Add operand 1 to operand 2 and place the widened result in
 operand 0. (This is used express accumulation of elements into an 
accumulator


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



Re: [AArch64][TLSLE][1/3] Add the option "-mtls-size" for AArch64

2015-08-25 Thread Marcus Shawcroft
On 19 August 2015 at 15:26, Jiong Wang  wrote:

> 2015-08-19  Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64.opt (mtls-size): New entry.
>   * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
>   (aarch64_override_options_internal): Call initialize_aarch64_tls_size.
>   * doc/invoke.texi (AArch64 Options): Document -mtls-size.
>
> --
> Regards,
> Jiong
>

+case AARCH64_CMODEL_TINY:
+  /* The maximum TLS size allowed under tiny is 1M.  */
+  if (aarch64_tls_size > 20)
+ aarch64_tls_size = 20;

The only valid values of aarch64_tls_size handled/expected by the
remainder of the patch set is 12,24,32,48 so setting the value to 20
here doesn;t make sense.

/Marcus


[PATCH] Fix PR67306

2015-08-25 Thread Richard Biener

The following fixes ICEs due to the genmatch generated code for
GENERIC not verifying if builtin_decl_implicit returns non-NULL.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-08-25  Richard Biener  

PR middle-end/67306
* genmatch.c (expr::gen_transform): Verify the result of
builtin_decl_implicit.
(dt_simplify::gen_1): Likewise.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 227058)
+++ gcc/genmatch.c  (working copy)
@@ -2177,11 +2216,19 @@ expr::gen_transform (FILE *f, int indent
fprintf_indent (f, indent, "res = fold_build%d_loc (loc, %s, %s",
ops.length(), opr_name, type);
   else
-   fprintf_indent (f, indent, "res = build_call_expr_loc (loc, "
-   "builtin_decl_implicit (%s), %d", opr_name, 
ops.length());
+   {
+ fprintf_indent (f, indent, "{\n");
+ fprintf_indent (f, indent, "  tree decl = builtin_decl_implicit 
(%s);\n",
+ opr_name);
+ fprintf_indent (f, indent, "  if (!decl) return NULL_TREE;\n");
+ fprintf_indent (f, indent, "  res = build_call_expr_loc (loc, "
+ "decl, %d", ops.length());
+   }
   for (unsigned i = 0; i < ops.length (); ++i)
fprintf (f, ", ops%d[%u]", depth, i);
   fprintf (f, ");\n");
+  if (opr->kind != id_base::CODE)
+   fprintf_indent (f, indent, "}\n");
   if (*opr == CONVERT_EXPR)
{
  indent -= 2;
@@ -3069,13 +3147,24 @@ dt_simplify::gen_1 (FILE *f, int indent,
*e->operation == CONVERT_EXPR
? "NOP_EXPR" : e->operation->id);
  else
-   fprintf_indent (f, indent,
-   "res = build_call_expr_loc "
-   "(loc, builtin_decl_implicit (%s), %d",
-   e->operation->id, e->ops.length());
+   {
+ fprintf_indent (f, indent,
+ "{\n");
+ fprintf_indent (f, indent,
+ "  tree decl = builtin_decl_implicit 
(%s);\n",
+ e->operation->id);
+ fprintf_indent (f, indent,
+ "  if (!decl) return NULL_TREE;\n");
+ fprintf_indent (f, indent,
+ "  res = build_call_expr_loc "
+ "(loc, decl, %d",
+ e->ops.length());
+   }
  for (unsigned j = 0; j < e->ops.length (); ++j)
fprintf (f, ", res_op%d", j);
  fprintf (f, ");\n");
+ if (!is_a  (opr))
+   fprintf_indent (f, indent, "}\n");
}
}
}


Re: [PATCH 12/15][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix

2015-08-25 Thread Alan Lawrence
James Greenhalgh wrote:
>>
>> -  VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
>> +  VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf)
>
> Should this not use the appropriate "BUILTIN_..." iterator?

Indeed; BUILTIN_VQ_HSF it is.

>>VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
>>VAR1 (BINOP, float_truncate_hi_, 0, v8hf)

I could also use BUILTIN_VQ_HSF here (these two were added in a previous patch, 
before the VQ_HSF iterator was introduced). However, that goes against the 
principle that we should use the same iterator as the pattern (the pattern uses 
the  attribute of the VDF iterator), so I'm not sure whether that would 
be preferable (i.e. as a separate patch)?

>> -  VAR1 (UNOP, float_extend_lo_, 0, v2df)
>> +  VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf)
>
> Likewise.

Similarly, the required iterator does not exist, as float_extend_lo_ is named
after the  attribute of the VDF iterator. The nearest equivalents I can
see use two VAR1's rather than a VAR2, so I've updated the patch to do that too.

OK with those two changes? (patch attached and bootstrapped+check-gcc on 
aarch64-none-linux-gnu)

Thanks, Alan
---
 gcc/config/aarch64/aarch64-simd-builtins.def |  3 +-
 gcc/config/aarch64/aarch64-simd.md   | 63 ++--
 gcc/config/aarch64/arm_neon.h| 16 +--
 gcc/config/aarch64/iterators.md  | 18 +---
 4 files changed, 69 insertions(+), 31 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index c5b46aa..2c13cfb 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -361,11 +361,12 @@
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
   BUILTIN_VDQF (UNOP, abs, 2)
 
-  VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
+  BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
   VAR1 (BINOP, float_truncate_hi_, 0, v8hf)
 
   VAR1 (UNOP, float_extend_lo_, 0, v2df)
+  VAR1 (UNOP, float_extend_lo_,  0, v4sf)
   BUILTIN_VDF (UNOP, float_truncate_lo_, 0)
 
   /* Implemented by aarch64_ld1.  */
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index f8754cd..160acf9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1691,36 +1691,57 @@
 
 ;; Float widening operations.
 
-(define_insn "vec_unpacks_lo_v4sf"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-   (float_extend:V2DF
- (vec_select:V2SF
-   (match_operand:V4SF 1 "register_operand" "w")
-   (parallel [(const_int 0) (const_int 1)])
- )))]
+(define_insn "aarch64_simd_vec_unpacks_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+(float_extend: (vec_select:
+  (match_operand:VQ_HSF 1 "register_operand" "w")
+  (match_operand:VQ_HSF 2 "vect_par_cnst_lo_half" 
"")
+   )))]
   "TARGET_SIMD"
-  "fcvtl\\t%0.2d, %1.2s"
+  "fcvtl\\t%0., %1."
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "aarch64_float_extend_lo_v2df"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-   (float_extend:V2DF
- (match_operand:V2SF 1 "register_operand" "w")))]
+(define_expand "vec_unpacks_lo_"
+  [(match_operand: 0 "register_operand" "")
+   (match_operand:VQ_HSF 1 "register_operand" "")]
   "TARGET_SIMD"
-  "fcvtl\\t%0.2d, %1.2s"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+emit_insn (gen_aarch64_simd_vec_unpacks_lo_ (operands[0],
+  operands[1], p));
+DONE;
+  }
+)
+
+(define_insn "aarch64_simd_vec_unpacks_hi_"
+  [(set (match_operand: 0 "register_operand" "=w")
+(float_extend: (vec_select:
+  (match_operand:VQ_HSF 1 "register_operand" "w")
+  (match_operand:VQ_HSF 2 "vect_par_cnst_hi_half" 
"")
+   )))]
+  "TARGET_SIMD"
+  "fcvtl2\\t%0., %1."
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "vec_unpacks_hi_v4sf"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-   (float_extend:V2DF
- (vec_select:V2SF
-   (match_operand:V4SF 1 "register_operand" "w")
-   (parallel [(const_int 2) (const_int 3)])
- )))]
+(define_expand "vec_unpacks_hi_"
+  [(match_operand: 0 "register_operand" "")
+   (match_operand:VQ_HSF 1 "register_operand" "")]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, true);
+emit_insn (gen_aarch64_simd_vec_unpacks_lo_ (operands[0],
+  operands[1], p));
+DONE;
+  }
+)
+(define_insn "aarch64_float_extend_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (float_extend:
+ (match_operand:VDF 1 "register_operand" "w")))]
   "TARGET_SIMD"
-  "fcvtl2\\t%0.2d, %1.4s"
+  "fcvtl\\t%0, %1"
   [(set_attr "type" "neon_fp_c

[PATCH 0/5][tree-sra.c] PR/63679 Make SRA replace constant pool loads

2015-08-25 Thread Alan Lawrence
ssa-dom-cse-2.c fails on a number of platforms because the input array is pushed
out to the constant pool, preventing later stages from folding away the entire
computation. This patch series fixes the failure by extending SRA to pull the
constants back in.

This is my first patch(set) to SRA and as such I'd appreciate suggestions about
the approach. I think the first two patches, which essentially just extend SRA
to deal with ARRAY_TYPE as well as RECORD_TYPE, are fairly straightforward and
may stand alone. Later patches, in particular, may be better done in a different
way and I'd welcome feedback as to what a patch (series) should look like.

Also the heuristic for controlling SRA, when dealing with constant-pool loads,
may want something better/other than the default
--param sra-max-scalarization-size-O{speed,size}, or else platforms where the
initializer is forced to memory, will still suffer in terms of constant
propagation.



[RFC 4/5] Handle constant-pool entries

2015-08-25 Thread Alan Lawrence
This makes SRA replace loads of records/arrays from constant pool entries,
with elementwise assignments of the constant values, hence, overcoming the
fundamental problem in PR/63679.

As a first pass, the approach I took was to look for constant-pool loads as
we scanned through other accesses, and add them as candidates there; to build a
constant replacement_decl for any such accesses in completely_scalarize; and to
use any existing replacement_decl rather than creating a variable in
create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but
that does not allow addresses of labels, which can still end up in the constant
pool.)

Feedback as to the approach or how it might be better structured / fitted into
SRA, is solicited ;).

Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly 
increases the number of testcases in which this code is exercised!

Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a 
stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, 
and sh.

gcc/ChangeLog:

* tree-sra.c (create_access): Scan for uses of constant pool and add
to candidates.
(subst_initial): New.
(scalarize_elem): Build replacement_decl using subst_initial.
(create_access_replacement): Use replacement_decl if set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
sra-max-scalarization-size-Ospeed.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
 gcc/tree-sra.c| 56 +--
 2 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 9eccdc9..b13d583 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param 
sra-max-scalarization-size-Ospeed=32" } */
 
 int
 foo ()
@@ -17,7 +17,4 @@ foo ()
 /* After late unrolling the above loop completely DOM should be
able to optimize this to return 28.  */
 
-/* See PR63679 and PR64159, if the target forces the initializer to memory then
-   DOM is not able to perform this optimization.  */
-
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* 
alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index af35fcc..a3ff2df 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
   else
 ptr = false;
 
+  /* FORNOW: scan for uses of constant pool as we go along.  */
+  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+  && !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
+{
+  gcc_assert (!write);
+  bitmap_set_bit (candidate_bitmap, DECL_UID (base));
+  tree_node **slot = candidates->find_slot_with_hash (base, DECL_UID 
(base),
+ INSERT);
+  *slot = base;
+}
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
 return NULL;
 
@@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, 
HOST_WIDE_INT offset, tree ref)
 }
 }
 
+static tree
+subst_initial (tree expr, tree var)
+{
+  if (TREE_CODE (expr) == VAR_DECL)
+{
+  gcc_assert (DECL_IN_CONSTANT_POOL (expr));
+  gcc_assert (expr == var);
+  return DECL_INITIAL (expr);
+}
+  if (TREE_CODE (expr) == COMPONENT_REF)
+{
+  gcc_assert (TREE_CODE (TREE_OPERAND (expr, 1)) == FIELD_DECL);
+  gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+  return fold_build3 (COMPONENT_REF, TREE_TYPE (expr),
+ subst_initial (TREE_OPERAND (expr, 0), var),
+ TREE_OPERAND (expr, 1),
+ NULL_TREE);
+}
+  if (TREE_CODE (expr) == ARRAY_REF)
+{
+  gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+  gcc_assert (TREE_OPERAND (expr, 3) == NULL_TREE);
+  return fold (build4 (ARRAY_REF, TREE_TYPE (expr),
+  subst_initial (TREE_OPERAND (expr, 0), var),
+  TREE_OPERAND (expr, 1),
+  NULL_TREE,
+  NULL_TREE));
+}
+  gcc_unreachable ();
+}
+
 static void
 scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
 tree ref, tree type)
@@ -1033,6 +1075,9 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, 
HOST_WIDE_INT size,
   {
 struct access *access = create_access_1 (base, pos, size);
 

[RFC 5/5] Always completely replace constant pool entries

2015-08-25 Thread Alan Lawrence
I used this as a means of better-testing the previous changes, as it exercises
the constant replacement code a whole lot more. Indeed, quite a few tests are
now optimized away to nothing on AArch64...

Always pulling in constants, is almost certainly not what we want, but we may
nonetheless want something more aggressive than the usual --param, e.g. for the
ssa-dom-cse-2.c test. Thoughts welcomed?

Thanks, Alan

gcc/ChangeLog:

* tree-sra.c (analyze_all_variable_accesses): Bypass size limit for
constant-pool accesses.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove --param
sra-max-scalarization-size-Ospeed.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 2 +-
 gcc/tree-sra.c| 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index b13d583..370b785 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param 
sra-max-scalarization-size-Ospeed=32" } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
 
 int
 foo ()
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index a3ff2df..2a741b8 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2651,7 +2651,8 @@ analyze_all_variable_accesses (void)
&& scalarizable_type_p (TREE_TYPE (var)))
  {
if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
-   <= max_scalarization_size)
+ <= max_scalarization_size
+   || DECL_IN_CONSTANT_POOL (var))
  {
create_total_scalarization_access (var);
completely_scalarize (var, TREE_TYPE (var), 0, var);
-- 
1.8.3



[PATCH 2/5] completely_scalarize arrays as well as records

2015-08-25 Thread Alan Lawrence
This changes the completely_scalarize_record path to also work on arrays (thus
allowing records containing arrays, etc.). This just required extending the
existing type_consists_of_records_p and completely_scalarize_record methods
to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both
methods so as not to mention 'record'.

Bootstrapped + check-gcc on aarch64-none-linux-gnu, arm-none-linux-gnueabihf 
and x86_64-none-linux-gnu.

Have also verified the scan-tree-dump check in the new sra-15.c passes (using a 
stage 1 compiler only, no execution test) on alpha, hppa, powerpc, sparc, avr 
and sh.

gcc/ChangeLog:

* tree-sra.c (type_consists_of_records_p): Rename to...
(scalarizable_type_p): ...this, add case for ARRAY_TYPE.

(completely_scalarize_record): Rename to...
(completely_scalarize): ...this, add ARRAY_TYPE case, move some code to:
(scalarize_elem): New.

gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/sra-15.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c |  38 +
 gcc/tree-sra.c | 146 ++---
 2 files changed, 135 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
new file mode 100644
index 000..e251058
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
@@ -0,0 +1,38 @@
+/* Verify that SRA total scalarization works on records containing arrays.  */
+/* Test skipped for targets with small (often default) MOVE_RATIO.  */
+/* { dg-do run } */
+/* { dg-options "-O1 -fdump-tree-release_ssa --param 
sra-max-scalarization-size-Ospeed=32" } */
+
+extern void abort (void);
+
+struct S
+{
+  char c;
+  unsigned short f[2][2];
+  int i;
+  unsigned short f3, f4;
+};
+
+
+int __attribute__ ((noinline))
+foo (struct S *p)
+{
+  struct S l;
+
+  l = *p;
+  l.i++;
+  l.f[1][0] += 3;
+  *p = l;
+}
+
+int
+main (int argc, char **argv)
+{
+  struct S a = {0, { {5, 7}, {9, 11} }, 4, 0, 0};
+  foo (&a);
+  if (a.i != 5 || a.f[1][0] != 12)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index a0c92b0..08fa8dc 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -915,74 +915,122 @@ create_access (tree expr, gimple stmt, bool write)
 }
 
 
-/* Return true iff TYPE is a RECORD_TYPE with fields that are either of gimple
-   register types or (recursively) records with only these two kinds of fields.
-   It also returns false if any of these records contains a bit-field.  */
+/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or ARRAY_TYPE with
+   fields that are either of gimple register types (excluding bit-fields)
+   or (recursively) scalarizable types.  */
 
 static bool
-type_consists_of_records_p (tree type)
+scalarizable_type_p (tree type)
 {
-  tree fld;
+  gcc_assert (!is_gimple_reg_type (type));
 
-  if (TREE_CODE (type) != RECORD_TYPE)
-return false;
+  switch (TREE_CODE (type))
+  {
+  case RECORD_TYPE:
+for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
+  if (TREE_CODE (fld) == FIELD_DECL)
+   {
+ tree ft = TREE_TYPE (fld);
 
-  for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
-if (TREE_CODE (fld) == FIELD_DECL)
-  {
-   tree ft = TREE_TYPE (fld);
+ if (DECL_BIT_FIELD (fld))
+   return false;
 
-   if (DECL_BIT_FIELD (fld))
- return false;
+ if (!is_gimple_reg_type (ft)
+ && !scalarizable_type_p (ft))
+   return false;
+   }
 
-   if (!is_gimple_reg_type (ft)
-   && !type_consists_of_records_p (ft))
- return false;
-  }
+return true;
 
-  return true;
+  case ARRAY_TYPE:
+{
+  tree elem = TREE_TYPE (type);
+  if (DECL_P (elem) && DECL_BIT_FIELD (elem))
+   return false;
+  if (!is_gimple_reg_type (elem)
+&& !scalarizable_type_p (elem))
+   return false;
+  return true;
+}
+  default:
+return false;
+  }
 }
 
-/* Create total_scalarization accesses for all scalar type fields in DECL that
-   must be of a RECORD_TYPE conforming to type_consists_of_records_p.  BASE
-   must be the top-most VAR_DECL representing the variable, OFFSET must be the
-   offset of DECL within BASE.  REF must be the memory reference expression for
-   the given decl.  */
+static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
+
+/* Create total_scalarization accesses for all scalar fields of a member
+   of type DECL_TYPE conforming to scalarizable_type_p.  BASE
+   must be the top-most VAR_DECL representing the variable; within that,
+   OFFSET locates the member and REF must be the memory reference expression 
for
+   the member.  */
 
 static void
-completely_scalarize_record (tree base, tree decl, HOST_WIDE_INT offset,
-tree 

[hsa] Fixes in gen_hsa_{unary,binary}_operation

2015-08-25 Thread Martin Jambor
Hi,

the patch below fixes two minor issues with new function
gen_hsa_unary_operation and gen_hsa_binary_operation.  First, they
should add new instructions to use list of pseudoregiters.  Second,
the type opcode should really be int.  (It was BrigType16_t, probably
BrigOpcode16_t was intended, but in our own representation, we use
negative values for instructions which are not actual HSA
instructions, such as PHI nodes).  Committed to the HSA branch.

Martin


2015-08-25  Martin Jambor  

* hsa-gen.c (gen_hsa_unary_operation): Use int for the opcode.
Add instruction to uses of register operands.
(gen_hsa_binary_operation): Likewise.
(gen_hsa_insns_for_operation_assignment): Use int for opcodes
passed to gen_hsa_binary_operation.

---
 gcc/ChangeLog.hsa |  8 
 gcc/hsa-gen.c | 16 ++--
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index bdb9cec..14cf890 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -2250,7 +2250,7 @@ gen_hsa_cmp_insn_from_gimple (enum tree_code code, tree 
lhs, tree rhs,
as a single operand.  */
 
 static void
-gen_hsa_unary_operation (BrigType16_t opcode, hsa_op_reg *dest,
+gen_hsa_unary_operation (int opcode, hsa_op_reg *dest,
 hsa_op_base *op1, hsa_bb *hbb)
 {
   gcc_checking_assert (dest);
@@ -2266,6 +2266,8 @@ gen_hsa_unary_operation (BrigType16_t opcode, hsa_op_reg 
*dest,
 }
 
   dest->set_definition (insn);
+  if (hsa_op_reg *reg = dyn_cast  (op1))
+reg->uses.safe_push (insn);
   hbb->append_insn (insn);
 }
 
@@ -2274,7 +2276,7 @@ gen_hsa_unary_operation (BrigType16_t opcode, hsa_op_reg 
*dest,
and OP2.  */
 
 static void
-gen_hsa_binary_operation (BrigType16_t opcode, hsa_op_reg *dest,
+gen_hsa_binary_operation (int opcode, hsa_op_reg *dest,
  hsa_op_base *op1, hsa_op_base *op2, hsa_bb *hbb)
 {
   gcc_checking_assert (dest);
@@ -2290,6 +2292,10 @@ gen_hsa_binary_operation (BrigType16_t opcode, 
hsa_op_reg *dest,
   hsa_insn_basic *insn = new hsa_insn_basic (3, opcode, dest->type, dest,
 op1, op2);
   dest->set_definition (insn);
+  if (hsa_op_reg *reg = dyn_cast  (op1))
+reg->uses.safe_push (insn);
+  if (hsa_op_reg *reg = dyn_cast  (op2))
+reg->uses.safe_push (insn);
   hbb->append_insn (insn);
 }
 
@@ -2387,10 +2393,8 @@ gen_hsa_insns_for_operation_assignment (gimple assign, 
hsa_bb *hbb,
 case RROTATE_EXPR:
   {
hsa_insn_basic *insn = NULL;
-   BrigType16_t code1 = code == LROTATE_EXPR
- ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR;
-   BrigType16_t code2 = code != LROTATE_EXPR
- ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR;
+   int code1 = code == LROTATE_EXPR ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR;
+   int code2 = code != LROTATE_EXPR ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR;
BrigType16_t btype = hsa_type_for_scalar_tree_type (TREE_TYPE (lhs),
true);
 
-- 
2.4.6



[hsa] Support unary FP operations implementable with a single HSA instruction

2015-08-25 Thread Martin Jambor
Hi,

the patch below adds support for a few unary floating point buitlins
that can be implemented with a single HSA instruction.  More effort in
the area of builtins is needed, the motivation for this was a a
benchmark that previously failed with a sorry message.  Committed to
the hsa branch.

Martin


2015-08-25  Martin Jambor  

* hsa-gen.c (gen_hsa_unaryop_for_builtin): New function.
(gen_hsa_insns_for_call): Add support for a few unary fp
operations.

---
 gcc/ChangeLog.hsa |  4 +++
 gcc/hsa-gen.c | 86 +++
 2 files changed, 78 insertions(+), 12 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 14cf890..1e23996 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3213,6 +3213,26 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hsa_cfun->kernel_dispatch_count++;
 }
 
+/* Helper functions to create a single unary HSA operations out of calls to
+   builtins.  OPCODE is the HSA operation to be generated.  STMT is a gimple
+   call to a builtin.  HBB is the HSA BB to which the instruction should be
+   added and SSA_MAP is used to map gimple SSA names to HSA pseudoreisters.  */
+
+static void
+gen_hsa_unaryop_for_builtin (int opcode, gimple stmt, hsa_bb *hbb,
+vec  *ssa_map)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  /* FIXME: Since calls without a LHS are not removed, double check that
+ they cannot have side effects.  */
+  if (!lhs)
+return;
+  hsa_op_reg *dest = hsa_reg_for_gimple_ssa (lhs, ssa_map);
+  hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 0),
+   hbb, ssa_map, NULL);
+  gen_hsa_unary_operation (opcode, dest, op, hbb);
+}
+
 /* Generate HSA instructions for the given call statement STMT.  Instructions
will be appended to HBB.  SSA_MAP maps gimple SSA names to HSA pseudo
registers.  */
@@ -3284,22 +3304,64 @@ specialop:
break;
   }
 
+case BUILT_IN_FABS:
+case BUILT_IN_FABSF:
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_ABS, stmt, hbb, ssa_map);
+  break;
+
+case BUILT_IN_CEIL:
+case BUILT_IN_CEILF:
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_CEIL, stmt, hbb, ssa_map);
+  break;
+
+case BUILT_IN_FLOOR:
+case BUILT_IN_FLOORF:
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_FLOOR, stmt, hbb, ssa_map);
+  break;
+
+case BUILT_IN_RINT:
+case BUILT_IN_RINTF:
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_RINT, stmt, hbb, ssa_map);
+  break;
+
 case BUILT_IN_SQRT:
 case BUILT_IN_SQRTF:
-  /* FIXME: Since calls without a LHS are not removed, double check that
-they cannot have side effects.  */
-  if (!lhs)
-   return;
-  dest = hsa_reg_for_gimple_ssa (lhs, ssa_map);
-  insn = new hsa_insn_basic (2, BRIG_OPCODE_SQRT, dest->type);
-  insn->operands[0] = dest;
-  dest->set_definition (insn);
-  insn->operands[1]
-   = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 0),
- hbb, ssa_map, insn);
-  hbb->append_insn (insn);
+  /* TODO: Perhaps produce BRIG_OPCODE_NSQRT with -ffast-math?  */
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb, ssa_map);
+  break;
+
+case BUILT_IN_TRUNC:
+case BUILT_IN_TRUNCF:
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb, ssa_map);
   break;
 
+case BUILT_IN_COS:
+case BUILT_IN_COSF:
+  /* FIXME: Using the native instruction may not be precise enough.
+Perhaps only allow if using -ffast-math?  */
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb, ssa_map);
+  break;
+
+case BUILT_IN_EXP2:
+case BUILT_IN_EXP2F:
+  /* FIXME: Using the native instruction may not be precise enough.
+Perhaps only allow if using -ffast-math?  */
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb, ssa_map);
+  break;
+
+case BUILT_IN_LOG2:
+case BUILT_IN_LOG2F:
+  /* FIXME: Using the native instruction may not be precise enough.
+Perhaps only allow if using -ffast-math?  */
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb, ssa_map);
+  break;
+
+case BUILT_IN_SIN:
+case BUILT_IN_SINF:
+  /* FIXME: Using the native instruction may not be precise enough.
+Perhaps only allow if using -ffast-math?  */
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb, ssa_map);
+
 case BUILT_IN_ATOMIC_LOAD_1:
 case BUILT_IN_ATOMIC_LOAD_2:
 case BUILT_IN_ATOMIC_LOAD_4:
-- 
2.4.6



[PATCH 1/5] Refactor completely_scalarize_var

2015-08-25 Thread Alan Lawrence
This is a small refactoring/renaming patch, it just moves the call to
"completely_scalarize_record" out from completely_scalarize_var, and renames
the latter to create_total_scalarization_access.

This is because the next patch needs to drop the "_record" suffix and I felt
it would be confusing to have both completely_scalarize and
completely_scalarize_var. However, it also makes the new function name
(create_total_scalarization_access) consistent with the existing code & comment.

Bootstrapped + check-gcc on x86_64.

gcc/ChangeLog:

* tree-sra.c (completely_scalarize_var): Rename to...
(create_total_scalarization_access): ... Here. Drop call to
completely_scalarize_record.

(analyze_all_variable_accesses): Replace completely_scalarize_var
with create_total_scalarization_access and completely_scalarize_record.
---
 gcc/tree-sra.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 818c290..a0c92b0 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -985,7 +985,7 @@ completely_scalarize_record (tree base, tree decl, 
HOST_WIDE_INT offset,
type_consists_of_records_p.   */
 
 static void
-completely_scalarize_var (tree var)
+create_total_scalarization_access (tree var)
 {
   HOST_WIDE_INT size = tree_to_uhwi (DECL_SIZE (var));
   struct access *access;
@@ -994,8 +994,6 @@ completely_scalarize_var (tree var)
   access->expr = var;
   access->type = TREE_TYPE (var);
   access->grp_total_scalarization = 1;
-
-  completely_scalarize_record (var, var, 0, var);
 }
 
 /* Return true if REF has an VIEW_CONVERT_EXPR somewhere in it.  */
@@ -2529,7 +2527,8 @@ analyze_all_variable_accesses (void)
if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
<= max_scalarization_size)
  {
-   completely_scalarize_var (var);
+   create_total_scalarization_access (var);
+   completely_scalarize_record (var, var, 0, var);
if (dump_file && (dump_flags & TDF_DETAILS))
  {
fprintf (dump_file, "Will attempt to totally scalarize ");
-- 
1.8.3



[PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-25 Thread Alan Lawrence
When SRA completely scalarizes an array, this patch changes the generated 
accesses from e.g.

   MEM[(int[8] *)&a + 4B] = 1;

to

   a[1] = 1;

This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g. 
MEM[(int[8] *)&a] are not hashable_expr_equal_p with accesses to e.g. 
MEM[(int[8] *)&a]. This is necessary for constant propagation in the 
ssa-dom-cse-2.c testcase (after the next patch that makes SRA handle 
constant-pool loads).

I tried to work around this by making dom2's hashable_expr_equal_p less 
conservative, but found that on platforms without AArch64's vectorized 
reductions (specifically Alpha, hppa, PowerPC, and SPARC, mentioned in 
ssa-dom-cse-2.c), I also needed to make MEM[(int[8] *)&a] equivalent to a[0], 
etc.; a complete overhaul of hashable_expr_equal_p seems like a larger task 
than this patch series.

I can't see how to write a testcase for this in C though as direct assignment 
to an array is not possible; such assignments occur only with constant pool 
data, which is dealt with in the next patch.

Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, 
aarch64-none-linux-gnu.

gcc/ChangeLog:

* tree-sra.c (completely_scalarize): Move some code into:
(get_elem_size): New.
(build_ref_for_offset): Build ARRAY_REF if base is aligned array.
---
 gcc/tree-sra.c | 110 -
 1 file changed, 69 insertions(+), 41 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 08fa8dc..af35fcc 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -957,6 +957,20 @@ scalarizable_type_p (tree type)
   }
 }
 
+static bool
+get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out)
+{
+  gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
+  tree t_size = TYPE_SIZE (TREE_TYPE (type));
+  if (!t_size || !tree_fits_uhwi_p (t_size))
+return false;
+  unsigned HOST_WIDE_INT sz = tree_to_uhwi (t_size);
+  if (!sz)
+return false;
+  *sz_out = sz;
+  return true;
+}
+
 static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
 
 /* Create total_scalarization accesses for all scalar fields of a member
@@ -985,10 +999,9 @@ completely_scalarize (tree base, tree decl_type, 
HOST_WIDE_INT offset, tree ref)
 case ARRAY_TYPE:
   {
tree elemtype = TREE_TYPE (decl_type);
-   tree elem_size = TYPE_SIZE (elemtype);
-   gcc_assert (elem_size && tree_fits_uhwi_p (elem_size));
-   int el_size = tree_to_uhwi (elem_size);
-   gcc_assert (el_size);
+   unsigned HOST_WIDE_INT el_size;
+   if (!get_elem_size (decl_type, &el_size))
+ gcc_assert (false);
 
tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
@@ -1563,7 +1576,7 @@ build_ref_for_offset (location_t loc, tree base, 
HOST_WIDE_INT offset,
   tree off;
   tree mem_ref;
   HOST_WIDE_INT base_offset;
-  unsigned HOST_WIDE_INT misalign;
+  unsigned HOST_WIDE_INT misalign, el_sz;
   unsigned int align;
 
   gcc_checking_assert (offset % BITS_PER_UNIT == 0);
@@ -1572,47 +1585,62 @@ build_ref_for_offset (location_t loc, tree base, 
HOST_WIDE_INT offset,
 
   /* get_addr_base_and_unit_offset returns NULL for references with a variable
  offset such as array[var_index].  */
-  if (!base)
-{
-  gassign *stmt;
-  tree tmp, addr;
-
-  gcc_checking_assert (gsi);
-  tmp = make_ssa_name (build_pointer_type (TREE_TYPE (prev_base)));
-  addr = build_fold_addr_expr (unshare_expr (prev_base));
-  STRIP_USELESS_TYPE_CONVERSION (addr);
-  stmt = gimple_build_assign (tmp, addr);
-  gimple_set_location (stmt, loc);
-  if (insert_after)
-   gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
-  else
-   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
-
-  off = build_int_cst (reference_alias_ptr_type (prev_base),
-  offset / BITS_PER_UNIT);
-  base = tmp;
-}
-  else if (TREE_CODE (base) == MEM_REF)
-{
-  off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)),
-  base_offset + offset / BITS_PER_UNIT);
-  off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off);
-  base = unshare_expr (TREE_OPERAND (base, 0));
+  if (base
+  && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE
+  && misalign == 0
+  && get_elem_size (TREE_TYPE (base), &el_sz)
+  && ((offset % el_sz) == 0)
+  && useless_type_conversion_p (exp_type, TREE_TYPE (TREE_TYPE (base)))
+  && (align >= TYPE_ALIGN (exp_type)))
+{
+  tree idx = build_int_cst (TYPE_DOMAIN (TREE_TYPE (base)), offset / 
el_sz);
+  base = unshare_expr (base);
+  mem_ref = build4 (ARRAY_REF, exp_type, base, idx, NULL_TREE, NULL_TREE);
 }
   else
 {
-  off = build_int_cst (reference_alias_ptr_type (base),
-  base_offset + offset / BITS_PER_UNIT);
-  base = build_fold_addr_expr (unshare_expr (base));
-

Re: [PATCH 0/15][ARM/AArch64] Add support for float16_t vectors (v3)

2015-08-25 Thread Alan Lawrence

Alan Lawrence wrote:
All AArch64 patches are unchanged from previous version. However, in response to 
discussion, the ARM patches are changed (much as I suggested 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02249.html); this version:


* Hides the existing vcvt_f16_f32 and vcvt_f32_f16 intrinsics, and 
float16x4_t type, unless we have a scalar __FP16 type (i.e. unless 
-mfp16-format=ieee/alternative is specified on the command line). Although this 
loses us the ability to write code that uses hardware instructions to work with 
either IEEE or Alternative formats according to the FPSCR bit, it is consistent 
with ACLE statements that the vector types (float16x4_t and float16x8_t) should 
only be available if the scalar type is, and that if the scalar type is 
available, then one or other of __ARM_FP16_FORMAT_IEEE or 
__ARM_FP16_FORMAT_ALTERNATIVE should be set.


(Straightforward interpretation of ACLE can be confusing because GCC has made 
the choice of supporting the __FP16 type even when hardware is not available, 
via software conversion routines - the -mfp16-format flag then picking which set 
of sw routines are in use.)


   * Makes all the new intrinsics available, similarly, only if we have a scalar 
__FP16 type. This means that (in contrast to previous versions of this patch 
series) we will not gain the ability to write programs that pass 
half-precision-float values through as "bags of bits".


I considered the alternative of making -mfp16-format default to ieee, but that 
makes the -mfp16-format=alternative option almost unusable, as one cannot link 
object files compiled with different -mfp16-format :(. We could set the default 
to be ieee only when neon-fp16 is specified, but that change is pretty much 
orthogonal to this patch series so can follow independently if desired.


   * To ease testing (including a couple of existing tests), I modified the 
arm_neon_fp16_ok functions in lib/target-supports.exp to try also flags 
specifying -mfp16-format=ieee (if flags without that fail to compile, presumably 
because of the absence of an __FP16 type; however, this still allows an explicit 
-mfp16-format=alternative if desired). On ARM targets, we then pass in 
-mfpu=neon-fp16 and -mfp16-format flags for all tests in advsimd-intrinsics.exp, 
unless these are overridden by an explicit multilib, in which case we will run 
the advsimd-intrinsics tests without the float16 variants (via #if).


Are these patches OK for trunk? If so I will commit along with the 
previously-approved fix to fold-const.c for HFmode, 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00696.html


Bootstrapped on arm-none-linux-gnueabihf (--with-arch=armv7-a --with-fpu=neon 
--with-float=hard), and aarch64-none-linux-gnu; cross-tested arm-none-eabi (a 
number of variants, especially for the advsimd-intrinsics tests in patch 13+14).


Thanks, Alan




Ping the final three patches: 2 * testsuite (aarch64/advsimd-intrinsics/, but 
executed on both platforms, ARM and AArch64 maintainer approval would be 
required if these are OK), and the update to doc/sourcebuild.texi for ARM.




Re: Move remaining flag_unsafe_math_optimizations using simplify and match

2015-08-25 Thread Richard Biener
On Tue, Aug 25, 2015 at 5:29 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> Please find attached the remaining part of patch.
>
> Tested the patch on AArch64 and X86 without any regressions.
>
> Please review the patch and let me know if any modifications are required.

Ok.

Thanks,
Richard.

> Thanks,
> Naveen
>
> ChangeLog
>
> 2015-08-25  Naveen H.S  
>
> * fold-const.c (fold_binary_loc) : Move Optimize
> root(x)*root(y) as root(x*y) to match.pd.
> Move Optimize expN(x)*expN(y) as expN(x+y) to match.pd.
> Move Optimize pow(x,y)*pow(x,z) as pow(x,y+z) to match.pd.
> Move Optimize a/root(b/c) into a*root(c/b) to match.pd.
> Move Optimize x/expN(y) into x*expN(-y) to match.pd.
>
> * match.pd (mult (root:s @0) (root:s @1)): New simplifier.
> (mult (POW:s @0 @1) (POW:s @0 @2)) : New simplifier.
> (mult (exps:s @0) (exps:s @1)) : New simplifier.
> (rdiv @0 (root:s (rdiv:s @1 @2))) : New simplifier.
> (rdiv @0 (exps:s @1)) : New simplifier.


Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-08-25 Thread Andre Vieira

On 25/08/15 10:52, Andrew Pinski wrote:

On Tue, Aug 25, 2015 at 5:50 PM, Andrew Pinski  wrote:

On Tue, Aug 25, 2015 at 5:37 PM, Andre Vieira
 wrote:

Conditional branches have a maximum range of [-1048576, 1048572]. Any
destination further away can not be reached by these.
To be able to have conditional branches in very large functions, we invert
the condition and change the destination to jump over an unconditional
branch to the original, far away, destination.

gcc/ChangeLog:
2015-08-07  Ramana Radhakrishnan  
 Andre Vieira  

 * config/aarch64/aarch64.md (*condjump): Handle functions > 1
 Mib.
 (*cb1): Idem.
 (*tb1): Idem.
 (*cb1): Idem.
 * config/aarch64/iterators.md (inv_cb): New code attribute.
 (inv_tb): Idem.
 * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
 * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.

gcc/testsuite/ChangeLog:
2015-08-07  Andre Vieira  

 * gcc.target/aarch64/long-branch.c: New test.


Just a few comments about the testcase.  You could improve the size
(on disk) of the testcase by using the preprocessor some more:
Something like:
#define CASE_ENTRY2 (x) CASE_ENTRY ((x)) CASE_ENTRY ((x)+1)
#define CASE_ENTRY4 (x) CASE_ENTRY2 ((x)) CASE_ENTRY2 ((x)+2+1)
#define CASE_ENTRY8 (x) CASE_ENTRY4 ((x)) CASE_ENTRY4 ((x)+4+1)
#define CASE_ENTRY16 (x) CASE_ENTRY8 ((x)) CASE_ENTRY8 ((x)+8+1)
#define CASE_ENTRY32 (x) CASE_ENTRY16 ((x)) CASE_ENTRY16 ((x)+16)
#define CASE_ENTRY64 (x) CASE_ENTRY32 ((x)) CASE_ENTRY32 ((x)+32+1)
#define CASE_ENTRY128 (x) CASE_ENTRY64 ((x)) CASE_ENTRY16 ((x)+64+1)
#define CASE_ENTRY256 (x) CASE_ENTRY128 ((x)) CASE_ENTRY128 ((x)+128+1)



I do have an off by one error but you should get the idea.  Basically
instead of 200 lines, we only have 9 lines (log2(256) == 8).

Thanks,
Andrew



And then use
CASE_ENTRY256 (1)

You can do the same trick to reduce the size of CASE_ENTRY too.

Thanks,
Andrew Pinski




Conditional branches have a maximum range of [-1048576, 1048572]. Any 
destination further away can not be reached by these.
To be able to have conditional branches in very large functions, we 
invert the condition and change the destination to jump over an 
unconditional branch to the original, far away, destination.


gcc/ChangeLog:
2015-08-07  Ramana Radhakrishnan  
Andre Vieira  

* config/aarch64/aarch64.md (*condjump): Handle functions > 1 Mib.
(*cb1): Likewise.
(*tb1): Likewise.
(*cb1): Likewise.
* config/aarch64/iterators.md (inv_cb): New code attribute.
(inv_tb): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
* config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.

gcc/testsuite/ChangeLog:
2015-08-07  Andre Vieira  

* gcc.target/aarch64/long_branch_1.c: New test.
From e34022ecd6f914b5a713594ca5b21b33929a3a1f Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Tue, 25 Aug 2015 13:12:11 +0100
Subject: [PATCH] fix for far branches

---
 gcc/config/aarch64/aarch64-protos.h  |  1 +
 gcc/config/aarch64/aarch64.c | 23 ++
 gcc/config/aarch64/aarch64.md| 89 +++
 gcc/config/aarch64/iterators.md  |  6 ++
 gcc/testsuite/gcc.target/aarch64/long_branch_1.c | 91 
 5 files changed, 195 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/long_branch_1.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 4b3cbedbd0a5fa186619e05c0c0b400c8257b1c0..9afb7ef9afadf2b3dfeb24db230829344201deba 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -322,6 +322,7 @@ unsigned aarch64_trampoline_size (void);
 void aarch64_asm_output_labelref (FILE *, const char *);
 void aarch64_cpu_cpp_builtins (cpp_reader *);
 void aarch64_elf_asm_named_section (const char *, unsigned, tree);
+const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
 void aarch64_err_no_fpadvsimd (machine_mode, const char *);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 87bbf6e7988e4ef796c09075ee584822483cbbce..188d0dd555d3d765aff7e78623a4e938497bec3f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -586,6 +586,29 @@ static const char * const aarch64_condition_codes[] =
   "hi", "ls", "ge", "lt", "gt", "le", "al", "nv"
 };
 
+/* Generate code to enable conditional branches in functions over 1 MiB.  */
+const char *
+aarch64_gen_far_branch (rtx * operands, int pos_label, const char * dest,
+			const char * branch_format)
+{
+rtx_code_label * tmp_label = gen_label_rtx ();
+char label_buf[256];
+char buffer[128];
+ASM_GENERATE_INTERNAL_LABEL (label_buf, dest,
+ C

Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"

2015-08-25 Thread Ulrich Weigand
Ian Lance Taylor wrote:
> Hans-Peter Nilsson  writes:
> 
> > * configure.ac: Only compile with -fPIC if the target
> > supports it.
> > * configure: Regenerate.
> 
> This is OK.

I'm now running into the same problem on SPU, but unfortnately
this patch still doesn't fix the problem.

Now, the SPU does not support dynamic loading and the loader
does not support (most) run-time relocations.  There is no
support for shared libraries on the SPU.  On the SPU, all
GCC target libraries are built as static libraries, and
should be compiled without -fPIC.

However, the compiler actually does accept -fPIC.  If the flag is
present, we attempt to generate relocatable code, but only to the
extent the compiler can do that without support for run-time
relocations.  The most significant restriction is that statically
initializing a global variable to a pointer will not work.
(This is useful for some special cases of self-relocating code.
Such code normally can work around this restriction.)

Now, with the patch above, libbacktrace is still compiled with
-fPIC on SPU, but some files do in fact contain just such global
initializers, causing compilation to fail:

gcc-head/src/libbacktrace/elf.c:241:27: error: creating run-time relocation for 
'*.LC2'
 static const char * const debug_section_names[DEBUG_MAX] =
   ^
The other GCC run-time libraries rely on libtool to figure out
that even though -fPIC works, dynamic libraries are still not
supported on the platform, and thus compile everything for
static linking (i.e. without -fPIC).

I'm wondering if we couldn't use the same libtool mechanism here:
if the architecture does not support dynamic linking at all, no
target library will be built as shared library, and thus there is
no need to build libbacktrace with -fPIC either.  (My understanding
is that we need to build libbacktrace with -fPIC because it might
get linked into some other shared target library.)

The libbacktrace configure script actually incorporates all the
libtool init code that makes this determination, and sets the
shell variable "can_build_shared" to "no" on SPU.  Would it be
valid to use this variable in the test whether to use -fPIC?
(I'm not sure which of the many libtool variables are intended
to be used outside, and which are private ...)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite

2015-08-25 Thread Christophe Lyon
On 28 July 2015 at 13:26, Alan Lawrence  wrote:
> This is a respin of
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the
> testsuite for float16 vectors. Relative to the previous version, most of the
> additions to the tests are now within #if..#endif such that they are only
> compiled if we have a scalar __fp16 type (the exception is hfloat16_t: since
> this is actually an integer type, we can define and use it without any
> compiler fp16 support). Also we  try to use add_options_for_arm_neon_fp16
> for all tests (on ARM targets), falling back to add_options_for_arm_neon if
> the previous fails.
>
> Cross-tested on many multilibs, including -march=armv6,
> -march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon,
> -march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon},
> -march=armv7-a/-mfp16-format=alternative .
>
Hi Alan,

It looks OK.
Did you also run the tests on AArch64?

> Note that on bigendian, this requires path at
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00696.html , which I will
> commit at the same time.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
> Set additional_flags for neon-fp16 if supported, else fallback to
> neon.
>
> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
> (hfloat16_t): New.
> (result, expected, clean_results, DECL_VARIABLE_64BITS_VARIANTS,
> DECL_VARIABLE_128BITS_VARIANTS): Add float16x4_t and float16x8_t
> cases
> if supported.
> (CHECK_RESULTS): Redefine using CHECK_RESULTS_NAMED.
> (CHECK_RESULTS_NAMED): Move body to CHECK_RESULTS_NAMED_NO_FP16;
> redefine in terms of CHECK_RESULTS_NAMED_NO_FP16 with float16
> variants
> when those are supported.
> (CHECK_RESULTS_NAMED_NO_FP16, CHECK_RESULTS_NO_FP16): New.
> (vdup_n_f16): New.
>
> * gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h (buffer,
> buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and
> float16x8_t
> cases if supported.
>
> * gcc.target/aarch64/advsimd-intrinsics/vbsl.c (exec_vbsl):
> Use CHECK_RESULTS_NO_FP16 in place of CHECK_RESULTS.
> * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
> (exec_vdup_vmov):
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
> (exec_vdup_lane):
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vext.c (exec_vext):
> Likewise.
>
> * gcc.target/aarch64/advsimd-intrinsics/vcombine.c (expected):
> Add float16x8_t case.
> (main, exec_vcombine): test float16x4_t -> float16x8_t, if
> supported.
> * gcc.target/aarch64/advsimd-intrinsics/vcreate.c (expected,
> main, exec_vcreate): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vget_high (expected,
>  exec_vget_high): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vget_low.c (expected,
> exec_vget_low): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vld1.c (expected,
> exec_vld1):
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c (expected,
> exec_vld1_dup): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c (expected,
> exec_vld1_lane): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vldX.c (expected,
> exec_vldX):
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c (expected,
> exec_vldX_dup): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c (expected,
> exec_vldX_lane): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vset_lane.c (expected,
> exec_vset_lane): Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c (expected,
>  exec_vst1_lane): Likewise.


Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_{f16_f32,f32_f16}

2015-08-25 Thread Christophe Lyon
On 28 July 2015 at 13:27, Alan Lawrence  wrote:
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
> set additional flags for neon-fp16 support.
> * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.

Is that the right version of the patch?

The advsimd-intrinsics.exp part conflicts with patch 13/15.

Am I missing something?

Christophe.


Openacc launch API

2015-08-25 Thread Nathan Sidwell

Jakub,

This patch changes the launch API for openacc parallels.  The current scheme 
passes the launch dimensions as 3 separate parameters to the GOACC_parallel 
function.  This is problematic for a couple of reasons:


1) these must be validated in the host compiler

2) they provide no extension to support a variety of different offload devices 
with different geometry requirements.


This patch changes things so that the function tables emitted by (ptx) 
mkoffloads includes the geometry triplet for each function.  This allows them to 
be validated and/or manipulated in the offload compiler.  However, this only 
works for compile-time known dimensions -- which is a common case.  To deal with 
runtime-computed dimensions we have to retain the host-side compiler's 
calculation and pass that into the GOACC_parallel function.  We change 
GOACC_parallel to take a variadic list of keyed operands ending with a sentinel 
marker.  These keyed operands have a slot for expansion to support multiple 
different offload devices.


We also extend the functionality of the 'oacc function' internal attribute. 
Rather than being a simple marker, it now has a value, which is a TREE_LIST of 
the geometry required.  The geometry is held as INTEGER_CSTs on the TREE_VALUE 
slots.  Runtime-calculated values are represented by an INTEGER_CST of zero. 
We'll also use this representation for  'routines', where the TREE_PURPOSE slot 
will be used to indicate the levels at which a routine might spawn a partitioned 
loop.  Again, to allow future expansion supporting a number of different offload 
devices, this can become a list-of-lists, keyed by and offload device 
identifier.  The offload  compiler can manipulate this data, and a later patch 
will do this within a new oacc-xform pass.


I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide 
a forwarding function. However, as the mkoffload data is incompatible, this is 
probably overkill.  I've had to increment the (just committed) version number to 
detect the change in data representation.  So any attempt to run an old binary 
with a new libgomp will fail at the loading point.  We could simply keep the 
same 'GOACC_parallel' name and not need any new symbols.  WDYT?


ok?

nathan
2015-08-25  Nathan Sidwell  

	inlude/
	* gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment.
	(GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX,
	GOMP_DIM_MASK): New.
	(GOMP_LAUNCH_END, GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC,
	GOMP_LAUNCH_WAIT): New.
	(GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT,
	GOMP_LAUNCH_OP_SHIFT): New.
	(GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE,
	GOMP_LAUNCH_OP): New.
	(GOMP_LAUNCH_OP_MAX): New.

	libgomp/
	* libgomp.h (acc_dispatch_t): Replace separate geometry args with
	array.
	* libgomp.map (GOACC_parallel_keyed): New.
	* oacc-parallel.c (goacc_wait): Take pointer to va_list.  Adjust
	all callers.
	(GOACC_parallel_keyed): New interface.  Lose geometry arguments
	and take keyed varargs list.  Adjust call to exec_func.
	(GOACC_parallel): Forward to GACC_parallel_keyed.
	* libgomp_g.h (GOACC_parallel): Remove.
	(GOACC_parallel_keyed): Declare.
	* plugin/plugin-nvptx.c (struct targ_fn_launch): New struct.
	(stuct targ_gn_descriptor): Replace name field with launch field.
	(nvptx_exec): Lose separate geometry args, take array.  Process
	dynamic dimensions and adjust.
	(struct nvptx_tdata): Replace fn_names field with fn_descs.
	(GOMP_OFFLOAD_load_image): Adjust for change in function table
	data.
	(GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension
	passing.
	* oacc-host.c (host_openacc_exec): Adjust for change in dimension
	passing.

	gcc/
	* config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h.
	(nvptx_record_offload_symbol): Record function execution geometry.
	* config/nvptx/mkoffload.c (process): Include launch geometry in
	function data.
	* omp-low.c (oacc_launch_pack): New.
	(replace_oacc_fn_attrib): New.
	(set_oacc_fn_attrib): New.
	(get_oacc_fn_attrib): New.
	(expand_omp_target): Create keyed varargs for GOACC_parallel call
	generation.
	* omp-low.h (get_oacc_fn_attrib): Declare.
	* builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* tree.h (OMP_CLAUSE_EXPR): New.
	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name.

	gcc/lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/c-family/
	* c-common.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* types.def (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

Index: include/gomp-constants.h
===
--- include/gomp-constants.h	(revision 227137)
+++ include/gomp-constants.h	(working copy)
@@ -115,11 +115,34 @@ enum gomp_map_kind
 
 /* Versions of libgomp and dev

Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite

2015-08-25 Thread Alan Lawrence

Christophe Lyon wrote:

On 28 July 2015 at 13:26, Alan Lawrence  wrote:

This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the
testsuite for float16 vectors. Relative to the previous version, most of the
additions to the tests are now within #if..#endif such that they are only
compiled if we have a scalar __fp16 type (the exception is hfloat16_t: since
this is actually an integer type, we can define and use it without any
compiler fp16 support). Also we  try to use add_options_for_arm_neon_fp16
for all tests (on ARM targets), falling back to add_options_for_arm_neon if
the previous fails.

Cross-tested on many multilibs, including -march=armv6,
-march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon,
-march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon},
-march=armv7-a/-mfp16-format=alternative .


Hi Alan,

It looks OK.
Did you also run the tests on AArch64?


Sorry, yes, I did - aarch64-none-linux-gnu, and aarch64_be-none-elf also.

Thanks, Alan



Re: [PATCH, PR other/67320] Fix wide add standard names

2015-08-25 Thread Joseph Myers
On Tue, 25 Aug 2015, Michael Collison wrote:

> The standard names for signed and unsigned vector wide adds are wrong in the
> documentation.
> 
> OK for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/2] driver: support state cleanup

2015-08-25 Thread Joseph Myers
On Thu, 6 Aug 2015, David Malcolm wrote:

> gcc/ChangeLog:
>   * gcc-main.c (main): Add params to driver ctor.
>   * gcc.c (class env_manager): New.
>   (env): New global.
>   (env_manager::init): New.
>   (env_manager::get): New.
>   (env_manager::xput): New.
>   (env_manager::restore): New.
>   Poison getenv and putenv.
>   (DEFAULT_TARGET_SYSTEM_ROOT): New.
>   (target_system_root): Update initialization to use
>   DEFAULT_TARGET_SYSTEM_ROOT.
>   (struct spec_list): Add field "default_ptr".
>   (INIT_STATIC_SPEC): Initialize new field "default_ptr".
>   (init_spec): Likewise.
>   (set_spec): Clear field "default_ptr".
>   (read_specs): Free "spec" and "buffer".
>   (xputenv): Reimplement in terms of env_manager.
>   (process_command): Replace ::getenv calls with calls to the
>   env_manager singleton.
>   (process_brace_body): Free string in three places.
>   (driver::driver): New.
>   (driver::~driver): New.
>   (used_arg): Convert from a function to...
>   (class used_arg_t): ...this class, and...
>   (used_arg): ...this new global instance.
>   (used_arg_t::finalize): New function.
>   (getenv_spec_function): Add "const" to local "value".  Replace
>   ::getenv call with call to the env_manager singleton.
>   (path_prefix_reset): New function.
>   (driver::finalize): New function.
>   * gcc.h (driver::driver): New.
>   (driver::~driver): New.
>   (driver::finalize): New.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}

2015-08-25 Thread Alan Lawrence
Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has
moved to the previous patch! This version also fixes some whitespace issues.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
* lib/target-supports.exp
(check_effective_target_arm_neon_fp16_hw_ok): New.
---
 .../aarch64/advsimd-intrinsics/vcvt_f16.c  | 98 ++
 gcc/testsuite/lib/target-supports.exp  | 15 
 2 files changed, 113 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
new file mode 100644
index 000..a2cfd38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
@@ -0,0 +1,98 @@
+/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } } 
*/
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include 
+
+/* Expected results for vcvt.  */
+VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170,
+   0x4160, 0x4150 };
+VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
+
+/* Expected results for vcvt_high_f32_f16.  */
+VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130,
+0xc120, 0xc110 };
+/* Expected results for vcvt_high_f16_f32.  */
+VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 
0x4000,
+0xcc00, 0xcb80, 0xcb00, 0xca80 
};
+
+void
+exec_vcvt (void)
+{
+  clean_results ();
+
+#define TEST_MSG vcvt_f32_f16
+  {
+VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
+
+DECL_VARIABLE (vector_src, float, 16, 4);
+
+VLOAD (vector_src, buffer_src, , float, f, 16, 4);
+DECL_VARIABLE (vector_res, float, 32, 4) =
+   vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+  VECT_VAR (vector_res, float, 32, 4));
+
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
+  }
+#undef TEST_MSG
+
+  clean_results ();
+
+#define TEST_MSG vcvt_f16_f32
+  {
+VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
+DECL_VARIABLE (vector_src, float, 32, 4);
+
+VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
+DECL_VARIABLE (vector_res, float, 16, 4) =
+  vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
+vst1_f16 (VECT_VAR (result, float, 16, 4),
+ VECT_VAR (vector_res, float, 16 ,4));
+
+CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  }
+#undef TEST_MSG
+
+#if defined (__aarch64__)
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f32_f16"
+  {
+DECL_VARIABLE (vector_src, float, 16, 8);
+VLOAD (vector_src, buffer, q, float, f, 16, 8);
+DECL_VARIABLE (vector_res, float, 32, 4);
+VECT_VAR (vector_res, float, 32, 4) =
+  vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+  VECT_VAR (vector_res, float, 32, 4));
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, "");
+  }
+#undef TEST_MSG
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f16_f32"
+  {
+DECL_VARIABLE (vector_low, float, 16, 4);
+VDUP (vector_low, , float, f, 16, 4, 2.0);
+
+DECL_VARIABLE (vector_src, float, 32, 4);
+VLOAD (vector_src, buffer, q, float, f, 32, 4);
+
+DECL_VARIABLE (vector_res, float, 16, 8) =
+  vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4),
+VECT_VAR (vector_src, float, 32, 4));
+vst1q_f16 (VECT_VAR (result, float, 16, 8),
+  VECT_VAR (vector_res, float, 16, 8));
+
+CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, "");
+  }
+#endif
+}
+
+int
+main (void)
+{
+  exec_vcvt ();
+  return 0;
+}
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 9aec02d..0a22c95 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2730,6 +2730,21 @@ proc check_effective_target_arm_neon_fp16_ok { } {
check_effective_target_arm_neon_fp16_ok_nocache]
 }
 
+proc check_effective_target_arm_neon_fp16_hw_ok { } {
+if {! [check_effective_target_arm_neon_fp16_ok] } {
+   return 0
+}
+global et_arm_neon_fp16_flags
+check_runtime_nocache arm_neon_fp16_hw_ok {
+   int
+   main (int argc, char **argv)
+   {
+ asm ("vcvt.f32.f16 q1, d0");
+ return 0;
+   }
+} $et_arm_neon_fp16_flags
+}
+
 proc add_options_for_arm_neon_fp16 { flags } {
 if { ! [check_effective_target_arm_neon_fp16_ok] } {
return "$flags"
-- 
1.8.3



Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_{f16_f32,f32_f16}

2015-08-25 Thread Alan Lawrence

Christophe Lyon wrote:

On 28 July 2015 at 13:27, Alan Lawrence  wrote:

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
set additional flags for neon-fp16 support.
* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.


Is that the right version of the patch?

The advsimd-intrinsics.exp part conflicts with patch 13/15.

Am I missing something?

Christophe.



Oh, sorry, thanks for pointing this out. Looks like I reposted the previous 
version, rather than what I'd been testing, which drops the conflicting hunk.


Thanks,
Alan



[nvptx] fix thinko

2015-08-25 Thread Nathan Sidwell
I've committed this.  A thinko regarding what asm_operands wants for an 
argument.  Fixes a couple of tests.


nathan
2015-08-25  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_write_function_decl): Reformat.
	(nvptx_reorg_subreg): Pass insn pattern to asm_operands.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 227128)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -321,7 +321,8 @@ nvptx_write_function_decl (std::stringst
 
   /* Declare argument types.  */
   if ((args != NULL_TREE
-   && !(TREE_CODE (args) == TREE_LIST && TREE_VALUE (args) == void_type_node))
+   && !(TREE_CODE (args) == TREE_LIST
+	&& TREE_VALUE (args) == void_type_node))
   || is_main
   || return_in_mem
   || DECL_STATIC_CHAIN (decl))
@@ -1917,7 +1918,7 @@ nvptx_reorg_subreg (void)
 {
   next = NEXT_INSN (insn);
   if (!NONDEBUG_INSN_P (insn)
-	  || asm_noperands (insn) >= 0
+	  || asm_noperands (PATTERN (insn)) >= 0
 	  || GET_CODE (PATTERN (insn)) == USE
 	  || GET_CODE (PATTERN (insn)) == CLOBBER)
 	continue;


Indirect jumps

2015-08-25 Thread Nathan Sidwell
Ptx is one of those rare (unique?) machines that doesn't  have an indirect 
branch.  optabs  is prepared for such  a target and emits a sorry when an 
indirect branch is needed.  However it then goes on to try and  emit such an 
instruction and ends up ICEing.


Fixed thusly, ok?  (Or is the right solution to define a dummy indirect branch 
in the PTX  md file?)


nathan
2015-08-25  Nathan Sidwell  

	* optabs (emit_indirect_jump): Don't try an emit a jump if the
	target doesn't  have one.

Index: gcc/optabs.c
===
--- gcc/optabs.c	(revision 227128)
+++ gcc/optabs.c	(working copy)
@@ -4488,11 +4488,13 @@ emit_indirect_jump (rtx loc)
 {
   if (!targetm.have_indirect_jump ())
 sorry ("indirect jumps are not available on this target");
-
-  struct expand_operand ops[1];
-  create_address_operand (&ops[0], loc);
-  expand_jump_insn (targetm.code_for_indirect_jump, 1, ops);
-  emit_barrier ();
+  else
+{
+  struct expand_operand ops[1];
+  create_address_operand (&ops[0], loc);
+  expand_jump_insn (targetm.code_for_indirect_jump, 1, ops);
+  emit_barrier ();
+}
 }
 
 


Re: [Patch] Add to the libgfortran/newlib bodge to "detect" ftruncate support in ARM/AArch64/SH

2015-08-25 Thread James Greenhalgh

On Fri, Aug 21, 2015 at 11:05:47AM +0100, James Greenhalgh wrote:
> On Thu, Aug 20, 2015 at 10:50:47AM +0100, Marcus Shawcroft wrote:
> > On 20 August 2015 at 09:31, James Greenhalgh  
> > wrote:
> > >
> > > Hi,
> > >
> > > Steve's patch in 2013 [1] to fix the MIPS newlib/libgfortran build
> > > causes subtle issues for an ARM/AArch64 newlib/libgfortran build. The
> > > problem is that ARM/AArch64 (and SH) define a stub function for
> > > ftruncate, which we would previously have auto-detected, but which is not
> > > part of the hardwiring Steve added.
> > >
> > > Continuing the tradition of building bodge on bodge on bodge, this patch
> > > hardwires HAVE_FTRUNCATE on for ARM/AArch64/SH, which does fix the issue
> > > I was seeing.
> >
> > This is the second breakage I'm aware of due to the introduction of
> > this hardwire code, the first being related to strtold.  My
> > recollection is that it is only the mips target that requires the
> > newlib API hardwiring. Ideally we should rely only on the
> > AC_CHECK_FUNCS_ONCE probe code and avoid the hardwire entirely.
> >
> > Perhaps a better approach for trunk would be something along the lines of:
> >
> > case "${host}--x${with_newlib}" in
> > mips*--xyes)
> >   hardwire_newlib=1;;
> > esac
> > if test "${hardwire_newlib:-0}" -eq 1; then
> >   ... existing AC_DEFINES hardwire code
> > else
> >   ... existing AC_CHECK_FUNCS_ONCE probe code
> > fi
> >
> > In effect limiting the hardwire to just the target which is unable to
> > probe.  For backport to 4.9 and 5 I think James' more conservative
> > patch is probably more appropriate.
> >
> > What do folks think?
>
> (+CC fort...@gcc.gnu.org - who I should have CCed from the start).
>
> This runs in to issues with a newlib build [1] (newlib provides a 'kill'
> symbol for linking, but does not provide a declaration in signal.h, so
> we take a -Werror=implicit-function-declaration).

This is what the patch you suggested would look like.

I've sent a patch to the newlib list [1] which unconditionally declares
'kill'. With that in place, we can then autodetect the presence of the
functions newlib provides. I'd expect that you would need to apply that
newlib patch if you were testing this patch locally.

I've tested this with a build of arm-none-eabi and aarch64-none-elf to
check that I now get HAVE_FTRUNCATE defined, and that the build completes.

OK?

Thanks,
James

---
2015-08-25  James Greenhalgh  

* configure.ac: Auto-detect newlib function support unless we
know there are issues when configuring for a host.
* configure: Regenerate.

---
[1]: https://sourceware.org/ml/newlib/2015/msg00632.html

diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 35a8b39..1e9914c 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -273,8 +273,13 @@ GCC_HEADER_STDINT(gstdint.h)
 
 AC_CHECK_MEMBERS([struct stat.st_blksize, struct stat.st_blocks, struct stat.st_rdev])
 
+case "${host}--x${with_newlib}" in
+  mips*--xyes)
+hardwire_newlib=1;;
+esac
+
 # Check for library functions.
-if test "x${with_newlib}" = "xyes"; then
+if test "${hardwire_newlib:-0}" -eq 1; then
# We are being configured with a cross compiler.  AC_REPLACE_FUNCS
# may not work correctly, because the compiler may not be able to
# link executables.


[PATCH, rs6000] Fix vec_shr define_expand

2015-08-25 Thread Pat Haugen
The following patch fixes the vec_shr expander to do a shift instead of 
a rotate. CPU2006 benchmark 482.sphinx3 recently started failing due to 
this issue.  Bootstrapped and tested on ppc64/ppc64le with no new 
regressions. Ok for trunk? And ok for 4.9/5 (with equivalent change to 
vec_shl expander which exists in those releases) after bootstrap/regtest?


-Pat


2015-08-25  Pat Haugen  

* config/rs6000/vector.md (vec_shr_): Fix to do a shift
instead of a rotate.


gcc/testsuite:
* gcc.target/powerpc/vec-shr.c: New.



Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md	(revision 227041)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -977,6 +977,8 @@ (define_expand "movmisalign"
 ;; General shift amounts can be supported using vsro + vsr. We're
 ;; not expecting to see these yet (the vectorizer currently
 ;; generates only shifts by a whole number of vector elements).
+;; Note that the vec_shr operation is actually defined as 
+;; 'shift toward element 0' so is a shr for LE and shl for BE.
 (define_expand "vec_shr_"
   [(match_operand:VEC_L 0 "vlogical_operand" "")
(match_operand:VEC_L 1 "vlogical_operand" "")
@@ -987,6 +989,7 @@ (define_expand "vec_shr_"
   rtx bitshift = operands[2];
   rtx shift;
   rtx insn;
+  rtx zero_reg, op1, op2;
   HOST_WIDE_INT bitshift_val;
   HOST_WIDE_INT byteshift_val;
 
@@ -996,19 +999,29 @@ (define_expand "vec_shr_"
   if (bitshift_val & 0x7)
 FAIL;
   byteshift_val = (bitshift_val >> 3);
+  zero_reg = gen_reg_rtx(mode);
+  emit_move_insn (zero_reg, CONST0_RTX (mode));
   if (!BYTES_BIG_ENDIAN)
-byteshift_val = 16 - byteshift_val;
+{
+  byteshift_val = 16 - byteshift_val;
+  op1 = zero_reg;
+  op2 = operands[1];
+}
+  else
+{
+  op1 = operands[1];
+  op2 = zero_reg;
+}
+
   if (TARGET_VSX && (byteshift_val & 0x3) == 0)
 {
   shift = gen_rtx_CONST_INT (QImode, byteshift_val >> 2);
-  insn = gen_vsx_xxsldwi_ (operands[0], operands[1], operands[1],
- shift);
+  insn = gen_vsx_xxsldwi_ (operands[0], op1, op2, shift);
 }
   else
 {
   shift = gen_rtx_CONST_INT (QImode, byteshift_val);
-  insn = gen_altivec_vsldoi_ (operands[0], operands[1], operands[1],
-	shift);
+  insn = gen_altivec_vsldoi_ (operands[0], op1, op2, shift);
 }
 
   emit_insn (insn);
Index: gcc/testsuite/gcc.target/powerpc/vec-shr.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-shr.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-shr.c	(working copy)
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-inline" } */
+
+#include 
+
+typedef struct { double r, i; } complex;
+#define LEN 30
+complex c[LEN];
+double d[LEN];
+
+void
+foo (complex *c, double *d, int len1)
+{
+  int i;
+  for (i = 0; i < len1; i++)
+{
+  c[i].r = d[i];
+  c[i].i = 0.0;
+}
+}
+
+int
+main (void)
+{
+  int i;
+  for (i = 0; i < LEN; i++)
+d[i] = (double) i;
+  foo (c, d, LEN);
+  for (i=0;i

Re: [AArch64][TLSLE][1/3] Add the option "-mtls-size" for AArch64

2015-08-25 Thread Jiong Wang

Marcus Shawcroft writes:

> On 19 August 2015 at 15:26, Jiong Wang  wrote:
>
>> 2015-08-19  Jiong Wang  
>>
>> gcc/
>>   * config/aarch64/aarch64.opt (mtls-size): New entry.
>>   * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
>>   (aarch64_override_options_internal): Call initialize_aarch64_tls_size.
>>   * doc/invoke.texi (AArch64 Options): Document -mtls-size.
>>
>> --
>> Regards,
>> Jiong
>>
>
> +case AARCH64_CMODEL_TINY:
> +  /* The maximum TLS size allowed under tiny is 1M.  */
> +  if (aarch64_tls_size > 20)
> + aarch64_tls_size = 20;
>
> The only valid values of aarch64_tls_size handled/expected by the
> remainder of the patch set is 12,24,32,48 so setting the value to 20
> here doesn;t make sense.

Thanks for pointing this out, how about the new patch attached?

2015-08-25  Jiong Wang  

gcc/
  * config/aarch64/aarch64.opt (mtls-size): New entry.
  * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
  (aarch64_override_options_internal): Call initialize_aarch64_tls_size.
  * doc/invoke.texi (AArch64 Options): Document -mtls-size.

commit 36736a1a2133ffc949d3e00efdced8ef2c53cddd
Author: Jiong Wang 
Date:   Tue Aug 25 11:13:44 2015 +0100

1

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 382be2c..318b852 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7492,6 +7492,40 @@ aarch64_parse_one_override_token (const char* token,
   return;
 }
 
+/* A checking mechanism for the implementation of the tls size.  */
+
+static void
+initialize_aarch64_tls_size (struct gcc_options *opts)
+{
+  if (aarch64_tls_size == 0)
+aarch64_tls_size = 24;
+
+  switch (opts->x_aarch64_cmodel_var)
+{
+case AARCH64_CMODEL_TINY:
+  /* Both the default and maximum TLS size allowed under tiny is 1M which
+	 needs two instructions to address, so we clamp the size to 24.  */
+  if (aarch64_tls_size > 24)
+	aarch64_tls_size = 24;
+  break;
+case AARCH64_CMODEL_SMALL:
+  /* The maximum TLS size allowed under small is 4G.  */
+  if (aarch64_tls_size > 32)
+	aarch64_tls_size = 32;
+  break;
+case AARCH64_CMODEL_LARGE:
+  /* The maximum TLS size allowed under large is 16E.
+	 FIXME: 16E should be 64bit, we only support 48bit offset now.  */
+  if (aarch64_tls_size > 48)
+	aarch64_tls_size = 48;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  return;
+}
+
 /* Parse STRING looking for options in the format:
  string	:: option:string
  option	:: name=substring
@@ -7584,6 +7618,7 @@ aarch64_override_options_internal (struct gcc_options *opts)
 }
 
   initialize_aarch64_code_model (opts);
+  initialize_aarch64_tls_size (opts);
 
   aarch64_override_options_after_change_1 (opts);
 }
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 37c2c50..8642bdb 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -96,6 +96,25 @@ mtls-dialect=
 Target RejectNegative Joined Enum(tls_type) Var(aarch64_tls_dialect) Init(TLS_DESCRIPTORS) Save
 Specify TLS dialect
 
+mtls-size=
+Target RejectNegative Joined Var(aarch64_tls_size) Enum(aarch64_tls_size)
+Specifies bit size of immediate TLS offsets.  Valid values are 12, 24, 32, 48.
+
+Enum
+Name(aarch64_tls_size) Type(int)
+
+EnumValue
+Enum(aarch64_tls_size) String(12) Value(12)
+
+EnumValue
+Enum(aarch64_tls_size) String(24) Value(24)
+
+EnumValue
+Enum(aarch64_tls_size) String(32) Value(32)
+
+EnumValue
+Enum(aarch64_tls_size) String(48) Value(48)
+
 march=
 Target RejectNegative ToLower Joined Var(aarch64_arch_string)
 -march=ARCH	Use features of architecture ARCH
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27be317..f990bef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -514,6 +514,7 @@ Objective-C and Objective-C++ Dialects}.
 -mstrict-align @gol
 -momit-leaf-frame-pointer  -mno-omit-leaf-frame-pointer @gol
 -mtls-dialect=desc  -mtls-dialect=traditional @gol
+-mtls-size=@var{size} @gol
 -mfix-cortex-a53-835769  -mno-fix-cortex-a53-835769 @gol
 -mfix-cortex-a53-843419  -mno-fix-cortex-a53-843419 @gol
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
@@ -12409,6 +12410,11 @@ of TLS variables.  This is the default.
 Use traditional TLS as the thread-local storage mechanism for dynamic accesses
 of TLS variables.
 
+@item -mtls-size=@var{size}
+@opindex mtls-size
+Specify bit size of immediate TLS offsets.  Valid values are 12, 24, 32, 48.
+This option depends on binutils higher than 2.25.
+
 @item -mfix-cortex-a53-835769
 @itemx -mno-fix-cortex-a53-835769
 @opindex mfix-cortex-a53-835769


RE: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation

2015-08-25 Thread Wilco Dijkstra
> Richard Henderson wrote:
> On 08/12/2015 08:59 AM, Wilco Dijkstra wrote:
> > I looked at the statistics of AArch64 immediate generation a while ago.
> > The interesting thing is ~95% of calls are queries, and the same query is on
> > average repeated 10 times in a row. So (a) it is not important to cache the
> > expansions, and (b) the high repetition rate means a single-entry cache
> > has a 90% hitrate. We already have a patch for this and could collect stats
> > comparing the approaches. If a single-entry cache can provide a similar
> > benefit as caching all immediates then my preference would be to keep things
> > simple and just cache the last query.
> 
> Interesting.  That's already more detailed investigation than I'd done.  I had
> no idea the queries were so clustered.  I assumed that the queries would be
> scattered across various passes, and so the various constants across the
> function would get checked in sequence.
> 
> I would be very interested in seeing those stats when you've done.

Caching improves average buildtime by 0.1-0.2% - your patch seems to be slightly
faster than caching just 1 query, so that suggests caching a few entries would 
be
beneficial.

However looking at the immediates that are generated by the loops, it's feasible
to avoid linear/quadratic search loops altogether. So I think a generic 
immediate 
caching scheme won't be useful for AArch64.

Wilco




Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}

2015-08-25 Thread Christophe Lyon
On 25 August 2015 at 15:57, Alan Lawrence  wrote:
> Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has
> moved to the previous patch! This version also fixes some whitespace issues.
>

This looks OK to me now, thanks.

> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
> * lib/target-supports.exp
> (check_effective_target_arm_neon_fp16_hw_ok): New.
> ---
>  .../aarch64/advsimd-intrinsics/vcvt_f16.c  | 98 
> ++
>  gcc/testsuite/lib/target-supports.exp  | 15 
>  2 files changed, 113 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
> new file mode 100644
> index 000..a2cfd38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
> @@ -0,0 +1,98 @@
> +/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } 
> } */
> +#include 
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +#include 
> +
> +/* Expected results for vcvt.  */
> +VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170,
> +   0x4160, 0x4150 };
> +VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
> +
> +/* Expected results for vcvt_high_f32_f16.  */
> +VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130,
> +0xc120, 0xc110 };
> +/* Expected results for vcvt_high_f16_f32.  */
> +VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 
> 0x4000,
> +0xcc00, 0xcb80, 0xcb00, 
> 0xca80 };
> +
> +void
> +exec_vcvt (void)
> +{
> +  clean_results ();
> +
> +#define TEST_MSG vcvt_f32_f16
> +  {
> +VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
> +
> +DECL_VARIABLE (vector_src, float, 16, 4);
> +
> +VLOAD (vector_src, buffer_src, , float, f, 16, 4);
> +DECL_VARIABLE (vector_res, float, 32, 4) =
> +   vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
> +vst1q_f32 (VECT_VAR (result, float, 32, 4),
> +  VECT_VAR (vector_res, float, 32, 4));
> +
> +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
> +  }
> +#undef TEST_MSG
> +
> +  clean_results ();
> +
> +#define TEST_MSG vcvt_f16_f32
> +  {
> +VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
> +DECL_VARIABLE (vector_src, float, 32, 4);
> +
> +VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
> +DECL_VARIABLE (vector_res, float, 16, 4) =
> +  vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
> +vst1_f16 (VECT_VAR (result, float, 16, 4),
> + VECT_VAR (vector_res, float, 16 ,4));
> +
> +CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
> +  }
> +#undef TEST_MSG
> +
> +#if defined (__aarch64__)
> +  clean_results ();
> +
> +#define TEST_MSG "vcvt_high_f32_f16"
> +  {
> +DECL_VARIABLE (vector_src, float, 16, 8);
> +VLOAD (vector_src, buffer, q, float, f, 16, 8);
> +DECL_VARIABLE (vector_res, float, 32, 4);
> +VECT_VAR (vector_res, float, 32, 4) =
> +  vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
> +vst1q_f32 (VECT_VAR (result, float, 32, 4),
> +  VECT_VAR (vector_res, float, 32, 4));
> +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, "");
> +  }
> +#undef TEST_MSG
> +  clean_results ();
> +
> +#define TEST_MSG "vcvt_high_f16_f32"
> +  {
> +DECL_VARIABLE (vector_low, float, 16, 4);
> +VDUP (vector_low, , float, f, 16, 4, 2.0);
> +
> +DECL_VARIABLE (vector_src, float, 32, 4);
> +VLOAD (vector_src, buffer, q, float, f, 32, 4);
> +
> +DECL_VARIABLE (vector_res, float, 16, 8) =
> +  vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4),
> +VECT_VAR (vector_src, float, 32, 4));
> +vst1q_f16 (VECT_VAR (result, float, 16, 8),
> +  VECT_VAR (vector_res, float, 16, 8));
> +
> +CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, "");
> +  }
> +#endif
> +}
> +
> +int
> +main (void)
> +{
> +  exec_vcvt ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 9aec02d..0a22c95 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2730,6 +2730,21 @@ proc check_effective_target_arm_neon_fp16_ok { } {
> check_effective_target_arm_neon_fp16_ok_nocache]
>  }
>
> +proc check_effective_target_arm_neon_fp16_hw_ok { } {
> +if {! [check_effective_target_arm_neon_fp16_ok] } {
> +   return 0
> +}
> +global et_arm_neon_fp16_flags
> +check_runtime_nocache arm_neon_fp16_hw_ok {
> +   int

[hsa] Fix omp declare target support

2015-08-25 Thread Martin Jambor
Hi,

it was brought to my attention that omp declare target functions were
not properly translated to HSA functions.  Until the grand shceme with
an IPA pass is complete, this will do.  And having a single predicate
to decide what should be an HSA function cannot be bad.  Committed to
the hsa branch.

Thanks,

Martin


2015-08-25  Martin Jambor  

* hsa.h (hsa_callable_function_p): Declare.
* hsa.c (hsa_callable_function_p): New function.
* hsa-gen.c (gen_hsa_insns_for_call): Use it.
(pass_gen_hsail::execute): Likewise.
---
 gcc/ChangeLog.hsa | 7 +++
 gcc/hsa-gen.c | 5 ++---
 gcc/hsa.c | 9 +
 gcc/hsa.h | 1 +
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog.hsa b/gcc/ChangeLog.hsa
index 4ad8414..922c917 100644
--- a/gcc/ChangeLog.hsa
+++ b/gcc/ChangeLog.hsa
@@ -1,5 +1,12 @@
 2015-08-25  Martin Jambor  
 
+   * hsa.h (hsa_callable_function_p): Declare.
+   * hsa.c (hsa_callable_function_p): New function.
+   * hsa-gen.c (gen_hsa_insns_for_call): Use it.
+   (pass_gen_hsail::execute): Likewise.
+
+2015-08-25  Martin Jambor  
+
* hsa-gen.c (gen_hsa_unaryop_for_builtin): New function.
 
 2015-08-25  Martin Jambor  
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 1e23996..7190dce 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3255,7 +3255,7 @@ gen_hsa_insns_for_call (gimple stmt, hsa_bb *hbb,
  return;
}
 
-  if (lookup_attribute ("hsafunc", DECL_ATTRIBUTES (function_decl)))
+  if (hsa_callable_function_p (function_decl))
 gen_hsa_insns_for_direct_call (stmt, hbb, ssa_map);
   else if (!gen_hsa_insns_for_known_library_call (stmt, hbb, ssa_map))
sorry ("HSA does support only call for functions with 'hsafunc' "
@@ -4102,8 +4102,7 @@ pass_gen_hsail::execute (function *)
   || lookup_attribute ("hsakernel",
   DECL_ATTRIBUTES (current_function_decl)))
 return generate_hsa (true);
-  else if (lookup_attribute ("hsafunc",
-DECL_ATTRIBUTES (current_function_decl)))
+  else if (hsa_callable_function_p (current_function_decl))
 return generate_hsa (false);
   else
 return wrap_all_hsa_calls ();
diff --git a/gcc/hsa.c b/gcc/hsa.c
index 13a2ace..4ad44fe 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -103,6 +103,15 @@ hash_table  
*hsa_global_variable_symbols;
 /* True if compilation unit-wide data are already allocated and initialized.  
*/
 static bool compilation_unit_data_initialized;
 
+/* Return true if FNDECL represents an HSA-callable function.  */
+
+bool
+hsa_callable_function_p (tree fndecl)
+{
+  return lookup_attribute ("hsafunc", DECL_ATTRIBUTES (fndecl))
+|| lookup_attribute ("omp declare target", DECL_ATTRIBUTES (fndecl));
+}
+
 /* Allocate HSA structures that are are used when dealing with different
functions.  */
 
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 3956676..f9bcc80 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -898,6 +898,7 @@ extern struct hsa_function_representation *hsa_cfun;
 extern hash_table  *hsa_global_variable_symbols;
 extern hash_map  *> *hsa_decl_kernel_dependencies;
 extern unsigned hsa_kernel_calls_counter;
+bool hsa_callable_function_p (tree fndecl);
 void hsa_init_compilation_unit_data (void);
 void hsa_deinit_compilation_unit_data (void);
 bool hsa_machine_large_p (void);
-- 
2.4.6



Re: [Patch] Add to the libgfortran/newlib bodge to "detect" ftruncate support in ARM/AArch64/SH

2015-08-25 Thread FX
> 2015-08-25  James Greenhalgh  
> 
>   * configure.ac: Auto-detect newlib function support unless we
>   know there are issues when configuring for a host.
>   * configure: Regenerate.

Thanks for CC’ing the fortran list.

Given that this is newlib-specific code, even though it’s in libgfortran 
configury, you should decide and commit what’s best. I don’t think we have any 
newlib expert in the Fortran maintainers.

Wait for 48 hours to see if anyone else objects, though.

Cheers,
FX

[gomp4] another routine test

2015-08-25 Thread Nathan Sidwell

I've committed this test to check 2-dimensional loops inside a routine.

nathan
2015-08-24  Nathan Sidwell  

	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: New.

Index: testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
===
--- testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c	(revision 0)
+++ testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c	(revision 0)
@@ -0,0 +1,75 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O1" } */
+
+#include 
+#include 
+
+#define NUM_WORKERS 16
+#define NUM_VECTORS 32
+#define WIDTH 64
+#define HEIGHT 32
+
+#define WORK_ID(I,N)		\
+  (acc_on_device (acc_device_nvidia)\
+   ? ({unsigned __r;		\
+   __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (__r));	\
+   __r; }) : (I % N))
+#define VEC_ID(I,N)		\
+  (acc_on_device (acc_device_nvidia)\
+   ? ({unsigned __r;		\
+   __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (__r));	\
+   __r; }) : (I % N))
+
+#pragma acc routine worker
+void __attribute__ ((noinline))
+  WorkVec (int *ptr, int w, int h, int nw, int nv)
+{
+#pragma acc loop worker
+  for (int i = 0; i < h; i++)
+#pragma acc loop vector
+for (int j = 0; j < w; j++)
+  ptr[i*w + j] = (WORK_ID (i, nw) << 8) | VEC_ID(j, nv);
+}
+
+int DoWorkVec (int nw)
+{
+  int ary[HEIGHT][WIDTH];
+  int err = 0;
+
+  for (int ix = 0; ix != HEIGHT; ix++)
+for (int jx = 0; jx != WIDTH; jx++)
+  ary[ix][jx] = 0xdeadbeef;
+
+  printf ("spawning %d ...", nw); fflush (stdout);
+  
+#pragma acc parallel num_workers(nw) vector_length (NUM_VECTORS) copy (ary)
+  {
+WorkVec ((int *)ary, WIDTH, HEIGHT, nw, NUM_VECTORS);
+  }
+
+  for (int ix = 0; ix != HEIGHT; ix++)
+for (int jx = 0; jx != WIDTH; jx++)
+  {
+	int exp = ((ix % nw) << 8) | (jx % NUM_VECTORS);
+	
+	if (ary[ix][jx] != exp)
+	  {
+	printf ("\nary[%d][%d] = %#x expected %#x", ix, jx,
+		ary[ix][jx], exp);
+	err = 1;
+	  }
+  }
+  printf (err ? " failed\n" : " ok\n");
+  
+  return err;
+}
+
+int main ()
+{
+  int err = 0;
+
+  for (int W = 1; W <= NUM_WORKERS; W <<= 1)
+err |= DoWorkVec (W);
+
+  return err;
+}


Expand comment on struct switchstr

2015-08-25 Thread Joseph Myers
This patch, extracted from a larger change on gomp-4_0-branch, expands
a comment documenting struct switchstr in gcc.c.

Committed.

2015-08-25  Thomas Schwinge  
Joseph Myers  

* gcc.c (struct switchstr): Expand comment.

Index: gcc.c
===
--- gcc.c   (revision 227171)
+++ gcc.c   (working copy)
@@ -3068,10 +3068,15 @@
SWITCH_LIVE to indicate this switch is true in a conditional spec.
SWITCH_FALSE to indicate this switch is overridden by a later switch.
SWITCH_IGNORE to indicate this switch should be ignored (used in %

Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-08-25 Thread Joseph Myers
On reviewing in more detail the changes to pass offloading targets
from the driver to libgomp at link time to identify the minimal
self-contained pieces that can go to trunk, I found that the use of
fnmatch to match against target names was completely unnecessary; the
ISO C90 functions strstr and strncmp could be used instead, so
avoiding the need to add configure tests for fnmatch.  This patch duly
removes the use of and configure tests for fnmatch.

Will commit to gomp-4_0-branch subject to test results.

2015-08-25  Joseph Myers  

* plugin/configfrag.ac: Don't test for fnmatch.h or fnmatch.
* configure, config.h.in: Regenerate.
* target.c [PLUGIN_SUPPORT]: Don't include .
(offload_target_to_plugin_name): Use strstr and strncmp instead of
fnmatch.

Index: libgomp/config.h.in
===
--- libgomp/config.h.in (revision 227169)
+++ libgomp/config.h.in (working copy)
@@ -24,12 +24,6 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_DLFCN_H
 
-/* Define to 1 if you have the `fnmatch' function. */
-#undef HAVE_FNMATCH
-
-/* Define to 1 if you have the  header file. */
-#undef HAVE_FNMATCH_H
-
 /* Define to 1 if you have the `getloadavg' function. */
 #undef HAVE_GETLOADAVG
 
Index: libgomp/target.c
===
--- libgomp/target.c(revision 227169)
+++ libgomp/target.c(working copy)
@@ -41,7 +41,6 @@
 
 #ifdef PLUGIN_SUPPORT
 #include 
-#include 
 #include "plugin-suffix.h"
 #endif
 
@@ -1271,9 +1270,9 @@
 static const char *
 offload_target_to_plugin_name (const char *offload_target)
 {
-  if (fnmatch ("*-intelmic*", offload_target, 0) == 0)
+  if (strstr (offload_target, "-intelmic") != NULL)
 return "intelmic";
-  if (fnmatch ("nvptx*", offload_target, 0) == 0)
+  if (strncmp (offload_target, "nvptx", 5) == 0)
 return "nvptx";
   gomp_fatal ("Unknown offload target: %s", offload_target);
 }
Index: libgomp/configure
===
--- libgomp/configure   (revision 227169)
+++ libgomp/configure   (working copy)
@@ -15119,33 +15119,6 @@
 offload_targets=
 
 plugin_support=yes
-for ac_header in fnmatch.h
-do :
-  ac_fn_c_check_header_mongrel "$LINENO" "fnmatch.h" "ac_cv_header_fnmatch_h" 
"$ac_includes_default"
-if test "x$ac_cv_header_fnmatch_h" = x""yes; then :
-  cat >>confdefs.h <<_ACEOF
-#define HAVE_FNMATCH_H 1
-_ACEOF
-
-else
-  plugin_support=no
-fi
-
-done
-
-for ac_func in fnmatch
-do :
-  ac_fn_c_check_func "$LINENO" "fnmatch" "ac_cv_func_fnmatch"
-if test "x$ac_cv_func_fnmatch" = x""yes; then :
-  cat >>confdefs.h <<_ACEOF
-#define HAVE_FNMATCH 1
-_ACEOF
-
-else
-  plugin_support=no
-fi
-done
-
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl" >&5
 $as_echo_n "checking for dlsym in -ldl... " >&6; }
 if test "${ac_cv_lib_dl_dlsym+set}" = set; then :
Index: libgomp/plugin/configfrag.ac
===
--- libgomp/plugin/configfrag.ac(revision 227169)
+++ libgomp/plugin/configfrag.ac(working copy)
@@ -29,8 +29,6 @@
 offload_targets=
 AC_SUBST(offload_targets)
 plugin_support=yes
-AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no])
-AC_CHECK_FUNCS([fnmatch], , [plugin_support=no])
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, rs6000] Fix vec_shr define_expand

2015-08-25 Thread David Edelsohn
On Tue, Aug 25, 2015 at 10:14 AM, Pat Haugen
 wrote:
> The following patch fixes the vec_shr expander to do a shift instead of a
> rotate. CPU2006 benchmark 482.sphinx3 recently started failing due to this
> issue.  Bootstrapped and tested on ppc64/ppc64le with no new regressions. Ok
> for trunk? And ok for 4.9/5 (with equivalent change to vec_shl expander
> which exists in those releases) after bootstrap/regtest?
>
> -Pat
>
>
> 2015-08-25  Pat Haugen  
>
> * config/rs6000/vector.md (vec_shr_): Fix to do a shift
> instead of a rotate.
>
> gcc/testsuite:
> * gcc.target/powerpc/vec-shr.c: New.

This is okay.

As Peter and I noticed

+  zero_reg = gen_reg_rtx(mode);

This needs a space after gen_rtx_rtx.

Thanks, David


PING: PATCH: Mention --enable-default-pie in gcc-6/changes.html

2015-08-25 Thread H.J. Lu
On Thu, May 28, 2015 at 6:49 AM, H.J. Lu  wrote:
> OK to install?
>
> H.J.
> ---
> Index: gcc-6/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
> retrieving revision 1.10
> diff -u -p -r1.10 changes.html
> --- gcc-6/changes.html  26 May 2015 10:12:08 -  1.10
> +++ gcc-6/changes.html  28 May 2015 13:49:00 -
> @@ -140,8 +140,12 @@ enum {
>
>
>  
> -
> +Other significant improvements
>
> +  
> +Added --enable-default-pie configure option to
> +   generate PIE by default.
> +  
>
>  
>  

PING.

-- 
H.J.


Re: [testsuite] Clean up effective_target cache

2015-08-25 Thread Mike Stump
On Aug 25, 2015, at 1:14 AM, Christophe Lyon  wrote:
> Some subsets of the tests override ALWAYS_CXXFLAGS or
> TEST_ALWAYS_FLAGS and perform effective_target support tests using
> these modified flags.

> This patch adds a new function 'clear_effective_target_cache', which
> is called at the end of every .exp file which overrides
> ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS.

So, a simple English directive somewhere that says, if one changes 
ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS then they should do a 
clear_effective_target_cache at the end as the target cache can make decisions 
based upon the flags, and those decisions need to be redone when the flags 
change would be nice.

I do wonder, do we need to reexamine when setting the flags?  I’m thinking of a 
sequence like: non-thumb default, is_thumb, set flags (thumb), is_thumb.  
Anyway, safe to punt this until someone discovers it or is reasonable sure it 
happens.

Anyway, all looks good.  Ok.

> However, I noticed that lib/g++.exp changes ALWAYS_CXXFLAGS, but does
> not appear to restore it. In doubt, I didn't change it.

Yeah, I examined it.  It seems like it might not matter, as anyone setting and 
unsetting would come in cleared, and if they didn’t, it should be roughly the 
same exact state, meaning, no clearing necessary.  I think it is safe to punt 
this until someone finds a bug or can see a way that it would matter.  I also 
don’t think it would hurt to clear, if someone wanted to refactor the code a 
bit and make the clearing and the cleanup a little more automatic.  I’m 
thinking of a RAII style code in which the dtor runs the clear.  Not sure if 
that is even possible in tcl.  [ checking ] Nope, maybe not.  Oh well.

[PATCH] Don't ICE on invalid weak decl (PR middle-end/67330)

2015-08-25 Thread Marek Polacek
Here we are ICEing on an invalid code: symtab_node::get asserts that it's
dealing with a function or a static or external variable, but an invalid
decl is rejected too late.  So don't try to mark_weak an invalid decl and
also don't duplicate the "declared weak after being used" check -- that is
already in mark_weak.
Perhaps we should also punt if (!TARGET_SUPPORTS_WEAK)?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-08-25  Marek Polacek  

PR middle-end/67330
* varasm.c (declare_weak): Return after giving an error.

* c-common.c (handle_weak_attribute): Don't check whether the
visibility can be changed here.

* gcc.dg/weak/weak-18.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index ff502e5..7691035 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -8328,12 +8328,7 @@ handle_weak_attribute (tree *node, tree name,
   return NULL_TREE;
 }
   else if (VAR_OR_FUNCTION_DECL_P (*node))
-{
-  struct symtab_node *n = symtab_node::get (*node);
-  if (n && n->refuse_visibility_changes)
-   error ("%+D declared weak after being used", *node);
-  declare_weak (*node);
-}
+declare_weak (*node);
   else
 warning (OPT_Wattributes, "%qE attribute ignored", name);
 
diff --git gcc/testsuite/gcc.dg/weak/weak-18.c 
gcc/testsuite/gcc.dg/weak/weak-18.c
index e69de29..ebeb4d5 100644
--- gcc/testsuite/gcc.dg/weak/weak-18.c
+++ gcc/testsuite/gcc.dg/weak/weak-18.c
@@ -0,0 +1,9 @@
+/* PR middle-end/67330 */
+/* { dg-do compile } */
+/* { dg-require-weak "" } */
+
+void
+f (void)
+{
+  __attribute__ ((weak)) int a; /* { dg-error "weak declaration of .a. must be 
public" } */
+}
diff --git gcc/varasm.c gcc/varasm.c
index 7fa2e7b..d9290a1 100644
--- gcc/varasm.c
+++ gcc/varasm.c
@@ -5403,7 +5403,10 @@ declare_weak (tree decl)
 {
   gcc_assert (TREE_CODE (decl) != FUNCTION_DECL || !TREE_ASM_WRITTEN (decl));
   if (! TREE_PUBLIC (decl))
-error ("weak declaration of %q+D must be public", decl);
+{
+  error ("weak declaration of %q+D must be public", decl);
+  return;
+}
   else if (!TARGET_SUPPORTS_WEAK)
 warning (0, "weak declaration of %q+D not supported", decl);
 

Marek


Re: [libgfortran,patch] Remove never-used debugging code

2015-08-25 Thread FX
Turns out I missed some of the dead code. And I now also fixed comments and 
some formatting.
libgfortran/runtime/environ.c is now much more readable than before.
The patch is still a no-op, in terms of user functionality.
OK to commit to trunk?

FX




unusedcode.ChangeLog
Description: Binary data


unusedcode.diff
Description: Binary data


[nvptx] More gcc testsuite markup

2015-08-25 Thread Nathan Sidwell

I've committed this to markup more test requirements.  Most are obvious enough.

nvptx doesn't expose a normal stack, so stack-based tests fail.  It also 
requires correct typing on function calls, so lying about that results in 
assembler errors.  Finally, it doesn't accept string constants, requiring 
expansion to an array of ints.


nathan
2015-08-25  Nathan Sidwell  

	* gcc.dg/20001117-1.c: Needs return_address.
	* gcc.dg/20020415-1.c: Needs alloca.
	* gcc.dg/graphite/id-pr44676.c: Needs profiling.
	* gcc.dg/graphite/pr60979.c: Needs nonlocal_goto
	* gcc.dg/pr63186.c: Needs label_values.
	* gcc.dg/torture/pr33848.c: Likwise.

	* lib/target-supports.exp (check_effective_target_fopenacc,
	check_effective_target_fopenmp): Disable for nvptx.
	* gcc.dg/graphite/run-id-pr47653.c: Disable for nvptx.
	* gcc.dg/stack-usage-1.c: Likewise.
	* gcc.dg/stack-usage-2.c: Likewise.
	* gcc.dg/unused-5.c: Likewise.
	* gcc.dg/unwind-1.c: Likewise.

Index: gcc.dg/20001117-1.c
===
--- gcc.dg/20001117-1.c	(revision 227166)
+++ gcc.dg/20001117-1.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -finstrument-functions" } */
+/* { dg-require-effective-target return_address } */
 
 extern void abort (void);
 extern void exit (int);
Index: gcc.dg/20020415-1.c
===
--- gcc.dg/20020415-1.c	(revision 227166)
+++ gcc.dg/20020415-1.c	(working copy)
@@ -1,9 +1,9 @@
 /* PR target/6303
This testcase ICEd because s390 did not define
ASM_SIMPLIFY_DWARF_ADDR hook.  */
-/* { dg-require-effective-target alloca } */
 /* { dg-do compile { target fpic } } */
 /* { dg-options "-O2 -fpic -g" } */
+/* { dg-require-effective-target alloca } */
 
 static inline char *
 bar (unsigned long x, char *y)
Index: gcc.dg/graphite/id-pr44676.c
===
--- gcc.dg/graphite/id-pr44676.c	(revision 227166)
+++ gcc.dg/graphite/id-pr44676.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-options "-O2 -fgraphite-identity -fprofile-generate" } */
+/* { dg-require-profiling "-fprofile-generate" } */
 
 int
 extend_options (int h, int map, int x, int y, int dx)
Index: gcc.dg/graphite/pr60979.c
===
--- gcc.dg/graphite/pr60979.c	(revision 227166)
+++ gcc.dg/graphite/pr60979.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-options "-O -fgraphite-identity" } */
+/* { dg-require-effective-target nonlocal_goto } */
 
 #include 
 
Index: gcc.dg/graphite/run-id-pr47653.c
===
--- gcc.dg/graphite/run-id-pr47653.c	(revision 227166)
+++ gcc.dg/graphite/run-id-pr47653.c	(working copy)
@@ -1,4 +1,6 @@
 /* { dg-options "-O -fstack-check=generic -ftree-pre -fgraphite-identity" } */
+/* nvptx doesn't expose a stack.  */
+/* { dg-skip-if "" { nvptx-*-* } { "*" } { "" } } */
 
 int main ()
 {
Index: gcc.dg/pr63186.c
===
--- gcc.dg/pr63186.c	(revision 227166)
+++ gcc.dg/pr63186.c	(working copy)
@@ -1,5 +1,7 @@
 /* { dg-do link } */
 /* { dg-options "-O2" } */
+/* { dg-require-effective-target label_values } */
+
 void *a;
 int b, c, d;
 
Index: gcc.dg/stack-usage-1.c
===
--- gcc.dg/stack-usage-1.c	(revision 227166)
+++ gcc.dg/stack-usage-1.c	(working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fstack-usage" } */
+/* nvptx doesn't have a reg allocator, and hence no stack usage data.  */
+/* { dg-skip-if "" { nvptx-*-* } { "*" } { "" } } */
 
 /* This is aimed at testing basic support for -fstack-usage in the back-ends.
See the SPARC back-end for example (grep flag_stack_usage_info in sparc.c).
Index: gcc.dg/stack-usage-2.c
===
--- gcc.dg/stack-usage-2.c	(revision 227166)
+++ gcc.dg/stack-usage-2.c	(working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-Wstack-usage=512" } */
+/* nvptx gets very upset with mismatched function types.  */
+/* { dg-skip-if "" { nvptx-*-* } { "*" } { "" } } */
 
 int foo1 (void)  /* { dg-bogus "stack usage" } */
 {
Index: gcc.dg/torture/pr33848.c
===
--- gcc.dg/torture/pr33848.c	(revision 227166)
+++ gcc.dg/torture/pr33848.c	(working copy)
@@ -1,4 +1,3 @@
-/* { dg-require-effective-target label_values } */
 /* &&foo should be hoisted, but on most targets, excess register pressure
forces it to be rematerialized before "data != &&foo".  On targets that
have a "branch if registers are equal" instruction, this leads to the
@@ -7,6 +6,7 @@
say that &&foo was the target of the branch, and the real target would
then be removed as dead.  */
 /* { dg-do link } */
+/* { dg-require-effective-target label_values } */
 #define NVARS 3

[libgfortran,committed] Fix default SIGN mode on preconnected/internal units

2015-08-25 Thread FX
Preconnected and internal units currently have their sign mode set to 
SIGN_SUPPRESS, rather than the logical value of SIGN_UNSPECIFIED. This does not 
matter in most cases, since our chosen processor-dependent behavior is to 
suppress optional plus signs anyway…

… except when one tries to override the default behavior with environment 
variable GFORTRAN_OPTIONAL_PLUS, which is thus currently broken on internal and 
preconnected units. Take the following code:

  character(len=20) :: s
  print *, 42.
  write(s,"(G0)") 42.
  print *, s
  end

without the patch, run with “GFORTRAN_OPTIONAL_PLUS=y”, it will still output:

   42.000
 42.000  

while with the patch, it will now correctly output (with 
GFORTRAN_OPTIONAL_PLUS=y):

  +42.000
 +42.000 


I regtested on x86_64-apple-darwin15, and committed as trivial.
(I couldn’t come up with a way to figure out how to test that in the testuite, 
though.)

FX





sign.ChangeLog
Description: Binary data


sign.diff
Description: Binary data


[PATCH] Update wwwdocs for --with-advance-toolchain=at

2015-08-25 Thread Michael Meissner
I installed the following patch on wwwdocs to document the
--with-advance-toolchain= option I added in June:

2015-08-25  Michael Meissner  

* changes.html (PowerPC options): Document new configure option
--with-advance-toolchain=at.

Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.21
diff -p -c -r1.21 changes.html
*** htdocs/gcc-6/changes.html   13 Aug 2015 12:32:16 -  1.21
--- htdocs/gcc-6/changes.html   25 Aug 2015 16:38:13 -
*** enum {
*** 182,187 
--- 182,195 
  
  
  
+   
+ A new configuration option 
---with-advance-toolchain=at
+ was added for PowerPC 64-bit Linux systems to use the header files, 
library
+ files, and the dynamic linker from a specific Advance Toolchain release
+ instead of the default versions that are provided by the Linux
+ distribution.  In general, this option is intended for the developers of
+ GCC, and it is not intended for general use.
+   
  
  
  

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [libgfortran,patch] Remove never-used debugging code

2015-08-25 Thread Steve Kargl
On Tue, Aug 25, 2015 at 06:17:13PM +0200, FX wrote:
> Turns out I missed some of the dead code. And I now also fixed comments and 
> some formatting.
> libgfortran/runtime/environ.c is now much more readable than before.
> The patch is still a no-op, in terms of user functionality.
> OK to commit to trunk?
> 

Certainly, the dead code can go.  But,is this changing
the library ABI?

troutmask:fvwm:kargl[764] nm /mnt/sgk/work/6/lib/libgfortran.a | grep show_
 T _gfortrani_show_variables
 t show_boolean
 t show_integer
 t show_sep
 t show_string
 T _gfortrani_show_locus




-- 
Steve


Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"

2015-08-25 Thread Hans-Peter Nilsson
TL;DR: See last...

> From: Ulrich Weigand 
> Date: Tue, 25 Aug 2015 14:59:05 +0200

> However, the compiler actually does accept -fPIC.  If the flag is
> present, we attempt to generate relocatable code, but only to the
> extent the compiler can do that without support for run-time
> relocations.  The most significant restriction is that statically
> initializing a global variable to a pointer will not work.
> (This is useful for some special cases of self-relocating code.
> Such code normally can work around this restriction.)

Still, things like that is why I chose to emit a hard error for
-fPIC/-fpic where it's not supported for *all* code...


> Now, with the patch above, libbacktrace is still compiled with
> -fPIC on SPU, but some files do in fact contain just such global
> initializers, causing compilation to fail:
> 
> gcc-head/src/libbacktrace/elf.c:241:27: error: creating run-time relocation 
> for '*.LC2'
>  static const char * const debug_section_names[DEBUG_MAX] =
>^
> The other GCC run-time libraries rely on libtool to figure out
> that even though -fPIC works, dynamic libraries are still not
> supported on the platform, and thus compile everything for
> static linking (i.e. without -fPIC).

That's not what I see, at least not the "figuring out" part.
(They mostly use libtool as-is; some test tuples, but some test
version-script support and add it then.)

> I'm wondering if we couldn't use the same libtool mechanism here:
> if the architecture does not support dynamic linking at all, no
> target library will be built as shared library, and thus there is
> no need to build libbacktrace with -fPIC either.  (My understanding
> is that we need to build libbacktrace with -fPIC because it might
> get linked into some other shared target library.)

Yes, that's what the comment in the patch context says, as
happens for libgfortran.

> The libbacktrace configure script actually incorporates all the
> libtool init code that makes this determination, and sets the
> shell variable "can_build_shared" to "no" on SPU.  Would it be
> valid to use this variable in the test whether to use -fPIC?
> (I'm not sure which of the many libtool variables are intended
> to be used outside, and which are private ...)

I momentarily pondered this too, when I found the libtool
PIC-test-code grepping libtool/configure, but I chose the
simpler TRY_COMPILE test partly for the same
maybe-internal-variable reason.  A visit to the libtool
documentation shows can_build_shared is for some reason not
listed among the documented variables and a STFW doesn't yield
more information for the first few pages of hits (just some
indexed random libtool copies).  On the other hand, libtool.m4
is in the top directory, so we know if we switch to some version
without can_build_shared.

I'll leave that to you to sort out, but if you chose to use
$can_build_shared, consider also setting PIC_FLAG to $pic_flag
(instead of plain -fPIC).  In the meantime I'll commit my patch
as it solves *some* of the breakage; for targets erroring on -fPIC.

...but reading the libtool documention I think I found a much
better solution: Let's just add -prefer-pic when compiling
libbacktrace.  It leaves everything to libtool.  Can you please
test this?

libbacktrace:
* configure.ac: Use libtool option -prefer-pic, not -fPIC.
* configure: Regenerate.

diff -upr /expvol/pp_slask/hp/checkout/gcchead/gcc/libbacktrace/configure.ac 
libbacktrace/configure.ac
--- libbacktrace/configure.ac   2015-05-29 17:23:20.0 +0200
+++ libbacktrace/configure.ac   2015-08-24 17:31:18.0 +0200
@@ -163,10 +163,11 @@ fi

 # When building as a target library, shared libraries may want to link
 # this in.  We don't want to provide another shared library to
-# complicate dependencies.  Instead, we just compile with -fPIC.
+# complicate dependencies.  Instead, we prefer PIC, if the target
+# supports that through libtool.
 PIC_FLAG=
 if test -n "${with_target_subdir}"; then
-  PIC_FLAG=-fPIC
+  PIC_FLAG=-prefer-pic
 fi
 # Similarly, use -fPIC with --enable-host-shared:
 AC_ARG_ENABLE(host-shared,

brgds, H-P


[PATCH] rs6000: Fix PR67344

2015-08-25 Thread Segher Boessenkool
The "*and3_imm_dot_shifted" pattern is a define_insn_and_split,
like most "dot" patterns: if its output is not assigned cr0 but some
other cr reg, it splits to a non-dot insn and a compare.

Unfortunately that non-dot insn will clobber cr0 as well.  We could
add another clobber (with "=X,x"), but then that second alternative
is never useful; instead, just remove that second alternative.

Bootstrapped and tested on powerpc64-linux; is this okay for trunk?


Segher


2015-08-25  Segher Boessenkool  

PR target/67344
* config/rs6000/rs6000.md (*and3_imm_dot_shifted): Change to
a define_insn, remove second alternative.

---
 gcc/config/rs6000/rs6000.md | 29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 527ad98..2138184 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3037,15 +3037,15 @@ (define_insn_and_split "*and3_imm_mask_dot2"
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
-(define_insn_and_split "*and3_imm_dot_shifted"
-  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y")
+(define_insn "*and3_imm_dot_shifted"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x")
(compare:CC
  (and:GPR
-   (lshiftrt:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,r")
- (match_operand:SI 4 "const_int_operand" "n,n"))
-   (match_operand:GPR 2 "const_int_operand" "n,n"))
+   (lshiftrt:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r")
+ (match_operand:SI 4 "const_int_operand" "n"))
+   (match_operand:GPR 2 "const_int_operand" "n"))
  (const_int 0)))
-   (clobber (match_scratch:GPR 0 "=r,r"))]
+   (clobber (match_scratch:GPR 0 "=r"))]
   "logical_const_operand (GEN_INT (UINTVAL (operands[2])
   << INTVAL (operands[4])),
  DImode)
@@ -3054,23 +3054,10 @@ (define_insn_and_split "*and3_imm_dot_shifted"
&& rs6000_gen_cell_microcode"
 {
   operands[2] = GEN_INT (UINTVAL (operands[2]) << INTVAL (operands[4]));
-  if (which_alternative == 0)
-return "andi%e2. %0,%1,%u2";
-  else
-return "#";
+  return "andi%e2. %0,%1,%u2";
 }
-  "&& reload_completed && cc_reg_not_cr0_operand (operands[3], CCmode)"
-  [(set (match_dup 0)
-   (and:GPR (lshiftrt:GPR (match_dup 1)
-  (match_dup 4))
-(match_dup 2)))
-   (set (match_dup 3)
-   (compare:CC (match_dup 0)
-   (const_int 0)))]
-  ""
   [(set_attr "type" "logical")
-   (set_attr "dot" "yes")
-   (set_attr "length" "4,8")])
+   (set_attr "dot" "yes")])
 
 
 (define_insn "and3_mask"
-- 
1.8.1.4



[PATCH] rs6000: Fix PR67346

2015-08-25 Thread Segher Boessenkool
"*ior_mask" is a define_insn_and_split, so it shouldn't use
can_create_pseudo in its instruction condition, because IRA can then
create such an instruction, and the condition becomes false before
the insn is split.  Use a scratch instead.

Bootstrapped and tested on powerpc64-linux; okay for trunk?


Segher


2015-08-25  Segher Boessenkool  

PR target/67346
* config/rs6000/rs6000.md (*ior_mask): Use a match_scratch.

---
 gcc/config/rs6000/rs6000.md | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2138184..1710b5e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3652,9 +3652,9 @@ (define_split
 (define_insn_and_split "*ior_mask"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(ior:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-(match_operand:GPR 2 "const_int_operand" "n")))]
-  "can_create_pseudo_p ()
-   && !logical_const_operand (operands[2], mode)
+(match_operand:GPR 2 "const_int_operand" "n")))
+   (clobber (match_scratch:GPR 3 "=r"))]
+  "!logical_const_operand (operands[2], mode)
&& rs6000_is_valid_mask (operands[2], NULL, NULL, mode)"
   "#"
   "&& 1"
@@ -3669,7 +3669,8 @@ (define_insn_and_split "*ior_mask"
 {
   int nb, ne;
   rs6000_is_valid_mask (operands[2], &nb, &ne, mode);
-  operands[3] = gen_reg_rtx (mode);
+  if (GET_CODE (operands[3]) == SCRATCH)
+operands[3] = gen_reg_rtx (mode);
   operands[4] = GEN_INT (ne);
   operands[5] = GEN_INT (~UINTVAL (operands[2]));
 }
-- 
1.8.1.4



Re: [libgfortran,patch] Remove never-used debugging code

2015-08-25 Thread FX
> Certainly, the dead code can go.  But,is this changing the library ABI?
> 
> troutmask:fvwm:kargl[764] nm /mnt/sgk/work/6/lib/libgfortran.a | grep show_
>  T _gfortrani_show_variables
>  t show_boolean
>  t show_integer
>  t show_sep
>  t show_string
>  T _gfortrani_show_locus

Nope, none of those functions are actually publicly exported. They are not in 
gfortran.map, being either static, or having _gfortrani_ prefix which means 
internal libgfortran use.

FX

[gomp4] optimize routine calling

2015-08-25 Thread Nathan Sidwell
When forking to call a partitioned routine there is no need to propagate local 
state from the active thread to the forked threads.


I've committed this patch to implement that optimization.

nathan
2015-08-25  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_emit_forking): Add is_call argument,
	propagate it into mask.
	(nvptx_emit_joining): Likewise.
	(nvptx_expand_call): Move emit_forking call to later.  Add is_call
	argument.
	(nvptx_expand_oacc_fork, nvptx_expand_oacc_join): Asjust calls.
	(nvptx_discover_pars): Don't look for predecessor insn in call
	forks and joins.
	(nvptx_process_pars): Don't emit propagation code for a call.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 227159)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -1047,16 +1047,16 @@ nvptx_expand_compare (rtx compare)
 /* Emit forking instructions for MASK.  */
 
 static void
-nvptx_emit_forking (unsigned mask)
+nvptx_emit_forking (unsigned mask, bool is_call)
 {
   mask &= (GOMP_DIM_MASK (GOMP_DIM_WORKER)
 	   | GOMP_DIM_MASK (GOMP_DIM_VECTOR));
   if (mask)
 {
-  rtx op = GEN_INT (mask);
+  rtx op = GEN_INT (mask | (is_call << GOMP_DIM_MAX));
   
   /* Emit fork for worker level.  */
-  if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
+  if (!is_call && mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
 	emit_insn (gen_nvptx_fork (op));
   emit_insn (gen_nvptx_forked (op));
 }
@@ -1065,16 +1065,19 @@ nvptx_emit_forking (unsigned mask)
 /* Emit joining instructions for MASK.  */
 
 static void
-nvptx_emit_joining (unsigned mask)
+nvptx_emit_joining (unsigned mask, bool is_call)
 {
   mask &= (GOMP_DIM_MASK (GOMP_DIM_WORKER)
 	   | GOMP_DIM_MASK (GOMP_DIM_VECTOR));
   if (mask)
 {
-  rtx op = GEN_INT (mask);
+  rtx op = GEN_INT (mask | (is_call << GOMP_DIM_MAX));
 
-  /* Emit joining for all pars.  */
-  emit_insn (gen_nvptx_joining (op));
+  /* Emit joining for all non-call pars to ensure there's a single
+	 predecessor for the block the join insn ends up in.  This is
+	 needed for skipping entire loops.  */
+  if (!is_call)
+	emit_insn (gen_nvptx_joining (op));
   emit_insn (gen_nvptx_join (op));
 }
 }
@@ -1135,8 +1138,6 @@ nvptx_expand_call (rtx retval, rtx addre
 	}
 }
 
-  nvptx_emit_forking (parallel);
-
   if (cfun->machine->funtype
   /* It's possible to construct testcases where we call a variable.
 	 See compile/20020129-1.c.  stdarg_p will crash so avoid calling it
@@ -1195,11 +1196,12 @@ nvptx_expand_call (rtx retval, rtx addre
 	  write_func_decl_from_insn (func_decls, retval, pat, callee);
 	}
 }
+  nvptx_emit_forking (parallel, true);
   emit_call_insn (pat);
   if (tmp_retval != retval)
 emit_move_insn (retval, tmp_retval);
 
-  nvptx_emit_joining (parallel);
+  nvptx_emit_joining (parallel, true);
 }
 
 /* Expand the oacc fork & join primitive into ptx-required unspecs.  */
@@ -1207,13 +1209,13 @@ nvptx_expand_call (rtx retval, rtx addre
 void
 nvptx_expand_oacc_fork (unsigned mode)
 {
-  nvptx_emit_forking (GOMP_DIM_MASK (mode));
+  nvptx_emit_forking (GOMP_DIM_MASK (mode), false);
 }
 
 void
 nvptx_expand_oacc_join (unsigned mode)
 {
-  nvptx_emit_joining (GOMP_DIM_MASK (mode));
+  nvptx_emit_joining (GOMP_DIM_MASK (mode), false);
 }
 
 /* Expander for reduction locking and unlocking.  We expect SRC to be
@@ -2611,7 +2613,8 @@ nvptx_discover_pars (bb_insn_map_t *map)
 		l = new parallel (l, mask);
 		l->forked_block = block;
 		l->forked_insn = end;
-		if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
+		if (!(mask & GOMP_DIM_MASK (GOMP_DIM_MAX))
+		&& (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
 		  l->fork_insn
 		= nvptx_discover_pre (block, CODE_FOR_nvptx_fork);
 	  }
@@ -2626,7 +2629,8 @@ nvptx_discover_pars (bb_insn_map_t *map)
 		gcc_assert (l->mask == mask);
 		l->join_block = block;
 		l->join_insn = end;
-		if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
+		if (!(mask & GOMP_DIM_MASK (GOMP_DIM_MAX))
+		&& (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
 		  l->joining_insn
 		= nvptx_discover_pre (block, CODE_FOR_nvptx_joining);
 		l = l->parent;
@@ -3013,7 +3017,9 @@ nvptx_process_pars (parallel *par)
   inner_mask |= par->inner_mask;
 }
 
-  if (par->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
+  if (par->mask & GOMP_DIM_MASK (GOMP_DIM_MAX))
+{ /* No propagation needed for a call.  */ }
+  else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
 {
   nvptx_wpropagate (false, par->forked_block, par->forked_insn);
   nvptx_wpropagate (true, par->forked_block, par->fork_insn);


Re: [PATCH] rs6000: Fix PR67344

2015-08-25 Thread David Edelsohn
On Tue, Aug 25, 2015 at 1:08 PM, Segher Boessenkool
 wrote:
> The "*and3_imm_dot_shifted" pattern is a define_insn_and_split,
> like most "dot" patterns: if its output is not assigned cr0 but some
> other cr reg, it splits to a non-dot insn and a compare.
>
> Unfortunately that non-dot insn will clobber cr0 as well.  We could
> add another clobber (with "=X,x"), but then that second alternative
> is never useful; instead, just remove that second alternative.
>
> Bootstrapped and tested on powerpc64-linux; is this okay for trunk?
>
>
> Segher
>
>
> 2015-08-25  Segher Boessenkool  
>
> PR target/67344
> * config/rs6000/rs6000.md (*and3_imm_dot_shifted): Change to
> a define_insn, remove second alternative.

Okay.

Thanks, David


Re: [PATCH] rs6000: Fix PR67346

2015-08-25 Thread David Edelsohn
On Tue, Aug 25, 2015 at 1:09 PM, Segher Boessenkool
 wrote:
> "*ior_mask" is a define_insn_and_split, so it shouldn't use
> can_create_pseudo in its instruction condition, because IRA can then
> create such an instruction, and the condition becomes false before
> the insn is split.  Use a scratch instead.
>
> Bootstrapped and tested on powerpc64-linux; okay for trunk?
>
>
> Segher
>
>
> 2015-08-25  Segher Boessenkool  
>
> PR target/67346
> * config/rs6000/rs6000.md (*ior_mask): Use a match_scratch.

Okay.

Thanks, David


[gomp4.1] comment some stuff

2015-08-25 Thread Aldy Hernandez
I'm obviously not smart enough to understand libgomp's tasking runtime, 
and rth and you get 0 for commenting skills ;-).


I had some notes scribbled down while reading the code, and figured 
someone else might read this code some day.  It's still in dire need of 
commenting, but this mildly helps.


OK for branch?
commit 5fc2816946c9250c4cca43d002b364b2d6400919
Author: Aldy Hernandez 
Date:   Tue Aug 25 10:32:48 2015 -0700

* env.c: Make gomp_max_task_priority_var static.
* libgomp.h (struct gomp_task_depend_entry): Add comment.
* task.c (gomp_clear_parent): Document function.
(GOMP_task): Same.
(gomp_task_run_pre): Add comments.
(gomp_task_run_post_handle_dependers): Same.
(gomp_task_run_post_remove_parent): Same.
(gomp_task_run_post_remove_taskgroup): Same.
(GOMP_taskwait): Same.
(gomp_task_maybe_wait_for_dependencies): Same.

diff --git a/libgomp/env.c b/libgomp/env.c
index 65a6851..0569521 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -69,7 +69,7 @@ struct gomp_task_icv gomp_global_icv = {
 
 unsigned long gomp_max_active_levels_var = INT_MAX;
 bool gomp_cancel_var = false;
-int gomp_max_task_priority_var = 0;
+static int gomp_max_task_priority_var = 0;
 #ifndef HAVE_SYNC_BUILTINS
 gomp_mutex_t gomp_managed_threads_lock;
 #endif
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9031649..3d705ef 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -279,6 +279,7 @@ struct gomp_task_depend_entry
   struct gomp_task_depend_entry *next;
   struct gomp_task_depend_entry *prev;
   struct gomp_task *task;
+  /* Depend entry is of type "IN".  */
   bool is_in;
   bool redundant;
   bool redundant_out;
diff --git a/libgomp/task.c b/libgomp/task.c
index f2a0fae..7c7bae4 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -92,6 +92,8 @@ gomp_end_task (void)
   thr->task = task->parent;
 }
 
+/* Orphan the task in CHILDREN and all its siblings.  */
+
 static inline void
 gomp_clear_parent (struct gomp_task *children)
 {
@@ -110,7 +112,12 @@ static void gomp_task_maybe_wait_for_dependencies (void 
**depend);
 
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
false, then we must not delay in executing the task.  If UNTIED is true,
-   then the task may be executed by any member of the team.  */
+   then the task may be executed by any member of the team.
+
+   DEPEND is an array containing:
+   depend[0]: number of depend elements.
+   depend[1]: number of depend elements of type "out".
+   depend[N+2]: address of [0..N]th depend element.  */
 
 void
 GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
@@ -444,8 +451,10 @@ gomp_task_run_pre (struct gomp_task *child_task, struct 
gomp_task *parent,
 {
   if (parent)
 {
+  /* Remove child_task from parent.  */
   if (parent->children == child_task)
parent->children = child_task->next_child;
+
   if (__builtin_expect (child_task->parent_depends_on, 0)
  && parent->taskwait->last_parent_depends_on == child_task)
{
@@ -456,8 +465,10 @@ gomp_task_run_pre (struct gomp_task *child_task, struct 
gomp_task *parent,
parent->taskwait->last_parent_depends_on = NULL;
}
 }
+  /* Remove child_task from taskgroup.  */
   if (taskgroup && taskgroup->children == child_task)
 taskgroup->children = child_task->next_taskgroup;
+
   child_task->prev_queue->next_queue = child_task->next_queue;
   child_task->next_queue->prev_queue = child_task->prev_queue;
   if (team->task_queue == child_task)
@@ -528,6 +539,7 @@ gomp_task_run_post_handle_dependers (struct gomp_task 
*child_task,
  if (parent->taskwait && parent->taskwait->last_parent_depends_on
  && !task->parent_depends_on)
{
+ /* Put task in last_parent_depends_on.  */
  struct gomp_task *last_parent_depends_on
= parent->taskwait->last_parent_depends_on;
  task->next_child = last_parent_depends_on->next_child;
@@ -535,6 +547,7 @@ gomp_task_run_post_handle_dependers (struct gomp_task 
*child_task,
}
  else
{
+ /* Put task at the top of the sibling list.  */
  task->next_child = parent->children;
  task->prev_child = parent->children->prev_child;
  parent->children = task;
@@ -544,6 +557,7 @@ gomp_task_run_post_handle_dependers (struct gomp_task 
*child_task,
}
  else
{
+ /* Put task in the sibling list.  */
  task->next_child = task;
  task->prev_child = task;
  parent->children = task;
@@ -628,12 +642,18 @@ gomp_task_run_post_handle_depend (struct gomp_task 
*child_task,
   return gomp_task_run_post_handle_dependers (child_task, team);
 }
 
+/* Remove CHILD_TASK from its parent.  */
+
 static inline void
 gomp_task_

Re: [gomp4.1] comment some stuff

2015-08-25 Thread Aldy Hernandez

On 08/25/2015 10:35 AM, Aldy Hernandez wrote:

-int gomp_max_task_priority_var = 0;
+static int gomp_max_task_priority_var = 0;


Sorry I snuck that in there.  The variable is unused elsewhere, might as 
well make it static.


Aldy


[gomp-4.1] fix incorrect memory size in goacc_new_thread

2015-08-25 Thread Aldy Hernandez
This is either blatantly wrong or subtly correct, in which case it needs 
a comment.  My guess is the former.


OK for branch?
commit 330391636113ed9a9067e6eb639755fb0f4723dc
Author: Aldy Hernandez 
Date:   Tue Aug 25 10:41:28 2015 -0700

* oacc-init.c (goacc_new_thread): Use correct size of goacc_thread
when allocating memory.

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index c91731e..e6d2c03 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -312,7 +312,7 @@ acc_shutdown_1 (acc_device_t d)
 static struct goacc_thread *
 goacc_new_thread (void)
 {
-  struct goacc_thread *thr = gomp_malloc (sizeof (struct gomp_thread));
+  struct goacc_thread *thr = gomp_malloc (sizeof (struct goacc_thread));
 
 #if defined HAVE_TLS || defined USE_EMUTLS
   goacc_tls_data = thr;


Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"

2015-08-25 Thread Ulrich Weigand
Hans-Peter Nilsson wrote:
> > From: Ulrich Weigand 
> > Date: Tue, 25 Aug 2015 14:59:05 +0200
> 
> > The other GCC run-time libraries rely on libtool to figure out
> > that even though -fPIC works, dynamic libraries are still not
> > supported on the platform, and thus compile everything for
> > static linking (i.e. without -fPIC).
> 
> That's not what I see, at least not the "figuring out" part.
> (They mostly use libtool as-is; some test tuples, but some test
> version-script support and add it then.)

Well, the "figuring out" is implicit; because libtool knows the
platform does not support dynamic linking, it defaults to
--disable-shared, which means only static libraries are being
built; and the default compile option when building static
libraries does not use -fPIC.


> I'll leave that to you to sort out, but if you chose to use
> $can_build_shared, consider also setting PIC_FLAG to $pic_flag
> (instead of plain -fPIC).  In the meantime I'll commit my patch
> as it solves *some* of the breakage; for targets erroring on -fPIC.
> 
> ...but reading the libtool documention I think I found a much
> better solution: Let's just add -prefer-pic when compiling
> libbacktrace.  It leaves everything to libtool.  Can you please
> test this?

Hmm, reading the documentation an even simpler version that has
equivalent effect to yours should be just adding the pic-only
option when calling LT_INIT.

However, neither works for the SPU, because in both cases libtool
will only do the test whether the target supports the -fPIC option.
It will not test whether the target supports dynamic libraries.

[ It will do that test; and default to --disable-shared on SPU.
That is a no-op for libbacktrace however, since it calls LT_INIT
with the disable-shared option anyway.  When adding back the -fPIC
flag due to either the pic-only LT_INIT option or the -prefer-pic
libtool command line option, it does not check for that again.  ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [libgfortran,patch] Remove never-used debugging code

2015-08-25 Thread Steve Kargl
On Tue, Aug 25, 2015 at 07:10:23PM +0200, FX wrote:
> > Certainly, the dead code can go.  But,is this changing the library ABI?
> > 
> > troutmask:fvwm:kargl[764] nm /mnt/sgk/work/6/lib/libgfortran.a | grep show_
> >  T _gfortrani_show_variables
> >  t show_boolean
> >  t show_integer
> >  t show_sep
> >  t show_string
> >  T _gfortrani_show_locus
> 
> Nope, none of those functions are actually publicly exported.
> They are not in gfortran.map, being either static, or having
> _gfortrani_ prefix which means internal libgfortran use.
> 

OK. Just checking.  Thanks for the code cleanup.

-- 
Steve


RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2015-08-25 Thread Moore, Catherine


> -Original Message-
> From: Andrew Bennett [mailto:andrew.benn...@imgtec.com]
> Sent: Tuesday, July 21, 2015 10:15 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Matthew Fortune; Moore, Catherine
> Subject: [PATCH] MIPS: If a test in the MIPS testsuite requires standard
> library support check the sysroot supports the required test options.
> 
> Hi,
> 
> The recent changes to the MIPS GCC Linux sysroot
> (https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01014.html) have meant
> that the include directory is now not global and is provided only for each
> multi-lib configuration.  This means that for any test in the MIPS GCC
> Testsuite that requires standard library support we need to check if there is 
> a
> multi-lib support for the test options, otherwise it might fail to compile.
> 
> This patch adds this support to the testsuite and mips.exp files.  Firstly any
> test that requires standard library support has the implicit option
> "(REQUIRES_STDLIB)" added to its dg-options.  Secondly in mips.exp a pre-
> processor check is performed to ensure that when expanding a testcase
> containing a "#include " using the current set of test options we do
> not get file not found errors.  If this happens we mark the testcase as
> unsupported.
> 
> The patch has been tested on the mti/img elf/linux-gnu toolchains, and there
> have been no new regressions.
> 
> The patch and ChangeLog are below.
> 
> Ok to commit?
> 
> 
Yes.  This looks good.


Re: Indirect jumps

2015-08-25 Thread Jeff Law

On 08/25/2015 08:11 AM, Nathan Sidwell wrote:

Ptx is one of those rare (unique?) machines that doesn't  have an
indirect branch.  optabs  is prepared for such  a target and emits a
sorry when an indirect branch is needed.  However it then goes on to try
and  emit such an instruction and ends up ICEing.

Fixed thusly, ok?  (Or is the right solution to define a dummy indirect
branch in the PTX  md file?)

I think we're trying to generally get away from dummy patterns.

We could emulate by creating a new stack frame and shoving the target of 
the branch into the stack, then executing a return.  However, I don't 
think that's worth doing ;-)


I think the patch is fine for the trunk.

jeff


[gomp4] add reduction lock initializer

2015-08-25 Thread Nathan Sidwell
Cesar discovered another quirk of PTX.  Inspite of PTX documenting that static 
variables can be initialized and default to zero, there's a little note that it 
doesn't work for .shared variables.  Thus we need code to initialize the worker 
lock variable used for reductions.


This implements a new internal function 'IFN_GOACC_LOCK_INIT', with the same 
arguments as the LOCK and UNLOCK functions.  The intent is that it is emitted at 
the reduction setup point and expands to target-specific code.


For PTX it's deleted for everything but worker level, and for that we expand to 
an initialization of the lock variable.  We can simply use the same insn as the 
unlocker, but I renamed it to be less confusing.


nathan
2015-08-25  Nathan Sidwell  

	* targhooks.h (default_goacc_lock_unlock): Rename to ...
	(default_goacc_lock): ... here.  Adjust.
	* config/nvptx/nvptx.md (oacc_expand_lock, oacc_expand_unlock):
	Adjust call to lock expander.
	(oacc_expand_lock_init): New.
	(nvptx_spinlock, nvptx_spinunlock): Rename to ...
	(nvptx_spin_lock, nvtx_spin_reset): ... here.
	* config/nvptx/ntptx.c (nvptx_expand_oacc_lock_unlock): Rename to ...
	(nvptx_expand_oacc_lock): ... here.  Deal with init too.
	(nvptx_xform_lock_unlock): Rename to ...
	(nvptx_xform_lock): ... here.  Deal with init too.
	(TARGET_GOACC_LOCK_UNLOCK): Replace with ...
	(TARGET_GOACC_LOCK): ... this.
	* omp-low.c (exectute_oacc_transform): Deal with
	IFN_GOACC_LOCK_INIT.
	(default_goacc_lock_unlock): Rename to ...
	(default_goacc_lock): ... here.  Deal with init too.
	* internal-fn.c (expand_GOACC_LOCK_INIT): New.
	* internal-fn.def (GOACC_LOCK_INIT): New.
	* doc/tm.texi.in (TARGET_GOACC_LOCK_UNLOCK): Replace with ...
	(TARGET_GOACC_LOCK): ... this.
	* doc/tm.texi: Rebuilt.
	* target.def (goacc lock_unlock): Replace with ...
	(goacc lock): ... this.  Deal with init too.

Index: gcc/targhooks.h
===
--- gcc/targhooks.h	(revision 227174)
+++ gcc/targhooks.h	(working copy)
@@ -110,7 +110,7 @@ extern void default_destroy_cost_data (v
 extern bool default_goacc_validate_dims (tree, int [], int);
 extern unsigned default_goacc_dim_limit (unsigned);
 extern bool default_goacc_fork_join (gimple, const int [], bool);
-extern bool default_goacc_lock_unlock (gimple, const int [], bool);
+extern bool default_goacc_lock (gimple, const int [], unsigned);
 
 /* These are here, and not in hooks.[ch], because not all users of
hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */
Index: gcc/config/nvptx/nvptx.md
===
--- gcc/config/nvptx/nvptx.md	(revision 227174)
+++ gcc/config/nvptx/nvptx.md	(working copy)
@@ -1371,7 +1371,7 @@
 		   UNSPECV_LOCK)]
   ""
 {
-  nvptx_expand_oacc_lock_unlock (operands[0], true);
+  nvptx_expand_oacc_lock (operands[0], 0);
   DONE;
 })
 
@@ -1381,7 +1381,17 @@
 		   UNSPECV_LOCK)]
   ""
 {
-  nvptx_expand_oacc_lock_unlock (operands[0], false);
+  nvptx_expand_oacc_lock (operands[0], +1);
+  DONE;
+})
+
+(define_expand "oacc_lock_init"
+  [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")
+		(match_operand:SI 1 "const_int_operand" "")]
+		   UNSPECV_LOCK)]
+  ""
+{
+  nvptx_expand_oacc_lock (operands[0], -1);
   DONE;
 })
 
@@ -1592,8 +1602,8 @@
   ""
   "membar%B0;")
 
-;; spinlock and unlock
-(define_insn "nvptx_spinlock"
+;; spin lock and reset
+(define_insn "nvptx_spin_lock"
[(parallel
  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")
 			(match_operand:SI 1 "const_int_operand" "i")]
@@ -1604,7 +1614,7 @@
""
"%4:\\tatom%R1.cas.b32 %2,%0,0,1;setp.ne.u32 %3,%2,0;@%3 bra.uni %4;")
 
-(define_insn "nvptx_spinunlock"
+(define_insn "nvptx_spin_reset"
[(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")
 		  (match_operand:SI 1 "const_int_operand" "i")]
 		  UNSPECV_LOCK)
Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 227174)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -1220,7 +1220,7 @@ nvptx_expand_oacc_join (unsigned mode)
gang or worker level.  */
 
 void
-nvptx_expand_oacc_lock_unlock (rtx src, bool lock)
+nvptx_expand_oacc_lock (rtx src, int direction)
 {
   unsigned HOST_WIDE_INT kind;
   rtx pat;
@@ -1230,22 +1230,26 @@ nvptx_expand_oacc_lock_unlock (rtx src,
 
   rtx mem = gen_rtx_MEM (SImode, lock_syms[kind]);
   rtx space = GEN_INT (lock_space[kind]);
-  rtx barrier = gen_nvptx_membar (GEN_INT (lock_level[kind]));
+  rtx barrier = NULL_RTX;
   rtx tmp = gen_reg_rtx (SImode);
 
-  if (!lock)
+  if (direction >= 0)
+barrier = gen_nvptx_membar (GEN_INT (lock_level[kind]));
+
+  if (direction > 0)
 emit_insn (barrier);
-  if (lock)
+  if (!direction)
 {
   rtx_code_label *label = gen_label_rtx ();
 
   LABEL_NUSES (label)++;
-  pat = gen_nvptx_spinlock (mem, space, tmp, gen_reg_rtx (BImode

Re: [PATCH 1/5] Refactor completely_scalarize_var

2015-08-25 Thread Jeff Law

On 08/25/2015 05:06 AM, Alan Lawrence wrote:

This is a small refactoring/renaming patch, it just moves the call to
"completely_scalarize_record" out from completely_scalarize_var, and renames
the latter to create_total_scalarization_access.

This is because the next patch needs to drop the "_record" suffix and I felt
it would be confusing to have both completely_scalarize and
completely_scalarize_var. However, it also makes the new function name
(create_total_scalarization_access) consistent with the existing code & comment.

Bootstrapped + check-gcc on x86_64.

gcc/ChangeLog:

* tree-sra.c (completely_scalarize_var): Rename to...
(create_total_scalarization_access): ... Here. Drop call to
completely_scalarize_record.

(analyze_all_variable_accesses): Replace completely_scalarize_var
with create_total_scalarization_access and completely_scalarize_record.

OK.
Jeff



Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-25 Thread Jeff Law

On 08/25/2015 05:06 AM, Alan Lawrence wrote:

This changes the completely_scalarize_record path to also work on arrays (thus
allowing records containing arrays, etc.). This just required extending the
existing type_consists_of_records_p and completely_scalarize_record methods
to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both
methods so as not to mention 'record'.

Bootstrapped + check-gcc on aarch64-none-linux-gnu, arm-none-linux-gnueabihf 
and x86_64-none-linux-gnu.

Have also verified the scan-tree-dump check in the new sra-15.c passes (using a 
stage 1 compiler only, no execution test) on alpha, hppa, powerpc, sparc, avr 
and sh.

gcc/ChangeLog:

* tree-sra.c (type_consists_of_records_p): Rename to...
(scalarizable_type_p): ...this, add case for ARRAY_TYPE.

(completely_scalarize_record): Rename to...
(completely_scalarize): ...this, add ARRAY_TYPE case, move some code to:
(scalarize_elem): New.

gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/sra-15.c: New.
---
  gcc/testsuite/gcc.dg/tree-ssa/sra-15.c |  38 +
  gcc/tree-sra.c | 146 ++---
  2 files changed, 135 insertions(+), 49 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
new file mode 100644
index 000..e251058
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
@@ -0,0 +1,38 @@
+/* Verify that SRA total scalarization works on records containing arrays.  */
+/* Test skipped for targets with small (often default) MOVE_RATIO.  */
?!?  I don't see anything that skips this test for any targets. 
Presumably this was copied from sra-12.c.  I suspect this comment should 
just be removed.


With that comment removed from the testcase, this is OK.

jeff




Re: Indirect jumps

2015-08-25 Thread Nathan Sidwell

On 08/25/15 15:10, Jeff Law wrote:

On 08/25/2015 08:11 AM, Nathan Sidwell wrote:



We could emulate by creating a new stack frame and shoving the target of the
branch into the stack, then executing a return.  However, I don't think that's
worth doing ;-)


And wouldn't work for PTX anyway -- all the return addr handling is hidden away.


I think the patch is fine for the trunk.


ok.



Go patch committed: Don't crash on erroneous array types

2015-08-25 Thread Ian Lance Taylor
This patch by Chris Manghane fixes the Go frontend to not crash on
erroneous array types.  This fixes https://golang.org/issue/11546 .
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227160)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-81810917af7ba19e1f9f8efc8b1989f7d6419d30
+d6d59d5927c4ea0c02468ebc6a2df431fb64595a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 226846)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -5781,6 +5781,8 @@ Array_type::verify_length()
 bool
 Array_type::do_verify()
 {
+  if (this->element_type()->is_error_type())
+return false;
   if (!this->verify_length())
 this->length_ = Expression::make_error(this->length_->location());
   return true;


Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-25 Thread Jeff Law

On 08/25/2015 05:06 AM, Alan Lawrence wrote:

When SRA completely scalarizes an array, this patch changes the
generated accesses from e.g.

MEM[(int[8] *)&a + 4B] = 1;

to

a[1] = 1;

This overcomes a limitation in dom2, that accesses to equivalent
chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with
accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant
propagation in the ssa-dom-cse-2.c testcase (after the next patch
that makes SRA handle constant-pool loads).

I tried to work around this by making dom2's hashable_expr_equal_p
less conservative, but found that on platforms without AArch64's
vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
*)&a] equivalent to a[0], etc.; a complete overhaul of
hashable_expr_equal_p seems like a larger task than this patch
series.

I can't see how to write a testcase for this in C though as direct
assignment to an array is not possible; such assignments occur only
with constant pool data, which is dealt with in the next patch.

It's a general issue that if there's > 1 common way to represent an
expression, then DOM will often miss discovery of the CSE opportunity
because of the way it hashes expressions.

Ideally we'd be moving to a canonical form, but I also realize that in
the case of memory references like this, that may not be feasible.

It does make me wonder how many CSEs we're really missing due to the two
ways to represent array accesses.



Bootstrap + check-gcc on x86-none-linux-gnu,
arm-none-linux-gnueabihf, aarch64-none-linux-gnu.

gcc/ChangeLog:

* tree-sra.c (completely_scalarize): Move some code into:
(get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base
is aligned array. --- gcc/tree-sra.c | 110
- 1 file
changed, 69 insertions(+), 41 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc
100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@
scalarizable_type_p (tree type) } }

+static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT
*sz_out)

Function comment needed.

I may have missed it in the earlier patches, but can you please make
sure any new functions you created have comments in those as well.  Such 
patches are pre-approved.


With the added function comment, this patch is fine.

jeff




Re: [RFC 5/5] Always completely replace constant pool entries

2015-08-25 Thread Jeff Law

On 08/25/2015 05:06 AM, Alan Lawrence wrote:

I used this as a means of better-testing the previous changes, as it exercises
the constant replacement code a whole lot more. Indeed, quite a few tests are
now optimized away to nothing on AArch64...

Always pulling in constants, is almost certainly not what we want, but we may
nonetheless want something more aggressive than the usual --param, e.g. for the
ssa-dom-cse-2.c test. Thoughts welcomed?
I'm of the opinion that we have too many knobs already.  So I'd perhaps 
ask whether or not this option is likely to be useful to end users?


As for the patch itself, any thoughts on reasonable heuristics for when 
to pull in the constants?  Clearly we don't want the patch as-is, but 
are there cases we can identify when we want to be more aggressive?


jeff




[nvptx] disable another test

2015-08-25 Thread Nathan Sidwell
I've committed this to avoid tripping over another ptx assembler problem. 
memcpy is used for block move, and the FUNCTION_DECL for that is created 
uniquely in expr.c.  It's not linked up to any other decl for memcpy.


PTX requires declarations forexternal symbols, so we emit one for the blkmove 
variant.  We also emit a  definition in this particular testcase, and the PTX 
assembler complains that we've said both 'external', and 'definition'.  Rather 
than go through heroics for just this testcase, I skip it.


nathan
2015-08-25  Nathan Sidwell  

	* gcc.c-torture/execute/builtins/20010124-1.x: New.

Index: gcc.c-torture/execute/builtins/20010124-1.x
===
--- gcc.c-torture/execute/builtins/20010124-1.x	(revision 0)
+++ gcc.c-torture/execute/builtins/20010124-1.x	(working copy)
@@ -0,0 +1,10 @@
+load_lib target-supports.exp
+
+if [istarget "nvptx-*-*"] {
+# This test uses memcpy for block move in the same file as it
+# defines it.  The two decls are not the same, by design, and we
+# end up emitting a definition of memcpy, along with a .extern
+# declaration. This confuses the ptx assembler.
+return 1
+}
+return 0


Re: [RFC 4/5] Handle constant-pool entries

2015-08-25 Thread Jeff Law

On 08/25/2015 05:06 AM, Alan Lawrence wrote:

This makes SRA replace loads of records/arrays from constant pool entries,
with elementwise assignments of the constant values, hence, overcoming the
fundamental problem in PR/63679.

As a first pass, the approach I took was to look for constant-pool loads as
we scanned through other accesses, and add them as candidates there; to build a
constant replacement_decl for any such accesses in completely_scalarize; and to
use any existing replacement_decl rather than creating a variable in
create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but
that does not allow addresses of labels, which can still end up in the constant
pool.)

Feedback as to the approach or how it might be better structured / fitted into
SRA, is solicited ;).

Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly 
increases the number of testcases in which this code is exercised!

Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a 
stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, 
and sh.

gcc/ChangeLog:

* tree-sra.c (create_access): Scan for uses of constant pool and add
to candidates.
(subst_initial): New.
(scalarize_elem): Build replacement_decl using subst_initial.
(create_access_replacement): Use replacement_decl if set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
sra-max-scalarization-size-Ospeed.
---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
  gcc/tree-sra.c| 56 +--
  2 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index af35fcc..a3ff2df 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
else
  ptr = false;

+  /* FORNOW: scan for uses of constant pool as we go along.  */
I'm not sure why you have this marked as FORNOW.  If I'm reading all 
this code correctly, you're lazily adding items from the constant pool 
into the candidates table when you find they're used.  That seems better 
than walking the entire constant pool adding them all to the candidates.


I don't see this as fundamentally wrong or unclean.

The question I have is why this differs from the effects of patch #5. 
That would seem to indicate that there's things we're not getting into 
the candidate tables with this approach?!?





@@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, 
HOST_WIDE_INT offset, tree ref)
  }
  }

+static tree
+subst_initial (tree expr, tree var)

Function comment.

I think this patch is fine with the function comment added and removing 
the FORNOW part of the comment in create_access.  It may be worth noting 
in create_access's comment that it can add new items to the candidates 
tables for constant pool entries.



Jeff


Re: [PATCH] Don't ICE on invalid weak decl (PR middle-end/67330)

2015-08-25 Thread Jeff Law

On 08/25/2015 09:44 AM, Marek Polacek wrote:

Here we are ICEing on an invalid code: symtab_node::get asserts that it's
dealing with a function or a static or external variable, but an invalid
decl is rejected too late.  So don't try to mark_weak an invalid decl and
also don't duplicate the "declared weak after being used" check -- that is
already in mark_weak.
Perhaps we should also punt if (!TARGET_SUPPORTS_WEAK)?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-08-25  Marek Polacek  

PR middle-end/67330
* varasm.c (declare_weak): Return after giving an error.

* c-common.c (handle_weak_attribute): Don't check whether the
visibility can be changed here.

* gcc.dg/weak/weak-18.c: New test.

OK.
jeff



Re: [testsuite] Clean up effective_target cache

2015-08-25 Thread Jeff Law

On 08/25/2015 02:14 AM, Christophe Lyon wrote:

Hi,

Some subsets of the tests override ALWAYS_CXXFLAGS or
TEST_ALWAYS_FLAGS and perform effective_target support tests using
these modified flags.

In case these flags conflict with the effective_target tests, it means
that subsequent tests will be UNSUPPORTED even though
ALWAYS_CXXFLAGS/TEST_ALWAYS_FLAGS have been reset and no longer
conflict.

In practice, we noticed this when running validation under 'ulimit -v
XXX', which can conflict with ASAN. We observed that sse2 and
stack_protector tests would randomly fail when tested from asan.exp,
making non-asan tests UNSUPPORTED.

This patch adds a new function 'clear_effective_target_cache', which
is called at the end of every .exp file which overrides
ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS.

I tested it works well for asan.exp on x86_64 but the changes in other
.exp files seem mechanical.

However, I noticed that lib/g++.exp changes ALWAYS_CXXFLAGS, but does
not appear to restore it. In doubt, I didn't change it.

OK?
OK after a full regression test.  While I agree the change is 
mechanical, there may be interactions that are non-obvious.


Jeff


Re: [PATCH 1/2] driver: support state cleanup

2015-08-25 Thread David Malcolm
On Tue, 2015-08-25 at 13:48 +, Joseph Myers wrote:
> On Thu, 6 Aug 2015, David Malcolm wrote:
> 
> > gcc/ChangeLog:
> > * gcc-main.c (main): Add params to driver ctor.
> > * gcc.c (class env_manager): New.
> > (env): New global.
> > (env_manager::init): New.
> > (env_manager::get): New.
> > (env_manager::xput): New.
> > (env_manager::restore): New.
> > Poison getenv and putenv.
> > (DEFAULT_TARGET_SYSTEM_ROOT): New.
> > (target_system_root): Update initialization to use
> > DEFAULT_TARGET_SYSTEM_ROOT.
> > (struct spec_list): Add field "default_ptr".
> > (INIT_STATIC_SPEC): Initialize new field "default_ptr".
> > (init_spec): Likewise.
> > (set_spec): Clear field "default_ptr".
> > (read_specs): Free "spec" and "buffer".
> > (xputenv): Reimplement in terms of env_manager.
> > (process_command): Replace ::getenv calls with calls to the
> > env_manager singleton.
> > (process_brace_body): Free string in three places.
> > (driver::driver): New.
> > (driver::~driver): New.
> > (used_arg): Convert from a function to...
> > (class used_arg_t): ...this class, and...
> > (used_arg): ...this new global instance.
> > (used_arg_t::finalize): New function.
> > (getenv_spec_function): Add "const" to local "value".  Replace
> > ::getenv call with call to the env_manager singleton.
> > (path_prefix_reset): New function.
> > (driver::finalize): New function.
> > * gcc.h (driver::driver): New.
> > (driver::~driver): New.
> > (driver::finalize): New.
> 
> OK.

Thanks.

I've committed the combination of patches 1 and 2 to trunk as r227188,
having double-checked that it still bootstraps & passes regression
testing (on x86_64-pc-linux-gnu).

[some people on the jit list were specifically interested in this patch
kit, hence this email].



Go patch committed: Allow string slices with start == len

2015-08-25 Thread Ian Lance Taylor
This patch by Chris Manghane fixes the Go frontend to permit string
slice expressions when the start of the slice is the length of the
setring.  This were previously erroneously forbidden when using a
constant index.  This fixes https://golang.org/issue/11522 .
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227184)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d6d59d5927c4ea0c02468ebc6a2df431fb64595a
+14ca4b6130b9a7132d132f418e9ea283b3a52c08
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 227039)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -10341,7 +10341,10 @@ String_index_expression::do_check_types(
 {
   ival_valid = true;
   if (mpz_sgn(ival) < 0
- || (sval_valid && mpz_cmp_ui(ival, sval.length()) >= 0))
+ || (sval_valid
+ && (this->end_ == NULL
+ ? mpz_cmp_ui(ival, sval.length()) >= 0
+ : mpz_cmp_ui(ival, sval.length()) > 0)))
{
  error_at(this->start_->location(), "string index out of bounds");
  this->set_is_error();


Re: [PATCH] [ping] Use single shared memory block pool for all pool allocators

2015-08-25 Thread Richard Sandiford
Mikhail Maltsev  writes:
> Hi, all.
> I'm pinging this patch: 
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00030.html

A week and no reply, so FWIW: I really hope this goes in.  malloc always
shows up high in my profiles too.

Just a couple of very minor things to show that I read the patch :-)

> diff --git a/gcc/virtual-memory.cc b/gcc/virtual-memory.cc
> new file mode 100644
> index 000..69bda37
> --- /dev/null
> +++ b/gcc/virtual-memory.cc
> @@ -0,0 +1,66 @@
> +/* 
> +   Copyright (C) 2015 Free Software Foundation, Inc.

Missing a description line.  Same for the .h.

> +/* Free all memory allocated by memory_block_pool.  */

Strictly speaking I think this is "all unused memory".

> +  size_t size = reinterpret_cast<_obstack_chunk *> (chunk)->limit -
> +reinterpret_cast(chunk);

The emacs formatting rule would make this:

  size_t size = (reinterpret_cast<_obstack_chunk *> (chunk)->limit
 - reinterpret_cast (chunk));

Still not sure what the rule is supposed to be wrt space before "<"
or after ">".

> +#ifndef VIRTUAL_MEMORY_H
> +#define  VIRTUAL_MEMORY_H

Tab indentation for the #define but not the #ifndef.  (Think it should
be a space for both.)

> +/* Allocate single block.  Reuse previously returned block, if possible.  */

Pedantic, but: "a single block", "a previously returned block".

> +inline void *
> +memory_block_pool::allocate ()
> +{
> +  if (instance.m_blocks == NULL)
> +return XNEWVEC (char, block_size);
> +
> +  void *result = instance.m_blocks;
> +  instance.m_blocks = instance.m_blocks->m_next;
> +  return result;
> +}
> +
> +/* Return UNCAST_BLOCK to pool.  */

"to the pool"

> +inline void
> +memory_block_pool::remove (void *uncast_block)
> +{
> +  block_list *block = reinterpret_cast (uncast_block);

For aliasing purposes, should this instead be a placement new,
to show that a new block_list object is being created?

> +extern void *mempool_obstack_chunk_alloc(size_t) ATTRIBUTE_MALLOC;
> +extern void mempool_obstack_chunk_free(void *);

Space before "(".

Thanks,
Richard


Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.

2015-08-25 Thread Jeff Law

On 08/21/2015 10:30 AM, Ilya Enkovich wrote:

If we're checking an optab to drive an optimization, then we're probably on
the wrong track.


That's totally similar to VEC_COND_EXPR which we generate comparison into.
It is.  The vectorizer is riddled with this stuff.  Sigh.  So I won't 
consider this a negative for the scalar  mask support.





I think this ties into the overall discussion about
whether or not to represent these masks in gimple or try to handle them
later during gimple->rtl expansion.


Currently we don't have any abstraction for masks, it is supported
using vector of integers. When I expand it I have no idea whether it
is just a vector of integers to be stored or a mask to be used for
MASK_LOAD. Actually it may be used in both cases at the same time.

Handling it in RTL means we have to undo bool->int transformation made
in GIMPLE. For trivial cases it may be easy but in generic it can be
challenging.  I want to avoid it from the beginning.
I wasn't suggesting handling them in RTL, but at the border between 
gimple and RTL.


But if we can't reliably determine how a particular mask is going to be 
used at that point, then doing things at the gimple/RTL border may be a 
spectacularly bad idea ;-)


jeff


Go patch committed: accept numeric literals with leading zeroes

2015-08-25 Thread Ian Lance Taylor
This patch by Chris Manghane fixes the Go frontend to accept numeric
literals with leading zeroes, even if they don't turn out to be octal.
This fixes https://golang.org/issue/11532 and
https://golang.org/issue/11533 .  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227191)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-14ca4b6130b9a7132d132f418e9ea283b3a52c08
+f97d579fa8431af5cfde9b0a48604caabfd09377
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/lex.cc
===
--- gcc/go/gofrontend/lex.cc(revision 226846)
+++ gcc/go/gofrontend/lex.cc(working copy)
@@ -1047,7 +1047,7 @@ Lex::gather_number()
  pnum = p;
  while (p < pend)
{
- if (*p < '0' || *p > '7')
+ if (*p < '0' || *p > '9')
break;
  ++p;
}
@@ -1060,7 +1060,13 @@ Lex::gather_number()
  std::string s(pnum, p - pnum);
  mpz_t val;
  int r = mpz_init_set_str(val, s.c_str(), base);
- go_assert(r == 0);
+  if (r != 0)
+{
+  if (base == 8)
+error_at(this->location(), "invalid octal literal");
+  else
+error_at(this->location(), "invalid hex literal");
+}
 
  if (neg)
mpz_neg(val, val);


Re: [PATCH 1/2] driver: support state cleanup

2015-08-25 Thread Joseph Myers
I've committed as obvious this patch to fix a build failure this 
introduced for targets not defining EXTRA_SPECS, in which case extra_specs 
is not declared.  (nvptx-none was the target for which I came across such 
a failure.)

2015-08-25  Joseph Myers  

* gcc.c (driver::finalize): Only assign to extra_specs if
[EXTRA_SPECS].

Index: gcc.c
===
--- gcc.c   (revision 227191)
+++ gcc.c   (working copy)
@@ -9813,7 +9813,9 @@ driver::finalize ()
}
   *(sl->ptr_spec) = sl->default_ptr;
 }
+#ifdef EXTRA_SPECS
   extra_specs = NULL;
+#endif
 
   processing_spec_function = 0;
 

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-25 Thread Jeff Law

On 08/21/2015 06:17 AM, Ilya Enkovich wrote:


Hmm, I don't see how vector masks are more difficult to operate with.


There are just no instructions for that but you have to pretend you
have to get code vectorized.




Also according to vector ABI integer mask should be used for mask
operand in case of masked vector call.


What ABI?  The function signature of the intrinsics?  How would that
come into play here?


Not intrinsics. I mean OpenMP vector functions which require integer
arg for a mask in case of 512-bit vector.
That's what I assumed -- you can pass in a mask as an argument and it's 
supposed to be a simple integer, right?








Current implementation of masked loads, masked stores and bool
patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
really call it a canonical representation for all targets?


No idea - we'll revisit when another targets adds a similar capability.


AVX-512 is such target. Current representation forces multiple scalar
mask -> vector mask and back transformations which are artificially
introduced by current bool patterns and are hard to optimize out.
I'm a bit surprised they're so prevalent and hard to optimize away. 
ISTM PRE ought to handle this kind of thing with relative ease.




Fact is GCC already copes with vector masks generated by vector compares
just fine everywhere and I'd rather leave it as that.


Nope. Currently vector mask is obtained from a vec_cond . AND and IOR on bools are also expressed via
additional vec_cond. I don't think vectorizer ever generates vector
comparison.

And I wouldn't say it's fine 'everywhere' because there is a single
target utilizing them. Masked loads and stored for AVX-512 just don't
work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
512-bit vector then we get an ugly inefficient code. The question is
where to fight with this inefficiency: in RTL or in GIMPLE. I want to
fight with it where it appears, i.e. in GIMPLE by preventing bool ->
int conversions applied everywhere even if target doesn't need it.
You should expect pushback anytime target dependencies are added to 
gimple, even if it's stuff in the vectorizer, which is infested with 
target dependencies.




If we don't want to support both types of masks in GIMPLE then it's
more reasonable to make bool -> int conversion in expand for targets
requiring it, rather than do it for everyone and then leave it to
target to transform it back and try to get rid of all those redundant
transformations. I'd give vector a chance to become a canonical
mask representation for that.

Might be worth some experimentation.

Jeff


Re: [PATCH 1/5] Refactor completely_scalarize_var

2015-08-25 Thread Martin Jambor
Hi,

On Tue, Aug 25, 2015 at 12:06:13PM +0100, Alan Lawrence wrote:
> This is a small refactoring/renaming patch, it just moves the call to
> "completely_scalarize_record" out from completely_scalarize_var, and renames
> the latter to create_total_scalarization_access.
> 
> This is because the next patch needs to drop the "_record" suffix and I felt
> it would be confusing to have both completely_scalarize and
> completely_scalarize_var. However, it also makes the new function name
> (create_total_scalarization_access) consistent with the existing code & 
> comment.
> 
> Bootstrapped + check-gcc on x86_64.
> 
> gcc/ChangeLog:
> 
>   * tree-sra.c (completely_scalarize_var): Rename to...
>   (create_total_scalarization_access): ... Here. Drop call to
>   completely_scalarize_record.
> 
>   (analyze_all_variable_accesses): Replace completely_scalarize_var
>   with create_total_scalarization_access and completely_scalarize_record.
> ---
>  gcc/tree-sra.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 818c290..a0c92b0 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -985,7 +985,7 @@ completely_scalarize_record (tree base, tree decl, 
> HOST_WIDE_INT offset,
> type_consists_of_records_p.   */
>  
>  static void
> -completely_scalarize_var (tree var)
> +create_total_scalarization_access (tree var)

If you change what the function does, you have to change the comment
too.  If I am not mistaken, even with the whole patch set applied, the
first sentence would still be: "Create total_scalarization accesses
for all scalar type fields in VAR and for VAR as a whole." And with
this change, only the part after "and" will remain true.

Thanks,

Martin

>  {
>HOST_WIDE_INT size = tree_to_uhwi (DECL_SIZE (var));
>struct access *access;
> @@ -994,8 +994,6 @@ completely_scalarize_var (tree var)
>access->expr = var;
>access->type = TREE_TYPE (var);
>access->grp_total_scalarization = 1;
> -
> -  completely_scalarize_record (var, var, 0, var);
>  }
>  
>  /* Return true if REF has an VIEW_CONVERT_EXPR somewhere in it.  */
> @@ -2529,7 +2527,8 @@ analyze_all_variable_accesses (void)
>   if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>   <= max_scalarization_size)
> {
> - completely_scalarize_var (var);
> + create_total_scalarization_access (var);
> + completely_scalarize_record (var, var, 0, var);
>   if (dump_file && (dump_flags & TDF_DETAILS))
> {
>   fprintf (dump_file, "Will attempt to totally scalarize ");
> -- 
> 1.8.3
> 


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-25 Thread Jeff Law

On 08/21/2015 04:49 AM, Ilya Enkovich wrote:


I want a work with bitmasks to be expressed in a natural way using
regular integer operations. Currently all masks manipulations are
emulated via vector statements (mostly using a bunch of vec_cond). For
complex predicates it may be nontrivial to transform it back to scalar
masks and get an efficient code. Also the same vector may be used as
both a mask and an integer vector. Things become more complex if you
additionally have broadcasts and vector pack/unpack code. It also
should be transformed into a scalar masks manipulations somehow.
Or why not model the conversion at the gimple level using a 
CONVERT_EXPR?   In fact, the more I think about it, that seems to make 
more sense to me.


We pick a canonical form for the mask, whatever it may be.  We use that 
canonical form and model conversions between it and the other form via 
CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant 
conversions.  If it's not up to the task, we should really look into why 
and resolve.


Yes, that does mean we have two forms which I'm not terribly happy about 
and it means some target dependencies on what the masked vector 
operation looks like (ie, does it accept a simple integer or vector 
mask), but I'm starting to wonder if, as distasteful as I find it, it's 
the right thing to do.




But I don't like changing our IL so much as to allow 'integer' masks everywhere.

I'm warming up to that idea...

jeff



Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-25 Thread Martin Jambor
Hi,

On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:
> This changes the completely_scalarize_record path to also work on arrays (thus
> allowing records containing arrays, etc.). This just required extending the
> existing type_consists_of_records_p and completely_scalarize_record methods
> to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both
> methods so as not to mention 'record'.

thanks for working on this.  I see Jeff has already approved the
patch, but I have two comments nevertheless.  First, I would be much
happier if you added a proper comment to scalarize_elem function which
you forgot completely.  The name is not very descriptive and it has
quite few parameters too.

Second, this patch should also fix PR 67283.  It would be great if you
could verify that and add it to the changelog when committing if that
is indeed the case.

Thanks,

Martin


> 
> Bootstrapped + check-gcc on aarch64-none-linux-gnu, arm-none-linux-gnueabihf 
> and x86_64-none-linux-gnu.
> 
> Have also verified the scan-tree-dump check in the new sra-15.c passes (using 
> a stage 1 compiler only, no execution test) on alpha, hppa, powerpc, sparc, 
> avr and sh.
> 
> gcc/ChangeLog:
> 
>   * tree-sra.c (type_consists_of_records_p): Rename to...
>   (scalarizable_type_p): ...this, add case for ARRAY_TYPE.
> 
>   (completely_scalarize_record): Rename to...
>   (completely_scalarize): ...this, add ARRAY_TYPE case, move some code to:
>   (scalarize_elem): New.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/tree-ssa/sra-15.c: New.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/sra-15.c |  38 +
>  gcc/tree-sra.c | 146 
> ++---
>  2 files changed, 135 insertions(+), 49 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
> new file mode 100644
> index 000..e251058
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
> @@ -0,0 +1,38 @@
> +/* Verify that SRA total scalarization works on records containing arrays.  
> */
> +/* Test skipped for targets with small (often default) MOVE_RATIO.  */
> +/* { dg-do run } */
> +/* { dg-options "-O1 -fdump-tree-release_ssa --param 
> sra-max-scalarization-size-Ospeed=32" } */
> +
> +extern void abort (void);
> +
> +struct S
> +{
> +  char c;
> +  unsigned short f[2][2];
> +  int i;
> +  unsigned short f3, f4;
> +};
> +
> +
> +int __attribute__ ((noinline))
> +foo (struct S *p)
> +{
> +  struct S l;
> +
> +  l = *p;
> +  l.i++;
> +  l.f[1][0] += 3;
> +  *p = l;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  struct S a = {0, { {5, 7}, {9, 11} }, 4, 0, 0};
> +  foo (&a);
> +  if (a.i != 5 || a.f[1][0] != 12)
> +abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index a0c92b0..08fa8dc 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -915,74 +915,122 @@ create_access (tree expr, gimple stmt, bool write)
>  }
>  
>  
> -/* Return true iff TYPE is a RECORD_TYPE with fields that are either of 
> gimple
> -   register types or (recursively) records with only these two kinds of 
> fields.
> -   It also returns false if any of these records contains a bit-field.  */
> +/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or ARRAY_TYPE 
> with
> +   fields that are either of gimple register types (excluding bit-fields)
> +   or (recursively) scalarizable types.  */
>  
>  static bool
> -type_consists_of_records_p (tree type)
> +scalarizable_type_p (tree type)
>  {
> -  tree fld;
> +  gcc_assert (!is_gimple_reg_type (type));
>  
> -  if (TREE_CODE (type) != RECORD_TYPE)
> -return false;
> +  switch (TREE_CODE (type))
> +  {
> +  case RECORD_TYPE:
> +for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> +  if (TREE_CODE (fld) == FIELD_DECL)
> + {
> +   tree ft = TREE_TYPE (fld);
>  
> -  for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> -if (TREE_CODE (fld) == FIELD_DECL)
> -  {
> - tree ft = TREE_TYPE (fld);
> +   if (DECL_BIT_FIELD (fld))
> + return false;
>  
> - if (DECL_BIT_FIELD (fld))
> -   return false;
> +   if (!is_gimple_reg_type (ft)
> +   && !scalarizable_type_p (ft))
> + return false;
> + }
>  
> - if (!is_gimple_reg_type (ft)
> - && !type_consists_of_records_p (ft))
> -   return false;
> -  }
> +return true;
>  
> -  return true;
> +  case ARRAY_TYPE:
> +{
> +  tree elem = TREE_TYPE (type);
> +  if (DECL_P (elem) && DECL_BIT_FIELD (elem))
> + return false;
> +  if (!is_gimple_reg_type (elem)
> +  && !scalarizable_type_p (elem))
> + return false;
> +  return true;
> +}
> +  default:
> +return false;
> +  }
>  }
>  
> -/* Create total_scalarization accesses for all sca

Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-25 Thread Jeff Law

On 08/25/2015 03:42 PM, Martin Jambor wrote:

Hi,

On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:

This changes the completely_scalarize_record path to also work on arrays (thus
allowing records containing arrays, etc.). This just required extending the
existing type_consists_of_records_p and completely_scalarize_record methods
to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both
methods so as not to mention 'record'.


thanks for working on this.  I see Jeff has already approved the
patch, but I have two comments nevertheless.  First, I would be much
happier if you added a proper comment to scalarize_elem function which
you forgot completely.  The name is not very descriptive and it has
quite few parameters too.
Right.  I mentioned that I missed the lack of function comments when 
looking at #3 and asked Alan to go back and fix them in #1 and #2.




Second, this patch should also fix PR 67283.  It would be great if you
could verify that and add it to the changelog when committing if that
is indeed the case.

Excellent.  Yes, definitely mention the BZ.

jeff



Re: PING: PATCH: Mention --enable-default-pie in gcc-6/changes.html

2015-08-25 Thread Gerald Pfeifer
On Tue, 25 Aug 2015, H.J. Lu wrote:
>> +Added --enable-default-pie configure option to
>> +   generate PIE by default.
>> +  
> PING.

How about something more like "The ... configure option enables
generation of PIE by default"?  That feels more consistent with
the rest of the page.

Okay with that change.

Gerald


Re: [PATCH] Update wwwdocs for --with-advance-toolchain=at

2015-08-25 Thread Gerald Pfeifer
Hi Michael,

On Tue, 25 Aug 2015, Michael Meissner wrote:
> + A new configuration option 
> ---with-advance-toolchain=at
> + was added for PowerPC 64-bit Linux systems to use the header files, 
> library

should this be GNU/Linux in the above (per guidance from the FSF)?

> + files, and the dynamic linker from a specific Advance Toolchain release
> + instead of the default versions that are provided by the Linux
> + distribution.

Same here?

Gerald


Re: top-level configure.ac: factor the libgomp check for posix-like OS

2015-08-25 Thread Gary Funck

Ended up using the same approach as libatomic,
moving the checking logic into libgupc/configure.tgt.

+# Disable libgupc on unsupported systems.
+if test -d ${srcdir}/libgupc; then
+if test x$enable_libgupc = x; then
+   AC_MSG_CHECKING([for libgupc support])
+   if (srcdir=${srcdir}/libgupc; \
+   . ${srcdir}/configure.tgt; \
+   test -n "$UNSUPPORTED")
+   then
+   AC_MSG_RESULT([no])
+   noconfigdirs="$noconfigdirs target-libgupc"
+   else
+   AC_MSG_RESULT([yes])
+   fi
+fi
+fi
+

Thanks,
- Gary



Re: [PATCH] Update wwwdocs for --with-advance-toolchain=at

2015-08-25 Thread Michael Meissner
On Tue, Aug 25, 2015 at 11:58:21PM +0200, Gerald Pfeifer wrote:
> Hi Michael,
> 
> On Tue, 25 Aug 2015, Michael Meissner wrote:
> > + A new configuration option 
> > ---with-advance-toolchain=at
> > + was added for PowerPC 64-bit Linux systems to use the header files, 
> > library
> 
> should this be GNU/Linux in the above (per guidance from the FSF)?
> 
> > + files, and the dynamic linker from a specific Advance Toolchain 
> > release
> > + instead of the default versions that are provided by the Linux
> > + distribution.
> 
> Same here?
> 
> Gerald

I checked in a fix.  Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-25 Thread Martin Jambor
Hi,

On Tue, Aug 25, 2015 at 12:06:15PM +0100, Alan Lawrence wrote:
> When SRA completely scalarizes an array, this patch changes the
> generated accesses from e.g.
> 
>MEM[(int[8] *)&a + 4B] = 1;
> 
> to
> 
>a[1] = 1;
> 
> This overcomes a limitation in dom2, that accesses to equivalent
> chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with
> accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant
> propagation in the ssa-dom-cse-2.c testcase (after the next patch
> that makes SRA handle constant-pool loads).
> 
> I tried to work around this by making dom2's hashable_expr_equal_p
> less conservative, but found that on platforms without AArch64's
> vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
> mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
> *)&a] equivalent to a[0], etc.; a complete overhaul of
> hashable_expr_equal_p seems like a larger task than this patch
> series.

Uff.  Well, while I am obviously not excited about such workaround
landing in SRA, if maintainers agree that it is reasonable, I suppose
I'll manage to live with it.

I also have more specific comments:

> 
> I can't see how to write a testcase for this in C though as direct assignment 
> to an array is not possible; such assignments occur only with constant pool 
> data, which is dealt with in the next patch.
> 
> Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, 
> aarch64-none-linux-gnu.
> 
> gcc/ChangeLog:
> 
>   * tree-sra.c (completely_scalarize): Move some code into:
>   (get_elem_size): New.
>   (build_ref_for_offset): Build ARRAY_REF if base is aligned array.
> ---
>  gcc/tree-sra.c | 110 
> -
>  1 file changed, 69 insertions(+), 41 deletions(-)
> 
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 08fa8dc..af35fcc 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -957,6 +957,20 @@ scalarizable_type_p (tree type)
>}
>  }
>  
> +static bool
> +get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out)

As Jeff already pointed out, this function needs a comment.

> +{
> +  gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> +  tree t_size = TYPE_SIZE (TREE_TYPE (type));
> +  if (!t_size || !tree_fits_uhwi_p (t_size))
> +return false;
> +  unsigned HOST_WIDE_INT sz = tree_to_uhwi (t_size);
> +  if (!sz)
> +return false;
> +  *sz_out = sz;
> +  return true;
> +}
> +
>  static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
>  
>  /* Create total_scalarization accesses for all scalar fields of a member
> @@ -985,10 +999,9 @@ completely_scalarize (tree base, tree decl_type, 
> HOST_WIDE_INT offset, tree ref)
>  case ARRAY_TYPE:
>{
>   tree elemtype = TREE_TYPE (decl_type);
> - tree elem_size = TYPE_SIZE (elemtype);
> - gcc_assert (elem_size && tree_fits_uhwi_p (elem_size));
> - int el_size = tree_to_uhwi (elem_size);
> - gcc_assert (el_size);
> + unsigned HOST_WIDE_INT el_size;
> + if (!get_elem_size (decl_type, &el_size))
> +   gcc_assert (false);

This is usually written as gcc_unreachable ()

>  
>   tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
>   tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
> @@ -1563,7 +1576,7 @@ build_ref_for_offset (location_t loc, tree base, 
> HOST_WIDE_INT offset,
>tree off;
>tree mem_ref;
>HOST_WIDE_INT base_offset;
> -  unsigned HOST_WIDE_INT misalign;
> +  unsigned HOST_WIDE_INT misalign, el_sz;
>unsigned int align;
>  
>gcc_checking_assert (offset % BITS_PER_UNIT == 0);
> @@ -1572,47 +1585,62 @@ build_ref_for_offset (location_t loc, tree base, 
> HOST_WIDE_INT offset,
>  
>/* get_addr_base_and_unit_offset returns NULL for references with a 
> variable
>   offset such as array[var_index].  */
> -  if (!base)
> -{
> -  gassign *stmt;
> -  tree tmp, addr;
> -
> -  gcc_checking_assert (gsi);
> -  tmp = make_ssa_name (build_pointer_type (TREE_TYPE (prev_base)));
> -  addr = build_fold_addr_expr (unshare_expr (prev_base));
> -  STRIP_USELESS_TYPE_CONVERSION (addr);
> -  stmt = gimple_build_assign (tmp, addr);
> -  gimple_set_location (stmt, loc);
> -  if (insert_after)
> - gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> -  else
> - gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> -
> -  off = build_int_cst (reference_alias_ptr_type (prev_base),
> -offset / BITS_PER_UNIT);
> -  base = tmp;
> -}
> -  else if (TREE_CODE (base) == MEM_REF)
> -{
> -  off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)),
> -base_offset + offset / BITS_PER_UNIT);
> -  off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off);
> -  base = unshare_expr (TREE_OPERAND (base, 0));
> +  if (base
> +  && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE
> +  && misalign == 0
> +  && get_elem_size (T

Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-08-25 Thread Ian Lance Taylor
On Tue, Aug 18, 2015 at 1:36 PM, Lynn A. Boger
 wrote:
>
> libgo/
> PR target/66870
> configure.ac:  When gccgo for building libgo uses the gold version
> containing split stack support on ppc64, ppc64le, define
> LINKER_SUPPORTS_SPLIT_STACK.
> configure:  Regenerate.

Your version test for gold isn't robust: if the major version >= 3,
then presumably split stack is supported.  And since you have numbers,
I suggest not trying to use switch, but instead writing something like
if expr "$gold_minor" == 25; then
...
elif expr "$gold_minor" > 25; then
...
fi

If that is fixed, I'm fine with the libgo part of this patch.

Ian


Go patch committed: disallow unary ^ on boolean values

2015-08-25 Thread Ian Lance Taylor
The Go frontend erroneously permitted unary ^ on boolean values.  This
patch by Chris Manghane fixes the problem, fixing
https://golang.org/issue/11529 .  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227193)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-f97d579fa8431af5cfde9b0a48604caabfd09377
+d5e6af4e6dd456075a1ec1c03d0dc41cbea5eb36
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 227191)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -3943,9 +3943,8 @@ Unary_expression::do_check_types(Gogo*)
   break;
 
 case OPERATOR_XOR:
-  if (type->integer_type() == NULL
- && !type->is_boolean_type())
-   this->report_error(_("expected integer or boolean type"));
+  if (type->integer_type() == NULL)
+   this->report_error(_("expected integer"));
   break;
 
 case OPERATOR_AND:


Re: [PATCH] rs6000: Fix PR67344

2015-08-25 Thread Alan Modra
On Tue, Aug 25, 2015 at 10:08:54AM -0700, Segher Boessenkool wrote:
> -(define_insn_and_split "*and3_imm_dot_shifted"
> -  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y")
> +(define_insn "*and3_imm_dot_shifted"
> +  [(set (match_operand:CC 3 "cc_reg_operand" "=x")

Is this really the best solution?  The operand predicate allows any
cr, but the constraint only cr0.  In the past we've seen this sort of
thing result in "insn does not satisfy its constraints" errors, and if
the operand is successfully reloaded you'll get slow mcrf insns.  At
-O1 the testcase generates:

andi. 8,3,0x16
mcrf 4,0

I started throwing together a patch yesterday, before you claimed the
bug.  With this patch, I see what looks to be better code despite it
being larger:

li 9,22
and 9,3,9
cmpdi 4,9,0


diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 87abd6e..a9eea80 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3060,17 +3060,18 @@
 return "#";
 }
   "&& reload_completed && cc_reg_not_cr0_operand (operands[3], CCmode)"
-  [(set (match_dup 0)
-   (and:GPR (lshiftrt:GPR (match_dup 1)
-  (match_dup 4))
-(match_dup 2)))
+  [(set (match_dup 0) (match_dup 2))
+   (set (match_dup 0) (and:GPR (match_dup 1) (match_dup 0)))
(set (match_dup 3)
(compare:CC (match_dup 0)
(const_int 0)))]
-  ""
+  "
+{
+  operands[2] = GEN_INT (UINTVAL (operands[2]) << INTVAL (operands[4]));
+}"
   [(set_attr "type" "logical")
(set_attr "dot" "yes")
-   (set_attr "length" "4,8")])
+   (set_attr "length" "4,12")])
 
 
 (define_insn "and3_mask"

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH], PowerPC IEEE 128-bit patch #5

2015-08-25 Thread Michael Meissner
On Wed, Aug 19, 2015 at 07:41:24AM -0500, Segher Boessenkool wrote:
> On Fri, Aug 14, 2015 at 11:46:03AM -0400, Michael Meissner wrote:
> > +;; Like int_reg_operand, but don't return true for pseudo registers
> > +(define_predicate "int_reg_operand_not_pseudo"
> > +  (match_operand 0 "register_operand")
> > +{
> > +  if ((TARGET_E500_DOUBLE || TARGET_SPE) && invalid_e500_subreg (op, mode))
> > +return 0;
> > +
> > +  if (GET_CODE (op) == SUBREG)
> > +op = SUBREG_REG (op);
> > +
> > +  if (!REG_P (op))
> > +return 0;
> > +
> > +  if (REGNO (op) >= FIRST_PSEUDO_REGISTER)
> > +return 0;
> > +
> > +  return INT_REGNO_P (REGNO (op));
> > +})
> 
> Since you use this only once, maybe it is easier (to read, etc.) if you
> just test it there?  Hard regs do not get subregs.

I was worried about hard regs appearing before reload, and possibly being
subregs, so I'll remove the SUBREG test.

> > +(define_insn_and_split "ieee_128bit_vsx_neg2"
> > +  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
> > +   (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
> > +   (clobber (match_scratch:V16QI 2 "=v"))]
> > +  "TARGET_FLOAT128 && FLOAT128_IEEE_P (mode)"
> > +  "#"
> > +  "&& 1"
> > +  [(parallel [(set (match_dup 0)
> > +  (neg:TFIFKF (match_dup 1)))
> > + (use (match_dup 2))])]
> > +{
> > +  if (GET_CODE (operands[2]) == SCRATCH)
> > +operands[2] = gen_reg_rtx (V16QImode);
> > +
> > +  operands[3] = gen_reg_rtx (V16QImode);
> > +  emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
> > +}
> > +  [(set_attr "length" "8")
> > +   (set_attr "type" "vecsimple")])
> 
> Where is operands[3] used?  I guess that whole line should be deleted?

Good catch. It was from the earlier patch before the fix for PR 67071, which
added better support for vector constants that can be constructed with several
vector operations, including a vector octet shift.

> > +(define_insn "*ieee_128bit_vsx_neg2_internal"
> > +  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
> > +   (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
> > +   (use (match_operand:V16QI 2 "register_operand" "=v"))]
> > +  "TARGET_FLOAT128"
> > +  "xxlxor %x0,%x1,%x2"
> > +  [(set_attr "length" "4")
> > +   (set_attr "type" "vecsimple")])
> 
> Length 4 is default, you can just leave it out (like we do for most
> machine insns already).

Ok, though I tend to always put them in.

Here is the revised patch. Is it ok to install?

2015-08-25  Michael Meissner  

* config/rs6000/predicates.md (int_reg_operand_not_pseudo): New
predicate for only GPR hard registers.

* config/rs6000/rs6000.md (FP): Add IEEE 128-bit floating point
modes to iterators. Add new iterators for moving 128-bit values in
scalar FPR registers and VSX registers.
(FMOVE128): Likewise.
(FMOVE128_FPR): Likewise.
(FMOVE128_GPR): Likewise.
(FMOVE128_VSX): Likewise.
(FLOAT128_SFDFTF): New iterators for IEEE 128-bit floating point
in VSX registers.
(IFKF): Likewise.
(IBM128): Likewise.
(TFIFKF): Likewise.
(RELOAD): Add IEEE 128-bit floating point modes.
(signbittf2): Convert TF insns to add support for new IEEE 128-bit
floating point in VSX registers modes.
(signbit2, IBM128 iterator): Likewise.
(mov_64bit_dm, FMOVE128_FPR iterator): Likewise.
(mov_32bit, FMOVE128_FPR iterator): Likewise.
(negtf2): Likewise.
(neg2, TFIFKF iterator): Likewise.
(negtf2_internal): Likewise.
(abstf2): Likewise.
(abs2, TFIFKF iterator): Likewise.
(ieee_128bit_negative_zero): New IEEE 128-bit floating point in
VSX insn support for negate, absolute value, and negative absolute
value.
(ieee_128bit_vsx_neg2): Likewise.
(ieee_128bit_vsx_neg2_internal): Likewise.
(ieee_128bit_vsx_abs2): Likewise.
(ieee_128bit_vsx_abs2_internal): Likewise.
(ieee_128bit_vsx_nabs2): Likewise.
(ieee_128bit_vsx_nabs2_internal): Likewise.
(FP128_64): Update pack/unpack 128-bit insns for IEEE 128-bit
floating point in VSX registers.
(unpack_dm): Likewise.
(unpack_nodm): Likewise.
(pack): Likewise.
(unpackv1ti): Likewise.
(unpack, FMOVE128_VSX iterator): Likewise.
(packv1ti): Likewise.
(pack, FMOVE128_VSX iterator): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 227180)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -239,6 +239,25 @@ (define_predicate "int_reg_operand"
   return INT_REGNO_P (REGNO (op));
 })
 
+;; Like int_reg_operand, but don't return true for pseudo regi

  1   2   >