[testsuite] Clean up effective_target cache
Hi, Some subsets of the tests override ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS and perform effective_target support tests using these modified flags. In case these flags conflict with the effective_target tests, it means that subsequent tests will be UNSUPPORTED even though ALWAYS_CXXFLAGS/TEST_ALWAYS_FLAGS have been reset and no longer conflict. In practice, we noticed this when running validation under 'ulimit -v XXX', which can conflict with ASAN. We observed that sse2 and stack_protector tests would randomly fail when tested from asan.exp, making non-asan tests UNSUPPORTED. This patch adds a new function 'clear_effective_target_cache', which is called at the end of every .exp file which overrides ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS. I tested it works well for asan.exp on x86_64 but the changes in other .exp files seem mechanical. However, I noticed that lib/g++.exp changes ALWAYS_CXXFLAGS, but does not appear to restore it. In doubt, I didn't change it. OK? Christophe. 2015-08-25 Christophe Lyon * lib/target-supports.exp (clear_effective_target_cache): New. (check_cached_effective_target): Update et_prop_list. * lib/asan-dg.exp (asan_finish): Call clear_effective_target_cache. * g++.dg/compat/compat.exp: Likewise. * g++.dg/compat/struct-layout-1.exp: Likewise. * lib/asan-dg.exp: Likewise. * lib/atomic-dg.exp: Likewise. * lib/cilk-plus-dg.exp: Likewise. * lib/clearcap.exp: Likewise. * lib/mpx-dg.exp: Likewise. * lib/target-supports.exp: Likewise. * lib/tsan-dg.exp: Likewise. * lib/ubsan-dg.exp: Likewise. diff --git a/gcc/testsuite/g++.dg/compat/compat.exp b/gcc/testsuite/g++.dg/compat/compat.exp index 1272289..4c4b25f 100644 --- a/gcc/testsuite/g++.dg/compat/compat.exp +++ b/gcc/testsuite/g++.dg/compat/compat.exp @@ -78,6 +78,7 @@ proc compat-use-tst-compiler { } { set ALWAYS_CXXFLAGS $save_always_cxxflags set ld_library_path $save_ld_library_path set_ld_library_path_env_vars + clear_effective_target_cache } } diff --git a/gcc/testsuite/g++.dg/compat/struct-layout-1.exp b/gcc/testsuite/g++.dg/compat/struct-layout-1.exp index d98..097a731 100644 --- a/gcc/testsuite/g++.dg/compat/struct-layout-1.exp +++ b/gcc/testsuite/g++.dg/compat/struct-layout-1.exp @@ -61,6 +61,7 @@ proc compat-use-alt-compiler { } { set ld_library_path $alt_ld_library_path set_ld_library_path_env_vars restore_gcc_exec_prefix_env_var + clear_effective_target_cache } } diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp index 141a479..3ce264e 100644 --- a/gcc/testsuite/lib/asan-dg.exp +++ b/gcc/testsuite/lib/asan-dg.exp @@ -138,6 +138,7 @@ proc asan_finish { args } { } set ld_library_path $asan_saved_library_path set_ld_library_path_env_vars +clear_effective_target_cache } # Symbolize lines like diff --git a/gcc/testsuite/lib/atomic-dg.exp b/gcc/testsuite/lib/atomic-dg.exp index d9df227..fe24127 100644 --- a/gcc/testsuite/lib/atomic-dg.exp +++ b/gcc/testsuite/lib/atomic-dg.exp @@ -101,4 +101,5 @@ proc atomic_finish { args } { } else { unset TEST_ALWAYS_FLAGS } +clear_effective_target_cache } diff --git a/gcc/testsuite/lib/cilk-plus-dg.exp b/gcc/testsuite/lib/cilk-plus-dg.exp index 38e5400..7f38f37 100644 --- a/gcc/testsuite/lib/cilk-plus-dg.exp +++ b/gcc/testsuite/lib/cilk-plus-dg.exp @@ -101,4 +101,5 @@ proc cilkplus_finish { args } { } else { unset TEST_ALWAYS_FLAGS } +clear_effective_target_cache } diff --git a/gcc/testsuite/lib/clearcap.exp b/gcc/testsuite/lib/clearcap.exp index d41aa1e..3e2a88c 100644 --- a/gcc/testsuite/lib/clearcap.exp +++ b/gcc/testsuite/lib/clearcap.exp @@ -55,4 +55,5 @@ proc clearcap-finish { args } { } else { unset TEST_ALWAYS_FLAGS } +clear_effective_target_cache } diff --git a/gcc/testsuite/lib/mpx-dg.exp b/gcc/testsuite/lib/mpx-dg.exp index c8f64cd..b2bd40c 100644 --- a/gcc/testsuite/lib/mpx-dg.exp +++ b/gcc/testsuite/lib/mpx-dg.exp @@ -142,4 +142,5 @@ proc mpx_finish { args } { } set ld_library_path $mpx_saved_library_path set_ld_library_path_env_vars +clear_effective_target_cache } diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 1988301..e2084bb 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -117,6 +117,7 @@ proc current_target_name { } { proc check_cached_effective_target { prop args } { global et_cache +global et_prop_list set target [current_target_name] if {![info exists et_cache($prop,target)] @@ -124,12 +125,30 @@ proc check_cached_effective_target { prop args } { verbose "check_cached_effective_target $prop: checking $target" 2 set et_cache($prop,target) $target set et_cache($prop,value) [uplevel eval $args] + lappend et_prop_list $prop + verbose "check_cached_effective_target cached list is now: $et_prop_list" 2 } set value $et_cache($prop,value) verbose "check_cached_effective_targe
Re: [AArch64] [TLSIE][1/2] Rename test source file for reuse
On 19 June 2015 at 10:15, Jiong Wang wrote: > > Rename test source from tlsle.c into tls.c for reuse purpose. > > tls.c will be used as test source file for all TLS test, we just need to > specify different tls options in different testcases. > > 2015-06-19 Jiong Wang > > gcc/testsuite/ > * gcc.target/aarch64/tlsle.c: Rename to tls.c > * gcc.target/aarch64/aarch64/tlsle12.c: Update source file name. > * gcc.target/aarch64/aarch64/tlsle24.c: Ditto. > * gcc.target/aarch64/aarch64/tlsle32.c: Ditto. OK Thanks /Marcus
[PATCH, ARM] List Cs and US constraints as being used
Hi, The header in gcc/config/arm/constraints.md list all the ARM-specific constraints defined and for which targets they are but miss a couple of them. This patch add the missing Cs and US constraints to the list. Patch was tested by verifying that arm-none-eabi-gcc cross-compiler can still be build (ie the comment remains a comment). diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 42935a4..2d9ffb8 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -21,7 +21,7 @@ ;; The following register constraints have been used: ;; - in ARM/Thumb-2 state: t, w, x, y, z ;; - in Thumb state: h, b -;; - in both states: l, c, k, q, US +;; - in both states: l, c, k, q, Cs, Ts, US ;; In ARM state, 'l' is an alias for 'r' ;; 'f' and 'v' were previously used for FPA and MAVERICK registers. Committed as obvious with the following ChangeLog entry: 2015-08-25 Thomas Preud'homme * config/arm/constraints.md: Also list Cs and US ARM-specific constraints as used. Best regards, Thomas
[PATCH][AARCH64]Fix for branch offsets over 1 MiB
Conditional branches have a maximum range of [-1048576, 1048572]. Any destination further away can not be reached by these. To be able to have conditional branches in very large functions, we invert the condition and change the destination to jump over an unconditional branch to the original, far away, destination. gcc/ChangeLog: 2015-08-07 Ramana Radhakrishnan Andre Vieira * config/aarch64/aarch64.md (*condjump): Handle functions > 1 Mib. (*cb1): Idem. (*tb1): Idem. (*cb1): Idem. * config/aarch64/iterators.md (inv_cb): New code attribute. (inv_tb): Idem. * config/aarch64/aarch64.c (aarch64_gen_far_branch): New. * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New. gcc/testsuite/ChangeLog: 2015-08-07 Andre Vieira * gcc.target/aarch64/long-branch.c: New test. From 9759c5a50c44b0421c7911014e63a6222dd9017d Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Fri, 14 Aug 2015 10:21:57 +0100 Subject: [PATCH] fix for far branches --- gcc/config/aarch64/aarch64-protos.h| 1 + gcc/config/aarch64/aarch64.c | 23 + gcc/config/aarch64/aarch64.md | 89 +++- gcc/config/aarch64/iterators.md| 6 + gcc/testsuite/gcc.target/aarch64/long_branch.c | 565 + 5 files changed, 669 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/long_branch.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 32b5d0958a6e0b2356874736f858f007fe68cdda..87a26deb6a0dbf13e25275baeebec21a37a42f41 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -316,6 +316,7 @@ unsigned aarch64_trampoline_size (void); void aarch64_asm_output_labelref (FILE *, const char *); void aarch64_cpu_cpp_builtins (cpp_reader *); void aarch64_elf_asm_named_section (const char *, unsigned, tree); +const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *); void aarch64_err_no_fpadvsimd (machine_mode, const char *); void aarch64_expand_epilogue (bool); void aarch64_expand_mov_immediate (rtx, rtx); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 7159f5aca5df97f154b3e654f60af9136354f335..3b491a232d81892b6511bad84e4174d939fa1be7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -587,6 +587,29 @@ static const char * const aarch64_condition_codes[] = "hi", "ls", "ge", "lt", "gt", "le", "al", "nv" }; +/* Generate code to enable conditional branches in functions over 1 MiB. */ +const char * +aarch64_gen_far_branch (rtx * operands, int pos_label, const char * dest, + const char * branch_format) +{ +rtx_code_label * tmp_label = gen_label_rtx (); +char label_buf[256]; +char buffer[128]; +ASM_GENERATE_INTERNAL_LABEL (label_buf, dest, + CODE_LABEL_NUMBER (tmp_label)); +const char *label_ptr = targetm.strip_name_encoding (label_buf); +rtx dest_label = operands[pos_label]; +operands[pos_label] = tmp_label; + +snprintf (buffer, sizeof (buffer), "%s%s", branch_format, label_ptr); +output_asm_insn (buffer, operands); + +snprintf (buffer, sizeof (buffer), "b\t%%l%d\n%s:", pos_label, label_ptr); +operands[pos_label] = dest_label; +output_asm_insn (buffer, operands); +return ""; +} + void aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg) { diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 35255e91a95cdf20d52270470202f7499ba46bb2..74f6e3ec4bdcd076c2ad6d1431102aa50dfb5068 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -181,6 +181,13 @@ (const_string "no") ] (const_string "yes"))) +;; Attribute that specifies whether we are dealing with a branch to a +;; label that is far away, i.e. further away than the maximum/minimum +;; representable in a signed 21-bits number. +;; 0 :=: no +;; 1 :=: yes +(define_attr "far_branch" "" (const_int 0)) + ;; --- ;; Pipeline descriptions and scheduling ;; --- @@ -308,8 +315,23 @@ (label_ref (match_operand 2 "" "")) (pc)))] "" - "b%m0\\t%l2" - [(set_attr "type" "branch")] + { +if (get_attr_length (insn) == 8) + return aarch64_gen_far_branch (operands, 2, "Lbcond", "b%M0\\t"); +else + return "b%m0\\t%l2"; + } + [(set_attr "type" "branch") + (set (attr "length") + (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) + (lt (minus (match_dup 2) (pc)) (const_int 1048572))) + (const_int 4) + (const_int 8))) + (set (attr "far_branch") + (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) + (lt (minus (match_dup 2) (pc)) (const_int 1048572))) + (const_int 0) +
Re: [AArch64] [TLSIE][2/2] Implement TLS IE for tiny model
On 19 June 2015 at 10:15, Jiong Wang wrote: > > Currently, TLS IE is supported on small model only. This patch implement > TLS Initial-exec model support for AArch64 tiny memory model. > > Under tiny model, we only allow 1M loadable segment size, one single ldr > instruction is enough for addressing the got entry for TLS IE directly. > > The code sequence is: > > A: mrs tp, tpidr_el0 > B0: ldr t0, :gottprel:x1 R_AARCH64_TLSIE_LD_GOTTPREL_PREL19 x1 > B1: add t0, t0, tp > > B0 and B1 should not be scheduled, as the pattern will be recognized > later for linker IE model to LE model optimization. > > 2015-06-19 Marcus Shawcroft > Jiong Wang > > gcc/ > * config/aarch64/aarch64.md (UNSPEC_GOTTINYTLS): New UNSPEC. > (tlsie_tiny_): New define_insn. > (tlsie_tiny_sidi): Ditto. > * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Define > SYMBOL_TINY_TLSIE. > (aarch64_symbol_context): New comment for SYMBOL_TINY_TLSIE. > * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Support > SYMBOL_TINY_TLSIE. > (aarch64_expand_mov_immediate): Ditto. > (aarch64_print_operand): Ditto. > (arch64_classify_tls_symbol): Ditto. > > gcc/testsuite/ > * gcc.target/aarch64/tlsie_tiny.c: New testcase. > > -- > Regards, > Jiong > OK /Marcus
Re: [AArch64][TLSLE][2/3] Add the option "-mtls-size" for AArch64
> 2015-08-19 Jiong Wang > > gcc/ > * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Rename > SYMBOL_TLSLE to SYMBOL_TLSLE24. > * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Likewise > (aarch64_expand_mov_immediate): Likewise > (aarch64_print_operand): Likewise > (aarch64_classify_symbol): Likewise > OK /Marcus
[PATCH, PR 57195] Allow mode iterators inside angle brackets
This patch allow mode iterators inside angle brackets in machine description files. I discovered the issue when attempting to use iterators on match_operand's as follows: match_operand: 0 "s_register_operand" "=w") The function 'read_name' is nor properly handling ':' inside angle brackets. Bootstrapped on arm-linux. OK for trunk? 2015-08-25 Michael Collison PR other/57195 * read-md.c (read_name): Allow mode iterators inside angle brackets in rtl expressions. diff --git a/gcc/read-md.c b/gcc/read-md.c index 9f158ec..0171fb0 100644 --- a/gcc/read-md.c +++ b/gcc/read-md.c @@ -399,17 +399,25 @@ read_name (struct md_name *name) { int c; size_t i; + bool in_angle_bracket; c = read_skip_spaces (); i = 0; + in_angle_bracket = false; while (1) { + if (c == '<') +in_angle_bracket = true; + + if (c == '>') +in_angle_bracket = false; + if (c == ' ' || c == '\n' || c == '\t' || c == '\f' || c == '\r' || c == EOF) break; - if (c == ':' || c == ')' || c == ']' || c == '"' || c == '/' - || c == '(' || c == '[') + if (((c == ':') and (!in_angle_bracket)) || c == ')' || c == ']' + || c == '"' || c == '/' || c == '(' || c == '[') { unread_char (c); break; -- Michael Collison Linaro Toolchain Working Group michael.colli...@linaro.org
Re: [AArch64][TLSLE][3/3] Implement local executable mode for all memory model
> 2015-08-19 Marcus Shawcroft > Jiong Wang > gcc/ > * config/aarch64/aarch64.c (initialize_aarch64_tls_size): Set default > tls size for tiny, small, large memory model. > (aarch64_load_symref_appropriately): Support new symbol types. > (aarch64_expand_mov_immediate): Likewise. > (aarch64_print_operand): Likewise. > (aarch64_classify_tls_symbol): Likewise. > * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Likewise. > (aarch64_symbol_type): Likewise. > * config/aarch64/aarch64.md (tlsle): Deleted. > (tlsle12_): New define_insn. > (tlsle24_): Likewise. > (tlsle32_): Likewise. > (tlsle48_): Likewise. > * doc/sourcebuild.texi (AArch64-specific attributes): Document > "aarch64_tlsle32". > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_aarch64_tlsle32): > New test directive. > * gcc.target/aarch64/tlsle_1.x: New test source. > * gcc.target/aarch64/tlsle12.c: New testcase. > * gcc.target/aarch64/tlsle24.c: New testcase. > * gcc.target/aarch64/tlsle32.c: New testcase. > -- > OK /Marcus
Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB
On Tue, Aug 25, 2015 at 5:37 PM, Andre Vieira wrote: > Conditional branches have a maximum range of [-1048576, 1048572]. Any > destination further away can not be reached by these. > To be able to have conditional branches in very large functions, we invert > the condition and change the destination to jump over an unconditional > branch to the original, far away, destination. > > gcc/ChangeLog: > 2015-08-07 Ramana Radhakrishnan > Andre Vieira > > * config/aarch64/aarch64.md (*condjump): Handle functions > 1 > Mib. > (*cb1): Idem. > (*tb1): Idem. > (*cb1): Idem. > * config/aarch64/iterators.md (inv_cb): New code attribute. > (inv_tb): Idem. > * config/aarch64/aarch64.c (aarch64_gen_far_branch): New. > * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New. > > gcc/testsuite/ChangeLog: > 2015-08-07 Andre Vieira > > * gcc.target/aarch64/long-branch.c: New test. Just a few comments about the testcase. You could improve the size (on disk) of the testcase by using the preprocessor some more: Something like: #define CASE_ENTRY2 (x) CASE_ENTRY ((x)) CASE_ENTRY ((x)+1) #define CASE_ENTRY4 (x) CASE_ENTRY2 ((x)) CASE_ENTRY2 ((x)+2+1) #define CASE_ENTRY8 (x) CASE_ENTRY4 ((x)) CASE_ENTRY4 ((x)+4+1) #define CASE_ENTRY16 (x) CASE_ENTRY8 ((x)) CASE_ENTRY8 ((x)+8+1) #define CASE_ENTRY32 (x) CASE_ENTRY16 ((x)) CASE_ENTRY16 ((x)+16) #define CASE_ENTRY64 (x) CASE_ENTRY32 ((x)) CASE_ENTRY32 ((x)+32+1) #define CASE_ENTRY128 (x) CASE_ENTRY64 ((x)) CASE_ENTRY16 ((x)+64+1) #define CASE_ENTRY256 (x) CASE_ENTRY128 ((x)) CASE_ENTRY128 ((x)+128+1) And then use CASE_ENTRY256 (1) You can do the same trick to reduce the size of CASE_ENTRY too. Thanks, Andrew Pinski
Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB
On Tue, Aug 25, 2015 at 5:50 PM, Andrew Pinski wrote: > On Tue, Aug 25, 2015 at 5:37 PM, Andre Vieira > wrote: >> Conditional branches have a maximum range of [-1048576, 1048572]. Any >> destination further away can not be reached by these. >> To be able to have conditional branches in very large functions, we invert >> the condition and change the destination to jump over an unconditional >> branch to the original, far away, destination. >> >> gcc/ChangeLog: >> 2015-08-07 Ramana Radhakrishnan >> Andre Vieira >> >> * config/aarch64/aarch64.md (*condjump): Handle functions > 1 >> Mib. >> (*cb1): Idem. >> (*tb1): Idem. >> (*cb1): Idem. >> * config/aarch64/iterators.md (inv_cb): New code attribute. >> (inv_tb): Idem. >> * config/aarch64/aarch64.c (aarch64_gen_far_branch): New. >> * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New. >> >> gcc/testsuite/ChangeLog: >> 2015-08-07 Andre Vieira >> >> * gcc.target/aarch64/long-branch.c: New test. > > Just a few comments about the testcase. You could improve the size > (on disk) of the testcase by using the preprocessor some more: > Something like: > #define CASE_ENTRY2 (x) CASE_ENTRY ((x)) CASE_ENTRY ((x)+1) > #define CASE_ENTRY4 (x) CASE_ENTRY2 ((x)) CASE_ENTRY2 ((x)+2+1) > #define CASE_ENTRY8 (x) CASE_ENTRY4 ((x)) CASE_ENTRY4 ((x)+4+1) > #define CASE_ENTRY16 (x) CASE_ENTRY8 ((x)) CASE_ENTRY8 ((x)+8+1) > #define CASE_ENTRY32 (x) CASE_ENTRY16 ((x)) CASE_ENTRY16 ((x)+16) > #define CASE_ENTRY64 (x) CASE_ENTRY32 ((x)) CASE_ENTRY32 ((x)+32+1) > #define CASE_ENTRY128 (x) CASE_ENTRY64 ((x)) CASE_ENTRY16 ((x)+64+1) > #define CASE_ENTRY256 (x) CASE_ENTRY128 ((x)) CASE_ENTRY128 ((x)+128+1) I do have an off by one error but you should get the idea. Basically instead of 200 lines, we only have 9 lines (log2(256) == 8). Thanks, Andrew > > And then use > CASE_ENTRY256 (1) > > You can do the same trick to reduce the size of CASE_ENTRY too. > > Thanks, > Andrew Pinski
[PATCH, PR other/67320] Fix wide add standard names
The standard names for signed and unsigned vector wide adds are wrong in the documentation. OK for trunk? 2015-08-25 Michael Collison PR other/67320 * doc/md.texi: Rename [su]sum_widen to widen_[su]sum to reflect correct standard names diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 0bffdc6..619259f 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4946,10 +4946,10 @@ is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the absolute difference. The result is placed in operand 0, which is of the same mode as operand 3. -@cindex @code{ssum_widen@var{m3}} instruction pattern -@item @samp{ssum_widen@var{m3}} -@cindex @code{usum_widen@var{m3}} instruction pattern -@itemx @samp{usum_widen@var{m3}} +@cindex @code{widen_ssum@var{m3}} instruction pattern +@item @samp{widen_ssum@var{m3}} +@cindex @code{widen_usum@var{m3}} instruction pattern +@itemx @samp{widen_usum@var{m3}} Operands 0 and 2 are of the same mode, which is wider than the mode of operand 1. Add operand 1 to operand 2 and place the widened result in operand 0. (This is used express accumulation of elements into an accumulator -- Michael Collison Linaro Toolchain Working Group michael.colli...@linaro.org
Re: [AArch64][TLSLE][1/3] Add the option "-mtls-size" for AArch64
On 19 August 2015 at 15:26, Jiong Wang wrote: > 2015-08-19 Jiong Wang > > gcc/ > * config/aarch64/aarch64.opt (mtls-size): New entry. > * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function. > (aarch64_override_options_internal): Call initialize_aarch64_tls_size. > * doc/invoke.texi (AArch64 Options): Document -mtls-size. > > -- > Regards, > Jiong > +case AARCH64_CMODEL_TINY: + /* The maximum TLS size allowed under tiny is 1M. */ + if (aarch64_tls_size > 20) + aarch64_tls_size = 20; The only valid values of aarch64_tls_size handled/expected by the remainder of the patch set is 12,24,32,48 so setting the value to 20 here doesn;t make sense. /Marcus
[PATCH] Fix PR67306
The following fixes ICEs due to the genmatch generated code for GENERIC not verifying if builtin_decl_implicit returns non-NULL. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-08-25 Richard Biener PR middle-end/67306 * genmatch.c (expr::gen_transform): Verify the result of builtin_decl_implicit. (dt_simplify::gen_1): Likewise. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 227058) +++ gcc/genmatch.c (working copy) @@ -2177,11 +2216,19 @@ expr::gen_transform (FILE *f, int indent fprintf_indent (f, indent, "res = fold_build%d_loc (loc, %s, %s", ops.length(), opr_name, type); else - fprintf_indent (f, indent, "res = build_call_expr_loc (loc, " - "builtin_decl_implicit (%s), %d", opr_name, ops.length()); + { + fprintf_indent (f, indent, "{\n"); + fprintf_indent (f, indent, " tree decl = builtin_decl_implicit (%s);\n", + opr_name); + fprintf_indent (f, indent, " if (!decl) return NULL_TREE;\n"); + fprintf_indent (f, indent, " res = build_call_expr_loc (loc, " + "decl, %d", ops.length()); + } for (unsigned i = 0; i < ops.length (); ++i) fprintf (f, ", ops%d[%u]", depth, i); fprintf (f, ");\n"); + if (opr->kind != id_base::CODE) + fprintf_indent (f, indent, "}\n"); if (*opr == CONVERT_EXPR) { indent -= 2; @@ -3069,13 +3147,24 @@ dt_simplify::gen_1 (FILE *f, int indent, *e->operation == CONVERT_EXPR ? "NOP_EXPR" : e->operation->id); else - fprintf_indent (f, indent, - "res = build_call_expr_loc " - "(loc, builtin_decl_implicit (%s), %d", - e->operation->id, e->ops.length()); + { + fprintf_indent (f, indent, + "{\n"); + fprintf_indent (f, indent, + " tree decl = builtin_decl_implicit (%s);\n", + e->operation->id); + fprintf_indent (f, indent, + " if (!decl) return NULL_TREE;\n"); + fprintf_indent (f, indent, + " res = build_call_expr_loc " + "(loc, decl, %d", + e->ops.length()); + } for (unsigned j = 0; j < e->ops.length (); ++j) fprintf (f, ", res_op%d", j); fprintf (f, ");\n"); + if (!is_a (opr)) + fprintf_indent (f, indent, "}\n"); } } }
Re: [PATCH 12/15][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix
James Greenhalgh wrote: >> >> - VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf) >> + VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf) > > Should this not use the appropriate "BUILTIN_..." iterator? Indeed; BUILTIN_VQ_HSF it is. >>VAR1 (BINOP, float_truncate_hi_, 0, v4sf) >>VAR1 (BINOP, float_truncate_hi_, 0, v8hf) I could also use BUILTIN_VQ_HSF here (these two were added in a previous patch, before the VQ_HSF iterator was introduced). However, that goes against the principle that we should use the same iterator as the pattern (the pattern uses the attribute of the VDF iterator), so I'm not sure whether that would be preferable (i.e. as a separate patch)? >> - VAR1 (UNOP, float_extend_lo_, 0, v2df) >> + VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf) > > Likewise. Similarly, the required iterator does not exist, as float_extend_lo_ is named after the attribute of the VDF iterator. The nearest equivalents I can see use two VAR1's rather than a VAR2, so I've updated the patch to do that too. OK with those two changes? (patch attached and bootstrapped+check-gcc on aarch64-none-linux-gnu) Thanks, Alan --- gcc/config/aarch64/aarch64-simd-builtins.def | 3 +- gcc/config/aarch64/aarch64-simd.md | 63 ++-- gcc/config/aarch64/arm_neon.h| 16 +-- gcc/config/aarch64/iterators.md | 18 +--- 4 files changed, 69 insertions(+), 31 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index c5b46aa..2c13cfb 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -361,11 +361,12 @@ BUILTIN_VSDQ_I_DI (UNOP, abs, 0) BUILTIN_VDQF (UNOP, abs, 2) - VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf) + BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10) VAR1 (BINOP, float_truncate_hi_, 0, v4sf) VAR1 (BINOP, float_truncate_hi_, 0, v8hf) VAR1 (UNOP, float_extend_lo_, 0, v2df) + VAR1 (UNOP, float_extend_lo_, 0, v4sf) BUILTIN_VDF (UNOP, float_truncate_lo_, 0) /* Implemented by aarch64_ld1. */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f8754cd..160acf9 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1691,36 +1691,57 @@ ;; Float widening operations. -(define_insn "vec_unpacks_lo_v4sf" - [(set (match_operand:V2DF 0 "register_operand" "=w") - (float_extend:V2DF - (vec_select:V2SF - (match_operand:V4SF 1 "register_operand" "w") - (parallel [(const_int 0) (const_int 1)]) - )))] +(define_insn "aarch64_simd_vec_unpacks_lo_" + [(set (match_operand: 0 "register_operand" "=w") +(float_extend: (vec_select: + (match_operand:VQ_HSF 1 "register_operand" "w") + (match_operand:VQ_HSF 2 "vect_par_cnst_lo_half" "") + )))] "TARGET_SIMD" - "fcvtl\\t%0.2d, %1.2s" + "fcvtl\\t%0., %1." [(set_attr "type" "neon_fp_cvt_widen_s")] ) -(define_insn "aarch64_float_extend_lo_v2df" - [(set (match_operand:V2DF 0 "register_operand" "=w") - (float_extend:V2DF - (match_operand:V2SF 1 "register_operand" "w")))] +(define_expand "vec_unpacks_lo_" + [(match_operand: 0 "register_operand" "") + (match_operand:VQ_HSF 1 "register_operand" "")] "TARGET_SIMD" - "fcvtl\\t%0.2d, %1.2s" + { +rtx p = aarch64_simd_vect_par_cnst_half (mode, false); +emit_insn (gen_aarch64_simd_vec_unpacks_lo_ (operands[0], + operands[1], p)); +DONE; + } +) + +(define_insn "aarch64_simd_vec_unpacks_hi_" + [(set (match_operand: 0 "register_operand" "=w") +(float_extend: (vec_select: + (match_operand:VQ_HSF 1 "register_operand" "w") + (match_operand:VQ_HSF 2 "vect_par_cnst_hi_half" "") + )))] + "TARGET_SIMD" + "fcvtl2\\t%0., %1." [(set_attr "type" "neon_fp_cvt_widen_s")] ) -(define_insn "vec_unpacks_hi_v4sf" - [(set (match_operand:V2DF 0 "register_operand" "=w") - (float_extend:V2DF - (vec_select:V2SF - (match_operand:V4SF 1 "register_operand" "w") - (parallel [(const_int 2) (const_int 3)]) - )))] +(define_expand "vec_unpacks_hi_" + [(match_operand: 0 "register_operand" "") + (match_operand:VQ_HSF 1 "register_operand" "")] + "TARGET_SIMD" + { +rtx p = aarch64_simd_vect_par_cnst_half (mode, true); +emit_insn (gen_aarch64_simd_vec_unpacks_lo_ (operands[0], + operands[1], p)); +DONE; + } +) +(define_insn "aarch64_float_extend_lo_" + [(set (match_operand: 0 "register_operand" "=w") + (float_extend: + (match_operand:VDF 1 "register_operand" "w")))] "TARGET_SIMD" - "fcvtl2\\t%0.2d, %1.4s" + "fcvtl\\t%0, %1" [(set_attr "type" "neon_fp_c
[PATCH 0/5][tree-sra.c] PR/63679 Make SRA replace constant pool loads
ssa-dom-cse-2.c fails on a number of platforms because the input array is pushed out to the constant pool, preventing later stages from folding away the entire computation. This patch series fixes the failure by extending SRA to pull the constants back in. This is my first patch(set) to SRA and as such I'd appreciate suggestions about the approach. I think the first two patches, which essentially just extend SRA to deal with ARRAY_TYPE as well as RECORD_TYPE, are fairly straightforward and may stand alone. Later patches, in particular, may be better done in a different way and I'd welcome feedback as to what a patch (series) should look like. Also the heuristic for controlling SRA, when dealing with constant-pool loads, may want something better/other than the default --param sra-max-scalarization-size-O{speed,size}, or else platforms where the initializer is forced to memory, will still suffer in terms of constant propagation.
[RFC 4/5] Handle constant-pool entries
This makes SRA replace loads of records/arrays from constant pool entries, with elementwise assignments of the constant values, hence, overcoming the fundamental problem in PR/63679. As a first pass, the approach I took was to look for constant-pool loads as we scanned through other accesses, and add them as candidates there; to build a constant replacement_decl for any such accesses in completely_scalarize; and to use any existing replacement_decl rather than creating a variable in create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but that does not allow addresses of labels, which can still end up in the constant pool.) Feedback as to the approach or how it might be better structured / fitted into SRA, is solicited ;). Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly increases the number of testcases in which this code is exercised! Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, and sh. gcc/ChangeLog: * tree-sra.c (create_access): Scan for uses of constant pool and add to candidates. (subst_initial): New. (scalarize_elem): Build replacement_decl using subst_initial. (create_access_replacement): Use replacement_decl if set. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param sra-max-scalarization-size-Ospeed. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 7 +--- gcc/tree-sra.c| 56 +-- 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c index 9eccdc9..b13d583 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */ +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */ int foo () @@ -17,7 +17,4 @@ foo () /* After late unrolling the above loop completely DOM should be able to optimize this to return 28. */ -/* See PR63679 and PR64159, if the target forces the initializer to memory then - DOM is not able to perform this optimization. */ - -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */ +/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index af35fcc..a3ff2df 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write) else ptr = false; + /* FORNOW: scan for uses of constant pool as we go along. */ + if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base) + && !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) +{ + gcc_assert (!write); + bitmap_set_bit (candidate_bitmap, DECL_UID (base)); + tree_node **slot = candidates->find_slot_with_hash (base, DECL_UID (base), + INSERT); + *slot = base; +} + if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) } } +static tree +subst_initial (tree expr, tree var) +{ + if (TREE_CODE (expr) == VAR_DECL) +{ + gcc_assert (DECL_IN_CONSTANT_POOL (expr)); + gcc_assert (expr == var); + return DECL_INITIAL (expr); +} + if (TREE_CODE (expr) == COMPONENT_REF) +{ + gcc_assert (TREE_CODE (TREE_OPERAND (expr, 1)) == FIELD_DECL); + gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE); + return fold_build3 (COMPONENT_REF, TREE_TYPE (expr), + subst_initial (TREE_OPERAND (expr, 0), var), + TREE_OPERAND (expr, 1), + NULL_TREE); +} + if (TREE_CODE (expr) == ARRAY_REF) +{ + gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE); + gcc_assert (TREE_OPERAND (expr, 3) == NULL_TREE); + return fold (build4 (ARRAY_REF, TREE_TYPE (expr), + subst_initial (TREE_OPERAND (expr, 0), var), + TREE_OPERAND (expr, 1), + NULL_TREE, + NULL_TREE)); +} + gcc_unreachable (); +} + static void scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, tree ref, tree type) @@ -1033,6 +1075,9 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, { struct access *access = create_access_1 (base, pos, size);
[RFC 5/5] Always completely replace constant pool entries
I used this as a means of better-testing the previous changes, as it exercises the constant replacement code a whole lot more. Indeed, quite a few tests are now optimized away to nothing on AArch64... Always pulling in constants, is almost certainly not what we want, but we may nonetheless want something more aggressive than the usual --param, e.g. for the ssa-dom-cse-2.c test. Thoughts welcomed? Thanks, Alan gcc/ChangeLog: * tree-sra.c (analyze_all_variable_accesses): Bypass size limit for constant-pool accesses. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove --param sra-max-scalarization-size-Ospeed. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 2 +- gcc/tree-sra.c| 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c index b13d583..370b785 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */ +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */ int foo () diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index a3ff2df..2a741b8 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -2651,7 +2651,8 @@ analyze_all_variable_accesses (void) && scalarizable_type_p (TREE_TYPE (var))) { if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) - <= max_scalarization_size) + <= max_scalarization_size + || DECL_IN_CONSTANT_POOL (var)) { create_total_scalarization_access (var); completely_scalarize (var, TREE_TYPE (var), 0, var); -- 1.8.3
[PATCH 2/5] completely_scalarize arrays as well as records
This changes the completely_scalarize_record path to also work on arrays (thus allowing records containing arrays, etc.). This just required extending the existing type_consists_of_records_p and completely_scalarize_record methods to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both methods so as not to mention 'record'. Bootstrapped + check-gcc on aarch64-none-linux-gnu, arm-none-linux-gnueabihf and x86_64-none-linux-gnu. Have also verified the scan-tree-dump check in the new sra-15.c passes (using a stage 1 compiler only, no execution test) on alpha, hppa, powerpc, sparc, avr and sh. gcc/ChangeLog: * tree-sra.c (type_consists_of_records_p): Rename to... (scalarizable_type_p): ...this, add case for ARRAY_TYPE. (completely_scalarize_record): Rename to... (completely_scalarize): ...this, add ARRAY_TYPE case, move some code to: (scalarize_elem): New. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/sra-15.c: New. --- gcc/testsuite/gcc.dg/tree-ssa/sra-15.c | 38 + gcc/tree-sra.c | 146 ++--- 2 files changed, 135 insertions(+), 49 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c new file mode 100644 index 000..e251058 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c @@ -0,0 +1,38 @@ +/* Verify that SRA total scalarization works on records containing arrays. */ +/* Test skipped for targets with small (often default) MOVE_RATIO. */ +/* { dg-do run } */ +/* { dg-options "-O1 -fdump-tree-release_ssa --param sra-max-scalarization-size-Ospeed=32" } */ + +extern void abort (void); + +struct S +{ + char c; + unsigned short f[2][2]; + int i; + unsigned short f3, f4; +}; + + +int __attribute__ ((noinline)) +foo (struct S *p) +{ + struct S l; + + l = *p; + l.i++; + l.f[1][0] += 3; + *p = l; +} + +int +main (int argc, char **argv) +{ + struct S a = {0, { {5, 7}, {9, 11} }, 4, 0, 0}; + foo (&a); + if (a.i != 5 || a.f[1][0] != 12) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index a0c92b0..08fa8dc 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -915,74 +915,122 @@ create_access (tree expr, gimple stmt, bool write) } -/* Return true iff TYPE is a RECORD_TYPE with fields that are either of gimple - register types or (recursively) records with only these two kinds of fields. - It also returns false if any of these records contains a bit-field. */ +/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or ARRAY_TYPE with + fields that are either of gimple register types (excluding bit-fields) + or (recursively) scalarizable types. */ static bool -type_consists_of_records_p (tree type) +scalarizable_type_p (tree type) { - tree fld; + gcc_assert (!is_gimple_reg_type (type)); - if (TREE_CODE (type) != RECORD_TYPE) -return false; + switch (TREE_CODE (type)) + { + case RECORD_TYPE: +for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) + if (TREE_CODE (fld) == FIELD_DECL) + { + tree ft = TREE_TYPE (fld); - for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) -if (TREE_CODE (fld) == FIELD_DECL) - { - tree ft = TREE_TYPE (fld); + if (DECL_BIT_FIELD (fld)) + return false; - if (DECL_BIT_FIELD (fld)) - return false; + if (!is_gimple_reg_type (ft) + && !scalarizable_type_p (ft)) + return false; + } - if (!is_gimple_reg_type (ft) - && !type_consists_of_records_p (ft)) - return false; - } +return true; - return true; + case ARRAY_TYPE: +{ + tree elem = TREE_TYPE (type); + if (DECL_P (elem) && DECL_BIT_FIELD (elem)) + return false; + if (!is_gimple_reg_type (elem) +&& !scalarizable_type_p (elem)) + return false; + return true; +} + default: +return false; + } } -/* Create total_scalarization accesses for all scalar type fields in DECL that - must be of a RECORD_TYPE conforming to type_consists_of_records_p. BASE - must be the top-most VAR_DECL representing the variable, OFFSET must be the - offset of DECL within BASE. REF must be the memory reference expression for - the given decl. */ +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree); + +/* Create total_scalarization accesses for all scalar fields of a member + of type DECL_TYPE conforming to scalarizable_type_p. BASE + must be the top-most VAR_DECL representing the variable; within that, + OFFSET locates the member and REF must be the memory reference expression for + the member. */ static void -completely_scalarize_record (tree base, tree decl, HOST_WIDE_INT offset, -tree
[hsa] Fixes in gen_hsa_{unary,binary}_operation
Hi, the patch below fixes two minor issues with new function gen_hsa_unary_operation and gen_hsa_binary_operation. First, they should add new instructions to use list of pseudoregiters. Second, the type opcode should really be int. (It was BrigType16_t, probably BrigOpcode16_t was intended, but in our own representation, we use negative values for instructions which are not actual HSA instructions, such as PHI nodes). Committed to the HSA branch. Martin 2015-08-25 Martin Jambor * hsa-gen.c (gen_hsa_unary_operation): Use int for the opcode. Add instruction to uses of register operands. (gen_hsa_binary_operation): Likewise. (gen_hsa_insns_for_operation_assignment): Use int for opcodes passed to gen_hsa_binary_operation. --- gcc/ChangeLog.hsa | 8 gcc/hsa-gen.c | 16 ++-- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index bdb9cec..14cf890 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -2250,7 +2250,7 @@ gen_hsa_cmp_insn_from_gimple (enum tree_code code, tree lhs, tree rhs, as a single operand. */ static void -gen_hsa_unary_operation (BrigType16_t opcode, hsa_op_reg *dest, +gen_hsa_unary_operation (int opcode, hsa_op_reg *dest, hsa_op_base *op1, hsa_bb *hbb) { gcc_checking_assert (dest); @@ -2266,6 +2266,8 @@ gen_hsa_unary_operation (BrigType16_t opcode, hsa_op_reg *dest, } dest->set_definition (insn); + if (hsa_op_reg *reg = dyn_cast (op1)) +reg->uses.safe_push (insn); hbb->append_insn (insn); } @@ -2274,7 +2276,7 @@ gen_hsa_unary_operation (BrigType16_t opcode, hsa_op_reg *dest, and OP2. */ static void -gen_hsa_binary_operation (BrigType16_t opcode, hsa_op_reg *dest, +gen_hsa_binary_operation (int opcode, hsa_op_reg *dest, hsa_op_base *op1, hsa_op_base *op2, hsa_bb *hbb) { gcc_checking_assert (dest); @@ -2290,6 +2292,10 @@ gen_hsa_binary_operation (BrigType16_t opcode, hsa_op_reg *dest, hsa_insn_basic *insn = new hsa_insn_basic (3, opcode, dest->type, dest, op1, op2); dest->set_definition (insn); + if (hsa_op_reg *reg = dyn_cast (op1)) +reg->uses.safe_push (insn); + if (hsa_op_reg *reg = dyn_cast (op2)) +reg->uses.safe_push (insn); hbb->append_insn (insn); } @@ -2387,10 +2393,8 @@ gen_hsa_insns_for_operation_assignment (gimple assign, hsa_bb *hbb, case RROTATE_EXPR: { hsa_insn_basic *insn = NULL; - BrigType16_t code1 = code == LROTATE_EXPR - ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR; - BrigType16_t code2 = code != LROTATE_EXPR - ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR; + int code1 = code == LROTATE_EXPR ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR; + int code2 = code != LROTATE_EXPR ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR; BrigType16_t btype = hsa_type_for_scalar_tree_type (TREE_TYPE (lhs), true); -- 2.4.6
[hsa] Support unary FP operations implementable with a single HSA instruction
Hi, the patch below adds support for a few unary floating point buitlins that can be implemented with a single HSA instruction. More effort in the area of builtins is needed, the motivation for this was a a benchmark that previously failed with a sorry message. Committed to the hsa branch. Martin 2015-08-25 Martin Jambor * hsa-gen.c (gen_hsa_unaryop_for_builtin): New function. (gen_hsa_insns_for_call): Add support for a few unary fp operations. --- gcc/ChangeLog.hsa | 4 +++ gcc/hsa-gen.c | 86 +++ 2 files changed, 78 insertions(+), 12 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index 14cf890..1e23996 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3213,6 +3213,26 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call) hsa_cfun->kernel_dispatch_count++; } +/* Helper functions to create a single unary HSA operations out of calls to + builtins. OPCODE is the HSA operation to be generated. STMT is a gimple + call to a builtin. HBB is the HSA BB to which the instruction should be + added and SSA_MAP is used to map gimple SSA names to HSA pseudoreisters. */ + +static void +gen_hsa_unaryop_for_builtin (int opcode, gimple stmt, hsa_bb *hbb, +vec *ssa_map) +{ + tree lhs = gimple_call_lhs (stmt); + /* FIXME: Since calls without a LHS are not removed, double check that + they cannot have side effects. */ + if (!lhs) +return; + hsa_op_reg *dest = hsa_reg_for_gimple_ssa (lhs, ssa_map); + hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 0), + hbb, ssa_map, NULL); + gen_hsa_unary_operation (opcode, dest, op, hbb); +} + /* Generate HSA instructions for the given call statement STMT. Instructions will be appended to HBB. SSA_MAP maps gimple SSA names to HSA pseudo registers. */ @@ -3284,22 +3304,64 @@ specialop: break; } +case BUILT_IN_FABS: +case BUILT_IN_FABSF: + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_ABS, stmt, hbb, ssa_map); + break; + +case BUILT_IN_CEIL: +case BUILT_IN_CEILF: + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_CEIL, stmt, hbb, ssa_map); + break; + +case BUILT_IN_FLOOR: +case BUILT_IN_FLOORF: + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_FLOOR, stmt, hbb, ssa_map); + break; + +case BUILT_IN_RINT: +case BUILT_IN_RINTF: + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_RINT, stmt, hbb, ssa_map); + break; + case BUILT_IN_SQRT: case BUILT_IN_SQRTF: - /* FIXME: Since calls without a LHS are not removed, double check that -they cannot have side effects. */ - if (!lhs) - return; - dest = hsa_reg_for_gimple_ssa (lhs, ssa_map); - insn = new hsa_insn_basic (2, BRIG_OPCODE_SQRT, dest->type); - insn->operands[0] = dest; - dest->set_definition (insn); - insn->operands[1] - = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 0), - hbb, ssa_map, insn); - hbb->append_insn (insn); + /* TODO: Perhaps produce BRIG_OPCODE_NSQRT with -ffast-math? */ + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb, ssa_map); + break; + +case BUILT_IN_TRUNC: +case BUILT_IN_TRUNCF: + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb, ssa_map); break; +case BUILT_IN_COS: +case BUILT_IN_COSF: + /* FIXME: Using the native instruction may not be precise enough. +Perhaps only allow if using -ffast-math? */ + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb, ssa_map); + break; + +case BUILT_IN_EXP2: +case BUILT_IN_EXP2F: + /* FIXME: Using the native instruction may not be precise enough. +Perhaps only allow if using -ffast-math? */ + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb, ssa_map); + break; + +case BUILT_IN_LOG2: +case BUILT_IN_LOG2F: + /* FIXME: Using the native instruction may not be precise enough. +Perhaps only allow if using -ffast-math? */ + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb, ssa_map); + break; + +case BUILT_IN_SIN: +case BUILT_IN_SINF: + /* FIXME: Using the native instruction may not be precise enough. +Perhaps only allow if using -ffast-math? */ + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb, ssa_map); + case BUILT_IN_ATOMIC_LOAD_1: case BUILT_IN_ATOMIC_LOAD_2: case BUILT_IN_ATOMIC_LOAD_4: -- 2.4.6
[PATCH 1/5] Refactor completely_scalarize_var
This is a small refactoring/renaming patch, it just moves the call to "completely_scalarize_record" out from completely_scalarize_var, and renames the latter to create_total_scalarization_access. This is because the next patch needs to drop the "_record" suffix and I felt it would be confusing to have both completely_scalarize and completely_scalarize_var. However, it also makes the new function name (create_total_scalarization_access) consistent with the existing code & comment. Bootstrapped + check-gcc on x86_64. gcc/ChangeLog: * tree-sra.c (completely_scalarize_var): Rename to... (create_total_scalarization_access): ... Here. Drop call to completely_scalarize_record. (analyze_all_variable_accesses): Replace completely_scalarize_var with create_total_scalarization_access and completely_scalarize_record. --- gcc/tree-sra.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 818c290..a0c92b0 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -985,7 +985,7 @@ completely_scalarize_record (tree base, tree decl, HOST_WIDE_INT offset, type_consists_of_records_p. */ static void -completely_scalarize_var (tree var) +create_total_scalarization_access (tree var) { HOST_WIDE_INT size = tree_to_uhwi (DECL_SIZE (var)); struct access *access; @@ -994,8 +994,6 @@ completely_scalarize_var (tree var) access->expr = var; access->type = TREE_TYPE (var); access->grp_total_scalarization = 1; - - completely_scalarize_record (var, var, 0, var); } /* Return true if REF has an VIEW_CONVERT_EXPR somewhere in it. */ @@ -2529,7 +2527,8 @@ analyze_all_variable_accesses (void) if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) <= max_scalarization_size) { - completely_scalarize_var (var); + create_total_scalarization_access (var); + completely_scalarize_record (var, var, 0, var); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Will attempt to totally scalarize "); -- 1.8.3
[PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.
When SRA completely scalarizes an array, this patch changes the generated accesses from e.g. MEM[(int[8] *)&a + 4B] = 1; to a[1] = 1; This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant propagation in the ssa-dom-cse-2.c testcase (after the next patch that makes SRA handle constant-pool loads). I tried to work around this by making dom2's hashable_expr_equal_p less conservative, but found that on platforms without AArch64's vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC, mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8] *)&a] equivalent to a[0], etc.; a complete overhaul of hashable_expr_equal_p seems like a larger task than this patch series. I can't see how to write a testcase for this in C though as direct assignment to an array is not possible; such assignments occur only with constant pool data, which is dealt with in the next patch. Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, aarch64-none-linux-gnu. gcc/ChangeLog: * tree-sra.c (completely_scalarize): Move some code into: (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base is aligned array. --- gcc/tree-sra.c | 110 - 1 file changed, 69 insertions(+), 41 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@ scalarizable_type_p (tree type) } } +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out) +{ + gcc_assert (TREE_CODE (type) == ARRAY_TYPE); + tree t_size = TYPE_SIZE (TREE_TYPE (type)); + if (!t_size || !tree_fits_uhwi_p (t_size)) +return false; + unsigned HOST_WIDE_INT sz = tree_to_uhwi (t_size); + if (!sz) +return false; + *sz_out = sz; + return true; +} + static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree); /* Create total_scalarization accesses for all scalar fields of a member @@ -985,10 +999,9 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) case ARRAY_TYPE: { tree elemtype = TREE_TYPE (decl_type); - tree elem_size = TYPE_SIZE (elemtype); - gcc_assert (elem_size && tree_fits_uhwi_p (elem_size)); - int el_size = tree_to_uhwi (elem_size); - gcc_assert (el_size); + unsigned HOST_WIDE_INT el_size; + if (!get_elem_size (decl_type, &el_size)) + gcc_assert (false); tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); @@ -1563,7 +1576,7 @@ build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset, tree off; tree mem_ref; HOST_WIDE_INT base_offset; - unsigned HOST_WIDE_INT misalign; + unsigned HOST_WIDE_INT misalign, el_sz; unsigned int align; gcc_checking_assert (offset % BITS_PER_UNIT == 0); @@ -1572,47 +1585,62 @@ build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset, /* get_addr_base_and_unit_offset returns NULL for references with a variable offset such as array[var_index]. */ - if (!base) -{ - gassign *stmt; - tree tmp, addr; - - gcc_checking_assert (gsi); - tmp = make_ssa_name (build_pointer_type (TREE_TYPE (prev_base))); - addr = build_fold_addr_expr (unshare_expr (prev_base)); - STRIP_USELESS_TYPE_CONVERSION (addr); - stmt = gimple_build_assign (tmp, addr); - gimple_set_location (stmt, loc); - if (insert_after) - gsi_insert_after (gsi, stmt, GSI_NEW_STMT); - else - gsi_insert_before (gsi, stmt, GSI_SAME_STMT); - - off = build_int_cst (reference_alias_ptr_type (prev_base), - offset / BITS_PER_UNIT); - base = tmp; -} - else if (TREE_CODE (base) == MEM_REF) -{ - off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), - base_offset + offset / BITS_PER_UNIT); - off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off); - base = unshare_expr (TREE_OPERAND (base, 0)); + if (base + && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE + && misalign == 0 + && get_elem_size (TREE_TYPE (base), &el_sz) + && ((offset % el_sz) == 0) + && useless_type_conversion_p (exp_type, TREE_TYPE (TREE_TYPE (base))) + && (align >= TYPE_ALIGN (exp_type))) +{ + tree idx = build_int_cst (TYPE_DOMAIN (TREE_TYPE (base)), offset / el_sz); + base = unshare_expr (base); + mem_ref = build4 (ARRAY_REF, exp_type, base, idx, NULL_TREE, NULL_TREE); } else { - off = build_int_cst (reference_alias_ptr_type (base), - base_offset + offset / BITS_PER_UNIT); - base = build_fold_addr_expr (unshare_expr (base)); -
Re: [PATCH 0/15][ARM/AArch64] Add support for float16_t vectors (v3)
Alan Lawrence wrote: All AArch64 patches are unchanged from previous version. However, in response to discussion, the ARM patches are changed (much as I suggested https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02249.html); this version: * Hides the existing vcvt_f16_f32 and vcvt_f32_f16 intrinsics, and float16x4_t type, unless we have a scalar __FP16 type (i.e. unless -mfp16-format=ieee/alternative is specified on the command line). Although this loses us the ability to write code that uses hardware instructions to work with either IEEE or Alternative formats according to the FPSCR bit, it is consistent with ACLE statements that the vector types (float16x4_t and float16x8_t) should only be available if the scalar type is, and that if the scalar type is available, then one or other of __ARM_FP16_FORMAT_IEEE or __ARM_FP16_FORMAT_ALTERNATIVE should be set. (Straightforward interpretation of ACLE can be confusing because GCC has made the choice of supporting the __FP16 type even when hardware is not available, via software conversion routines - the -mfp16-format flag then picking which set of sw routines are in use.) * Makes all the new intrinsics available, similarly, only if we have a scalar __FP16 type. This means that (in contrast to previous versions of this patch series) we will not gain the ability to write programs that pass half-precision-float values through as "bags of bits". I considered the alternative of making -mfp16-format default to ieee, but that makes the -mfp16-format=alternative option almost unusable, as one cannot link object files compiled with different -mfp16-format :(. We could set the default to be ieee only when neon-fp16 is specified, but that change is pretty much orthogonal to this patch series so can follow independently if desired. * To ease testing (including a couple of existing tests), I modified the arm_neon_fp16_ok functions in lib/target-supports.exp to try also flags specifying -mfp16-format=ieee (if flags without that fail to compile, presumably because of the absence of an __FP16 type; however, this still allows an explicit -mfp16-format=alternative if desired). On ARM targets, we then pass in -mfpu=neon-fp16 and -mfp16-format flags for all tests in advsimd-intrinsics.exp, unless these are overridden by an explicit multilib, in which case we will run the advsimd-intrinsics tests without the float16 variants (via #if). Are these patches OK for trunk? If so I will commit along with the previously-approved fix to fold-const.c for HFmode, https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00696.html Bootstrapped on arm-none-linux-gnueabihf (--with-arch=armv7-a --with-fpu=neon --with-float=hard), and aarch64-none-linux-gnu; cross-tested arm-none-eabi (a number of variants, especially for the advsimd-intrinsics tests in patch 13+14). Thanks, Alan Ping the final three patches: 2 * testsuite (aarch64/advsimd-intrinsics/, but executed on both platforms, ARM and AArch64 maintainer approval would be required if these are OK), and the update to doc/sourcebuild.texi for ARM.
Re: Move remaining flag_unsafe_math_optimizations using simplify and match
On Tue, Aug 25, 2015 at 5:29 AM, Hurugalawadi, Naveen wrote: > Hi, > > Please find attached the remaining part of patch. > > Tested the patch on AArch64 and X86 without any regressions. > > Please review the patch and let me know if any modifications are required. Ok. Thanks, Richard. > Thanks, > Naveen > > ChangeLog > > 2015-08-25 Naveen H.S > > * fold-const.c (fold_binary_loc) : Move Optimize > root(x)*root(y) as root(x*y) to match.pd. > Move Optimize expN(x)*expN(y) as expN(x+y) to match.pd. > Move Optimize pow(x,y)*pow(x,z) as pow(x,y+z) to match.pd. > Move Optimize a/root(b/c) into a*root(c/b) to match.pd. > Move Optimize x/expN(y) into x*expN(-y) to match.pd. > > * match.pd (mult (root:s @0) (root:s @1)): New simplifier. > (mult (POW:s @0 @1) (POW:s @0 @2)) : New simplifier. > (mult (exps:s @0) (exps:s @1)) : New simplifier. > (rdiv @0 (root:s (rdiv:s @1 @2))) : New simplifier. > (rdiv @0 (exps:s @1)) : New simplifier.
Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB
On 25/08/15 10:52, Andrew Pinski wrote: On Tue, Aug 25, 2015 at 5:50 PM, Andrew Pinski wrote: On Tue, Aug 25, 2015 at 5:37 PM, Andre Vieira wrote: Conditional branches have a maximum range of [-1048576, 1048572]. Any destination further away can not be reached by these. To be able to have conditional branches in very large functions, we invert the condition and change the destination to jump over an unconditional branch to the original, far away, destination. gcc/ChangeLog: 2015-08-07 Ramana Radhakrishnan Andre Vieira * config/aarch64/aarch64.md (*condjump): Handle functions > 1 Mib. (*cb1): Idem. (*tb1): Idem. (*cb1): Idem. * config/aarch64/iterators.md (inv_cb): New code attribute. (inv_tb): Idem. * config/aarch64/aarch64.c (aarch64_gen_far_branch): New. * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New. gcc/testsuite/ChangeLog: 2015-08-07 Andre Vieira * gcc.target/aarch64/long-branch.c: New test. Just a few comments about the testcase. You could improve the size (on disk) of the testcase by using the preprocessor some more: Something like: #define CASE_ENTRY2 (x) CASE_ENTRY ((x)) CASE_ENTRY ((x)+1) #define CASE_ENTRY4 (x) CASE_ENTRY2 ((x)) CASE_ENTRY2 ((x)+2+1) #define CASE_ENTRY8 (x) CASE_ENTRY4 ((x)) CASE_ENTRY4 ((x)+4+1) #define CASE_ENTRY16 (x) CASE_ENTRY8 ((x)) CASE_ENTRY8 ((x)+8+1) #define CASE_ENTRY32 (x) CASE_ENTRY16 ((x)) CASE_ENTRY16 ((x)+16) #define CASE_ENTRY64 (x) CASE_ENTRY32 ((x)) CASE_ENTRY32 ((x)+32+1) #define CASE_ENTRY128 (x) CASE_ENTRY64 ((x)) CASE_ENTRY16 ((x)+64+1) #define CASE_ENTRY256 (x) CASE_ENTRY128 ((x)) CASE_ENTRY128 ((x)+128+1) I do have an off by one error but you should get the idea. Basically instead of 200 lines, we only have 9 lines (log2(256) == 8). Thanks, Andrew And then use CASE_ENTRY256 (1) You can do the same trick to reduce the size of CASE_ENTRY too. Thanks, Andrew Pinski Conditional branches have a maximum range of [-1048576, 1048572]. Any destination further away can not be reached by these. To be able to have conditional branches in very large functions, we invert the condition and change the destination to jump over an unconditional branch to the original, far away, destination. gcc/ChangeLog: 2015-08-07 Ramana Radhakrishnan Andre Vieira * config/aarch64/aarch64.md (*condjump): Handle functions > 1 Mib. (*cb1): Likewise. (*tb1): Likewise. (*cb1): Likewise. * config/aarch64/iterators.md (inv_cb): New code attribute. (inv_tb): Likewise. * config/aarch64/aarch64.c (aarch64_gen_far_branch): New. * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New. gcc/testsuite/ChangeLog: 2015-08-07 Andre Vieira * gcc.target/aarch64/long_branch_1.c: New test. From e34022ecd6f914b5a713594ca5b21b33929a3a1f Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Tue, 25 Aug 2015 13:12:11 +0100 Subject: [PATCH] fix for far branches --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64.c | 23 ++ gcc/config/aarch64/aarch64.md| 89 +++ gcc/config/aarch64/iterators.md | 6 ++ gcc/testsuite/gcc.target/aarch64/long_branch_1.c | 91 5 files changed, 195 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/long_branch_1.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 4b3cbedbd0a5fa186619e05c0c0b400c8257b1c0..9afb7ef9afadf2b3dfeb24db230829344201deba 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -322,6 +322,7 @@ unsigned aarch64_trampoline_size (void); void aarch64_asm_output_labelref (FILE *, const char *); void aarch64_cpu_cpp_builtins (cpp_reader *); void aarch64_elf_asm_named_section (const char *, unsigned, tree); +const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *); void aarch64_err_no_fpadvsimd (machine_mode, const char *); void aarch64_expand_epilogue (bool); void aarch64_expand_mov_immediate (rtx, rtx); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 87bbf6e7988e4ef796c09075ee584822483cbbce..188d0dd555d3d765aff7e78623a4e938497bec3f 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -586,6 +586,29 @@ static const char * const aarch64_condition_codes[] = "hi", "ls", "ge", "lt", "gt", "le", "al", "nv" }; +/* Generate code to enable conditional branches in functions over 1 MiB. */ +const char * +aarch64_gen_far_branch (rtx * operands, int pos_label, const char * dest, + const char * branch_format) +{ +rtx_code_label * tmp_label = gen_label_rtx (); +char label_buf[256]; +char buffer[128]; +ASM_GENERATE_INTERNAL_LABEL (label_buf, dest, + C
Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"
Ian Lance Taylor wrote: > Hans-Peter Nilsson writes: > > > * configure.ac: Only compile with -fPIC if the target > > supports it. > > * configure: Regenerate. > > This is OK. I'm now running into the same problem on SPU, but unfortnately this patch still doesn't fix the problem. Now, the SPU does not support dynamic loading and the loader does not support (most) run-time relocations. There is no support for shared libraries on the SPU. On the SPU, all GCC target libraries are built as static libraries, and should be compiled without -fPIC. However, the compiler actually does accept -fPIC. If the flag is present, we attempt to generate relocatable code, but only to the extent the compiler can do that without support for run-time relocations. The most significant restriction is that statically initializing a global variable to a pointer will not work. (This is useful for some special cases of self-relocating code. Such code normally can work around this restriction.) Now, with the patch above, libbacktrace is still compiled with -fPIC on SPU, but some files do in fact contain just such global initializers, causing compilation to fail: gcc-head/src/libbacktrace/elf.c:241:27: error: creating run-time relocation for '*.LC2' static const char * const debug_section_names[DEBUG_MAX] = ^ The other GCC run-time libraries rely on libtool to figure out that even though -fPIC works, dynamic libraries are still not supported on the platform, and thus compile everything for static linking (i.e. without -fPIC). I'm wondering if we couldn't use the same libtool mechanism here: if the architecture does not support dynamic linking at all, no target library will be built as shared library, and thus there is no need to build libbacktrace with -fPIC either. (My understanding is that we need to build libbacktrace with -fPIC because it might get linked into some other shared target library.) The libbacktrace configure script actually incorporates all the libtool init code that makes this determination, and sets the shell variable "can_build_shared" to "no" on SPU. Would it be valid to use this variable in the test whether to use -fPIC? (I'm not sure which of the many libtool variables are intended to be used outside, and which are private ...) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite
On 28 July 2015 at 13:26, Alan Lawrence wrote: > This is a respin of > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the > testsuite for float16 vectors. Relative to the previous version, most of the > additions to the tests are now within #if..#endif such that they are only > compiled if we have a scalar __fp16 type (the exception is hfloat16_t: since > this is actually an integer type, we can define and use it without any > compiler fp16 support). Also we try to use add_options_for_arm_neon_fp16 > for all tests (on ARM targets), falling back to add_options_for_arm_neon if > the previous fails. > > Cross-tested on many multilibs, including -march=armv6, > -march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon, > -march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon}, > -march=armv7-a/-mfp16-format=alternative . > Hi Alan, It looks OK. Did you also run the tests on AArch64? > Note that on bigendian, this requires path at > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00696.html , which I will > commit at the same time. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: > Set additional_flags for neon-fp16 if supported, else fallback to > neon. > > * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h > (hfloat16_t): New. > (result, expected, clean_results, DECL_VARIABLE_64BITS_VARIANTS, > DECL_VARIABLE_128BITS_VARIANTS): Add float16x4_t and float16x8_t > cases > if supported. > (CHECK_RESULTS): Redefine using CHECK_RESULTS_NAMED. > (CHECK_RESULTS_NAMED): Move body to CHECK_RESULTS_NAMED_NO_FP16; > redefine in terms of CHECK_RESULTS_NAMED_NO_FP16 with float16 > variants > when those are supported. > (CHECK_RESULTS_NAMED_NO_FP16, CHECK_RESULTS_NO_FP16): New. > (vdup_n_f16): New. > > * gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h (buffer, > buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and > float16x8_t > cases if supported. > > * gcc.target/aarch64/advsimd-intrinsics/vbsl.c (exec_vbsl): > Use CHECK_RESULTS_NO_FP16 in place of CHECK_RESULTS. > * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c > (exec_vdup_vmov): > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c > (exec_vdup_lane): > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vext.c (exec_vext): > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcombine.c (expected): > Add float16x8_t case. > (main, exec_vcombine): test float16x4_t -> float16x8_t, if > supported. > * gcc.target/aarch64/advsimd-intrinsics/vcreate.c (expected, > main, exec_vcreate): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vget_high (expected, > exec_vget_high): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vget_low.c (expected, > exec_vget_low): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vld1.c (expected, > exec_vld1): > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c (expected, > exec_vld1_dup): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c (expected, > exec_vld1_lane): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vldX.c (expected, > exec_vldX): > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c (expected, > exec_vldX_dup): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c (expected, > exec_vldX_lane): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vset_lane.c (expected, > exec_vset_lane): Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c (expected, > exec_vst1_lane): Likewise.
Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_{f16_f32,f32_f16}
On 28 July 2015 at 13:27, Alan Lawrence wrote: > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: > set additional flags for neon-fp16 support. > * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. Is that the right version of the patch? The advsimd-intrinsics.exp part conflicts with patch 13/15. Am I missing something? Christophe.
Openacc launch API
Jakub, This patch changes the launch API for openacc parallels. The current scheme passes the launch dimensions as 3 separate parameters to the GOACC_parallel function. This is problematic for a couple of reasons: 1) these must be validated in the host compiler 2) they provide no extension to support a variety of different offload devices with different geometry requirements. This patch changes things so that the function tables emitted by (ptx) mkoffloads includes the geometry triplet for each function. This allows them to be validated and/or manipulated in the offload compiler. However, this only works for compile-time known dimensions -- which is a common case. To deal with runtime-computed dimensions we have to retain the host-side compiler's calculation and pass that into the GOACC_parallel function. We change GOACC_parallel to take a variadic list of keyed operands ending with a sentinel marker. These keyed operands have a slot for expansion to support multiple different offload devices. We also extend the functionality of the 'oacc function' internal attribute. Rather than being a simple marker, it now has a value, which is a TREE_LIST of the geometry required. The geometry is held as INTEGER_CSTs on the TREE_VALUE slots. Runtime-calculated values are represented by an INTEGER_CST of zero. We'll also use this representation for 'routines', where the TREE_PURPOSE slot will be used to indicate the levels at which a routine might spawn a partitioned loop. Again, to allow future expansion supporting a number of different offload devices, this can become a list-of-lists, keyed by and offload device identifier. The offload compiler can manipulate this data, and a later patch will do this within a new oacc-xform pass. I did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide a forwarding function. However, as the mkoffload data is incompatible, this is probably overkill. I've had to increment the (just committed) version number to detect the change in data representation. So any attempt to run an old binary with a new libgomp will fail at the loading point. We could simply keep the same 'GOACC_parallel' name and not need any new symbols. WDYT? ok? nathan 2015-08-25 Nathan Sidwell inlude/ * gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment. (GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX, GOMP_DIM_MASK): New. (GOMP_LAUNCH_END, GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC, GOMP_LAUNCH_WAIT): New. (GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT, GOMP_LAUNCH_OP_SHIFT): New. (GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE, GOMP_LAUNCH_OP): New. (GOMP_LAUNCH_OP_MAX): New. libgomp/ * libgomp.h (acc_dispatch_t): Replace separate geometry args with array. * libgomp.map (GOACC_parallel_keyed): New. * oacc-parallel.c (goacc_wait): Take pointer to va_list. Adjust all callers. (GOACC_parallel_keyed): New interface. Lose geometry arguments and take keyed varargs list. Adjust call to exec_func. (GOACC_parallel): Forward to GACC_parallel_keyed. * libgomp_g.h (GOACC_parallel): Remove. (GOACC_parallel_keyed): Declare. * plugin/plugin-nvptx.c (struct targ_fn_launch): New struct. (stuct targ_gn_descriptor): Replace name field with launch field. (nvptx_exec): Lose separate geometry args, take array. Process dynamic dimensions and adjust. (struct nvptx_tdata): Replace fn_names field with fn_descs. (GOMP_OFFLOAD_load_image): Adjust for change in function table data. (GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension passing. * oacc-host.c (host_openacc_exec): Adjust for change in dimension passing. gcc/ * config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h. (nvptx_record_offload_symbol): Record function execution geometry. * config/nvptx/mkoffload.c (process): Include launch geometry in function data. * omp-low.c (oacc_launch_pack): New. (replace_oacc_fn_attrib): New. (set_oacc_fn_attrib): New. (get_oacc_fn_attrib): New. (expand_omp_target): Create keyed varargs for GOACC_parallel call generation. * omp-low.h (get_oacc_fn_attrib): Declare. * builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. * tree.h (OMP_CLAUSE_EXPR): New. * omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name. gcc/lto/ * lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. gcc/c-family/ * c-common.c (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. gcc/fortran/ * f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. * types.def (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. Index: include/gomp-constants.h === --- include/gomp-constants.h (revision 227137) +++ include/gomp-constants.h (working copy) @@ -115,11 +115,34 @@ enum gomp_map_kind /* Versions of libgomp and dev
Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite
Christophe Lyon wrote: On 28 July 2015 at 13:26, Alan Lawrence wrote: This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the testsuite for float16 vectors. Relative to the previous version, most of the additions to the tests are now within #if..#endif such that they are only compiled if we have a scalar __fp16 type (the exception is hfloat16_t: since this is actually an integer type, we can define and use it without any compiler fp16 support). Also we try to use add_options_for_arm_neon_fp16 for all tests (on ARM targets), falling back to add_options_for_arm_neon if the previous fails. Cross-tested on many multilibs, including -march=armv6, -march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon, -march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon}, -march=armv7-a/-mfp16-format=alternative . Hi Alan, It looks OK. Did you also run the tests on AArch64? Sorry, yes, I did - aarch64-none-linux-gnu, and aarch64_be-none-elf also. Thanks, Alan
Re: [PATCH, PR other/67320] Fix wide add standard names
On Tue, 25 Aug 2015, Michael Collison wrote: > The standard names for signed and unsigned vector wide adds are wrong in the > documentation. > > OK for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 1/2] driver: support state cleanup
On Thu, 6 Aug 2015, David Malcolm wrote: > gcc/ChangeLog: > * gcc-main.c (main): Add params to driver ctor. > * gcc.c (class env_manager): New. > (env): New global. > (env_manager::init): New. > (env_manager::get): New. > (env_manager::xput): New. > (env_manager::restore): New. > Poison getenv and putenv. > (DEFAULT_TARGET_SYSTEM_ROOT): New. > (target_system_root): Update initialization to use > DEFAULT_TARGET_SYSTEM_ROOT. > (struct spec_list): Add field "default_ptr". > (INIT_STATIC_SPEC): Initialize new field "default_ptr". > (init_spec): Likewise. > (set_spec): Clear field "default_ptr". > (read_specs): Free "spec" and "buffer". > (xputenv): Reimplement in terms of env_manager. > (process_command): Replace ::getenv calls with calls to the > env_manager singleton. > (process_brace_body): Free string in three places. > (driver::driver): New. > (driver::~driver): New. > (used_arg): Convert from a function to... > (class used_arg_t): ...this class, and... > (used_arg): ...this new global instance. > (used_arg_t::finalize): New function. > (getenv_spec_function): Add "const" to local "value". Replace > ::getenv call with call to the env_manager singleton. > (path_prefix_reset): New function. > (driver::finalize): New function. > * gcc.h (driver::driver): New. > (driver::~driver): New. > (driver::finalize): New. OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}
Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has moved to the previous patch! This version also fixes some whitespace issues. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. * lib/target-supports.exp (check_effective_target_arm_neon_fp16_hw_ok): New. --- .../aarch64/advsimd-intrinsics/vcvt_f16.c | 98 ++ gcc/testsuite/lib/target-supports.exp | 15 2 files changed, 113 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c new file mode 100644 index 000..a2cfd38 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c @@ -0,0 +1,98 @@ +/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } } */ +#include +#include "arm-neon-ref.h" +#include "compute-ref-data.h" +#include + +/* Expected results for vcvt. */ +VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170, + 0x4160, 0x4150 }; +VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 }; + +/* Expected results for vcvt_high_f32_f16. */ +VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130, +0xc120, 0xc110 }; +/* Expected results for vcvt_high_f16_f32. */ +VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 0x4000, +0xcc00, 0xcb80, 0xcb00, 0xca80 }; + +void +exec_vcvt (void) +{ + clean_results (); + +#define TEST_MSG vcvt_f32_f16 + { +VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 }; + +DECL_VARIABLE (vector_src, float, 16, 4); + +VLOAD (vector_src, buffer_src, , float, f, 16, 4); +DECL_VARIABLE (vector_res, float, 32, 4) = + vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4)); +vst1q_f32 (VECT_VAR (result, float, 32, 4), + VECT_VAR (vector_res, float, 32, 4)); + +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, ""); + } +#undef TEST_MSG + + clean_results (); + +#define TEST_MSG vcvt_f16_f32 + { +VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 }; +DECL_VARIABLE (vector_src, float, 32, 4); + +VLOAD (vector_src, buffer_src, q, float, f, 32, 4); +DECL_VARIABLE (vector_res, float, 16, 4) = + vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4)); +vst1_f16 (VECT_VAR (result, float, 16, 4), + VECT_VAR (vector_res, float, 16 ,4)); + +CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, ""); + } +#undef TEST_MSG + +#if defined (__aarch64__) + clean_results (); + +#define TEST_MSG "vcvt_high_f32_f16" + { +DECL_VARIABLE (vector_src, float, 16, 8); +VLOAD (vector_src, buffer, q, float, f, 16, 8); +DECL_VARIABLE (vector_res, float, 32, 4); +VECT_VAR (vector_res, float, 32, 4) = + vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8)); +vst1q_f32 (VECT_VAR (result, float, 32, 4), + VECT_VAR (vector_res, float, 32, 4)); +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, ""); + } +#undef TEST_MSG + clean_results (); + +#define TEST_MSG "vcvt_high_f16_f32" + { +DECL_VARIABLE (vector_low, float, 16, 4); +VDUP (vector_low, , float, f, 16, 4, 2.0); + +DECL_VARIABLE (vector_src, float, 32, 4); +VLOAD (vector_src, buffer, q, float, f, 32, 4); + +DECL_VARIABLE (vector_res, float, 16, 8) = + vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4), +VECT_VAR (vector_src, float, 32, 4)); +vst1q_f16 (VECT_VAR (result, float, 16, 8), + VECT_VAR (vector_res, float, 16, 8)); + +CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, ""); + } +#endif +} + +int +main (void) +{ + exec_vcvt (); + return 0; +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 9aec02d..0a22c95 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2730,6 +2730,21 @@ proc check_effective_target_arm_neon_fp16_ok { } { check_effective_target_arm_neon_fp16_ok_nocache] } +proc check_effective_target_arm_neon_fp16_hw_ok { } { +if {! [check_effective_target_arm_neon_fp16_ok] } { + return 0 +} +global et_arm_neon_fp16_flags +check_runtime_nocache arm_neon_fp16_hw_ok { + int + main (int argc, char **argv) + { + asm ("vcvt.f32.f16 q1, d0"); + return 0; + } +} $et_arm_neon_fp16_flags +} + proc add_options_for_arm_neon_fp16 { flags } { if { ! [check_effective_target_arm_neon_fp16_ok] } { return "$flags" -- 1.8.3
Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_{f16_f32,f32_f16}
Christophe Lyon wrote: On 28 July 2015 at 13:27, Alan Lawrence wrote: gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: set additional flags for neon-fp16 support. * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. Is that the right version of the patch? The advsimd-intrinsics.exp part conflicts with patch 13/15. Am I missing something? Christophe. Oh, sorry, thanks for pointing this out. Looks like I reposted the previous version, rather than what I'd been testing, which drops the conflicting hunk. Thanks, Alan
[nvptx] fix thinko
I've committed this. A thinko regarding what asm_operands wants for an argument. Fixes a couple of tests. nathan 2015-08-25 Nathan Sidwell * config/nvptx/nvptx.c (nvptx_write_function_decl): Reformat. (nvptx_reorg_subreg): Pass insn pattern to asm_operands. Index: gcc/config/nvptx/nvptx.c === --- gcc/config/nvptx/nvptx.c (revision 227128) +++ gcc/config/nvptx/nvptx.c (working copy) @@ -321,7 +321,8 @@ nvptx_write_function_decl (std::stringst /* Declare argument types. */ if ((args != NULL_TREE - && !(TREE_CODE (args) == TREE_LIST && TREE_VALUE (args) == void_type_node)) + && !(TREE_CODE (args) == TREE_LIST + && TREE_VALUE (args) == void_type_node)) || is_main || return_in_mem || DECL_STATIC_CHAIN (decl)) @@ -1917,7 +1918,7 @@ nvptx_reorg_subreg (void) { next = NEXT_INSN (insn); if (!NONDEBUG_INSN_P (insn) - || asm_noperands (insn) >= 0 + || asm_noperands (PATTERN (insn)) >= 0 || GET_CODE (PATTERN (insn)) == USE || GET_CODE (PATTERN (insn)) == CLOBBER) continue;
Indirect jumps
Ptx is one of those rare (unique?) machines that doesn't have an indirect branch. optabs is prepared for such a target and emits a sorry when an indirect branch is needed. However it then goes on to try and emit such an instruction and ends up ICEing. Fixed thusly, ok? (Or is the right solution to define a dummy indirect branch in the PTX md file?) nathan 2015-08-25 Nathan Sidwell * optabs (emit_indirect_jump): Don't try an emit a jump if the target doesn't have one. Index: gcc/optabs.c === --- gcc/optabs.c (revision 227128) +++ gcc/optabs.c (working copy) @@ -4488,11 +4488,13 @@ emit_indirect_jump (rtx loc) { if (!targetm.have_indirect_jump ()) sorry ("indirect jumps are not available on this target"); - - struct expand_operand ops[1]; - create_address_operand (&ops[0], loc); - expand_jump_insn (targetm.code_for_indirect_jump, 1, ops); - emit_barrier (); + else +{ + struct expand_operand ops[1]; + create_address_operand (&ops[0], loc); + expand_jump_insn (targetm.code_for_indirect_jump, 1, ops); + emit_barrier (); +} }
Re: [Patch] Add to the libgfortran/newlib bodge to "detect" ftruncate support in ARM/AArch64/SH
On Fri, Aug 21, 2015 at 11:05:47AM +0100, James Greenhalgh wrote: > On Thu, Aug 20, 2015 at 10:50:47AM +0100, Marcus Shawcroft wrote: > > On 20 August 2015 at 09:31, James Greenhalgh > > wrote: > > > > > > Hi, > > > > > > Steve's patch in 2013 [1] to fix the MIPS newlib/libgfortran build > > > causes subtle issues for an ARM/AArch64 newlib/libgfortran build. The > > > problem is that ARM/AArch64 (and SH) define a stub function for > > > ftruncate, which we would previously have auto-detected, but which is not > > > part of the hardwiring Steve added. > > > > > > Continuing the tradition of building bodge on bodge on bodge, this patch > > > hardwires HAVE_FTRUNCATE on for ARM/AArch64/SH, which does fix the issue > > > I was seeing. > > > > This is the second breakage I'm aware of due to the introduction of > > this hardwire code, the first being related to strtold. My > > recollection is that it is only the mips target that requires the > > newlib API hardwiring. Ideally we should rely only on the > > AC_CHECK_FUNCS_ONCE probe code and avoid the hardwire entirely. > > > > Perhaps a better approach for trunk would be something along the lines of: > > > > case "${host}--x${with_newlib}" in > > mips*--xyes) > > hardwire_newlib=1;; > > esac > > if test "${hardwire_newlib:-0}" -eq 1; then > > ... existing AC_DEFINES hardwire code > > else > > ... existing AC_CHECK_FUNCS_ONCE probe code > > fi > > > > In effect limiting the hardwire to just the target which is unable to > > probe. For backport to 4.9 and 5 I think James' more conservative > > patch is probably more appropriate. > > > > What do folks think? > > (+CC fort...@gcc.gnu.org - who I should have CCed from the start). > > This runs in to issues with a newlib build [1] (newlib provides a 'kill' > symbol for linking, but does not provide a declaration in signal.h, so > we take a -Werror=implicit-function-declaration). This is what the patch you suggested would look like. I've sent a patch to the newlib list [1] which unconditionally declares 'kill'. With that in place, we can then autodetect the presence of the functions newlib provides. I'd expect that you would need to apply that newlib patch if you were testing this patch locally. I've tested this with a build of arm-none-eabi and aarch64-none-elf to check that I now get HAVE_FTRUNCATE defined, and that the build completes. OK? Thanks, James --- 2015-08-25 James Greenhalgh * configure.ac: Auto-detect newlib function support unless we know there are issues when configuring for a host. * configure: Regenerate. --- [1]: https://sourceware.org/ml/newlib/2015/msg00632.html diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac index 35a8b39..1e9914c 100644 --- a/libgfortran/configure.ac +++ b/libgfortran/configure.ac @@ -273,8 +273,13 @@ GCC_HEADER_STDINT(gstdint.h) AC_CHECK_MEMBERS([struct stat.st_blksize, struct stat.st_blocks, struct stat.st_rdev]) +case "${host}--x${with_newlib}" in + mips*--xyes) +hardwire_newlib=1;; +esac + # Check for library functions. -if test "x${with_newlib}" = "xyes"; then +if test "${hardwire_newlib:-0}" -eq 1; then # We are being configured with a cross compiler. AC_REPLACE_FUNCS # may not work correctly, because the compiler may not be able to # link executables.
[PATCH, rs6000] Fix vec_shr define_expand
The following patch fixes the vec_shr expander to do a shift instead of a rotate. CPU2006 benchmark 482.sphinx3 recently started failing due to this issue. Bootstrapped and tested on ppc64/ppc64le with no new regressions. Ok for trunk? And ok for 4.9/5 (with equivalent change to vec_shl expander which exists in those releases) after bootstrap/regtest? -Pat 2015-08-25 Pat Haugen * config/rs6000/vector.md (vec_shr_): Fix to do a shift instead of a rotate. gcc/testsuite: * gcc.target/powerpc/vec-shr.c: New. Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 227041) +++ gcc/config/rs6000/vector.md (working copy) @@ -977,6 +977,8 @@ (define_expand "movmisalign" ;; General shift amounts can be supported using vsro + vsr. We're ;; not expecting to see these yet (the vectorizer currently ;; generates only shifts by a whole number of vector elements). +;; Note that the vec_shr operation is actually defined as +;; 'shift toward element 0' so is a shr for LE and shl for BE. (define_expand "vec_shr_" [(match_operand:VEC_L 0 "vlogical_operand" "") (match_operand:VEC_L 1 "vlogical_operand" "") @@ -987,6 +989,7 @@ (define_expand "vec_shr_" rtx bitshift = operands[2]; rtx shift; rtx insn; + rtx zero_reg, op1, op2; HOST_WIDE_INT bitshift_val; HOST_WIDE_INT byteshift_val; @@ -996,19 +999,29 @@ (define_expand "vec_shr_" if (bitshift_val & 0x7) FAIL; byteshift_val = (bitshift_val >> 3); + zero_reg = gen_reg_rtx(mode); + emit_move_insn (zero_reg, CONST0_RTX (mode)); if (!BYTES_BIG_ENDIAN) -byteshift_val = 16 - byteshift_val; +{ + byteshift_val = 16 - byteshift_val; + op1 = zero_reg; + op2 = operands[1]; +} + else +{ + op1 = operands[1]; + op2 = zero_reg; +} + if (TARGET_VSX && (byteshift_val & 0x3) == 0) { shift = gen_rtx_CONST_INT (QImode, byteshift_val >> 2); - insn = gen_vsx_xxsldwi_ (operands[0], operands[1], operands[1], - shift); + insn = gen_vsx_xxsldwi_ (operands[0], op1, op2, shift); } else { shift = gen_rtx_CONST_INT (QImode, byteshift_val); - insn = gen_altivec_vsldoi_ (operands[0], operands[1], operands[1], - shift); + insn = gen_altivec_vsldoi_ (operands[0], op1, op2, shift); } emit_insn (insn); Index: gcc/testsuite/gcc.target/powerpc/vec-shr.c === --- gcc/testsuite/gcc.target/powerpc/vec-shr.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-shr.c (working copy) @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -fno-inline" } */ + +#include + +typedef struct { double r, i; } complex; +#define LEN 30 +complex c[LEN]; +double d[LEN]; + +void +foo (complex *c, double *d, int len1) +{ + int i; + for (i = 0; i < len1; i++) +{ + c[i].r = d[i]; + c[i].i = 0.0; +} +} + +int +main (void) +{ + int i; + for (i = 0; i < LEN; i++) +d[i] = (double) i; + foo (c, d, LEN); + for (i=0;i
Re: [AArch64][TLSLE][1/3] Add the option "-mtls-size" for AArch64
Marcus Shawcroft writes: > On 19 August 2015 at 15:26, Jiong Wang wrote: > >> 2015-08-19 Jiong Wang >> >> gcc/ >> * config/aarch64/aarch64.opt (mtls-size): New entry. >> * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function. >> (aarch64_override_options_internal): Call initialize_aarch64_tls_size. >> * doc/invoke.texi (AArch64 Options): Document -mtls-size. >> >> -- >> Regards, >> Jiong >> > > +case AARCH64_CMODEL_TINY: > + /* The maximum TLS size allowed under tiny is 1M. */ > + if (aarch64_tls_size > 20) > + aarch64_tls_size = 20; > > The only valid values of aarch64_tls_size handled/expected by the > remainder of the patch set is 12,24,32,48 so setting the value to 20 > here doesn;t make sense. Thanks for pointing this out, how about the new patch attached? 2015-08-25 Jiong Wang gcc/ * config/aarch64/aarch64.opt (mtls-size): New entry. * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function. (aarch64_override_options_internal): Call initialize_aarch64_tls_size. * doc/invoke.texi (AArch64 Options): Document -mtls-size. commit 36736a1a2133ffc949d3e00efdced8ef2c53cddd Author: Jiong Wang Date: Tue Aug 25 11:13:44 2015 +0100 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 382be2c..318b852 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7492,6 +7492,40 @@ aarch64_parse_one_override_token (const char* token, return; } +/* A checking mechanism for the implementation of the tls size. */ + +static void +initialize_aarch64_tls_size (struct gcc_options *opts) +{ + if (aarch64_tls_size == 0) +aarch64_tls_size = 24; + + switch (opts->x_aarch64_cmodel_var) +{ +case AARCH64_CMODEL_TINY: + /* Both the default and maximum TLS size allowed under tiny is 1M which + needs two instructions to address, so we clamp the size to 24. */ + if (aarch64_tls_size > 24) + aarch64_tls_size = 24; + break; +case AARCH64_CMODEL_SMALL: + /* The maximum TLS size allowed under small is 4G. */ + if (aarch64_tls_size > 32) + aarch64_tls_size = 32; + break; +case AARCH64_CMODEL_LARGE: + /* The maximum TLS size allowed under large is 16E. + FIXME: 16E should be 64bit, we only support 48bit offset now. */ + if (aarch64_tls_size > 48) + aarch64_tls_size = 48; + break; +default: + gcc_unreachable (); +} + + return; +} + /* Parse STRING looking for options in the format: string :: option:string option :: name=substring @@ -7584,6 +7618,7 @@ aarch64_override_options_internal (struct gcc_options *opts) } initialize_aarch64_code_model (opts); + initialize_aarch64_tls_size (opts); aarch64_override_options_after_change_1 (opts); } diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 37c2c50..8642bdb 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -96,6 +96,25 @@ mtls-dialect= Target RejectNegative Joined Enum(tls_type) Var(aarch64_tls_dialect) Init(TLS_DESCRIPTORS) Save Specify TLS dialect +mtls-size= +Target RejectNegative Joined Var(aarch64_tls_size) Enum(aarch64_tls_size) +Specifies bit size of immediate TLS offsets. Valid values are 12, 24, 32, 48. + +Enum +Name(aarch64_tls_size) Type(int) + +EnumValue +Enum(aarch64_tls_size) String(12) Value(12) + +EnumValue +Enum(aarch64_tls_size) String(24) Value(24) + +EnumValue +Enum(aarch64_tls_size) String(32) Value(32) + +EnumValue +Enum(aarch64_tls_size) String(48) Value(48) + march= Target RejectNegative ToLower Joined Var(aarch64_arch_string) -march=ARCH Use features of architecture ARCH diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 27be317..f990bef 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -514,6 +514,7 @@ Objective-C and Objective-C++ Dialects}. -mstrict-align @gol -momit-leaf-frame-pointer -mno-omit-leaf-frame-pointer @gol -mtls-dialect=desc -mtls-dialect=traditional @gol +-mtls-size=@var{size} @gol -mfix-cortex-a53-835769 -mno-fix-cortex-a53-835769 @gol -mfix-cortex-a53-843419 -mno-fix-cortex-a53-843419 @gol -march=@var{name} -mcpu=@var{name} -mtune=@var{name}} @@ -12409,6 +12410,11 @@ of TLS variables. This is the default. Use traditional TLS as the thread-local storage mechanism for dynamic accesses of TLS variables. +@item -mtls-size=@var{size} +@opindex mtls-size +Specify bit size of immediate TLS offsets. Valid values are 12, 24, 32, 48. +This option depends on binutils higher than 2.25. + @item -mfix-cortex-a53-835769 @itemx -mno-fix-cortex-a53-835769 @opindex mfix-cortex-a53-835769
RE: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
> Richard Henderson wrote: > On 08/12/2015 08:59 AM, Wilco Dijkstra wrote: > > I looked at the statistics of AArch64 immediate generation a while ago. > > The interesting thing is ~95% of calls are queries, and the same query is on > > average repeated 10 times in a row. So (a) it is not important to cache the > > expansions, and (b) the high repetition rate means a single-entry cache > > has a 90% hitrate. We already have a patch for this and could collect stats > > comparing the approaches. If a single-entry cache can provide a similar > > benefit as caching all immediates then my preference would be to keep things > > simple and just cache the last query. > > Interesting. That's already more detailed investigation than I'd done. I had > no idea the queries were so clustered. I assumed that the queries would be > scattered across various passes, and so the various constants across the > function would get checked in sequence. > > I would be very interested in seeing those stats when you've done. Caching improves average buildtime by 0.1-0.2% - your patch seems to be slightly faster than caching just 1 query, so that suggests caching a few entries would be beneficial. However looking at the immediates that are generated by the loops, it's feasible to avoid linear/quadratic search loops altogether. So I think a generic immediate caching scheme won't be useful for AArch64. Wilco
Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}
On 25 August 2015 at 15:57, Alan Lawrence wrote: > Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has > moved to the previous patch! This version also fixes some whitespace issues. > This looks OK to me now, thanks. > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. > * lib/target-supports.exp > (check_effective_target_arm_neon_fp16_hw_ok): New. > --- > .../aarch64/advsimd-intrinsics/vcvt_f16.c | 98 > ++ > gcc/testsuite/lib/target-supports.exp | 15 > 2 files changed, 113 insertions(+) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > > diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > new file mode 100644 > index 000..a2cfd38 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > @@ -0,0 +1,98 @@ > +/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } > } */ > +#include > +#include "arm-neon-ref.h" > +#include "compute-ref-data.h" > +#include > + > +/* Expected results for vcvt. */ > +VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170, > + 0x4160, 0x4150 }; > +VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 }; > + > +/* Expected results for vcvt_high_f32_f16. */ > +VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130, > +0xc120, 0xc110 }; > +/* Expected results for vcvt_high_f16_f32. */ > +VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, > 0x4000, > +0xcc00, 0xcb80, 0xcb00, > 0xca80 }; > + > +void > +exec_vcvt (void) > +{ > + clean_results (); > + > +#define TEST_MSG vcvt_f32_f16 > + { > +VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 }; > + > +DECL_VARIABLE (vector_src, float, 16, 4); > + > +VLOAD (vector_src, buffer_src, , float, f, 16, 4); > +DECL_VARIABLE (vector_res, float, 32, 4) = > + vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4)); > +vst1q_f32 (VECT_VAR (result, float, 32, 4), > + VECT_VAR (vector_res, float, 32, 4)); > + > +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, ""); > + } > +#undef TEST_MSG > + > + clean_results (); > + > +#define TEST_MSG vcvt_f16_f32 > + { > +VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 }; > +DECL_VARIABLE (vector_src, float, 32, 4); > + > +VLOAD (vector_src, buffer_src, q, float, f, 32, 4); > +DECL_VARIABLE (vector_res, float, 16, 4) = > + vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4)); > +vst1_f16 (VECT_VAR (result, float, 16, 4), > + VECT_VAR (vector_res, float, 16 ,4)); > + > +CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, ""); > + } > +#undef TEST_MSG > + > +#if defined (__aarch64__) > + clean_results (); > + > +#define TEST_MSG "vcvt_high_f32_f16" > + { > +DECL_VARIABLE (vector_src, float, 16, 8); > +VLOAD (vector_src, buffer, q, float, f, 16, 8); > +DECL_VARIABLE (vector_res, float, 32, 4); > +VECT_VAR (vector_res, float, 32, 4) = > + vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8)); > +vst1q_f32 (VECT_VAR (result, float, 32, 4), > + VECT_VAR (vector_res, float, 32, 4)); > +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, ""); > + } > +#undef TEST_MSG > + clean_results (); > + > +#define TEST_MSG "vcvt_high_f16_f32" > + { > +DECL_VARIABLE (vector_low, float, 16, 4); > +VDUP (vector_low, , float, f, 16, 4, 2.0); > + > +DECL_VARIABLE (vector_src, float, 32, 4); > +VLOAD (vector_src, buffer, q, float, f, 32, 4); > + > +DECL_VARIABLE (vector_res, float, 16, 8) = > + vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4), > +VECT_VAR (vector_src, float, 32, 4)); > +vst1q_f16 (VECT_VAR (result, float, 16, 8), > + VECT_VAR (vector_res, float, 16, 8)); > + > +CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, ""); > + } > +#endif > +} > + > +int > +main (void) > +{ > + exec_vcvt (); > + return 0; > +} > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 9aec02d..0a22c95 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -2730,6 +2730,21 @@ proc check_effective_target_arm_neon_fp16_ok { } { > check_effective_target_arm_neon_fp16_ok_nocache] > } > > +proc check_effective_target_arm_neon_fp16_hw_ok { } { > +if {! [check_effective_target_arm_neon_fp16_ok] } { > + return 0 > +} > +global et_arm_neon_fp16_flags > +check_runtime_nocache arm_neon_fp16_hw_ok { > + int
[hsa] Fix omp declare target support
Hi, it was brought to my attention that omp declare target functions were not properly translated to HSA functions. Until the grand shceme with an IPA pass is complete, this will do. And having a single predicate to decide what should be an HSA function cannot be bad. Committed to the hsa branch. Thanks, Martin 2015-08-25 Martin Jambor * hsa.h (hsa_callable_function_p): Declare. * hsa.c (hsa_callable_function_p): New function. * hsa-gen.c (gen_hsa_insns_for_call): Use it. (pass_gen_hsail::execute): Likewise. --- gcc/ChangeLog.hsa | 7 +++ gcc/hsa-gen.c | 5 ++--- gcc/hsa.c | 9 + gcc/hsa.h | 1 + 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/ChangeLog.hsa b/gcc/ChangeLog.hsa index 4ad8414..922c917 100644 --- a/gcc/ChangeLog.hsa +++ b/gcc/ChangeLog.hsa @@ -1,5 +1,12 @@ 2015-08-25 Martin Jambor + * hsa.h (hsa_callable_function_p): Declare. + * hsa.c (hsa_callable_function_p): New function. + * hsa-gen.c (gen_hsa_insns_for_call): Use it. + (pass_gen_hsail::execute): Likewise. + +2015-08-25 Martin Jambor + * hsa-gen.c (gen_hsa_unaryop_for_builtin): New function. 2015-08-25 Martin Jambor diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index 1e23996..7190dce 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3255,7 +3255,7 @@ gen_hsa_insns_for_call (gimple stmt, hsa_bb *hbb, return; } - if (lookup_attribute ("hsafunc", DECL_ATTRIBUTES (function_decl))) + if (hsa_callable_function_p (function_decl)) gen_hsa_insns_for_direct_call (stmt, hbb, ssa_map); else if (!gen_hsa_insns_for_known_library_call (stmt, hbb, ssa_map)) sorry ("HSA does support only call for functions with 'hsafunc' " @@ -4102,8 +4102,7 @@ pass_gen_hsail::execute (function *) || lookup_attribute ("hsakernel", DECL_ATTRIBUTES (current_function_decl))) return generate_hsa (true); - else if (lookup_attribute ("hsafunc", -DECL_ATTRIBUTES (current_function_decl))) + else if (hsa_callable_function_p (current_function_decl)) return generate_hsa (false); else return wrap_all_hsa_calls (); diff --git a/gcc/hsa.c b/gcc/hsa.c index 13a2ace..4ad44fe 100644 --- a/gcc/hsa.c +++ b/gcc/hsa.c @@ -103,6 +103,15 @@ hash_table *hsa_global_variable_symbols; /* True if compilation unit-wide data are already allocated and initialized. */ static bool compilation_unit_data_initialized; +/* Return true if FNDECL represents an HSA-callable function. */ + +bool +hsa_callable_function_p (tree fndecl) +{ + return lookup_attribute ("hsafunc", DECL_ATTRIBUTES (fndecl)) +|| lookup_attribute ("omp declare target", DECL_ATTRIBUTES (fndecl)); +} + /* Allocate HSA structures that are are used when dealing with different functions. */ diff --git a/gcc/hsa.h b/gcc/hsa.h index 3956676..f9bcc80 100644 --- a/gcc/hsa.h +++ b/gcc/hsa.h @@ -898,6 +898,7 @@ extern struct hsa_function_representation *hsa_cfun; extern hash_table *hsa_global_variable_symbols; extern hash_map *> *hsa_decl_kernel_dependencies; extern unsigned hsa_kernel_calls_counter; +bool hsa_callable_function_p (tree fndecl); void hsa_init_compilation_unit_data (void); void hsa_deinit_compilation_unit_data (void); bool hsa_machine_large_p (void); -- 2.4.6
Re: [Patch] Add to the libgfortran/newlib bodge to "detect" ftruncate support in ARM/AArch64/SH
> 2015-08-25 James Greenhalgh > > * configure.ac: Auto-detect newlib function support unless we > know there are issues when configuring for a host. > * configure: Regenerate. Thanks for CC’ing the fortran list. Given that this is newlib-specific code, even though it’s in libgfortran configury, you should decide and commit what’s best. I don’t think we have any newlib expert in the Fortran maintainers. Wait for 48 hours to see if anyone else objects, though. Cheers, FX
[gomp4] another routine test
I've committed this test to check 2-dimensional loops inside a routine. nathan 2015-08-24 Nathan Sidwell * testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: New. Index: testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c === --- testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c (revision 0) +++ testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c (revision 0) @@ -0,0 +1,75 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O1" } */ + +#include +#include + +#define NUM_WORKERS 16 +#define NUM_VECTORS 32 +#define WIDTH 64 +#define HEIGHT 32 + +#define WORK_ID(I,N) \ + (acc_on_device (acc_device_nvidia)\ + ? ({unsigned __r; \ + __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (__r)); \ + __r; }) : (I % N)) +#define VEC_ID(I,N) \ + (acc_on_device (acc_device_nvidia)\ + ? ({unsigned __r; \ + __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (__r)); \ + __r; }) : (I % N)) + +#pragma acc routine worker +void __attribute__ ((noinline)) + WorkVec (int *ptr, int w, int h, int nw, int nv) +{ +#pragma acc loop worker + for (int i = 0; i < h; i++) +#pragma acc loop vector +for (int j = 0; j < w; j++) + ptr[i*w + j] = (WORK_ID (i, nw) << 8) | VEC_ID(j, nv); +} + +int DoWorkVec (int nw) +{ + int ary[HEIGHT][WIDTH]; + int err = 0; + + for (int ix = 0; ix != HEIGHT; ix++) +for (int jx = 0; jx != WIDTH; jx++) + ary[ix][jx] = 0xdeadbeef; + + printf ("spawning %d ...", nw); fflush (stdout); + +#pragma acc parallel num_workers(nw) vector_length (NUM_VECTORS) copy (ary) + { +WorkVec ((int *)ary, WIDTH, HEIGHT, nw, NUM_VECTORS); + } + + for (int ix = 0; ix != HEIGHT; ix++) +for (int jx = 0; jx != WIDTH; jx++) + { + int exp = ((ix % nw) << 8) | (jx % NUM_VECTORS); + + if (ary[ix][jx] != exp) + { + printf ("\nary[%d][%d] = %#x expected %#x", ix, jx, + ary[ix][jx], exp); + err = 1; + } + } + printf (err ? " failed\n" : " ok\n"); + + return err; +} + +int main () +{ + int err = 0; + + for (int W = 1; W <= NUM_WORKERS; W <<= 1) +err |= DoWorkVec (W); + + return err; +}
Expand comment on struct switchstr
This patch, extracted from a larger change on gomp-4_0-branch, expands a comment documenting struct switchstr in gcc.c. Committed. 2015-08-25 Thomas Schwinge Joseph Myers * gcc.c (struct switchstr): Expand comment. Index: gcc.c === --- gcc.c (revision 227171) +++ gcc.c (working copy) @@ -3068,10 +3068,15 @@ SWITCH_LIVE to indicate this switch is true in a conditional spec. SWITCH_FALSE to indicate this switch is overridden by a later switch. SWITCH_IGNORE to indicate this switch should be ignored (used in %
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On reviewing in more detail the changes to pass offloading targets from the driver to libgomp at link time to identify the minimal self-contained pieces that can go to trunk, I found that the use of fnmatch to match against target names was completely unnecessary; the ISO C90 functions strstr and strncmp could be used instead, so avoiding the need to add configure tests for fnmatch. This patch duly removes the use of and configure tests for fnmatch. Will commit to gomp-4_0-branch subject to test results. 2015-08-25 Joseph Myers * plugin/configfrag.ac: Don't test for fnmatch.h or fnmatch. * configure, config.h.in: Regenerate. * target.c [PLUGIN_SUPPORT]: Don't include . (offload_target_to_plugin_name): Use strstr and strncmp instead of fnmatch. Index: libgomp/config.h.in === --- libgomp/config.h.in (revision 227169) +++ libgomp/config.h.in (working copy) @@ -24,12 +24,6 @@ /* Define to 1 if you have the header file. */ #undef HAVE_DLFCN_H -/* Define to 1 if you have the `fnmatch' function. */ -#undef HAVE_FNMATCH - -/* Define to 1 if you have the header file. */ -#undef HAVE_FNMATCH_H - /* Define to 1 if you have the `getloadavg' function. */ #undef HAVE_GETLOADAVG Index: libgomp/target.c === --- libgomp/target.c(revision 227169) +++ libgomp/target.c(working copy) @@ -41,7 +41,6 @@ #ifdef PLUGIN_SUPPORT #include -#include #include "plugin-suffix.h" #endif @@ -1271,9 +1270,9 @@ static const char * offload_target_to_plugin_name (const char *offload_target) { - if (fnmatch ("*-intelmic*", offload_target, 0) == 0) + if (strstr (offload_target, "-intelmic") != NULL) return "intelmic"; - if (fnmatch ("nvptx*", offload_target, 0) == 0) + if (strncmp (offload_target, "nvptx", 5) == 0) return "nvptx"; gomp_fatal ("Unknown offload target: %s", offload_target); } Index: libgomp/configure === --- libgomp/configure (revision 227169) +++ libgomp/configure (working copy) @@ -15119,33 +15119,6 @@ offload_targets= plugin_support=yes -for ac_header in fnmatch.h -do : - ac_fn_c_check_header_mongrel "$LINENO" "fnmatch.h" "ac_cv_header_fnmatch_h" "$ac_includes_default" -if test "x$ac_cv_header_fnmatch_h" = x""yes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_FNMATCH_H 1 -_ACEOF - -else - plugin_support=no -fi - -done - -for ac_func in fnmatch -do : - ac_fn_c_check_func "$LINENO" "fnmatch" "ac_cv_func_fnmatch" -if test "x$ac_cv_func_fnmatch" = x""yes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_FNMATCH 1 -_ACEOF - -else - plugin_support=no -fi -done - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl" >&5 $as_echo_n "checking for dlsym in -ldl... " >&6; } if test "${ac_cv_lib_dl_dlsym+set}" = set; then : Index: libgomp/plugin/configfrag.ac === --- libgomp/plugin/configfrag.ac(revision 227169) +++ libgomp/plugin/configfrag.ac(working copy) @@ -29,8 +29,6 @@ offload_targets= AC_SUBST(offload_targets) plugin_support=yes -AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no]) -AC_CHECK_FUNCS([fnmatch], , [plugin_support=no]) AC_CHECK_LIB(dl, dlsym, , [plugin_support=no]) if test x"$plugin_support" = xyes; then AC_DEFINE(PLUGIN_SUPPORT, 1, -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, rs6000] Fix vec_shr define_expand
On Tue, Aug 25, 2015 at 10:14 AM, Pat Haugen wrote: > The following patch fixes the vec_shr expander to do a shift instead of a > rotate. CPU2006 benchmark 482.sphinx3 recently started failing due to this > issue. Bootstrapped and tested on ppc64/ppc64le with no new regressions. Ok > for trunk? And ok for 4.9/5 (with equivalent change to vec_shl expander > which exists in those releases) after bootstrap/regtest? > > -Pat > > > 2015-08-25 Pat Haugen > > * config/rs6000/vector.md (vec_shr_): Fix to do a shift > instead of a rotate. > > gcc/testsuite: > * gcc.target/powerpc/vec-shr.c: New. This is okay. As Peter and I noticed + zero_reg = gen_reg_rtx(mode); This needs a space after gen_rtx_rtx. Thanks, David
PING: PATCH: Mention --enable-default-pie in gcc-6/changes.html
On Thu, May 28, 2015 at 6:49 AM, H.J. Lu wrote: > OK to install? > > H.J. > --- > Index: gcc-6/changes.html > === > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v > retrieving revision 1.10 > diff -u -p -r1.10 changes.html > --- gcc-6/changes.html 26 May 2015 10:12:08 - 1.10 > +++ gcc-6/changes.html 28 May 2015 13:49:00 - > @@ -140,8 +140,12 @@ enum { > > > > - > +Other significant improvements > > + > +Added --enable-default-pie configure option to > + generate PIE by default. > + > > > PING. -- H.J.
Re: [testsuite] Clean up effective_target cache
On Aug 25, 2015, at 1:14 AM, Christophe Lyon wrote: > Some subsets of the tests override ALWAYS_CXXFLAGS or > TEST_ALWAYS_FLAGS and perform effective_target support tests using > these modified flags. > This patch adds a new function 'clear_effective_target_cache', which > is called at the end of every .exp file which overrides > ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS. So, a simple English directive somewhere that says, if one changes ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS then they should do a clear_effective_target_cache at the end as the target cache can make decisions based upon the flags, and those decisions need to be redone when the flags change would be nice. I do wonder, do we need to reexamine when setting the flags? I’m thinking of a sequence like: non-thumb default, is_thumb, set flags (thumb), is_thumb. Anyway, safe to punt this until someone discovers it or is reasonable sure it happens. Anyway, all looks good. Ok. > However, I noticed that lib/g++.exp changes ALWAYS_CXXFLAGS, but does > not appear to restore it. In doubt, I didn't change it. Yeah, I examined it. It seems like it might not matter, as anyone setting and unsetting would come in cleared, and if they didn’t, it should be roughly the same exact state, meaning, no clearing necessary. I think it is safe to punt this until someone finds a bug or can see a way that it would matter. I also don’t think it would hurt to clear, if someone wanted to refactor the code a bit and make the clearing and the cleanup a little more automatic. I’m thinking of a RAII style code in which the dtor runs the clear. Not sure if that is even possible in tcl. [ checking ] Nope, maybe not. Oh well.
[PATCH] Don't ICE on invalid weak decl (PR middle-end/67330)
Here we are ICEing on an invalid code: symtab_node::get asserts that it's dealing with a function or a static or external variable, but an invalid decl is rejected too late. So don't try to mark_weak an invalid decl and also don't duplicate the "declared weak after being used" check -- that is already in mark_weak. Perhaps we should also punt if (!TARGET_SUPPORTS_WEAK)? Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-08-25 Marek Polacek PR middle-end/67330 * varasm.c (declare_weak): Return after giving an error. * c-common.c (handle_weak_attribute): Don't check whether the visibility can be changed here. * gcc.dg/weak/weak-18.c: New test. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index ff502e5..7691035 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -8328,12 +8328,7 @@ handle_weak_attribute (tree *node, tree name, return NULL_TREE; } else if (VAR_OR_FUNCTION_DECL_P (*node)) -{ - struct symtab_node *n = symtab_node::get (*node); - if (n && n->refuse_visibility_changes) - error ("%+D declared weak after being used", *node); - declare_weak (*node); -} +declare_weak (*node); else warning (OPT_Wattributes, "%qE attribute ignored", name); diff --git gcc/testsuite/gcc.dg/weak/weak-18.c gcc/testsuite/gcc.dg/weak/weak-18.c index e69de29..ebeb4d5 100644 --- gcc/testsuite/gcc.dg/weak/weak-18.c +++ gcc/testsuite/gcc.dg/weak/weak-18.c @@ -0,0 +1,9 @@ +/* PR middle-end/67330 */ +/* { dg-do compile } */ +/* { dg-require-weak "" } */ + +void +f (void) +{ + __attribute__ ((weak)) int a; /* { dg-error "weak declaration of .a. must be public" } */ +} diff --git gcc/varasm.c gcc/varasm.c index 7fa2e7b..d9290a1 100644 --- gcc/varasm.c +++ gcc/varasm.c @@ -5403,7 +5403,10 @@ declare_weak (tree decl) { gcc_assert (TREE_CODE (decl) != FUNCTION_DECL || !TREE_ASM_WRITTEN (decl)); if (! TREE_PUBLIC (decl)) -error ("weak declaration of %q+D must be public", decl); +{ + error ("weak declaration of %q+D must be public", decl); + return; +} else if (!TARGET_SUPPORTS_WEAK) warning (0, "weak declaration of %q+D not supported", decl); Marek
Re: [libgfortran,patch] Remove never-used debugging code
Turns out I missed some of the dead code. And I now also fixed comments and some formatting. libgfortran/runtime/environ.c is now much more readable than before. The patch is still a no-op, in terms of user functionality. OK to commit to trunk? FX unusedcode.ChangeLog Description: Binary data unusedcode.diff Description: Binary data
[nvptx] More gcc testsuite markup
I've committed this to markup more test requirements. Most are obvious enough. nvptx doesn't expose a normal stack, so stack-based tests fail. It also requires correct typing on function calls, so lying about that results in assembler errors. Finally, it doesn't accept string constants, requiring expansion to an array of ints. nathan 2015-08-25 Nathan Sidwell * gcc.dg/20001117-1.c: Needs return_address. * gcc.dg/20020415-1.c: Needs alloca. * gcc.dg/graphite/id-pr44676.c: Needs profiling. * gcc.dg/graphite/pr60979.c: Needs nonlocal_goto * gcc.dg/pr63186.c: Needs label_values. * gcc.dg/torture/pr33848.c: Likwise. * lib/target-supports.exp (check_effective_target_fopenacc, check_effective_target_fopenmp): Disable for nvptx. * gcc.dg/graphite/run-id-pr47653.c: Disable for nvptx. * gcc.dg/stack-usage-1.c: Likewise. * gcc.dg/stack-usage-2.c: Likewise. * gcc.dg/unused-5.c: Likewise. * gcc.dg/unwind-1.c: Likewise. Index: gcc.dg/20001117-1.c === --- gcc.dg/20001117-1.c (revision 227166) +++ gcc.dg/20001117-1.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -finstrument-functions" } */ +/* { dg-require-effective-target return_address } */ extern void abort (void); extern void exit (int); Index: gcc.dg/20020415-1.c === --- gcc.dg/20020415-1.c (revision 227166) +++ gcc.dg/20020415-1.c (working copy) @@ -1,9 +1,9 @@ /* PR target/6303 This testcase ICEd because s390 did not define ASM_SIMPLIFY_DWARF_ADDR hook. */ -/* { dg-require-effective-target alloca } */ /* { dg-do compile { target fpic } } */ /* { dg-options "-O2 -fpic -g" } */ +/* { dg-require-effective-target alloca } */ static inline char * bar (unsigned long x, char *y) Index: gcc.dg/graphite/id-pr44676.c === --- gcc.dg/graphite/id-pr44676.c (revision 227166) +++ gcc.dg/graphite/id-pr44676.c (working copy) @@ -1,4 +1,5 @@ /* { dg-options "-O2 -fgraphite-identity -fprofile-generate" } */ +/* { dg-require-profiling "-fprofile-generate" } */ int extend_options (int h, int map, int x, int y, int dx) Index: gcc.dg/graphite/pr60979.c === --- gcc.dg/graphite/pr60979.c (revision 227166) +++ gcc.dg/graphite/pr60979.c (working copy) @@ -1,4 +1,5 @@ /* { dg-options "-O -fgraphite-identity" } */ +/* { dg-require-effective-target nonlocal_goto } */ #include Index: gcc.dg/graphite/run-id-pr47653.c === --- gcc.dg/graphite/run-id-pr47653.c (revision 227166) +++ gcc.dg/graphite/run-id-pr47653.c (working copy) @@ -1,4 +1,6 @@ /* { dg-options "-O -fstack-check=generic -ftree-pre -fgraphite-identity" } */ +/* nvptx doesn't expose a stack. */ +/* { dg-skip-if "" { nvptx-*-* } { "*" } { "" } } */ int main () { Index: gcc.dg/pr63186.c === --- gcc.dg/pr63186.c (revision 227166) +++ gcc.dg/pr63186.c (working copy) @@ -1,5 +1,7 @@ /* { dg-do link } */ /* { dg-options "-O2" } */ +/* { dg-require-effective-target label_values } */ + void *a; int b, c, d; Index: gcc.dg/stack-usage-1.c === --- gcc.dg/stack-usage-1.c (revision 227166) +++ gcc.dg/stack-usage-1.c (working copy) @@ -1,5 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-fstack-usage" } */ +/* nvptx doesn't have a reg allocator, and hence no stack usage data. */ +/* { dg-skip-if "" { nvptx-*-* } { "*" } { "" } } */ /* This is aimed at testing basic support for -fstack-usage in the back-ends. See the SPARC back-end for example (grep flag_stack_usage_info in sparc.c). Index: gcc.dg/stack-usage-2.c === --- gcc.dg/stack-usage-2.c (revision 227166) +++ gcc.dg/stack-usage-2.c (working copy) @@ -1,5 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-Wstack-usage=512" } */ +/* nvptx gets very upset with mismatched function types. */ +/* { dg-skip-if "" { nvptx-*-* } { "*" } { "" } } */ int foo1 (void) /* { dg-bogus "stack usage" } */ { Index: gcc.dg/torture/pr33848.c === --- gcc.dg/torture/pr33848.c (revision 227166) +++ gcc.dg/torture/pr33848.c (working copy) @@ -1,4 +1,3 @@ -/* { dg-require-effective-target label_values } */ /* &&foo should be hoisted, but on most targets, excess register pressure forces it to be rematerialized before "data != &&foo". On targets that have a "branch if registers are equal" instruction, this leads to the @@ -7,6 +6,7 @@ say that &&foo was the target of the branch, and the real target would then be removed as dead. */ /* { dg-do link } */ +/* { dg-require-effective-target label_values } */ #define NVARS 3
[libgfortran,committed] Fix default SIGN mode on preconnected/internal units
Preconnected and internal units currently have their sign mode set to SIGN_SUPPRESS, rather than the logical value of SIGN_UNSPECIFIED. This does not matter in most cases, since our chosen processor-dependent behavior is to suppress optional plus signs anyway… … except when one tries to override the default behavior with environment variable GFORTRAN_OPTIONAL_PLUS, which is thus currently broken on internal and preconnected units. Take the following code: character(len=20) :: s print *, 42. write(s,"(G0)") 42. print *, s end without the patch, run with “GFORTRAN_OPTIONAL_PLUS=y”, it will still output: 42.000 42.000 while with the patch, it will now correctly output (with GFORTRAN_OPTIONAL_PLUS=y): +42.000 +42.000 I regtested on x86_64-apple-darwin15, and committed as trivial. (I couldn’t come up with a way to figure out how to test that in the testuite, though.) FX sign.ChangeLog Description: Binary data sign.diff Description: Binary data
[PATCH] Update wwwdocs for --with-advance-toolchain=at
I installed the following patch on wwwdocs to document the --with-advance-toolchain= option I added in June: 2015-08-25 Michael Meissner * changes.html (PowerPC options): Document new configure option --with-advance-toolchain=at. Index: htdocs/gcc-6/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.21 diff -p -c -r1.21 changes.html *** htdocs/gcc-6/changes.html 13 Aug 2015 12:32:16 - 1.21 --- htdocs/gcc-6/changes.html 25 Aug 2015 16:38:13 - *** enum { *** 182,187 --- 182,195 + + A new configuration option ---with-advance-toolchain=at + was added for PowerPC 64-bit Linux systems to use the header files, library + files, and the dynamic linker from a specific Advance Toolchain release + instead of the default versions that are provided by the Linux + distribution. In general, this option is intended for the developers of + GCC, and it is not intended for general use. + -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [libgfortran,patch] Remove never-used debugging code
On Tue, Aug 25, 2015 at 06:17:13PM +0200, FX wrote: > Turns out I missed some of the dead code. And I now also fixed comments and > some formatting. > libgfortran/runtime/environ.c is now much more readable than before. > The patch is still a no-op, in terms of user functionality. > OK to commit to trunk? > Certainly, the dead code can go. But,is this changing the library ABI? troutmask:fvwm:kargl[764] nm /mnt/sgk/work/6/lib/libgfortran.a | grep show_ T _gfortrani_show_variables t show_boolean t show_integer t show_sep t show_string T _gfortrani_show_locus -- Steve
Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"
TL;DR: See last... > From: Ulrich Weigand > Date: Tue, 25 Aug 2015 14:59:05 +0200 > However, the compiler actually does accept -fPIC. If the flag is > present, we attempt to generate relocatable code, but only to the > extent the compiler can do that without support for run-time > relocations. The most significant restriction is that statically > initializing a global variable to a pointer will not work. > (This is useful for some special cases of self-relocating code. > Such code normally can work around this restriction.) Still, things like that is why I chose to emit a hard error for -fPIC/-fpic where it's not supported for *all* code... > Now, with the patch above, libbacktrace is still compiled with > -fPIC on SPU, but some files do in fact contain just such global > initializers, causing compilation to fail: > > gcc-head/src/libbacktrace/elf.c:241:27: error: creating run-time relocation > for '*.LC2' > static const char * const debug_section_names[DEBUG_MAX] = >^ > The other GCC run-time libraries rely on libtool to figure out > that even though -fPIC works, dynamic libraries are still not > supported on the platform, and thus compile everything for > static linking (i.e. without -fPIC). That's not what I see, at least not the "figuring out" part. (They mostly use libtool as-is; some test tuples, but some test version-script support and add it then.) > I'm wondering if we couldn't use the same libtool mechanism here: > if the architecture does not support dynamic linking at all, no > target library will be built as shared library, and thus there is > no need to build libbacktrace with -fPIC either. (My understanding > is that we need to build libbacktrace with -fPIC because it might > get linked into some other shared target library.) Yes, that's what the comment in the patch context says, as happens for libgfortran. > The libbacktrace configure script actually incorporates all the > libtool init code that makes this determination, and sets the > shell variable "can_build_shared" to "no" on SPU. Would it be > valid to use this variable in the test whether to use -fPIC? > (I'm not sure which of the many libtool variables are intended > to be used outside, and which are private ...) I momentarily pondered this too, when I found the libtool PIC-test-code grepping libtool/configure, but I chose the simpler TRY_COMPILE test partly for the same maybe-internal-variable reason. A visit to the libtool documentation shows can_build_shared is for some reason not listed among the documented variables and a STFW doesn't yield more information for the first few pages of hits (just some indexed random libtool copies). On the other hand, libtool.m4 is in the top directory, so we know if we switch to some version without can_build_shared. I'll leave that to you to sort out, but if you chose to use $can_build_shared, consider also setting PIC_FLAG to $pic_flag (instead of plain -fPIC). In the meantime I'll commit my patch as it solves *some* of the breakage; for targets erroring on -fPIC. ...but reading the libtool documention I think I found a much better solution: Let's just add -prefer-pic when compiling libbacktrace. It leaves everything to libtool. Can you please test this? libbacktrace: * configure.ac: Use libtool option -prefer-pic, not -fPIC. * configure: Regenerate. diff -upr /expvol/pp_slask/hp/checkout/gcchead/gcc/libbacktrace/configure.ac libbacktrace/configure.ac --- libbacktrace/configure.ac 2015-05-29 17:23:20.0 +0200 +++ libbacktrace/configure.ac 2015-08-24 17:31:18.0 +0200 @@ -163,10 +163,11 @@ fi # When building as a target library, shared libraries may want to link # this in. We don't want to provide another shared library to -# complicate dependencies. Instead, we just compile with -fPIC. +# complicate dependencies. Instead, we prefer PIC, if the target +# supports that through libtool. PIC_FLAG= if test -n "${with_target_subdir}"; then - PIC_FLAG=-fPIC + PIC_FLAG=-prefer-pic fi # Similarly, use -fPIC with --enable-host-shared: AC_ARG_ENABLE(host-shared, brgds, H-P
[PATCH] rs6000: Fix PR67344
The "*and3_imm_dot_shifted" pattern is a define_insn_and_split, like most "dot" patterns: if its output is not assigned cr0 but some other cr reg, it splits to a non-dot insn and a compare. Unfortunately that non-dot insn will clobber cr0 as well. We could add another clobber (with "=X,x"), but then that second alternative is never useful; instead, just remove that second alternative. Bootstrapped and tested on powerpc64-linux; is this okay for trunk? Segher 2015-08-25 Segher Boessenkool PR target/67344 * config/rs6000/rs6000.md (*and3_imm_dot_shifted): Change to a define_insn, remove second alternative. --- gcc/config/rs6000/rs6000.md | 29 - 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 527ad98..2138184 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3037,15 +3037,15 @@ (define_insn_and_split "*and3_imm_mask_dot2" (set_attr "dot" "yes") (set_attr "length" "4,8")]) -(define_insn_and_split "*and3_imm_dot_shifted" - [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y") +(define_insn "*and3_imm_dot_shifted" + [(set (match_operand:CC 3 "cc_reg_operand" "=x") (compare:CC (and:GPR - (lshiftrt:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,r") - (match_operand:SI 4 "const_int_operand" "n,n")) - (match_operand:GPR 2 "const_int_operand" "n,n")) + (lshiftrt:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r") + (match_operand:SI 4 "const_int_operand" "n")) + (match_operand:GPR 2 "const_int_operand" "n")) (const_int 0))) - (clobber (match_scratch:GPR 0 "=r,r"))] + (clobber (match_scratch:GPR 0 "=r"))] "logical_const_operand (GEN_INT (UINTVAL (operands[2]) << INTVAL (operands[4])), DImode) @@ -3054,23 +3054,10 @@ (define_insn_and_split "*and3_imm_dot_shifted" && rs6000_gen_cell_microcode" { operands[2] = GEN_INT (UINTVAL (operands[2]) << INTVAL (operands[4])); - if (which_alternative == 0) -return "andi%e2. %0,%1,%u2"; - else -return "#"; + return "andi%e2. %0,%1,%u2"; } - "&& reload_completed && cc_reg_not_cr0_operand (operands[3], CCmode)" - [(set (match_dup 0) - (and:GPR (lshiftrt:GPR (match_dup 1) - (match_dup 4)) -(match_dup 2))) - (set (match_dup 3) - (compare:CC (match_dup 0) - (const_int 0)))] - "" [(set_attr "type" "logical") - (set_attr "dot" "yes") - (set_attr "length" "4,8")]) + (set_attr "dot" "yes")]) (define_insn "and3_mask" -- 1.8.1.4
[PATCH] rs6000: Fix PR67346
"*ior_mask" is a define_insn_and_split, so it shouldn't use can_create_pseudo in its instruction condition, because IRA can then create such an instruction, and the condition becomes false before the insn is split. Use a scratch instead. Bootstrapped and tested on powerpc64-linux; okay for trunk? Segher 2015-08-25 Segher Boessenkool PR target/67346 * config/rs6000/rs6000.md (*ior_mask): Use a match_scratch. --- gcc/config/rs6000/rs6000.md | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2138184..1710b5e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3652,9 +3652,9 @@ (define_split (define_insn_and_split "*ior_mask" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ior:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") -(match_operand:GPR 2 "const_int_operand" "n")))] - "can_create_pseudo_p () - && !logical_const_operand (operands[2], mode) +(match_operand:GPR 2 "const_int_operand" "n"))) + (clobber (match_scratch:GPR 3 "=r"))] + "!logical_const_operand (operands[2], mode) && rs6000_is_valid_mask (operands[2], NULL, NULL, mode)" "#" "&& 1" @@ -3669,7 +3669,8 @@ (define_insn_and_split "*ior_mask" { int nb, ne; rs6000_is_valid_mask (operands[2], &nb, &ne, mode); - operands[3] = gen_reg_rtx (mode); + if (GET_CODE (operands[3]) == SCRATCH) +operands[3] = gen_reg_rtx (mode); operands[4] = GEN_INT (ne); operands[5] = GEN_INT (~UINTVAL (operands[2])); } -- 1.8.1.4
Re: [libgfortran,patch] Remove never-used debugging code
> Certainly, the dead code can go. But,is this changing the library ABI? > > troutmask:fvwm:kargl[764] nm /mnt/sgk/work/6/lib/libgfortran.a | grep show_ > T _gfortrani_show_variables > t show_boolean > t show_integer > t show_sep > t show_string > T _gfortrani_show_locus Nope, none of those functions are actually publicly exported. They are not in gfortran.map, being either static, or having _gfortrani_ prefix which means internal libgfortran use. FX
[gomp4] optimize routine calling
When forking to call a partitioned routine there is no need to propagate local state from the active thread to the forked threads. I've committed this patch to implement that optimization. nathan 2015-08-25 Nathan Sidwell * config/nvptx/nvptx.c (nvptx_emit_forking): Add is_call argument, propagate it into mask. (nvptx_emit_joining): Likewise. (nvptx_expand_call): Move emit_forking call to later. Add is_call argument. (nvptx_expand_oacc_fork, nvptx_expand_oacc_join): Asjust calls. (nvptx_discover_pars): Don't look for predecessor insn in call forks and joins. (nvptx_process_pars): Don't emit propagation code for a call. Index: gcc/config/nvptx/nvptx.c === --- gcc/config/nvptx/nvptx.c (revision 227159) +++ gcc/config/nvptx/nvptx.c (working copy) @@ -1047,16 +1047,16 @@ nvptx_expand_compare (rtx compare) /* Emit forking instructions for MASK. */ static void -nvptx_emit_forking (unsigned mask) +nvptx_emit_forking (unsigned mask, bool is_call) { mask &= (GOMP_DIM_MASK (GOMP_DIM_WORKER) | GOMP_DIM_MASK (GOMP_DIM_VECTOR)); if (mask) { - rtx op = GEN_INT (mask); + rtx op = GEN_INT (mask | (is_call << GOMP_DIM_MAX)); /* Emit fork for worker level. */ - if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) + if (!is_call && mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) emit_insn (gen_nvptx_fork (op)); emit_insn (gen_nvptx_forked (op)); } @@ -1065,16 +1065,19 @@ nvptx_emit_forking (unsigned mask) /* Emit joining instructions for MASK. */ static void -nvptx_emit_joining (unsigned mask) +nvptx_emit_joining (unsigned mask, bool is_call) { mask &= (GOMP_DIM_MASK (GOMP_DIM_WORKER) | GOMP_DIM_MASK (GOMP_DIM_VECTOR)); if (mask) { - rtx op = GEN_INT (mask); + rtx op = GEN_INT (mask | (is_call << GOMP_DIM_MAX)); - /* Emit joining for all pars. */ - emit_insn (gen_nvptx_joining (op)); + /* Emit joining for all non-call pars to ensure there's a single + predecessor for the block the join insn ends up in. This is + needed for skipping entire loops. */ + if (!is_call) + emit_insn (gen_nvptx_joining (op)); emit_insn (gen_nvptx_join (op)); } } @@ -1135,8 +1138,6 @@ nvptx_expand_call (rtx retval, rtx addre } } - nvptx_emit_forking (parallel); - if (cfun->machine->funtype /* It's possible to construct testcases where we call a variable. See compile/20020129-1.c. stdarg_p will crash so avoid calling it @@ -1195,11 +1196,12 @@ nvptx_expand_call (rtx retval, rtx addre write_func_decl_from_insn (func_decls, retval, pat, callee); } } + nvptx_emit_forking (parallel, true); emit_call_insn (pat); if (tmp_retval != retval) emit_move_insn (retval, tmp_retval); - nvptx_emit_joining (parallel); + nvptx_emit_joining (parallel, true); } /* Expand the oacc fork & join primitive into ptx-required unspecs. */ @@ -1207,13 +1209,13 @@ nvptx_expand_call (rtx retval, rtx addre void nvptx_expand_oacc_fork (unsigned mode) { - nvptx_emit_forking (GOMP_DIM_MASK (mode)); + nvptx_emit_forking (GOMP_DIM_MASK (mode), false); } void nvptx_expand_oacc_join (unsigned mode) { - nvptx_emit_joining (GOMP_DIM_MASK (mode)); + nvptx_emit_joining (GOMP_DIM_MASK (mode), false); } /* Expander for reduction locking and unlocking. We expect SRC to be @@ -2611,7 +2613,8 @@ nvptx_discover_pars (bb_insn_map_t *map) l = new parallel (l, mask); l->forked_block = block; l->forked_insn = end; - if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) + if (!(mask & GOMP_DIM_MASK (GOMP_DIM_MAX)) + && (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))) l->fork_insn = nvptx_discover_pre (block, CODE_FOR_nvptx_fork); } @@ -2626,7 +2629,8 @@ nvptx_discover_pars (bb_insn_map_t *map) gcc_assert (l->mask == mask); l->join_block = block; l->join_insn = end; - if (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) + if (!(mask & GOMP_DIM_MASK (GOMP_DIM_MAX)) + && (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))) l->joining_insn = nvptx_discover_pre (block, CODE_FOR_nvptx_joining); l = l->parent; @@ -3013,7 +3017,9 @@ nvptx_process_pars (parallel *par) inner_mask |= par->inner_mask; } - if (par->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) + if (par->mask & GOMP_DIM_MASK (GOMP_DIM_MAX)) +{ /* No propagation needed for a call. */ } + else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)) { nvptx_wpropagate (false, par->forked_block, par->forked_insn); nvptx_wpropagate (true, par->forked_block, par->fork_insn);
Re: [PATCH] rs6000: Fix PR67344
On Tue, Aug 25, 2015 at 1:08 PM, Segher Boessenkool wrote: > The "*and3_imm_dot_shifted" pattern is a define_insn_and_split, > like most "dot" patterns: if its output is not assigned cr0 but some > other cr reg, it splits to a non-dot insn and a compare. > > Unfortunately that non-dot insn will clobber cr0 as well. We could > add another clobber (with "=X,x"), but then that second alternative > is never useful; instead, just remove that second alternative. > > Bootstrapped and tested on powerpc64-linux; is this okay for trunk? > > > Segher > > > 2015-08-25 Segher Boessenkool > > PR target/67344 > * config/rs6000/rs6000.md (*and3_imm_dot_shifted): Change to > a define_insn, remove second alternative. Okay. Thanks, David
Re: [PATCH] rs6000: Fix PR67346
On Tue, Aug 25, 2015 at 1:09 PM, Segher Boessenkool wrote: > "*ior_mask" is a define_insn_and_split, so it shouldn't use > can_create_pseudo in its instruction condition, because IRA can then > create such an instruction, and the condition becomes false before > the insn is split. Use a scratch instead. > > Bootstrapped and tested on powerpc64-linux; okay for trunk? > > > Segher > > > 2015-08-25 Segher Boessenkool > > PR target/67346 > * config/rs6000/rs6000.md (*ior_mask): Use a match_scratch. Okay. Thanks, David
[gomp4.1] comment some stuff
I'm obviously not smart enough to understand libgomp's tasking runtime, and rth and you get 0 for commenting skills ;-). I had some notes scribbled down while reading the code, and figured someone else might read this code some day. It's still in dire need of commenting, but this mildly helps. OK for branch? commit 5fc2816946c9250c4cca43d002b364b2d6400919 Author: Aldy Hernandez Date: Tue Aug 25 10:32:48 2015 -0700 * env.c: Make gomp_max_task_priority_var static. * libgomp.h (struct gomp_task_depend_entry): Add comment. * task.c (gomp_clear_parent): Document function. (GOMP_task): Same. (gomp_task_run_pre): Add comments. (gomp_task_run_post_handle_dependers): Same. (gomp_task_run_post_remove_parent): Same. (gomp_task_run_post_remove_taskgroup): Same. (GOMP_taskwait): Same. (gomp_task_maybe_wait_for_dependencies): Same. diff --git a/libgomp/env.c b/libgomp/env.c index 65a6851..0569521 100644 --- a/libgomp/env.c +++ b/libgomp/env.c @@ -69,7 +69,7 @@ struct gomp_task_icv gomp_global_icv = { unsigned long gomp_max_active_levels_var = INT_MAX; bool gomp_cancel_var = false; -int gomp_max_task_priority_var = 0; +static int gomp_max_task_priority_var = 0; #ifndef HAVE_SYNC_BUILTINS gomp_mutex_t gomp_managed_threads_lock; #endif diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 9031649..3d705ef 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -279,6 +279,7 @@ struct gomp_task_depend_entry struct gomp_task_depend_entry *next; struct gomp_task_depend_entry *prev; struct gomp_task *task; + /* Depend entry is of type "IN". */ bool is_in; bool redundant; bool redundant_out; diff --git a/libgomp/task.c b/libgomp/task.c index f2a0fae..7c7bae4 100644 --- a/libgomp/task.c +++ b/libgomp/task.c @@ -92,6 +92,8 @@ gomp_end_task (void) thr->task = task->parent; } +/* Orphan the task in CHILDREN and all its siblings. */ + static inline void gomp_clear_parent (struct gomp_task *children) { @@ -110,7 +112,12 @@ static void gomp_task_maybe_wait_for_dependencies (void **depend); /* Called when encountering an explicit task directive. If IF_CLAUSE is false, then we must not delay in executing the task. If UNTIED is true, - then the task may be executed by any member of the team. */ + then the task may be executed by any member of the team. + + DEPEND is an array containing: + depend[0]: number of depend elements. + depend[1]: number of depend elements of type "out". + depend[N+2]: address of [0..N]th depend element. */ void GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *), @@ -444,8 +451,10 @@ gomp_task_run_pre (struct gomp_task *child_task, struct gomp_task *parent, { if (parent) { + /* Remove child_task from parent. */ if (parent->children == child_task) parent->children = child_task->next_child; + if (__builtin_expect (child_task->parent_depends_on, 0) && parent->taskwait->last_parent_depends_on == child_task) { @@ -456,8 +465,10 @@ gomp_task_run_pre (struct gomp_task *child_task, struct gomp_task *parent, parent->taskwait->last_parent_depends_on = NULL; } } + /* Remove child_task from taskgroup. */ if (taskgroup && taskgroup->children == child_task) taskgroup->children = child_task->next_taskgroup; + child_task->prev_queue->next_queue = child_task->next_queue; child_task->next_queue->prev_queue = child_task->prev_queue; if (team->task_queue == child_task) @@ -528,6 +539,7 @@ gomp_task_run_post_handle_dependers (struct gomp_task *child_task, if (parent->taskwait && parent->taskwait->last_parent_depends_on && !task->parent_depends_on) { + /* Put task in last_parent_depends_on. */ struct gomp_task *last_parent_depends_on = parent->taskwait->last_parent_depends_on; task->next_child = last_parent_depends_on->next_child; @@ -535,6 +547,7 @@ gomp_task_run_post_handle_dependers (struct gomp_task *child_task, } else { + /* Put task at the top of the sibling list. */ task->next_child = parent->children; task->prev_child = parent->children->prev_child; parent->children = task; @@ -544,6 +557,7 @@ gomp_task_run_post_handle_dependers (struct gomp_task *child_task, } else { + /* Put task in the sibling list. */ task->next_child = task; task->prev_child = task; parent->children = task; @@ -628,12 +642,18 @@ gomp_task_run_post_handle_depend (struct gomp_task *child_task, return gomp_task_run_post_handle_dependers (child_task, team); } +/* Remove CHILD_TASK from its parent. */ + static inline void gomp_task_
Re: [gomp4.1] comment some stuff
On 08/25/2015 10:35 AM, Aldy Hernandez wrote: -int gomp_max_task_priority_var = 0; +static int gomp_max_task_priority_var = 0; Sorry I snuck that in there. The variable is unused elsewhere, might as well make it static. Aldy
[gomp-4.1] fix incorrect memory size in goacc_new_thread
This is either blatantly wrong or subtly correct, in which case it needs a comment. My guess is the former. OK for branch? commit 330391636113ed9a9067e6eb639755fb0f4723dc Author: Aldy Hernandez Date: Tue Aug 25 10:41:28 2015 -0700 * oacc-init.c (goacc_new_thread): Use correct size of goacc_thread when allocating memory. diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index c91731e..e6d2c03 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -312,7 +312,7 @@ acc_shutdown_1 (acc_device_t d) static struct goacc_thread * goacc_new_thread (void) { - struct goacc_thread *thr = gomp_malloc (sizeof (struct gomp_thread)); + struct goacc_thread *thr = gomp_malloc (sizeof (struct goacc_thread)); #if defined HAVE_TLS || defined USE_EMUTLS goacc_tls_data = thr;
Re: Fix libbacktrace -fPIC breakage from "Use libbacktrace in libgfortran"
Hans-Peter Nilsson wrote: > > From: Ulrich Weigand > > Date: Tue, 25 Aug 2015 14:59:05 +0200 > > > The other GCC run-time libraries rely on libtool to figure out > > that even though -fPIC works, dynamic libraries are still not > > supported on the platform, and thus compile everything for > > static linking (i.e. without -fPIC). > > That's not what I see, at least not the "figuring out" part. > (They mostly use libtool as-is; some test tuples, but some test > version-script support and add it then.) Well, the "figuring out" is implicit; because libtool knows the platform does not support dynamic linking, it defaults to --disable-shared, which means only static libraries are being built; and the default compile option when building static libraries does not use -fPIC. > I'll leave that to you to sort out, but if you chose to use > $can_build_shared, consider also setting PIC_FLAG to $pic_flag > (instead of plain -fPIC). In the meantime I'll commit my patch > as it solves *some* of the breakage; for targets erroring on -fPIC. > > ...but reading the libtool documention I think I found a much > better solution: Let's just add -prefer-pic when compiling > libbacktrace. It leaves everything to libtool. Can you please > test this? Hmm, reading the documentation an even simpler version that has equivalent effect to yours should be just adding the pic-only option when calling LT_INIT. However, neither works for the SPU, because in both cases libtool will only do the test whether the target supports the -fPIC option. It will not test whether the target supports dynamic libraries. [ It will do that test; and default to --disable-shared on SPU. That is a no-op for libbacktrace however, since it calls LT_INIT with the disable-shared option anyway. When adding back the -fPIC flag due to either the pic-only LT_INIT option or the -prefer-pic libtool command line option, it does not check for that again. ] Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [libgfortran,patch] Remove never-used debugging code
On Tue, Aug 25, 2015 at 07:10:23PM +0200, FX wrote: > > Certainly, the dead code can go. But,is this changing the library ABI? > > > > troutmask:fvwm:kargl[764] nm /mnt/sgk/work/6/lib/libgfortran.a | grep show_ > > T _gfortrani_show_variables > > t show_boolean > > t show_integer > > t show_sep > > t show_string > > T _gfortrani_show_locus > > Nope, none of those functions are actually publicly exported. > They are not in gfortran.map, being either static, or having > _gfortrani_ prefix which means internal libgfortran use. > OK. Just checking. Thanks for the code cleanup. -- Steve
RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.
> -Original Message- > From: Andrew Bennett [mailto:andrew.benn...@imgtec.com] > Sent: Tuesday, July 21, 2015 10:15 AM > To: gcc-patches@gcc.gnu.org > Cc: Matthew Fortune; Moore, Catherine > Subject: [PATCH] MIPS: If a test in the MIPS testsuite requires standard > library support check the sysroot supports the required test options. > > Hi, > > The recent changes to the MIPS GCC Linux sysroot > (https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01014.html) have meant > that the include directory is now not global and is provided only for each > multi-lib configuration. This means that for any test in the MIPS GCC > Testsuite that requires standard library support we need to check if there is > a > multi-lib support for the test options, otherwise it might fail to compile. > > This patch adds this support to the testsuite and mips.exp files. Firstly any > test that requires standard library support has the implicit option > "(REQUIRES_STDLIB)" added to its dg-options. Secondly in mips.exp a pre- > processor check is performed to ensure that when expanding a testcase > containing a "#include " using the current set of test options we do > not get file not found errors. If this happens we mark the testcase as > unsupported. > > The patch has been tested on the mti/img elf/linux-gnu toolchains, and there > have been no new regressions. > > The patch and ChangeLog are below. > > Ok to commit? > > Yes. This looks good.
Re: Indirect jumps
On 08/25/2015 08:11 AM, Nathan Sidwell wrote: Ptx is one of those rare (unique?) machines that doesn't have an indirect branch. optabs is prepared for such a target and emits a sorry when an indirect branch is needed. However it then goes on to try and emit such an instruction and ends up ICEing. Fixed thusly, ok? (Or is the right solution to define a dummy indirect branch in the PTX md file?) I think we're trying to generally get away from dummy patterns. We could emulate by creating a new stack frame and shoving the target of the branch into the stack, then executing a return. However, I don't think that's worth doing ;-) I think the patch is fine for the trunk. jeff
[gomp4] add reduction lock initializer
Cesar discovered another quirk of PTX. Inspite of PTX documenting that static variables can be initialized and default to zero, there's a little note that it doesn't work for .shared variables. Thus we need code to initialize the worker lock variable used for reductions. This implements a new internal function 'IFN_GOACC_LOCK_INIT', with the same arguments as the LOCK and UNLOCK functions. The intent is that it is emitted at the reduction setup point and expands to target-specific code. For PTX it's deleted for everything but worker level, and for that we expand to an initialization of the lock variable. We can simply use the same insn as the unlocker, but I renamed it to be less confusing. nathan 2015-08-25 Nathan Sidwell * targhooks.h (default_goacc_lock_unlock): Rename to ... (default_goacc_lock): ... here. Adjust. * config/nvptx/nvptx.md (oacc_expand_lock, oacc_expand_unlock): Adjust call to lock expander. (oacc_expand_lock_init): New. (nvptx_spinlock, nvptx_spinunlock): Rename to ... (nvptx_spin_lock, nvtx_spin_reset): ... here. * config/nvptx/ntptx.c (nvptx_expand_oacc_lock_unlock): Rename to ... (nvptx_expand_oacc_lock): ... here. Deal with init too. (nvptx_xform_lock_unlock): Rename to ... (nvptx_xform_lock): ... here. Deal with init too. (TARGET_GOACC_LOCK_UNLOCK): Replace with ... (TARGET_GOACC_LOCK): ... this. * omp-low.c (exectute_oacc_transform): Deal with IFN_GOACC_LOCK_INIT. (default_goacc_lock_unlock): Rename to ... (default_goacc_lock): ... here. Deal with init too. * internal-fn.c (expand_GOACC_LOCK_INIT): New. * internal-fn.def (GOACC_LOCK_INIT): New. * doc/tm.texi.in (TARGET_GOACC_LOCK_UNLOCK): Replace with ... (TARGET_GOACC_LOCK): ... this. * doc/tm.texi: Rebuilt. * target.def (goacc lock_unlock): Replace with ... (goacc lock): ... this. Deal with init too. Index: gcc/targhooks.h === --- gcc/targhooks.h (revision 227174) +++ gcc/targhooks.h (working copy) @@ -110,7 +110,7 @@ extern void default_destroy_cost_data (v extern bool default_goacc_validate_dims (tree, int [], int); extern unsigned default_goacc_dim_limit (unsigned); extern bool default_goacc_fork_join (gimple, const int [], bool); -extern bool default_goacc_lock_unlock (gimple, const int [], bool); +extern bool default_goacc_lock (gimple, const int [], unsigned); /* These are here, and not in hooks.[ch], because not all users of hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS. */ Index: gcc/config/nvptx/nvptx.md === --- gcc/config/nvptx/nvptx.md (revision 227174) +++ gcc/config/nvptx/nvptx.md (working copy) @@ -1371,7 +1371,7 @@ UNSPECV_LOCK)] "" { - nvptx_expand_oacc_lock_unlock (operands[0], true); + nvptx_expand_oacc_lock (operands[0], 0); DONE; }) @@ -1381,7 +1381,17 @@ UNSPECV_LOCK)] "" { - nvptx_expand_oacc_lock_unlock (operands[0], false); + nvptx_expand_oacc_lock (operands[0], +1); + DONE; +}) + +(define_expand "oacc_lock_init" + [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "") + (match_operand:SI 1 "const_int_operand" "")] + UNSPECV_LOCK)] + "" +{ + nvptx_expand_oacc_lock (operands[0], -1); DONE; }) @@ -1592,8 +1602,8 @@ "" "membar%B0;") -;; spinlock and unlock -(define_insn "nvptx_spinlock" +;; spin lock and reset +(define_insn "nvptx_spin_lock" [(parallel [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m") (match_operand:SI 1 "const_int_operand" "i")] @@ -1604,7 +1614,7 @@ "" "%4:\\tatom%R1.cas.b32 %2,%0,0,1;setp.ne.u32 %3,%2,0;@%3 bra.uni %4;") -(define_insn "nvptx_spinunlock" +(define_insn "nvptx_spin_reset" [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m") (match_operand:SI 1 "const_int_operand" "i")] UNSPECV_LOCK) Index: gcc/config/nvptx/nvptx.c === --- gcc/config/nvptx/nvptx.c (revision 227174) +++ gcc/config/nvptx/nvptx.c (working copy) @@ -1220,7 +1220,7 @@ nvptx_expand_oacc_join (unsigned mode) gang or worker level. */ void -nvptx_expand_oacc_lock_unlock (rtx src, bool lock) +nvptx_expand_oacc_lock (rtx src, int direction) { unsigned HOST_WIDE_INT kind; rtx pat; @@ -1230,22 +1230,26 @@ nvptx_expand_oacc_lock_unlock (rtx src, rtx mem = gen_rtx_MEM (SImode, lock_syms[kind]); rtx space = GEN_INT (lock_space[kind]); - rtx barrier = gen_nvptx_membar (GEN_INT (lock_level[kind])); + rtx barrier = NULL_RTX; rtx tmp = gen_reg_rtx (SImode); - if (!lock) + if (direction >= 0) +barrier = gen_nvptx_membar (GEN_INT (lock_level[kind])); + + if (direction > 0) emit_insn (barrier); - if (lock) + if (!direction) { rtx_code_label *label = gen_label_rtx (); LABEL_NUSES (label)++; - pat = gen_nvptx_spinlock (mem, space, tmp, gen_reg_rtx (BImode
Re: [PATCH 1/5] Refactor completely_scalarize_var
On 08/25/2015 05:06 AM, Alan Lawrence wrote: This is a small refactoring/renaming patch, it just moves the call to "completely_scalarize_record" out from completely_scalarize_var, and renames the latter to create_total_scalarization_access. This is because the next patch needs to drop the "_record" suffix and I felt it would be confusing to have both completely_scalarize and completely_scalarize_var. However, it also makes the new function name (create_total_scalarization_access) consistent with the existing code & comment. Bootstrapped + check-gcc on x86_64. gcc/ChangeLog: * tree-sra.c (completely_scalarize_var): Rename to... (create_total_scalarization_access): ... Here. Drop call to completely_scalarize_record. (analyze_all_variable_accesses): Replace completely_scalarize_var with create_total_scalarization_access and completely_scalarize_record. OK. Jeff
Re: [PATCH 2/5] completely_scalarize arrays as well as records
On 08/25/2015 05:06 AM, Alan Lawrence wrote: This changes the completely_scalarize_record path to also work on arrays (thus allowing records containing arrays, etc.). This just required extending the existing type_consists_of_records_p and completely_scalarize_record methods to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both methods so as not to mention 'record'. Bootstrapped + check-gcc on aarch64-none-linux-gnu, arm-none-linux-gnueabihf and x86_64-none-linux-gnu. Have also verified the scan-tree-dump check in the new sra-15.c passes (using a stage 1 compiler only, no execution test) on alpha, hppa, powerpc, sparc, avr and sh. gcc/ChangeLog: * tree-sra.c (type_consists_of_records_p): Rename to... (scalarizable_type_p): ...this, add case for ARRAY_TYPE. (completely_scalarize_record): Rename to... (completely_scalarize): ...this, add ARRAY_TYPE case, move some code to: (scalarize_elem): New. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/sra-15.c: New. --- gcc/testsuite/gcc.dg/tree-ssa/sra-15.c | 38 + gcc/tree-sra.c | 146 ++--- 2 files changed, 135 insertions(+), 49 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c new file mode 100644 index 000..e251058 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c @@ -0,0 +1,38 @@ +/* Verify that SRA total scalarization works on records containing arrays. */ +/* Test skipped for targets with small (often default) MOVE_RATIO. */ ?!? I don't see anything that skips this test for any targets. Presumably this was copied from sra-12.c. I suspect this comment should just be removed. With that comment removed from the testcase, this is OK. jeff
Re: Indirect jumps
On 08/25/15 15:10, Jeff Law wrote: On 08/25/2015 08:11 AM, Nathan Sidwell wrote: We could emulate by creating a new stack frame and shoving the target of the branch into the stack, then executing a return. However, I don't think that's worth doing ;-) And wouldn't work for PTX anyway -- all the return addr handling is hidden away. I think the patch is fine for the trunk. ok.
Go patch committed: Don't crash on erroneous array types
This patch by Chris Manghane fixes the Go frontend to not crash on erroneous array types. This fixes https://golang.org/issue/11546 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 227160) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -81810917af7ba19e1f9f8efc8b1989f7d6419d30 +d6d59d5927c4ea0c02468ebc6a2df431fb64595a The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/types.cc === --- gcc/go/gofrontend/types.cc (revision 226846) +++ gcc/go/gofrontend/types.cc (working copy) @@ -5781,6 +5781,8 @@ Array_type::verify_length() bool Array_type::do_verify() { + if (this->element_type()->is_error_type()) +return false; if (!this->verify_length()) this->length_ = Expression::make_error(this->length_->location()); return true;
Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.
On 08/25/2015 05:06 AM, Alan Lawrence wrote: When SRA completely scalarizes an array, this patch changes the generated accesses from e.g. MEM[(int[8] *)&a + 4B] = 1; to a[1] = 1; This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant propagation in the ssa-dom-cse-2.c testcase (after the next patch that makes SRA handle constant-pool loads). I tried to work around this by making dom2's hashable_expr_equal_p less conservative, but found that on platforms without AArch64's vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC, mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8] *)&a] equivalent to a[0], etc.; a complete overhaul of hashable_expr_equal_p seems like a larger task than this patch series. I can't see how to write a testcase for this in C though as direct assignment to an array is not possible; such assignments occur only with constant pool data, which is dealt with in the next patch. It's a general issue that if there's > 1 common way to represent an expression, then DOM will often miss discovery of the CSE opportunity because of the way it hashes expressions. Ideally we'd be moving to a canonical form, but I also realize that in the case of memory references like this, that may not be feasible. It does make me wonder how many CSEs we're really missing due to the two ways to represent array accesses. Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, aarch64-none-linux-gnu. gcc/ChangeLog: * tree-sra.c (completely_scalarize): Move some code into: (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base is aligned array. --- gcc/tree-sra.c | 110 - 1 file changed, 69 insertions(+), 41 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@ scalarizable_type_p (tree type) } } +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out) Function comment needed. I may have missed it in the earlier patches, but can you please make sure any new functions you created have comments in those as well. Such patches are pre-approved. With the added function comment, this patch is fine. jeff
Re: [RFC 5/5] Always completely replace constant pool entries
On 08/25/2015 05:06 AM, Alan Lawrence wrote: I used this as a means of better-testing the previous changes, as it exercises the constant replacement code a whole lot more. Indeed, quite a few tests are now optimized away to nothing on AArch64... Always pulling in constants, is almost certainly not what we want, but we may nonetheless want something more aggressive than the usual --param, e.g. for the ssa-dom-cse-2.c test. Thoughts welcomed? I'm of the opinion that we have too many knobs already. So I'd perhaps ask whether or not this option is likely to be useful to end users? As for the patch itself, any thoughts on reasonable heuristics for when to pull in the constants? Clearly we don't want the patch as-is, but are there cases we can identify when we want to be more aggressive? jeff
[nvptx] disable another test
I've committed this to avoid tripping over another ptx assembler problem. memcpy is used for block move, and the FUNCTION_DECL for that is created uniquely in expr.c. It's not linked up to any other decl for memcpy. PTX requires declarations forexternal symbols, so we emit one for the blkmove variant. We also emit a definition in this particular testcase, and the PTX assembler complains that we've said both 'external', and 'definition'. Rather than go through heroics for just this testcase, I skip it. nathan 2015-08-25 Nathan Sidwell * gcc.c-torture/execute/builtins/20010124-1.x: New. Index: gcc.c-torture/execute/builtins/20010124-1.x === --- gcc.c-torture/execute/builtins/20010124-1.x (revision 0) +++ gcc.c-torture/execute/builtins/20010124-1.x (working copy) @@ -0,0 +1,10 @@ +load_lib target-supports.exp + +if [istarget "nvptx-*-*"] { +# This test uses memcpy for block move in the same file as it +# defines it. The two decls are not the same, by design, and we +# end up emitting a definition of memcpy, along with a .extern +# declaration. This confuses the ptx assembler. +return 1 +} +return 0
Re: [RFC 4/5] Handle constant-pool entries
On 08/25/2015 05:06 AM, Alan Lawrence wrote: This makes SRA replace loads of records/arrays from constant pool entries, with elementwise assignments of the constant values, hence, overcoming the fundamental problem in PR/63679. As a first pass, the approach I took was to look for constant-pool loads as we scanned through other accesses, and add them as candidates there; to build a constant replacement_decl for any such accesses in completely_scalarize; and to use any existing replacement_decl rather than creating a variable in create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but that does not allow addresses of labels, which can still end up in the constant pool.) Feedback as to the approach or how it might be better structured / fitted into SRA, is solicited ;). Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly increases the number of testcases in which this code is exercised! Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, and sh. gcc/ChangeLog: * tree-sra.c (create_access): Scan for uses of constant pool and add to candidates. (subst_initial): New. (scalarize_elem): Build replacement_decl using subst_initial. (create_access_replacement): Use replacement_decl if set. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param sra-max-scalarization-size-Ospeed. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 7 +--- gcc/tree-sra.c| 56 +-- 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index af35fcc..a3ff2df 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write) else ptr = false; + /* FORNOW: scan for uses of constant pool as we go along. */ I'm not sure why you have this marked as FORNOW. If I'm reading all this code correctly, you're lazily adding items from the constant pool into the candidates table when you find they're used. That seems better than walking the entire constant pool adding them all to the candidates. I don't see this as fundamentally wrong or unclean. The question I have is why this differs from the effects of patch #5. That would seem to indicate that there's things we're not getting into the candidate tables with this approach?!? @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) } } +static tree +subst_initial (tree expr, tree var) Function comment. I think this patch is fine with the function comment added and removing the FORNOW part of the comment in create_access. It may be worth noting in create_access's comment that it can add new items to the candidates tables for constant pool entries. Jeff
Re: [PATCH] Don't ICE on invalid weak decl (PR middle-end/67330)
On 08/25/2015 09:44 AM, Marek Polacek wrote: Here we are ICEing on an invalid code: symtab_node::get asserts that it's dealing with a function or a static or external variable, but an invalid decl is rejected too late. So don't try to mark_weak an invalid decl and also don't duplicate the "declared weak after being used" check -- that is already in mark_weak. Perhaps we should also punt if (!TARGET_SUPPORTS_WEAK)? Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-08-25 Marek Polacek PR middle-end/67330 * varasm.c (declare_weak): Return after giving an error. * c-common.c (handle_weak_attribute): Don't check whether the visibility can be changed here. * gcc.dg/weak/weak-18.c: New test. OK. jeff
Re: [testsuite] Clean up effective_target cache
On 08/25/2015 02:14 AM, Christophe Lyon wrote: Hi, Some subsets of the tests override ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS and perform effective_target support tests using these modified flags. In case these flags conflict with the effective_target tests, it means that subsequent tests will be UNSUPPORTED even though ALWAYS_CXXFLAGS/TEST_ALWAYS_FLAGS have been reset and no longer conflict. In practice, we noticed this when running validation under 'ulimit -v XXX', which can conflict with ASAN. We observed that sse2 and stack_protector tests would randomly fail when tested from asan.exp, making non-asan tests UNSUPPORTED. This patch adds a new function 'clear_effective_target_cache', which is called at the end of every .exp file which overrides ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS. I tested it works well for asan.exp on x86_64 but the changes in other .exp files seem mechanical. However, I noticed that lib/g++.exp changes ALWAYS_CXXFLAGS, but does not appear to restore it. In doubt, I didn't change it. OK? OK after a full regression test. While I agree the change is mechanical, there may be interactions that are non-obvious. Jeff
Re: [PATCH 1/2] driver: support state cleanup
On Tue, 2015-08-25 at 13:48 +, Joseph Myers wrote: > On Thu, 6 Aug 2015, David Malcolm wrote: > > > gcc/ChangeLog: > > * gcc-main.c (main): Add params to driver ctor. > > * gcc.c (class env_manager): New. > > (env): New global. > > (env_manager::init): New. > > (env_manager::get): New. > > (env_manager::xput): New. > > (env_manager::restore): New. > > Poison getenv and putenv. > > (DEFAULT_TARGET_SYSTEM_ROOT): New. > > (target_system_root): Update initialization to use > > DEFAULT_TARGET_SYSTEM_ROOT. > > (struct spec_list): Add field "default_ptr". > > (INIT_STATIC_SPEC): Initialize new field "default_ptr". > > (init_spec): Likewise. > > (set_spec): Clear field "default_ptr". > > (read_specs): Free "spec" and "buffer". > > (xputenv): Reimplement in terms of env_manager. > > (process_command): Replace ::getenv calls with calls to the > > env_manager singleton. > > (process_brace_body): Free string in three places. > > (driver::driver): New. > > (driver::~driver): New. > > (used_arg): Convert from a function to... > > (class used_arg_t): ...this class, and... > > (used_arg): ...this new global instance. > > (used_arg_t::finalize): New function. > > (getenv_spec_function): Add "const" to local "value". Replace > > ::getenv call with call to the env_manager singleton. > > (path_prefix_reset): New function. > > (driver::finalize): New function. > > * gcc.h (driver::driver): New. > > (driver::~driver): New. > > (driver::finalize): New. > > OK. Thanks. I've committed the combination of patches 1 and 2 to trunk as r227188, having double-checked that it still bootstraps & passes regression testing (on x86_64-pc-linux-gnu). [some people on the jit list were specifically interested in this patch kit, hence this email].
Go patch committed: Allow string slices with start == len
This patch by Chris Manghane fixes the Go frontend to permit string slice expressions when the start of the slice is the length of the setring. This were previously erroneously forbidden when using a constant index. This fixes https://golang.org/issue/11522 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 227184) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -d6d59d5927c4ea0c02468ebc6a2df431fb64595a +14ca4b6130b9a7132d132f418e9ea283b3a52c08 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 227039) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -10341,7 +10341,10 @@ String_index_expression::do_check_types( { ival_valid = true; if (mpz_sgn(ival) < 0 - || (sval_valid && mpz_cmp_ui(ival, sval.length()) >= 0)) + || (sval_valid + && (this->end_ == NULL + ? mpz_cmp_ui(ival, sval.length()) >= 0 + : mpz_cmp_ui(ival, sval.length()) > 0))) { error_at(this->start_->location(), "string index out of bounds"); this->set_is_error();
Re: [PATCH] [ping] Use single shared memory block pool for all pool allocators
Mikhail Maltsev writes: > Hi, all. > I'm pinging this patch: > https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00030.html A week and no reply, so FWIW: I really hope this goes in. malloc always shows up high in my profiles too. Just a couple of very minor things to show that I read the patch :-) > diff --git a/gcc/virtual-memory.cc b/gcc/virtual-memory.cc > new file mode 100644 > index 000..69bda37 > --- /dev/null > +++ b/gcc/virtual-memory.cc > @@ -0,0 +1,66 @@ > +/* > + Copyright (C) 2015 Free Software Foundation, Inc. Missing a description line. Same for the .h. > +/* Free all memory allocated by memory_block_pool. */ Strictly speaking I think this is "all unused memory". > + size_t size = reinterpret_cast<_obstack_chunk *> (chunk)->limit - > +reinterpret_cast(chunk); The emacs formatting rule would make this: size_t size = (reinterpret_cast<_obstack_chunk *> (chunk)->limit - reinterpret_cast (chunk)); Still not sure what the rule is supposed to be wrt space before "<" or after ">". > +#ifndef VIRTUAL_MEMORY_H > +#define VIRTUAL_MEMORY_H Tab indentation for the #define but not the #ifndef. (Think it should be a space for both.) > +/* Allocate single block. Reuse previously returned block, if possible. */ Pedantic, but: "a single block", "a previously returned block". > +inline void * > +memory_block_pool::allocate () > +{ > + if (instance.m_blocks == NULL) > +return XNEWVEC (char, block_size); > + > + void *result = instance.m_blocks; > + instance.m_blocks = instance.m_blocks->m_next; > + return result; > +} > + > +/* Return UNCAST_BLOCK to pool. */ "to the pool" > +inline void > +memory_block_pool::remove (void *uncast_block) > +{ > + block_list *block = reinterpret_cast (uncast_block); For aliasing purposes, should this instead be a placement new, to show that a new block_list object is being created? > +extern void *mempool_obstack_chunk_alloc(size_t) ATTRIBUTE_MALLOC; > +extern void mempool_obstack_chunk_free(void *); Space before "(". Thanks, Richard
Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
On 08/21/2015 10:30 AM, Ilya Enkovich wrote: If we're checking an optab to drive an optimization, then we're probably on the wrong track. That's totally similar to VEC_COND_EXPR which we generate comparison into. It is. The vectorizer is riddled with this stuff. Sigh. So I won't consider this a negative for the scalar mask support. I think this ties into the overall discussion about whether or not to represent these masks in gimple or try to handle them later during gimple->rtl expansion. Currently we don't have any abstraction for masks, it is supported using vector of integers. When I expand it I have no idea whether it is just a vector of integers to be stored or a mask to be used for MASK_LOAD. Actually it may be used in both cases at the same time. Handling it in RTL means we have to undo bool->int transformation made in GIMPLE. For trivial cases it may be easy but in generic it can be challenging. I want to avoid it from the beginning. I wasn't suggesting handling them in RTL, but at the border between gimple and RTL. But if we can't reliably determine how a particular mask is going to be used at that point, then doing things at the gimple/RTL border may be a spectacularly bad idea ;-) jeff
Go patch committed: accept numeric literals with leading zeroes
This patch by Chris Manghane fixes the Go frontend to accept numeric literals with leading zeroes, even if they don't turn out to be octal. This fixes https://golang.org/issue/11532 and https://golang.org/issue/11533 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 227191) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -14ca4b6130b9a7132d132f418e9ea283b3a52c08 +f97d579fa8431af5cfde9b0a48604caabfd09377 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/lex.cc === --- gcc/go/gofrontend/lex.cc(revision 226846) +++ gcc/go/gofrontend/lex.cc(working copy) @@ -1047,7 +1047,7 @@ Lex::gather_number() pnum = p; while (p < pend) { - if (*p < '0' || *p > '7') + if (*p < '0' || *p > '9') break; ++p; } @@ -1060,7 +1060,13 @@ Lex::gather_number() std::string s(pnum, p - pnum); mpz_t val; int r = mpz_init_set_str(val, s.c_str(), base); - go_assert(r == 0); + if (r != 0) +{ + if (base == 8) +error_at(this->location(), "invalid octal literal"); + else +error_at(this->location(), "invalid hex literal"); +} if (neg) mpz_neg(val, val);
Re: [PATCH 1/2] driver: support state cleanup
I've committed as obvious this patch to fix a build failure this introduced for targets not defining EXTRA_SPECS, in which case extra_specs is not declared. (nvptx-none was the target for which I came across such a failure.) 2015-08-25 Joseph Myers * gcc.c (driver::finalize): Only assign to extra_specs if [EXTRA_SPECS]. Index: gcc.c === --- gcc.c (revision 227191) +++ gcc.c (working copy) @@ -9813,7 +9813,9 @@ driver::finalize () } *(sl->ptr_spec) = sl->default_ptr; } +#ifdef EXTRA_SPECS extra_specs = NULL; +#endif processing_spec_function = 0; -- Joseph S. Myers jos...@codesourcery.com
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On 08/21/2015 06:17 AM, Ilya Enkovich wrote: Hmm, I don't see how vector masks are more difficult to operate with. There are just no instructions for that but you have to pretend you have to get code vectorized. Also according to vector ABI integer mask should be used for mask operand in case of masked vector call. What ABI? The function signature of the intrinsics? How would that come into play here? Not intrinsics. I mean OpenMP vector functions which require integer arg for a mask in case of 512-bit vector. That's what I assumed -- you can pass in a mask as an argument and it's supposed to be a simple integer, right? Current implementation of masked loads, masked stores and bool patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we really call it a canonical representation for all targets? No idea - we'll revisit when another targets adds a similar capability. AVX-512 is such target. Current representation forces multiple scalar mask -> vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I'm a bit surprised they're so prevalent and hard to optimize away. ISTM PRE ought to handle this kind of thing with relative ease. Fact is GCC already copes with vector masks generated by vector compares just fine everywhere and I'd rather leave it as that. Nope. Currently vector mask is obtained from a vec_cond . AND and IOR on bools are also expressed via additional vec_cond. I don't think vectorizer ever generates vector comparison. And I wouldn't say it's fine 'everywhere' because there is a single target utilizing them. Masked loads and stored for AVX-512 just don't work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to 512-bit vector then we get an ugly inefficient code. The question is where to fight with this inefficiency: in RTL or in GIMPLE. I want to fight with it where it appears, i.e. in GIMPLE by preventing bool -> int conversions applied everywhere even if target doesn't need it. You should expect pushback anytime target dependencies are added to gimple, even if it's stuff in the vectorizer, which is infested with target dependencies. If we don't want to support both types of masks in GIMPLE then it's more reasonable to make bool -> int conversion in expand for targets requiring it, rather than do it for everyone and then leave it to target to transform it back and try to get rid of all those redundant transformations. I'd give vector a chance to become a canonical mask representation for that. Might be worth some experimentation. Jeff
Re: [PATCH 1/5] Refactor completely_scalarize_var
Hi, On Tue, Aug 25, 2015 at 12:06:13PM +0100, Alan Lawrence wrote: > This is a small refactoring/renaming patch, it just moves the call to > "completely_scalarize_record" out from completely_scalarize_var, and renames > the latter to create_total_scalarization_access. > > This is because the next patch needs to drop the "_record" suffix and I felt > it would be confusing to have both completely_scalarize and > completely_scalarize_var. However, it also makes the new function name > (create_total_scalarization_access) consistent with the existing code & > comment. > > Bootstrapped + check-gcc on x86_64. > > gcc/ChangeLog: > > * tree-sra.c (completely_scalarize_var): Rename to... > (create_total_scalarization_access): ... Here. Drop call to > completely_scalarize_record. > > (analyze_all_variable_accesses): Replace completely_scalarize_var > with create_total_scalarization_access and completely_scalarize_record. > --- > gcc/tree-sra.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index 818c290..a0c92b0 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -985,7 +985,7 @@ completely_scalarize_record (tree base, tree decl, > HOST_WIDE_INT offset, > type_consists_of_records_p. */ > > static void > -completely_scalarize_var (tree var) > +create_total_scalarization_access (tree var) If you change what the function does, you have to change the comment too. If I am not mistaken, even with the whole patch set applied, the first sentence would still be: "Create total_scalarization accesses for all scalar type fields in VAR and for VAR as a whole." And with this change, only the part after "and" will remain true. Thanks, Martin > { >HOST_WIDE_INT size = tree_to_uhwi (DECL_SIZE (var)); >struct access *access; > @@ -994,8 +994,6 @@ completely_scalarize_var (tree var) >access->expr = var; >access->type = TREE_TYPE (var); >access->grp_total_scalarization = 1; > - > - completely_scalarize_record (var, var, 0, var); > } > > /* Return true if REF has an VIEW_CONVERT_EXPR somewhere in it. */ > @@ -2529,7 +2527,8 @@ analyze_all_variable_accesses (void) > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > <= max_scalarization_size) > { > - completely_scalarize_var (var); > + create_total_scalarization_access (var); > + completely_scalarize_record (var, var, 0, var); > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, "Will attempt to totally scalarize "); > -- > 1.8.3 >
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On 08/21/2015 04:49 AM, Ilya Enkovich wrote: I want a work with bitmasks to be expressed in a natural way using regular integer operations. Currently all masks manipulations are emulated via vector statements (mostly using a bunch of vec_cond). For complex predicates it may be nontrivial to transform it back to scalar masks and get an efficient code. Also the same vector may be used as both a mask and an integer vector. Things become more complex if you additionally have broadcasts and vector pack/unpack code. It also should be transformed into a scalar masks manipulations somehow. Or why not model the conversion at the gimple level using a CONVERT_EXPR? In fact, the more I think about it, that seems to make more sense to me. We pick a canonical form for the mask, whatever it may be. We use that canonical form and model conversions between it and the other form via CONVERT_EXPR. We then let DOM/PRE find/eliminate the redundant conversions. If it's not up to the task, we should really look into why and resolve. Yes, that does mean we have two forms which I'm not terribly happy about and it means some target dependencies on what the masked vector operation looks like (ie, does it accept a simple integer or vector mask), but I'm starting to wonder if, as distasteful as I find it, it's the right thing to do. But I don't like changing our IL so much as to allow 'integer' masks everywhere. I'm warming up to that idea... jeff
Re: [PATCH 2/5] completely_scalarize arrays as well as records
Hi, On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote: > This changes the completely_scalarize_record path to also work on arrays (thus > allowing records containing arrays, etc.). This just required extending the > existing type_consists_of_records_p and completely_scalarize_record methods > to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both > methods so as not to mention 'record'. thanks for working on this. I see Jeff has already approved the patch, but I have two comments nevertheless. First, I would be much happier if you added a proper comment to scalarize_elem function which you forgot completely. The name is not very descriptive and it has quite few parameters too. Second, this patch should also fix PR 67283. It would be great if you could verify that and add it to the changelog when committing if that is indeed the case. Thanks, Martin > > Bootstrapped + check-gcc on aarch64-none-linux-gnu, arm-none-linux-gnueabihf > and x86_64-none-linux-gnu. > > Have also verified the scan-tree-dump check in the new sra-15.c passes (using > a stage 1 compiler only, no execution test) on alpha, hppa, powerpc, sparc, > avr and sh. > > gcc/ChangeLog: > > * tree-sra.c (type_consists_of_records_p): Rename to... > (scalarizable_type_p): ...this, add case for ARRAY_TYPE. > > (completely_scalarize_record): Rename to... > (completely_scalarize): ...this, add ARRAY_TYPE case, move some code to: > (scalarize_elem): New. > > gcc/testsuite/ChangeLog: > * gcc.dg/tree-ssa/sra-15.c: New. > --- > gcc/testsuite/gcc.dg/tree-ssa/sra-15.c | 38 + > gcc/tree-sra.c | 146 > ++--- > 2 files changed, 135 insertions(+), 49 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c > b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c > new file mode 100644 > index 000..e251058 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c > @@ -0,0 +1,38 @@ > +/* Verify that SRA total scalarization works on records containing arrays. > */ > +/* Test skipped for targets with small (often default) MOVE_RATIO. */ > +/* { dg-do run } */ > +/* { dg-options "-O1 -fdump-tree-release_ssa --param > sra-max-scalarization-size-Ospeed=32" } */ > + > +extern void abort (void); > + > +struct S > +{ > + char c; > + unsigned short f[2][2]; > + int i; > + unsigned short f3, f4; > +}; > + > + > +int __attribute__ ((noinline)) > +foo (struct S *p) > +{ > + struct S l; > + > + l = *p; > + l.i++; > + l.f[1][0] += 3; > + *p = l; > +} > + > +int > +main (int argc, char **argv) > +{ > + struct S a = {0, { {5, 7}, {9, 11} }, 4, 0, 0}; > + foo (&a); > + if (a.i != 5 || a.f[1][0] != 12) > +abort (); > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */ > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index a0c92b0..08fa8dc 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -915,74 +915,122 @@ create_access (tree expr, gimple stmt, bool write) > } > > > -/* Return true iff TYPE is a RECORD_TYPE with fields that are either of > gimple > - register types or (recursively) records with only these two kinds of > fields. > - It also returns false if any of these records contains a bit-field. */ > +/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or ARRAY_TYPE > with > + fields that are either of gimple register types (excluding bit-fields) > + or (recursively) scalarizable types. */ > > static bool > -type_consists_of_records_p (tree type) > +scalarizable_type_p (tree type) > { > - tree fld; > + gcc_assert (!is_gimple_reg_type (type)); > > - if (TREE_CODE (type) != RECORD_TYPE) > -return false; > + switch (TREE_CODE (type)) > + { > + case RECORD_TYPE: > +for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) > + if (TREE_CODE (fld) == FIELD_DECL) > + { > + tree ft = TREE_TYPE (fld); > > - for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) > -if (TREE_CODE (fld) == FIELD_DECL) > - { > - tree ft = TREE_TYPE (fld); > + if (DECL_BIT_FIELD (fld)) > + return false; > > - if (DECL_BIT_FIELD (fld)) > - return false; > + if (!is_gimple_reg_type (ft) > + && !scalarizable_type_p (ft)) > + return false; > + } > > - if (!is_gimple_reg_type (ft) > - && !type_consists_of_records_p (ft)) > - return false; > - } > +return true; > > - return true; > + case ARRAY_TYPE: > +{ > + tree elem = TREE_TYPE (type); > + if (DECL_P (elem) && DECL_BIT_FIELD (elem)) > + return false; > + if (!is_gimple_reg_type (elem) > + && !scalarizable_type_p (elem)) > + return false; > + return true; > +} > + default: > +return false; > + } > } > > -/* Create total_scalarization accesses for all sca
Re: [PATCH 2/5] completely_scalarize arrays as well as records
On 08/25/2015 03:42 PM, Martin Jambor wrote: Hi, On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote: This changes the completely_scalarize_record path to also work on arrays (thus allowing records containing arrays, etc.). This just required extending the existing type_consists_of_records_p and completely_scalarize_record methods to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both methods so as not to mention 'record'. thanks for working on this. I see Jeff has already approved the patch, but I have two comments nevertheless. First, I would be much happier if you added a proper comment to scalarize_elem function which you forgot completely. The name is not very descriptive and it has quite few parameters too. Right. I mentioned that I missed the lack of function comments when looking at #3 and asked Alan to go back and fix them in #1 and #2. Second, this patch should also fix PR 67283. It would be great if you could verify that and add it to the changelog when committing if that is indeed the case. Excellent. Yes, definitely mention the BZ. jeff
Re: PING: PATCH: Mention --enable-default-pie in gcc-6/changes.html
On Tue, 25 Aug 2015, H.J. Lu wrote: >> +Added --enable-default-pie configure option to >> + generate PIE by default. >> + > PING. How about something more like "The ... configure option enables generation of PIE by default"? That feels more consistent with the rest of the page. Okay with that change. Gerald
Re: [PATCH] Update wwwdocs for --with-advance-toolchain=at
Hi Michael, On Tue, 25 Aug 2015, Michael Meissner wrote: > + A new configuration option > ---with-advance-toolchain=at > + was added for PowerPC 64-bit Linux systems to use the header files, > library should this be GNU/Linux in the above (per guidance from the FSF)? > + files, and the dynamic linker from a specific Advance Toolchain release > + instead of the default versions that are provided by the Linux > + distribution. Same here? Gerald
Re: top-level configure.ac: factor the libgomp check for posix-like OS
Ended up using the same approach as libatomic, moving the checking logic into libgupc/configure.tgt. +# Disable libgupc on unsupported systems. +if test -d ${srcdir}/libgupc; then +if test x$enable_libgupc = x; then + AC_MSG_CHECKING([for libgupc support]) + if (srcdir=${srcdir}/libgupc; \ + . ${srcdir}/configure.tgt; \ + test -n "$UNSUPPORTED") + then + AC_MSG_RESULT([no]) + noconfigdirs="$noconfigdirs target-libgupc" + else + AC_MSG_RESULT([yes]) + fi +fi +fi + Thanks, - Gary
Re: [PATCH] Update wwwdocs for --with-advance-toolchain=at
On Tue, Aug 25, 2015 at 11:58:21PM +0200, Gerald Pfeifer wrote: > Hi Michael, > > On Tue, 25 Aug 2015, Michael Meissner wrote: > > + A new configuration option > > ---with-advance-toolchain=at > > + was added for PowerPC 64-bit Linux systems to use the header files, > > library > > should this be GNU/Linux in the above (per guidance from the FSF)? > > > + files, and the dynamic linker from a specific Advance Toolchain > > release > > + instead of the default versions that are provided by the Linux > > + distribution. > > Same here? > > Gerald I checked in a fix. Thanks. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.
Hi, On Tue, Aug 25, 2015 at 12:06:15PM +0100, Alan Lawrence wrote: > When SRA completely scalarizes an array, this patch changes the > generated accesses from e.g. > >MEM[(int[8] *)&a + 4B] = 1; > > to > >a[1] = 1; > > This overcomes a limitation in dom2, that accesses to equivalent > chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with > accesses to e.g. MEM[(int[8] *)&a]. This is necessary for constant > propagation in the ssa-dom-cse-2.c testcase (after the next patch > that makes SRA handle constant-pool loads). > > I tried to work around this by making dom2's hashable_expr_equal_p > less conservative, but found that on platforms without AArch64's > vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC, > mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8] > *)&a] equivalent to a[0], etc.; a complete overhaul of > hashable_expr_equal_p seems like a larger task than this patch > series. Uff. Well, while I am obviously not excited about such workaround landing in SRA, if maintainers agree that it is reasonable, I suppose I'll manage to live with it. I also have more specific comments: > > I can't see how to write a testcase for this in C though as direct assignment > to an array is not possible; such assignments occur only with constant pool > data, which is dealt with in the next patch. > > Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, > aarch64-none-linux-gnu. > > gcc/ChangeLog: > > * tree-sra.c (completely_scalarize): Move some code into: > (get_elem_size): New. > (build_ref_for_offset): Build ARRAY_REF if base is aligned array. > --- > gcc/tree-sra.c | 110 > - > 1 file changed, 69 insertions(+), 41 deletions(-) > > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index 08fa8dc..af35fcc 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -957,6 +957,20 @@ scalarizable_type_p (tree type) >} > } > > +static bool > +get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out) As Jeff already pointed out, this function needs a comment. > +{ > + gcc_assert (TREE_CODE (type) == ARRAY_TYPE); > + tree t_size = TYPE_SIZE (TREE_TYPE (type)); > + if (!t_size || !tree_fits_uhwi_p (t_size)) > +return false; > + unsigned HOST_WIDE_INT sz = tree_to_uhwi (t_size); > + if (!sz) > +return false; > + *sz_out = sz; > + return true; > +} > + > static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree); > > /* Create total_scalarization accesses for all scalar fields of a member > @@ -985,10 +999,9 @@ completely_scalarize (tree base, tree decl_type, > HOST_WIDE_INT offset, tree ref) > case ARRAY_TYPE: >{ > tree elemtype = TREE_TYPE (decl_type); > - tree elem_size = TYPE_SIZE (elemtype); > - gcc_assert (elem_size && tree_fits_uhwi_p (elem_size)); > - int el_size = tree_to_uhwi (elem_size); > - gcc_assert (el_size); > + unsigned HOST_WIDE_INT el_size; > + if (!get_elem_size (decl_type, &el_size)) > + gcc_assert (false); This is usually written as gcc_unreachable () > > tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); > tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); > @@ -1563,7 +1576,7 @@ build_ref_for_offset (location_t loc, tree base, > HOST_WIDE_INT offset, >tree off; >tree mem_ref; >HOST_WIDE_INT base_offset; > - unsigned HOST_WIDE_INT misalign; > + unsigned HOST_WIDE_INT misalign, el_sz; >unsigned int align; > >gcc_checking_assert (offset % BITS_PER_UNIT == 0); > @@ -1572,47 +1585,62 @@ build_ref_for_offset (location_t loc, tree base, > HOST_WIDE_INT offset, > >/* get_addr_base_and_unit_offset returns NULL for references with a > variable > offset such as array[var_index]. */ > - if (!base) > -{ > - gassign *stmt; > - tree tmp, addr; > - > - gcc_checking_assert (gsi); > - tmp = make_ssa_name (build_pointer_type (TREE_TYPE (prev_base))); > - addr = build_fold_addr_expr (unshare_expr (prev_base)); > - STRIP_USELESS_TYPE_CONVERSION (addr); > - stmt = gimple_build_assign (tmp, addr); > - gimple_set_location (stmt, loc); > - if (insert_after) > - gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > - else > - gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > - > - off = build_int_cst (reference_alias_ptr_type (prev_base), > -offset / BITS_PER_UNIT); > - base = tmp; > -} > - else if (TREE_CODE (base) == MEM_REF) > -{ > - off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), > -base_offset + offset / BITS_PER_UNIT); > - off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off); > - base = unshare_expr (TREE_OPERAND (base, 0)); > + if (base > + && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE > + && misalign == 0 > + && get_elem_size (T
Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack
On Tue, Aug 18, 2015 at 1:36 PM, Lynn A. Boger wrote: > > libgo/ > PR target/66870 > configure.ac: When gccgo for building libgo uses the gold version > containing split stack support on ppc64, ppc64le, define > LINKER_SUPPORTS_SPLIT_STACK. > configure: Regenerate. Your version test for gold isn't robust: if the major version >= 3, then presumably split stack is supported. And since you have numbers, I suggest not trying to use switch, but instead writing something like if expr "$gold_minor" == 25; then ... elif expr "$gold_minor" > 25; then ... fi If that is fixed, I'm fine with the libgo part of this patch. Ian
Go patch committed: disallow unary ^ on boolean values
The Go frontend erroneously permitted unary ^ on boolean values. This patch by Chris Manghane fixes the problem, fixing https://golang.org/issue/11529 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 227193) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -f97d579fa8431af5cfde9b0a48604caabfd09377 +d5e6af4e6dd456075a1ec1c03d0dc41cbea5eb36 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 227191) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -3943,9 +3943,8 @@ Unary_expression::do_check_types(Gogo*) break; case OPERATOR_XOR: - if (type->integer_type() == NULL - && !type->is_boolean_type()) - this->report_error(_("expected integer or boolean type")); + if (type->integer_type() == NULL) + this->report_error(_("expected integer")); break; case OPERATOR_AND:
Re: [PATCH] rs6000: Fix PR67344
On Tue, Aug 25, 2015 at 10:08:54AM -0700, Segher Boessenkool wrote: > -(define_insn_and_split "*and3_imm_dot_shifted" > - [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y") > +(define_insn "*and3_imm_dot_shifted" > + [(set (match_operand:CC 3 "cc_reg_operand" "=x") Is this really the best solution? The operand predicate allows any cr, but the constraint only cr0. In the past we've seen this sort of thing result in "insn does not satisfy its constraints" errors, and if the operand is successfully reloaded you'll get slow mcrf insns. At -O1 the testcase generates: andi. 8,3,0x16 mcrf 4,0 I started throwing together a patch yesterday, before you claimed the bug. With this patch, I see what looks to be better code despite it being larger: li 9,22 and 9,3,9 cmpdi 4,9,0 diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 87abd6e..a9eea80 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3060,17 +3060,18 @@ return "#"; } "&& reload_completed && cc_reg_not_cr0_operand (operands[3], CCmode)" - [(set (match_dup 0) - (and:GPR (lshiftrt:GPR (match_dup 1) - (match_dup 4)) -(match_dup 2))) + [(set (match_dup 0) (match_dup 2)) + (set (match_dup 0) (and:GPR (match_dup 1) (match_dup 0))) (set (match_dup 3) (compare:CC (match_dup 0) (const_int 0)))] - "" + " +{ + operands[2] = GEN_INT (UINTVAL (operands[2]) << INTVAL (operands[4])); +}" [(set_attr "type" "logical") (set_attr "dot" "yes") - (set_attr "length" "4,8")]) + (set_attr "length" "4,12")]) (define_insn "and3_mask" -- Alan Modra Australia Development Lab, IBM
Re: [PATCH], PowerPC IEEE 128-bit patch #5
On Wed, Aug 19, 2015 at 07:41:24AM -0500, Segher Boessenkool wrote: > On Fri, Aug 14, 2015 at 11:46:03AM -0400, Michael Meissner wrote: > > +;; Like int_reg_operand, but don't return true for pseudo registers > > +(define_predicate "int_reg_operand_not_pseudo" > > + (match_operand 0 "register_operand") > > +{ > > + if ((TARGET_E500_DOUBLE || TARGET_SPE) && invalid_e500_subreg (op, mode)) > > +return 0; > > + > > + if (GET_CODE (op) == SUBREG) > > +op = SUBREG_REG (op); > > + > > + if (!REG_P (op)) > > +return 0; > > + > > + if (REGNO (op) >= FIRST_PSEUDO_REGISTER) > > +return 0; > > + > > + return INT_REGNO_P (REGNO (op)); > > +}) > > Since you use this only once, maybe it is easier (to read, etc.) if you > just test it there? Hard regs do not get subregs. I was worried about hard regs appearing before reload, and possibly being subregs, so I'll remove the SUBREG test. > > +(define_insn_and_split "ieee_128bit_vsx_neg2" > > + [(set (match_operand:TFIFKF 0 "register_operand" "=wa") > > + (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa"))) > > + (clobber (match_scratch:V16QI 2 "=v"))] > > + "TARGET_FLOAT128 && FLOAT128_IEEE_P (mode)" > > + "#" > > + "&& 1" > > + [(parallel [(set (match_dup 0) > > + (neg:TFIFKF (match_dup 1))) > > + (use (match_dup 2))])] > > +{ > > + if (GET_CODE (operands[2]) == SCRATCH) > > +operands[2] = gen_reg_rtx (V16QImode); > > + > > + operands[3] = gen_reg_rtx (V16QImode); > > + emit_insn (gen_ieee_128bit_negative_zero (operands[2])); > > +} > > + [(set_attr "length" "8") > > + (set_attr "type" "vecsimple")]) > > Where is operands[3] used? I guess that whole line should be deleted? Good catch. It was from the earlier patch before the fix for PR 67071, which added better support for vector constants that can be constructed with several vector operations, including a vector octet shift. > > +(define_insn "*ieee_128bit_vsx_neg2_internal" > > + [(set (match_operand:TFIFKF 0 "register_operand" "=wa") > > + (neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa"))) > > + (use (match_operand:V16QI 2 "register_operand" "=v"))] > > + "TARGET_FLOAT128" > > + "xxlxor %x0,%x1,%x2" > > + [(set_attr "length" "4") > > + (set_attr "type" "vecsimple")]) > > Length 4 is default, you can just leave it out (like we do for most > machine insns already). Ok, though I tend to always put them in. Here is the revised patch. Is it ok to install? 2015-08-25 Michael Meissner * config/rs6000/predicates.md (int_reg_operand_not_pseudo): New predicate for only GPR hard registers. * config/rs6000/rs6000.md (FP): Add IEEE 128-bit floating point modes to iterators. Add new iterators for moving 128-bit values in scalar FPR registers and VSX registers. (FMOVE128): Likewise. (FMOVE128_FPR): Likewise. (FMOVE128_GPR): Likewise. (FMOVE128_VSX): Likewise. (FLOAT128_SFDFTF): New iterators for IEEE 128-bit floating point in VSX registers. (IFKF): Likewise. (IBM128): Likewise. (TFIFKF): Likewise. (RELOAD): Add IEEE 128-bit floating point modes. (signbittf2): Convert TF insns to add support for new IEEE 128-bit floating point in VSX registers modes. (signbit2, IBM128 iterator): Likewise. (mov_64bit_dm, FMOVE128_FPR iterator): Likewise. (mov_32bit, FMOVE128_FPR iterator): Likewise. (negtf2): Likewise. (neg2, TFIFKF iterator): Likewise. (negtf2_internal): Likewise. (abstf2): Likewise. (abs2, TFIFKF iterator): Likewise. (ieee_128bit_negative_zero): New IEEE 128-bit floating point in VSX insn support for negate, absolute value, and negative absolute value. (ieee_128bit_vsx_neg2): Likewise. (ieee_128bit_vsx_neg2_internal): Likewise. (ieee_128bit_vsx_abs2): Likewise. (ieee_128bit_vsx_abs2_internal): Likewise. (ieee_128bit_vsx_nabs2): Likewise. (ieee_128bit_vsx_nabs2_internal): Likewise. (FP128_64): Update pack/unpack 128-bit insns for IEEE 128-bit floating point in VSX registers. (unpack_dm): Likewise. (unpack_nodm): Likewise. (pack): Likewise. (unpackv1ti): Likewise. (unpack, FMOVE128_VSX iterator): Likewise. (packv1ti): Likewise. (pack, FMOVE128_VSX iterator): Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 227180) +++ gcc/config/rs6000/predicates.md (working copy) @@ -239,6 +239,25 @@ (define_predicate "int_reg_operand" return INT_REGNO_P (REGNO (op)); }) +;; Like int_reg_operand, but don't return true for pseudo regi