[committed] Fix lto build if WCONTINUED is not defined (PR lto/60571)
Hi! WCONTINUED is (recent) Linux specific, so it doesn't have to be defined on other hosts, or could be missing even on older Linux distros (e.g. glibc 2.3.2 doesn't have it). Fixed thusly, committed as obvious. 2014-03-19 Jakub Jelinek PR lto/60571 * lto.c (wait_for_child): Define WCONTINUED if not defined to 0. Fix formatting. --- gcc/lto/lto.c.jj2014-03-03 08:24:32.0 +0100 +++ gcc/lto/lto.c 2014-03-19 08:12:39.235144361 +0100 @@ -2476,7 +2476,10 @@ wait_for_child () int status; do { - int w = waitpid(0, &status, WUNTRACED | WCONTINUED); +#ifndef WCONTINUED +#define WCONTINUED 0 +#endif + int w = waitpid (0, &status, WUNTRACED | WCONTINUED); if (w == -1) fatal_error ("waitpid failed"); @@ -2485,7 +2488,7 @@ wait_for_child () else if (WIFSIGNALED (status)) fatal_error ("streaming subprocess was killed by signal"); } - while (!WIFEXITED(status) && !WIFSIGNALED(status)); + while (!WIFEXITED (status) && !WIFSIGNALED (status)); } #endif Jakub
[PATCH, ARM] Fix ICE due to out of bound.
Hi, ICE when compiling gcc.target/arm/neon-modes-3.c with "-g" in arm_dwarf_register_span since parts[8] is out of bound for XImode. GET_MODE_SIZE (XImode) / 4 is 16. "rtx parts[8]" can not hold all the registers. According to arm-modes.def, 16 should be the biggest number. So the patch updates parts to rtx parts[16]; Bootstrap and no make check regression on ARM Chrome book. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-03-19 Zhenqiang Chen * config/arm/arm.c (arm_dwarf_register_span): Update the element number of parts. testsuite/ChangeLog: 2014-03-19 Zhenqiang Chen * gcc.target/arm/neon-modes-3.c: Add "-g" option. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a68ed8d..c4466c1 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28692,7 +28692,7 @@ arm_dwarf_register_span (rtx rtl) { enum machine_mode mode; unsigned regno; - rtx parts[8]; + rtx parts[16]; int nregs; int i; diff --git a/gcc/testsuite/gcc.target/arm/neon-modes-3.c b/gcc/testsuite/gcc.target/arm/neon-modes-3.c index fe81875..f3e4f33 100644 --- a/gcc/testsuite/gcc.target/arm/neon-modes-3.c +++ b/gcc/testsuite/gcc.target/arm/neon-modes-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_neon_ok } */ -/* { dg-options "-O" } */ +/* { dg-options "-O -g" } */ /* { dg-add-options arm_neon } */ #include
Re: [PATCH] Fix ICE with MASK_LOAD and -fno-tree-dce (PR tree-optimization/60559)
On Tue, 18 Mar 2014, Jakub Jelinek wrote: > Hi! > > With -fno-tree-dce the scalar MASK_LOAD isn't removed from the IL and we ICE > on it during expansion (as we support only the vector loads, if those aren't > supported, MASK_LOAD is either not created by if-conversion at all, or > vectorization refuses to vectorize the loop and thus it is cfg cleaned up > away. > > The following patch replaces the scalar MASK_LOAD manually with load of > zero, similarly how we do it for vectorizable calls. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2014-03-18 Jakub Jelinek > > PR tree-optimization/60559 > * vectorizable_mask_load_store): Replace scalar MASK_LOAD > with build_zero_cst assignment. > > * g++.dg/vect/pr60559.cc: New test. > > --- gcc/tree-vect-stmts.c.jj 2014-03-03 08:24:33.0 +0100 > +++ gcc/tree-vect-stmts.c 2014-03-18 14:01:40.969657763 +0100 > @@ -2038,6 +2038,15 @@ vectorizable_mask_load_store (gimple stm > STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; > prev_stmt_info = vinfo_for_stmt (new_stmt); > } > + > + /* Ensure that even with -fno-tree-dce the scalar MASK_LOAD is removed > + from the IL. */ > + tree lhs = gimple_call_lhs (stmt); > + new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs))); > + set_vinfo_for_stmt (new_stmt, stmt_info); > + set_vinfo_for_stmt (stmt, NULL); > + STMT_VINFO_STMT (stmt_info) = new_stmt; > + gsi_replace (gsi, new_stmt, true); >return true; > } >else if (is_store) > @@ -2149,6 +2158,18 @@ vectorizable_mask_load_store (gimple stm > } > } > > + if (!is_store) > +{ > + /* Ensure that even with -fno-tree-dce the scalar MASK_LOAD is removed > + from the IL. */ > + tree lhs = gimple_call_lhs (stmt); > + new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs))); > + set_vinfo_for_stmt (new_stmt, stmt_info); > + set_vinfo_for_stmt (stmt, NULL); > + STMT_VINFO_STMT (stmt_info) = new_stmt; > + gsi_replace (gsi, new_stmt, true); > +} > + >return true; > } > > --- gcc/testsuite/g++.dg/vect/pr60559.cc.jj 2014-03-18 14:04:55.173449250 > +0100 > +++ gcc/testsuite/g++.dg/vect/pr60559.cc 2014-03-18 14:05:26.610273088 > +0100 > @@ -0,0 +1,8 @@ > +// PR tree-optimization/60559 > +// { dg-do compile } > +// { dg-additional-options "-O3 -std=c++11 -fnon-call-exceptions > -fno-tree-dce" } > +// { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } > + > +#include "pr60023.cc" > + > +// { dg-final { cleanup-tree-dump "vect" } } > > Jakub > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
[PATCH] Fix PR59543
This fixes PR59543 (confirmed by Jakub for the testcase at least) by not dropping debug stmts during WPA phase. LTO profiled-bootstrapped on x86_64-unknown-linux-gnu, applied. Honza - you can always come up with a better fix for 4.10. Richard. 2014-03-19 Richard Biener PR lto/59543 * lto-streamer-in.c (input_function): In WPA stage do not drop debug stmts. Index: lto-streamer-in.c === --- lto-streamer-in.c (revision 208642) +++ lto-streamer-in.c (working copy) @@ -988,7 +988,7 @@ input_function (tree fn_decl, struct dat We can't remove them earlier because this would cause uid mismatches in fixups, but we can do it at this point, as long as debug stmts don't require fixups. */ - if (!MAY_HAVE_DEBUG_STMTS && is_gimple_debug (stmt)) + if (!MAY_HAVE_DEBUG_STMTS && !flag_wpa && is_gimple_debug (stmt)) { gimple_stmt_iterator gsi = bsi; gsi_next (&bsi);
Re: [PATCH, ARM] Fix ICE due to out of bound.
On 03/19/14 08:42, Zhenqiang Chen wrote: Hi, ICE when compiling gcc.target/arm/neon-modes-3.c with "-g" in arm_dwarf_register_span since parts[8] is out of bound for XImode. GET_MODE_SIZE (XImode) / 4 is 16. "rtx parts[8]" can not hold all the registers. According to arm-modes.def, 16 should be the biggest number. So the patch updates parts to rtx parts[16]; Bootstrap and no make check regression on ARM Chrome book. OK for trunk? It may be time in 4.10 or 5.0 (whatever we call it :)), to deal with the FIXME in arm_dwarf_register_span to deal with DW_OP_piece. I'm surprised that it's taken so long to hit this. This is OK for stage4 - it looks sane to me but this needs an RM ack before applying. regards Ramana Thanks! -Zhenqiang ChangeLog: 2014-03-19 Zhenqiang Chen * config/arm/arm.c (arm_dwarf_register_span): Update the element number of parts. testsuite/ChangeLog: 2014-03-19 Zhenqiang Chen * gcc.target/arm/neon-modes-3.c: Add "-g" option. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a68ed8d..c4466c1 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28692,7 +28692,7 @@ arm_dwarf_register_span (rtx rtl) { enum machine_mode mode; unsigned regno; - rtx parts[8]; + rtx parts[16]; int nregs; int i; diff --git a/gcc/testsuite/gcc.target/arm/neon-modes-3.c b/gcc/testsuite/gcc.target/arm/neon-modes-3.c index fe81875..f3e4f33 100644 --- a/gcc/testsuite/gcc.target/arm/neon-modes-3.c +++ b/gcc/testsuite/gcc.target/arm/neon-modes-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_neon_ok } */ -/* { dg-options "-O" } */ +/* { dg-options "-O -g" } */ /* { dg-add-options arm_neon } */ #include -- Ramana Radhakrishnan Principal Engineer ARM Ltd.
Re: [PATCH, ARM] Fix ICE due to out of bound.
On Wed, 19 Mar 2014, Ramana Radhakrishnan wrote: > On 03/19/14 08:42, Zhenqiang Chen wrote: > > Hi, > > > > ICE when compiling gcc.target/arm/neon-modes-3.c with "-g" in > > arm_dwarf_register_span since parts[8] is out of bound for XImode. > > GET_MODE_SIZE (XImode) / 4 is 16. "rtx parts[8]" can not hold all the > > registers. > > > > According to arm-modes.def, 16 should be the biggest number. So the > > patch updates parts to > > > > rtx parts[16]; > > > > Bootstrap and no make check regression on ARM Chrome book. > > > > OK for trunk? > > > > It may be time in 4.10 or 5.0 (whatever we call it :)), to deal with the FIXME > in arm_dwarf_register_span to deal with DW_OP_piece. I'm surprised that it's > taken so long to hit this. > > This is OK for stage4 - it looks sane to me but this needs an RM ack before > applying. Ok (it can't possibly break anything). Richard. > regards > Ramana > > > Thanks! > > -Zhenqiang > > > > ChangeLog: > > 2014-03-19 Zhenqiang Chen > > > > * config/arm/arm.c (arm_dwarf_register_span): Update the element number > > of parts. > > > > testsuite/ChangeLog: > > 2014-03-19 Zhenqiang Chen > > > > * gcc.target/arm/neon-modes-3.c: Add "-g" option. > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index a68ed8d..c4466c1 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -28692,7 +28692,7 @@ arm_dwarf_register_span (rtx rtl) > > { > > enum machine_mode mode; > > unsigned regno; > > - rtx parts[8]; > > + rtx parts[16]; > > int nregs; > > int i; > > > > diff --git a/gcc/testsuite/gcc.target/arm/neon-modes-3.c > > b/gcc/testsuite/gcc.target/arm/neon-modes-3.c > > index fe81875..f3e4f33 100644 > > --- a/gcc/testsuite/gcc.target/arm/neon-modes-3.c > > +++ b/gcc/testsuite/gcc.target/arm/neon-modes-3.c > > @@ -1,6 +1,6 @@ > > /* { dg-do compile } */ > > /* { dg-require-effective-target arm_neon_ok } */ > > -/* { dg-options "-O" } */ > > +/* { dg-options "-O -g" } */ > > /* { dg-add-options arm_neon } */ > > > > #include > > > > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
[ARM/AArch64][0/3] Handle bitwise/bytewise reverse operations more effectively
Hi all, This patch series attempts to improve code generation on arm and aarch64 for various bitwise operations that can be expressed with rev16 instructions in those architectures. In particular expressions of the form: ((x & 0x00ff00ff) << 8) | ((x & 0xff00ff00) >> 8) This can appear in places like the Linux kernel and can be directly mapped to a single rev16 instruction. This series has 3 parts: [1/3] Add a new field to the rtx costs tables to represent the latency of the rev* group of instructions that will be used to accurately model the cost of these operations. Use it to properly cost existing patterns that generate rev16 (for bswap operations). [2/3] Add aarch64 combine patterns to recognise the above bitwise operations and map them to rev16. Model the cost appropriately and add helper functions that can be reused by the arm backend. [3/3] Define similar combine patterns for arm and reuse the helper functions introduced in patch 2/3 to properly cost them. I'm proposing these for next stage-1 of course. Thanks, Kyrill
[PATCH][ARM][1/3] Add rev field to rtx cost tables
Hi all, In order to properly cost the rev16 instruction we need a new field in the cost tables. This patch adds that and specifies its value for the existing cost tables. Since rev16 is used to implement the BSWAP operation we add handling of that in the rtx cost function using the new field. Tested on arm-none-eabi and bootstrapped on an arm linux target. Does it look ok for stage1? Thanks, Kyrill 2014-03-19 Kyrylo Tkachov * config/arm/aarch-common-protos.h (alu_cost_table): Add rev field. * config/arm/aarch-cost-tables.h (generic_extra_costs): Specify rev cost. (cortex_a53_extra_costs): Likewise. (cortex_a57_extra_costs): Likewise. * config/arm/arm.c (cortexa9_extra_costs): Likewise. (cortexa7_extra_costs): Likewise. (cortexa12_extra_costs): Likewise. (cortexa15_extra_costs): Likewise. (v7m_extra_costs): Likewise. (arm_new_rtx_costs): Handle BSWAP.commit 13b2976a9448565beabc41055fdcbd209cde949f Author: Kyrylo Tkachov Date: Wed Feb 26 15:55:13 2014 + Add rev field to rtx costs. diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 2b33626..4ff18cd 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -54,6 +54,7 @@ struct alu_cost_table const int bfi; /* Bit-field insert. */ const int bfx; /* Bit-field extraction. */ const int clz; /* Count Leading Zeros. */ + const int rev; /* Reverse bits/bytes. */ const int non_exec; /* Extra cost when not executing insn. */ const bool non_exec_costs_exec; /* True if non-execution must add the exec cost. */ diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h index c30ea2f..adf8708 100644 --- a/gcc/config/arm/aarch-cost-tables.h +++ b/gcc/config/arm/aarch-cost-tables.h @@ -39,6 +39,7 @@ const struct cpu_cost_table generic_extra_costs = 0, /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ COSTS_N_INSNS (1), /* non_exec. */ false /* non_exec_costs_exec. */ }, @@ -139,6 +140,7 @@ const struct cpu_cost_table cortexa53_extra_costs = COSTS_N_INSNS (1), /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -239,6 +241,7 @@ const struct cpu_cost_table cortexa57_extra_costs = COSTS_N_INSNS (1), /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e69911c..a72ee1e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -982,6 +982,7 @@ const struct cpu_cost_table cortexa9_extra_costs = COSTS_N_INSNS (1), /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1083,6 +1084,7 @@ const struct cpu_cost_table cortexa7_extra_costs = COSTS_N_INSNS (1), /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ COSTS_N_INSNS (1), /* clz. */ +COSTS_N_INSNS (1), /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1184,6 +1186,7 @@ const struct cpu_cost_table cortexa12_extra_costs = 0, /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ COSTS_N_INSNS (1), /* clz. */ +COSTS_N_INSNS (1), /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1284,6 +1287,7 @@ const struct cpu_cost_table cortexa15_extra_costs = COSTS_N_INSNS (1), /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1384,6 +1388,7 @@ const struct cpu_cost_table v7m_extra_costs = 0, /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ COSTS_N_INSNS (1), /* non_exec. */ false /* non_exec_costs_exec. */ }, @@ -9334,6 +9339,47 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost = LIBCALL_COST (2); return false; +case BSWAP: + if (arm_arch6) +{ + if (mode == SImode) +{ + *cost = COSTS_N_INSNS (1); + if (speed_p) +*cost += extra_cost->alu.rev; + + return false; +} +} + else +{ +/* No rev instruction available. Look at arm_legacy_rev + and thumb_legacy_rev for the form of RTL used then. */ + if (TARGET_THUMB) +{ + *cost = COSTS_N_INSNS (10); + + if (speed_p) +{ + *cost += 6 * extra_cost->alu.shift; + *cost += 3 * extra_cost->alu.logical; +} +} + else +
[PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data
Hi all, This patch adds a recogniser for the bitmask,shift,orr sequence of instructions that can be used to reverse the bytes in 16-bit halfwords (for the sequence itself look at the testcase included in the patch). This can be implemented with a rev16 instruction. Since the shifts can occur in any order and there are no canonicalisation rules for where they appear in the expression we have to have two patterns to match both cases. The rtx costs function is updated to recognise the pattern and cost it appropriately by using the rev field of the cost tables introduced in patch [1/3]. The rtx costs helper functions that are used to recognise those bitwise operations are placed in config/arm/aarch-common.c so that they can be reused by both arm and aarch64. I've added an execute testcase but no scan-assembler tests since conceptually in the future the combiner might decide to not use a rev instruction due to rtx costs. We can at least test that the code generated is functionally correct though. Tested aarch64-none-elf. Ok for stage1? [gcc/] 2014-03-19 Kyrylo Tkachov * config/aarch64/aarch64.md (rev162): New pattern. (rev162_alt): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case. * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New. (aarch_rev16_shleft_mask_imm_p): Likewise. (aarch_rev16_p_1): Likewise. (aarch_rev16_p): Likewise. * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern. (aarch_rev16_shright_mask_imm_p): Likewise. (aarch_rev16_shleft_mask_imm_p): Likewise. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov * gcc.target/aarch64/rev16_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ebd58c0..41761ae 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4682,6 +4682,16 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, return false; case IOR: + if (aarch_rev16_p (x)) +{ + *cost = COSTS_N_INSNS (1); + + if (speed) +*cost += extra_cost->alu.rev; + + return true; +} +/* Fall through. */ case XOR: case AND: cost_logic: diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..a23452b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3173,6 +3173,38 @@ [(set_attr "type" "rev")] ) +;; There are no canonicalisation rules for the position of the lshiftrt, ashift +;; operations within an IOR/AND RTX, therefore we have two patterns matching +;; each valid permutation. + +(define_insn "rev162" + [(set (match_operand:GPI 0 "register_operand" "=r") +(ior:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand" "r") + (const_int 8)) + (match_operand:GPI 3 "const_int_operand" "n")) + (and:GPI (lshiftrt:GPI (match_dup 1) +(const_int 8)) + (match_operand:GPI 2 "const_int_operand" "n"] + "aarch_rev16_shleft_mask_imm_p (operands[3], mode) + && aarch_rev16_shright_mask_imm_p (operands[2], mode)" + "rev16\\t%0, %1" + [(set_attr "type" "rev")] +) + +(define_insn "rev162_alt" + [(set (match_operand:GPI 0 "register_operand" "=r") +(ior:GPI (and:GPI (lshiftrt:GPI (match_operand:GPI 1 "register_operand" "r") +(const_int 8)) + (match_operand:GPI 2 "const_int_operand" "n")) + (and:GPI (ashift:GPI (match_dup 1) + (const_int 8)) + (match_operand:GPI 3 "const_int_operand" "n"] + "aarch_rev16_shleft_mask_imm_p (operands[3], mode) + && aarch_rev16_shright_mask_imm_p (operands[2], mode)" + "rev16\\t%0, %1" + [(set_attr "type" "rev")] +) + ;; zero_extend version of above (define_insn "*bswapsi2_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index d97ee61..08c4c7a 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -23,6 +23,9 @@ #ifndef GCC_AARCH_COMMON_PROTOS_H #define GCC_AARCH_COMMON_PROTOS_H +extern bool aarch_rev16_p (rtx); +extern bool aarch_rev16_shleft_mask_imm_p (rtx, enum machine_mode); +extern bool aarch_rev16_shright_mask_imm_p (rtx, enum machine_mode); extern int arm_early_load_addr_dep (rtx, rtx); extern int arm_early_store_addr_dep (rtx, rtx); extern int arm_mac_accumulator_is_mul_result (rtx, rtx); diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index c11f7e9..75ed3fd 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -155,6 +155,79 @@ arm_get_set_operands (rtx producer, rtx consumer, return 0; } +bool +aarch_rev16_s
[PATCH][ARM][3/3] Recognise bitwise operations leading to SImode rev16
Hi all, This is the arm equivalent of patch [2/3] in the series that adds combine patterns for the bitwise operations leading to a rev16 instruction. It reuses the functions that were put in aarch-common.c to properly cost these operations. I tried matching a DImode rev16 (with the intent of splitting it into two rev16 ops) like aarch64 but combine wouldn't try to match that bitwise pattern in DImode like aarch64 does. Instead it tries various exotic combinations with subregs. Tested arm-none-eabi, bootstrap on arm-none-linux-gnueabihf. Ok for stage1? [gcc/] 2014-03-19 Kyrylo Tkachov * config/arm/arm.md (arm_rev16si2): New pattern. (arm_rev16si2_alt): Likewise. * config/arm/arm.c (arm_new_rtx_costs): Handle rev16 case. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov * gcc.target/arm/rev16.c: New test.commit 04e60723bd1fa2f8e2adcfeed676390643ffec0c Author: Kyrylo Tkachov Date: Tue Feb 25 15:26:52 2014 + [ARM] Implement SImode rev16 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 8d1d721..ed603f0 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9716,8 +9716,17 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, /* Vector mode? */ *cost = LIBCALL_COST (2); return false; +case IOR: + if (mode == SImode && arm_arch6 && aarch_rev16_p (x)) +{ + *cost = COSTS_N_INSNS (1); + if (speed_p) +*cost += extra_cost->alu.rev; -case AND: case XOR: case IOR: + return true; +} +/* Fall through. */ +case AND: case XOR: if (mode == SImode) { enum rtx_code subcode = GET_CODE (XEXP (x, 0)); diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4df24a2..47bc747 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -12668,6 +12668,44 @@ (set_attr "type" "rev")] ) +;; There are no canonicalisation rules for the position of the lshiftrt, ashift +;; operations within an IOR/AND RTX, therefore we have two patterns matching +;; each valid permutation. + +(define_insn "arm_rev16si2" + [(set (match_operand:SI 0 "register_operand" "=l,l,r") +(ior:SI (and:SI (ashift:SI (match_operand:SI 1 "register_operand" "l,l,r") + (const_int 8)) +(match_operand:SI 3 "const_int_operand" "n,n,n")) +(and:SI (lshiftrt:SI (match_dup 1) + (const_int 8)) +(match_operand:SI 2 "const_int_operand" "n,n,n"] + "arm_arch6 + && aarch_rev16_shleft_mask_imm_p (operands[3], SImode) + && aarch_rev16_shright_mask_imm_p (operands[2], SImode)" + "rev16\\t%0, %1" + [(set_attr "arch" "t1,t2,32") + (set_attr "length" "2,2,4") + (set_attr "type" "rev")] +) + +(define_insn "arm_rev16si2_alt" + [(set (match_operand:SI 0 "register_operand" "=l,l,r") +(ior:SI (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "l,l,r") + (const_int 8)) +(match_operand:SI 2 "const_int_operand" "n,n,n")) +(and:SI (ashift:SI (match_dup 1) + (const_int 8)) +(match_operand:SI 3 "const_int_operand" "n,n,n"] + "arm_arch6 + && aarch_rev16_shleft_mask_imm_p (operands[3], SImode) + && aarch_rev16_shright_mask_imm_p (operands[2], SImode)" + "rev16\\t%0, %1" + [(set_attr "arch" "t1,t2,32") + (set_attr "length" "2,2,4") + (set_attr "type" "rev")] +) + (define_expand "bswaphi2" [(set (match_operand:HI 0 "s_register_operand" "=r") (bswap:HI (match_operand:HI 1 "s_register_operand" "r")))] diff --git a/gcc/testsuite/gcc.target/arm/rev16.c b/gcc/testsuite/gcc.target/arm/rev16.c new file mode 100644 index 000..1c869b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/rev16.c @@ -0,0 +1,35 @@ +/* { dg-options "-O2" } */ +/* { dg-do run } */ + +extern void abort (void); + +typedef unsigned int __u32; + +__u32 +__rev16_32_alt (__u32 x) +{ + return (((__u32)(x) & (__u32)0xff00ff00UL) >> 8) + | (((__u32)(x) & (__u32)0x00ff00ffUL) << 8); +} + +__u32 +__rev16_32 (__u32 x) +{ + return (((__u32)(x) & (__u32)0x00ff00ffUL) << 8) + | (((__u32)(x) & (__u32)0xff00ff00UL) >> 8); +} + +int +main (void) +{ + volatile __u32 in32 = 0x12345678; + volatile __u32 expected32 = 0x34127856; + + if (__rev16_32 (in32) != expected32) +abort (); + + if (__rev16_32_alt (in32) != expected32) +abort (); + + return 0; +}
[PATCH][AArch64] Add handling of bswap operations in rtx costs
Hi all, This patch depends on the series started at http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00933.html but is not really a part of it. It just adds costing of the bswap operation using the new rev field in the rtx cost tables since we have patterns in aarch64.md that handle bswap by generating rev16 instructions. Tested aarch64-none-elf. Ok for stage1 after that series goes in? 2014-03-19 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle BSWAP.commit b9771a71dbf62522d423e16ce03353624c1ccd5a Author: Kyrylo Tkachov Date: Thu Feb 27 11:55:27 2014 + [AArch64] Cost bswap operations properly diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 901ad3d..28c8841 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4678,6 +4678,14 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, return false; +case BSWAP: + *cost = COSTS_N_INSNS (1); + + if (speed) +*cost += extra_cost->alu.rev; + + return false; + case IOR: case XOR: case AND:
stray warning from gcc's cpp
I observe the following minor annoyance on FreeBSD systems where cpp is GCC's cpp. If a DTrace script has the following shebang line: #!/usr/sbin/dtrace -Cs then the following warning is produced when the script is run: cc1: warning: is shorter than expected Some details. dtrace(1) first forks. Then a child seeks on a file descriptor associated with the script file, so that the shebang line is skipped (because otherwise it would confuse cpp). Then the child makes the file descriptor its standard input and then it execs cpp. cpp performs fstat(2) on its standard input descriptor and determines that it points to a regular file. Then it verifies that a number of bytes it reads from the file is the same as a size of the file. The check makes sense if the file is opened by cpp itself, but it does not always make sense for the stdin as described above. The following patch seems to fix the issue, but perhaps there is a better / smarter alternative. --- a/libcpp/files.c +++ b/libcpp/files.c @@ -601,7 +601,8 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file) return false; } - if (regular && total != size && STAT_SIZE_RELIABLE (file->st)) + if (regular && total != size && file->fd != 0 + && STAT_SIZE_RELIABLE (file->st)) cpp_error (pfile, CPP_DL_WARNING, "%s is shorter than expected", file->path); -- Andriy Gapon
Re: [AArch64] 64-bit float vreinterpret implemention
On 28 February 2014 10:30, Alex Velenko wrote: > Hi Richard, > Thank you for your suggestion. Attached is a patch that includes > implementation of your proposition. A testsuite was run on LE and BE > compilers with no regressions. > > Here is the description of the patch: > > This patch introduces vreinterpret implementation for vectors with 64-bit > float lanes and adds testcase for those intrinsics. The aarch64_init_simd_builtins() infrastructure requires the presence of named RTL patterns in order to construct the types of the SIMD intrinsics even when an intrinsic is emitted as tree. This seems rather ugly to me. At some point we should figure out how to clean up this aspect of aarch64_init_simd_builtins() and remove the otherwise unused .md patterns. This aside I think your patch is fine as it stands and can be committed in stage-1. Cheers /Marcus
Re: [PATCH, ARM] Fix ICE due to out of bound.
On Wed, Mar 19, 2014 at 09:46:56AM +, Ramana Radhakrishnan wrote: > On 03/19/14 08:42, Zhenqiang Chen wrote: > >ICE when compiling gcc.target/arm/neon-modes-3.c with "-g" in > >arm_dwarf_register_span since parts[8] is out of bound for XImode. > >GET_MODE_SIZE (XImode) / 4 is 16. "rtx parts[8]" can not hold all the > >registers. > > > >According to arm-modes.def, 16 should be the biggest number. So the > >patch updates parts to > > > >rtx parts[16]; > > > >Bootstrap and no make check regression on ARM Chrome book. > > > >OK for trunk? > > > > It may be time in 4.10 or 5.0 (whatever we call it :)), to deal with > the FIXME in arm_dwarf_register_span to deal with DW_OP_piece. I'm > surprised that it's taken so long to hit this. > > This is OK for stage4 - it looks sane to me but this needs an RM ack > before applying. Ok. Jakub
[PATCH] Avoid ggc_collect () after WPA forking
This patch avoids calling ggc_collect after we possibly forked during WPA phase as that necessarily causes a lot of page unsharing. I have verified that during a LTO bootstrap we do not allocate GC memory during (or after) lto_wpa_write_files, thus the effect on memory use should be positive (the patch below contains checking code making sure that we don't alloc). LTO bootstrapped on x86_64-unknown-linux-gnu, will apply shortly (without the checking code of course). That should fix the WPA memory explosion Martin sees with building Chromium. Richard. 2014-03-19 Richard Biener * lto.c (lto_wpa_write_files): Move call to lto_promote_cross_file_statics ... (do_whole_program_analysis): ... here, into the partitioning block. Do not ggc_collect after lto_wpa_write_files but for a last time before it. Index: gcc/ggc-page.c === --- gcc/ggc-page.c (revision 208642) +++ gcc/ggc-page.c (working copy) @@ -1199,6 +1199,8 @@ ggc_round_alloc_size (size_t requested_s return size; } +int may_alloc = 1; + /* Allocate a chunk of memory of SIZE bytes. Its contents are undefined. */ void * @@ -1208,6 +1210,9 @@ ggc_internal_alloc_stat (size_t size MEM struct page_entry *entry; void *result; + if (!may_alloc) +fatal_error ("allocating GC memory"); + ggc_round_alloc_size_1 (size, &order, &object_size); /* If there are non-full pages for this size allocation, they are at Index: gcc/lto/lto.c === --- gcc/lto/lto.c (revision 208642) +++ gcc/lto/lto.c (working copy) @@ -2565,11 +2566,6 @@ lto_wpa_write_files (void) FOR_EACH_VEC_ELT (ltrans_partitions, i, part) lto_stats.num_output_symtab_nodes += lto_symtab_encoder_size (part->encoder); - /* Find out statics that need to be promoted - to globals with hidden visibility because they are accessed from multiple - partitions. */ - lto_promote_cross_file_statics (); - timevar_pop (TV_WHOPR_WPA); timevar_push (TV_WHOPR_WPA_IO); @@ -3281,11 +3277,25 @@ do_whole_program_analysis (void) node->aux = NULL; lto_stats.num_cgraph_partitions += ltrans_partitions.length (); + + /* Find out statics that need to be promoted + to globals with hidden visibility because they are accessed from multiple + partitions. */ + lto_promote_cross_file_statics (); timevar_pop (TV_WHOPR_PARTITIONING); timevar_stop (TV_PHASE_OPT_GEN); - timevar_start (TV_PHASE_STREAM_OUT); + /* Collect a last time - in lto_wpa_write_files we may end up forking + with the idea that this doesn't increase memory usage. So we + absoultely do not want to collect after that. */ + ggc_collect (); +{ + extern int may_alloc; + may_alloc = 0; +} + + timevar_start (TV_PHASE_STREAM_OUT); if (!quiet_flag) { fprintf (stderr, "\nStreaming out"); @@ -3294,10 +3304,8 @@ do_whole_program_analysis (void) lto_wpa_write_files (); if (!quiet_flag) fprintf (stderr, "\n"); - timevar_stop (TV_PHASE_STREAM_OUT); - ggc_collect (); if (post_ipa_mem_report) { fprintf (stderr, "Memory consumption after IPA\n");
[PATCH] Fix ubsan ICE (PR sanitizer/60569)
Apparently with LTO we can get a TYPE_NAME without a DECL_NAME, so check that it exists before accessing it. Note that the test has to be run; only compiling wasn't enough to provoke the ICE. Ran ubsan testsuite on x86_64-linux, ok for trunk? 2014-03-19 Marek Polacek PR sanitizer/60569 * ubsan.c (ubsan_type_descriptor): Check that DECL_NAME is nonnull before accessing it. testsuite/ * g++.dg/ubsan/pr60569.C: New test. diff --git gcc/testsuite/g++.dg/ubsan/pr60569.C gcc/testsuite/g++.dg/ubsan/pr60569.C index e69de29..df6b7a4 100644 --- gcc/testsuite/g++.dg/ubsan/pr60569.C +++ gcc/testsuite/g++.dg/ubsan/pr60569.C @@ -0,0 +1,21 @@ +// PR sanitizer/60569 +// { dg-do run } +// { dg-require-effective-target lto } +// { dg-options "-fsanitize=undefined -flto" } + +struct A +{ + void foo (); + struct + { +int i; +void bar () { i = 0; } + } s; +}; + +void A::foo () { s.bar (); } + +int +main () +{ +} diff --git gcc/ubsan.c gcc/ubsan.c index 7c7a893..22470da 100644 --- gcc/ubsan.c +++ gcc/ubsan.c @@ -318,7 +318,7 @@ ubsan_type_descriptor (tree type, bool want_pointer_type_p) { if (TREE_CODE (TYPE_NAME (type2)) == IDENTIFIER_NODE) tname = IDENTIFIER_POINTER (TYPE_NAME (type2)); - else + else if (DECL_NAME (TYPE_NAME (type2)) != NULL) tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2))); } Marek
Re: [PATCH] Fix ubsan ICE (PR sanitizer/60569)
On Wed, Mar 19, 2014 at 12:13:57PM +0100, Marek Polacek wrote: > Apparently with LTO we can get a TYPE_NAME without a DECL_NAME, > so check that it exists before accessing it. > Note that the test has to be run; only compiling wasn't enough > to provoke the ICE. ?? Shouldn't // { dg-do link } be sufficient? > --- gcc/ubsan.c > +++ gcc/ubsan.c > @@ -318,7 +318,7 @@ ubsan_type_descriptor (tree type, bool > want_pointer_type_p) > { >if (TREE_CODE (TYPE_NAME (type2)) == IDENTIFIER_NODE) > tname = IDENTIFIER_POINTER (TYPE_NAME (type2)); > - else > + else if (DECL_NAME (TYPE_NAME (type2)) != NULL) > tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2))); > } This looks good to me. Jakub
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, Mar 19, 2014 at 12:10 PM, Richard Biener wrote: > Index: gcc/ggc-page.c > === > --- gcc/ggc-page.c (revision 208642) > +++ gcc/ggc-page.c (working copy) > @@ -1199,6 +1199,8 @@ ggc_round_alloc_size (size_t requested_s >return size; > } > > +int may_alloc = 1; "bool may_alloc"? Ciao! Steven
Re: [testsuite] Fix gcc.dg/tls/pr58595.c on Solaris 9
Jakub Jelinek writes: > On Tue, Mar 18, 2014 at 11:19:52AM +0100, Rainer Orth wrote: >> The new gcc.dg/tls/pr58595.c testcase FAILs on Solaris 9: >> >> FAIL: gcc.dg/tls/pr58595.c (test for excess errors) >> Excess errors: >> Undefined first referenced >> symbol in file >> ___tls_get_addr /var/tmp//ccuBbAna.o >> ld: fatal: Symbol referencing errors. No output written to ./pr58595.exe >> WARNING: gcc.dg/tls/pr58595.c compilation failed to produce executable >> >> Fixed as follows, tested with the appropriate runtest invocation on >> i386-pc-solaris2.9, i386-pc-solaris2.11, and x86_64-unknown-linux-gnu, >> installed on mainline. > > Can you please also change > /* { dg-require-effective-target tls } */ > to > /* { dg-require-effective-target tls_runtime } */ > ? Sure, done as follows after retesting as before: 2014-03-19 Rainer Orth * gcc.dg/tls/pr58595.c: Require tls_runtime instead of tls. changeset: 13384:d1c2de35507e tag: tip user:Rainer Orth date:Wed Mar 19 13:04:36 2014 +0100 summary: Require tls_runtime in gcc.dg/tls/pr58595.c diff --git a/gcc/testsuite/gcc.dg/tls/pr58595.c b/gcc/testsuite/gcc.dg/tls/pr58595.c --- a/gcc/testsuite/gcc.dg/tls/pr58595.c +++ b/gcc/testsuite/gcc.dg/tls/pr58595.c @@ -3,7 +3,7 @@ /* { dg-options "-O2" } */ /* { dg-additional-options "-fpic" { target fpic } } */ /* { dg-add-options tls } */ -/* { dg-require-effective-target tls } */ +/* { dg-require-effective-target tls_runtime } */ /* { dg-require-effective-target sync_int_long } */ struct S { unsigned long a, b; }; > BTW, don't know if dg-add-options tls can come before that or not. It can: the tls_runtime check takes care of adding the options itself. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix ubsan ICE (PR sanitizer/60569)
On Wed, Mar 19, 2014 at 12:17:19PM +0100, Jakub Jelinek wrote: > On Wed, Mar 19, 2014 at 12:13:57PM +0100, Marek Polacek wrote: > > Apparently with LTO we can get a TYPE_NAME without a DECL_NAME, > > so check that it exists before accessing it. > > Note that the test has to be run; only compiling wasn't enough > > to provoke the ICE. > > ?? Shouldn't // { dg-do link } be sufficient? Ah, forgot about that, it is sufficient. Ok with dg-do link instead of dg-do run? > > --- gcc/ubsan.c > > +++ gcc/ubsan.c > > @@ -318,7 +318,7 @@ ubsan_type_descriptor (tree type, bool > > want_pointer_type_p) > > { > >if (TREE_CODE (TYPE_NAME (type2)) == IDENTIFIER_NODE) > > tname = IDENTIFIER_POINTER (TYPE_NAME (type2)); > > - else > > + else if (DECL_NAME (TYPE_NAME (type2)) != NULL) > > tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2))); > > } > > This looks good to me. Thanks. Marek
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, 19 Mar 2014, Steven Bosscher wrote: > On Wed, Mar 19, 2014 at 12:10 PM, Richard Biener wrote: > > Index: gcc/ggc-page.c > > === > > --- gcc/ggc-page.c (revision 208642) > > +++ gcc/ggc-page.c (working copy) > > @@ -1199,6 +1199,8 @@ ggc_round_alloc_size (size_t requested_s > >return size; > > } > > > > +int may_alloc = 1; > > "bool may_alloc"? It's only checking code I didn't commit. We may of course alloc but I wanted to prove we don't. Richard.
[PATCH] Reduce GC walk recursion depth for types
This reduces GC walk recursion depth in two ways. First by re-ordering tree_type_common members to move 'name' last and 'canonical' before 'next_variant'. That makes us first recurse downward (type, pointer_to/reference_to), then on the same level (canonical, next_variant, main_variant) and finally upward (context, name->decl_context). For TS_TYPE_NON_COMMON we still walk down afterwards via values, on the same level via minval/maxval and upwards via binfo, so that the patch helps is maybe too much handwaving? (but it helps a reduced testcase without doing the 2nd part) Second by choosing sth different for chain_next for types than TREE_CHAIN (which is TYPE_STUB_DECL, no chain at all). That makes the unreduced testcase work and apart from the issue below should be obvious enough (though there usually shouldn't be so many type variants - still if for every type we save two or three recursions that still helps). Martin verified this fixes PR60553. I've changed chain_next only for the LTO frontend as while (ggc_test_and_set_mark (xlimit)) xlimit = (CODE_CONTAINS_STRUCT (TREE_CODE (&(*xlimit).generic), TS_TYPE_COMMON) ? ((union lang_tree_node *) (*xlimit).generic.type_common.next_variant) : CODE_CONTAINS_STRUCT (TREE_CODE (&(*xlimit).generic), TS_COMMON) ? ((union lang_tree_node *) (*xlimit).generic.common.chain) : NULL); likely doesn't create great code ... (note duplicate tree checks with checking here for other frontends, fixed LTO with the patch below). LTO bootstrap running on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Richard. 2014-03-19 Richard Biener PR middle-end/60553 * tree-core.h (tree_type_common): Re-order pointer members to reduce recursion depth during GC walks. lto/ * lto-tree.h (lang_tree_node): For types use TYPE_NEXT_VARIANT instead of TREE_CHAIN as chain_next. Index: gcc/tree-core.h === --- gcc/tree-core.h (revision 208642) +++ gcc/tree-core.h (working copy) @@ -1265,11 +1265,11 @@ struct GTY(()) tree_type_common { const char * GTY ((tag ("TYPE_SYMTAB_IS_POINTER"))) pointer; struct die_struct * GTY ((tag ("TYPE_SYMTAB_IS_DIE"))) die; } GTY ((desc ("debug_hooks->tree_type_symtab_field"))) symtab; - tree name; + tree canonical; tree next_variant; tree main_variant; tree context; - tree canonical; + tree name; }; struct GTY(()) tree_type_with_lang_specific { Index: gcc/lto/lto-tree.h === --- gcc/lto/lto-tree.h (revision 208642) +++ gcc/lto/lto-tree.h (working copy) @@ -48,7 +48,7 @@ enum lto_tree_node_structure_enum { }; union GTY((desc ("lto_tree_node_structure (&%h)"), - chain_next ("CODE_CONTAINS_STRUCT (TREE_CODE (&%h.generic), TS_COMMON) ? ((union lang_tree_node *) TREE_CHAIN (&%h.generic)) : NULL"))) + chain_next ("CODE_CONTAINS_STRUCT (TREE_CODE (&%h.generic), TS_TYPE_COMMON) ? ((union lang_tree_node *) %h.generic.type_common.next_variant) : CODE_CONTAINS_STRUCT (TREE_CODE (&%h.generic), TS_COMMON) ? ((union lang_tree_node *) %h.generic.common.chain) : NULL"))) lang_tree_node { union tree_node GTY ((tag ("TS_LTO_GENERIC"),
Re: [PATCH] Reduce GC walk recursion depth for types
On Wed, Mar 19, 2014 at 02:02:10PM +0100, Richard Biener wrote: > LTO bootstrap running on x86_64-unknown-linux-gnu. > > Ok for trunk? > > Thanks, > Richard. > > 2014-03-19 Richard Biener > > PR middle-end/60553 > * tree-core.h (tree_type_common): Re-order pointer members > to reduce recursion depth during GC walks. > > lto/ > * lto-tree.h (lang_tree_node): For types use TYPE_NEXT_VARIANT > instead of TREE_CHAIN as chain_next. LGTM. Jakub
[Fortran][PATCH][gomp4]: Transform OpenACC loop directive
Hi Tobias! This patch implements transformation of OpenACC loop directive from Fortran AST to GENERIC. Successfully bootstrapped and tested with no new regressions on x86_64-unknown-linux-gnu. OK for gomp4 branch? -- Ilmir. >From de2dd5ba0c48500e8e9084bd46cbfac2f21352fe Mon Sep 17 00:00:00 2001 From: Ilmir Usmanov Date: Wed, 19 Mar 2014 15:12:36 +0400 Subject: [PATCH] Transform OpenACC loop directive from fortran AST to GENERIC --- * gcc/fortran/trans-openmp.c (gfc_trans_oacc_loop): New function. (gfc_trans_oacc_combined_directive): Call it. (gfc_trans_oacc_directive): Likewise. * gcc/tree-pretty-print (dump_omp_clause): Fix WORKER and VECTOR. * gcc/testsuite/gfortran.dg/goacc/loop-tree.f95: New test. diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 29364f4..cb7c970 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1571,11 +1571,181 @@ typedef struct dovar_init_d { tree init; } dovar_init; + +static tree +gfc_trans_oacc_loop (gfc_code *code, stmtblock_t *pblock, + gfc_omp_clauses *loop_clauses) +{ + gfc_se se; + tree dovar, stmt, from, to, step, type, init, cond, incr; + tree count = NULL_TREE, cycle_label, tmp, omp_clauses; + stmtblock_t block; + stmtblock_t body; + gfc_omp_clauses *clauses = code->ext.omp_clauses; + int i, collapse = clauses->collapse; + vec inits = vNULL; + dovar_init *di; + unsigned ix; + + if (collapse <= 0) +collapse = 1; + + code = code->block->next; + gcc_assert (code->op == EXEC_DO || code->op == EXEC_DO_CONCURRENT); + + init = make_tree_vec (collapse); + cond = make_tree_vec (collapse); + incr = make_tree_vec (collapse); + + if (pblock == NULL) +{ + gfc_start_block (&block); + pblock = █ +} + + omp_clauses = gfc_trans_omp_clauses (pblock, loop_clauses, code->loc); + + for (i = 0; i < collapse; i++) +{ + int simple = 0; + + /* Evaluate all the expressions in the iterator. */ + gfc_init_se (&se, NULL); + gfc_conv_expr_lhs (&se, code->ext.iterator->var); + gfc_add_block_to_block (pblock, &se.pre); + dovar = se.expr; + type = TREE_TYPE (dovar); + gcc_assert (TREE_CODE (type) == INTEGER_TYPE); + + gfc_init_se (&se, NULL); + gfc_conv_expr_val (&se, code->ext.iterator->start); + gfc_add_block_to_block (pblock, &se.pre); + from = gfc_evaluate_now (se.expr, pblock); + + gfc_init_se (&se, NULL); + gfc_conv_expr_val (&se, code->ext.iterator->end); + gfc_add_block_to_block (pblock, &se.pre); + to = gfc_evaluate_now (se.expr, pblock); + + gfc_init_se (&se, NULL); + gfc_conv_expr_val (&se, code->ext.iterator->step); + gfc_add_block_to_block (pblock, &se.pre); + step = gfc_evaluate_now (se.expr, pblock); + + /* Special case simple loops. */ + if (TREE_CODE (dovar) == VAR_DECL) + { + if (integer_onep (step)) + simple = 1; + else if (tree_int_cst_equal (step, integer_minus_one_node)) + simple = -1; + } + + /* Loop body. */ + if (simple) + { + TREE_VEC_ELT (init, i) = build2_v (MODIFY_EXPR, dovar, from); + /* The condition should not be folded. */ + TREE_VEC_ELT (cond, i) = build2_loc (input_location, simple > 0 + ? LE_EXPR : GE_EXPR, + boolean_type_node, dovar, to); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, PLUS_EXPR, + type, dovar, step); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, + MODIFY_EXPR, + type, dovar, + TREE_VEC_ELT (incr, i)); + } + else + { + /* STEP is not 1 or -1. Use: + for (count = 0; count < (to + step - from) / step; count++) + { + dovar = from + count * step; + body; + cycle_label:; + } */ + tmp = fold_build2_loc (input_location, MINUS_EXPR, type, step, from); + tmp = fold_build2_loc (input_location, PLUS_EXPR, type, to, tmp); + tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR, type, tmp, + step); + tmp = gfc_evaluate_now (tmp, pblock); + count = gfc_create_var (type, "count"); + TREE_VEC_ELT (init, i) = build2_v (MODIFY_EXPR, count, + build_int_cst (type, 0)); + /* The condition should not be folded. */ + TREE_VEC_ELT (cond, i) = build2_loc (input_location, LT_EXPR, + boolean_type_node, + count, tmp); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, PLUS_EXPR, + type, count, + build_int_cst (type, 1)); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, + MODIFY_EXPR, type, count, + TREE_VEC_ELT (incr, i)); + + /* Initialize DOVAR. */ + tmp = fold_build2_loc (input_location, MULT_EXPR, type, count, step); + tmp = fold_build2_loc (input_location, PLUS_EXPR, type, from, tmp); + dovar_init e = {dovar, tmp}; + inits.safe_push (e); + } + + if (i + 1 < collapse) + code = code->block->next; +} + + if (pblock != &block) +{ + pushlevel (); +
Re: [C++ PATCH] [gomp4] Initial OpenACC support to C++ front-end
Ping. On 13.03.2014 21:05, Ilmir Usmanov wrote: On 07.03.2014 15:37, Ilmir Usmanov wrote: Hi Thomas! I prepared simple patch to add support of OpenACC data, kernels and parallel constructs to C++ FE. It adds support of data clauses too. OK to gomp4 branch? Fixed subject: changed file extensions of tests and fixed comments. OK to gomp4 branch? -- Ilmir.
Re: [patch] gcc fstack-protector-explicit
Well, finally I have the assignment, could you please review this patch? On Wed, Nov 20, 2013 at 4:13 PM, Jeff Law wrote: > On 11/19/13 07:04, Marcos Díaz wrote: >> >> My employer is working on the signature of the papers. Could someone >> please do the review meanwhile? > > I'd prefer to wait until the assignment process is complete. If something > were to happen and we can't use your code the review time would have been > wasted (and such things have certainly happened in the past). > > Once the assignment is recorded, please ping this patch. > > Jeff > -- __ Marcos Díaz Software Engineer San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 4217888 / +54 351 4218211/ +54 351 7617452 Skype: markdiaz22
Re: [patch] gcc fstack-protector-explicit
On 03/19/14 08:06, Marcos Díaz wrote: Well, finally I have the assignment, could you please review this patch? Thanks. I'll take a look once we open up stage1 development again (should be soon as 4.9 is getting close to being ready). jeff
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, 19 Mar 2014, Martin Liška wrote: > There are stats for Firefox with LTO and -O2. According to graphs it > looks that memory consumption for parallel WPA phase is similar. > When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory > footprint is similar to parallel WPA that reduces libxul.so linking by ~10%. Ok, so I suppose this tracks RSS, not virtual memory use (what is "used" and what is "active")? And it is WPA plus LTRANS stages, WPA ends where memory use first goes down to zero? I wonder if you can identify the point where parallel streaming starts and where it ends ... ;) Btw, I have another patch in my local tree, limiting the exponential growth of blocks we allocate when outputting sections. But it shouldn't be _that_ bad ... maybe you can try if it has any effect? Thanks, Richard. Index: gcc/lto-section-out.c === --- gcc/lto-section-out.c (revision 208642) +++ gcc/lto-section-out.c (working copy) @@ -99,13 +99,19 @@ lto_end_section (void) } +/* We exponentially grow the size of the blocks as we need to make + room for more data to be written. Start with a single page and go up + to 2MB pages for this. */ +#define FIRST_BLOCK_SIZE 4096 +#define MAX_BLOCK_SIZE (2 * 1024 * 1024) + /* Write all of the chars in OBS to the assembler. Recycle the blocks in obs as this is being done. */ void lto_write_stream (struct lto_output_stream *obs) { - unsigned int block_size = 1024; + unsigned int block_size = FIRST_BLOCK_SIZE; struct lto_char_ptr_base *block; struct lto_char_ptr_base *next_block; if (!obs->first_block) @@ -135,6 +141,7 @@ lto_write_stream (struct lto_output_stre else lang_hooks.lto.append_data (base, num_chars, block); block_size *= 2; + block_size = MIN (MAX_BLOCK_SIZE, block_size); } } @@ -152,7 +159,7 @@ lto_append_block (struct lto_output_stre { /* This is the first time the stream has been written into. */ - obs->block_size = 1024; + obs->block_size = FIRST_BLOCK_SIZE; new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); obs->first_block = new_block; } @@ -162,6 +169,7 @@ lto_append_block (struct lto_output_stre /* Get a new block that is twice as big as the last block and link it into the list. */ obs->block_size *= 2; + obs->block_size = MIN (MAX_BLOCK_SIZE, obs->block_size); new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); /* The first bytes of the block are reserved as a pointer to the next block. Set the chain of the full block to the
Re: [PATCH] Avoid ggc_collect () after WPA forking
On 03/19/2014 03:55 PM, Richard Biener wrote: On Wed, 19 Mar 2014, Martin Liška wrote: There are stats for Firefox with LTO and -O2. According to graphs it looks that memory consumption for parallel WPA phase is similar. When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory footprint is similar to parallel WPA that reduces libxul.so linking by ~10%. Ok, so I suppose this tracks RSS, not virtual memory use (what is "used" and what is "active")? Data are given by vmstat, according to: http://stackoverflow.com/questions/18529723/what-is-active-memory-and-inactive-memory *Active memory*is memory that is being used by a particular process. *Inactive memory*is memory that was allocated to a process that is no longer running. So please follow just 'blue' line that displays really used memory. According to man, vmstat tracks virtual memory statistics. And it is WPA plus LTRANS stages, WPA ends where memory use first goes down to zero? I wonder if you can identify the point where parallel streaming starts and where it ends ... ;) Exactly, WPA ends when it goes to zero. Btw, I have another patch in my local tree, limiting the exponential growth of blocks we allocate when outputting sections. But it shouldn't be _that_ bad ... maybe you can try if it has any effect? I can apply it. Martin Thanks, Richard. Index: gcc/lto-section-out.c === --- gcc/lto-section-out.c (revision 208642) +++ gcc/lto-section-out.c (working copy) @@ -99,13 +99,19 @@ lto_end_section (void) } +/* We exponentially grow the size of the blocks as we need to make + room for more data to be written. Start with a single page and go up + to 2MB pages for this. */ +#define FIRST_BLOCK_SIZE 4096 +#define MAX_BLOCK_SIZE (2 * 1024 * 1024) + /* Write all of the chars in OBS to the assembler. Recycle the blocks in obs as this is being done. */ void lto_write_stream (struct lto_output_stream *obs) { - unsigned int block_size = 1024; + unsigned int block_size = FIRST_BLOCK_SIZE; struct lto_char_ptr_base *block; struct lto_char_ptr_base *next_block; if (!obs->first_block) @@ -135,6 +141,7 @@ lto_write_stream (struct lto_output_stre else lang_hooks.lto.append_data (base, num_chars, block); block_size *= 2; + block_size = MIN (MAX_BLOCK_SIZE, block_size); } } @@ -152,7 +159,7 @@ lto_append_block (struct lto_output_stre { /* This is the first time the stream has been written into. */ - obs->block_size = 1024; + obs->block_size = FIRST_BLOCK_SIZE; new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); obs->first_block = new_block; } @@ -162,6 +169,7 @@ lto_append_block (struct lto_output_stre /* Get a new block that is twice as big as the last block and link it into the list. */ obs->block_size *= 2; + obs->block_size = MIN (MAX_BLOCK_SIZE, obs->block_size); new_block = (struct lto_char_ptr_base*) xmalloc (obs->block_size); /* The first bytes of the block are reserved as a pointer to the next block. Set the chain of the full block to the
Re: [ARM] [Trivial] Fix shortening of field name extend.
On Mon, Feb 24, 2014 at 09:13:45AM +, James Greenhalgh wrote: > *ping*, CCing Jakub. *ping x2* This was OKed by ramana, but we wanted release manager approval. I would have committed the patch as obvious if we were not in stage 4. Thanks, James > On Wed, Feb 12, 2014 at 12:43:10PM +, Ramana Radhakrishnan wrote: > > On 02/12/14 12:19, James Greenhalgh wrote: > > > > > > Hi, > > > > > > In aarch-common-protos.h we define a field in alu_cost_table: > > > > > >"extnd" > > > > > > On its own this is an upsetting optimization of the > > > English language, but this trouble is compounded by the > > > comment attached to this field throughout the cost tables > > > themselves: > > > > > >/* Extend. */ > > > > > > This patch fixes the spelling of extend to match that in the > > > commemnts. > > > > > > I've checked that AArch64 and AArch32 build with this patch > > > applied. > > > > > > OK for trunk/stage-1 (I don't mind which)? > > > > I am happy for this to go in now - > > > > Jakub ? > > > > > > regards > > Ramana > 2014-03-19 James Greenhalgh * config/arm/aarch-common-protos.h (alu_cost_table): Fix spelling of "extend". * config/arm/arm.c (arm_new_rtx_costs): Fix spelling of "extend". diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 056fe56..a5ff6b4 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -48,8 +48,8 @@ struct alu_cost_table const int arith_shift_reg; /* ... and when the shift is by a reg. */ const int log_shift; /* Additional when logic also shifts... */ const int log_shift_reg; /* ... and when the shift is by a reg. */ - const int extnd; /* Zero/sign extension. */ - const int extnd_arith; /* Extend and arith. */ + const int extend;/* Zero/sign extension. */ + const int extend_arith; /* Extend and arith. */ const int bfi; /* Bit-field insert. */ const int bfx; /* Bit-field extraction. */ const int clz; /* Count Leading Zeros. */ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a68ed8d..31df089 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9594,7 +9594,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, { /* UXTA[BH] or SXTA[BH]. */ if (speed_p) - *cost += extra_cost->alu.extnd_arith; + *cost += extra_cost->alu.extend_arith; *cost += (rtx_cost (XEXP (XEXP (x, 0), 0), ZERO_EXTEND, 0, speed_p) + rtx_cost (XEXP (x, 1), PLUS, 0, speed_p)); @@ -10311,7 +10311,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost = COSTS_N_INSNS (1); *cost += rtx_cost (XEXP (x, 0), code, 0, speed_p); if (speed_p) - *cost += extra_cost->alu.extnd; + *cost += extra_cost->alu.extend; } else if (GET_MODE (XEXP (x, 0)) != SImode) { @@ -10364,7 +10364,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost = COSTS_N_INSNS (1); *cost += rtx_cost (XEXP (x, 0), code, 0, speed_p); if (speed_p) - *cost += extra_cost->alu.extnd; + *cost += extra_cost->alu.extend; } else if (GET_MODE (XEXP (x, 0)) != SImode) {
Re: [ARM] [Trivial] Fix shortening of field name extend.
On Wed, Mar 19, 2014 at 03:13:40PM +, James Greenhalgh wrote: > On Mon, Feb 24, 2014 at 09:13:45AM +, James Greenhalgh wrote: > > *ping*, CCing Jakub. > > *ping x2* This was OKed by ramana, but we wanted release manager approval. > I would have committed the patch as obvious if we were not in stage 4. This is ok even in stage4. Jakub
Re: [Patch AArch64] Define TARGET_FLAGS_REGNUM
On 28 February 2014 09:32, Ramana Radhakrishnan wrote: > Hi, > > This defines TARGET_FLAGS_REGNUM for AArch64 to be CC_REGNUM. > Noticed this turns on the cmpelim pass after reload and in a few examples > and a couple of benchmarks I noticed a number of comparisons getting > deleted. A similar patch for AArch32 is being tested. > > Tested cross with aarch64-none-elf on a model with no regressions. > > Ok for stage1 ? OK /Marcus
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, 19 Mar 2014, Martin Liška wrote: > > On 03/19/2014 03:55 PM, Richard Biener wrote: > > On Wed, 19 Mar 2014, Martin Liška wrote: > > > > > There are stats for Firefox with LTO and -O2. According to graphs it > > > looks that memory consumption for parallel WPA phase is similar. > > > When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory > > > footprint is similar to parallel WPA that reduces libxul.so linking by > > > ~10%. > > Ok, so I suppose this tracks RSS, not virtual memory use (what is > > "used" and what is "active")? > > Data are given by vmstat, according to: > http://stackoverflow.com/questions/18529723/what-is-active-memory-and-inactive-memory > > *Active memory*is memory that is being used by a particular process. > *Inactive memory*is memory that was allocated to a process that is no longer > running. > > So please follow just 'blue' line that displays really used memory. According > to man, vmstat tracks virtual memory statistics. But 'blue' is neither active nor inactive ... what is 'used'? Does it correspond to 'swpd'? If it is virtual memory in use then this is expected to grow when fork()ing as the virtual memory space is obviously copied (just the pages are still shared). For me allocating a GB memory and clearing it increases "active" by 1GB and then forking doesn't increase any of the metrics vmstat -a outputs in any significant way. > > And it is WPA plus LTRANS stages, WPA ends where memory use first goes > > down to zero? > > I wonder if you can identify the point where parallel streaming > > starts and where it ends ... ;) > > Exactly, WPA ends when it goes to zero. So the difference isn't that big (8GB vs. 7.2GB), and is likely attributed to heap memory we allocate during the stream-out. For example we need some for the tree-ref-encoders (I remember that can be a significant amount of memory, but I improved that already as far as possible...). So yes, we _do_ allocate memory during stream-out and that is now required N times. > > Btw, I have another patch in my local tree, limiting the > > exponential growth of blocks we allocate when outputting sections. > > But it shouldn't be _that_ bad ... maybe you can try if it has > > any effect? > > I can apply it. Thanks, Richard.
PATCH: PR testsuite/60590: Can't recreate the same executable in testsuite
On Wed, Mar 19, 2014 at 8:41 AM, H.J. Lu wrote: > GNU linker sets DT_RPATH from the environment variable LD_RUN_PATH. > set_ld_library_path_env_vars sets a few environment variables including > LD_RUN_PATH. This patch logs all environment variables set by > set_ld_library_path_env_vars so that one can recreate the same > executable as "make check" run. OK to install? > > Thanks. > > H.J. > --- > 2014-03-19 H.J. Lu > > PR testsuite/60590 > * lib/target-libpath.exp (set_ld_library_path_env_vars): Log > LD_LIBRARY_PATH, LD_RUN_PATH, SHLIB_PATH, LD_LIBRARY_PATH_32, > LD_LIBRARY_PATH_64 and DYLD_LIBRARY_PATH. > > diff --git a/gcc/testsuite/lib/target-libpath.exp > b/gcc/testsuite/lib/target-libpath.exp > index 603ed8a..1891088 100644 > --- a/gcc/testsuite/lib/target-libpath.exp > +++ b/gcc/testsuite/lib/target-libpath.exp > @@ -155,7 +155,12 @@ proc set_ld_library_path_env_vars { } { > setenv DYLD_LIBRARY_PATH "$ld_library_path" >} > > - verbose -log "set_ld_library_path_env_vars: > ld_library_path=$ld_library_path" > + verbose -log "LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH]" > + verbose -log "LD_RUN_PATH=[getenv LD_RUN_PATH]" > + verbose -log "SHLIB_PATH=[getenv SHLIB_PATH]" > + verbose -log "LD_LIBRARY_PATH_32=[getenv LD_LIBRARY_PATH_32]" > + verbose -log "LD_LIBRARY_PATH_64=[getenv LD_LIBRARY_PATH_64]" > + verbose -log "DYLD_LIBRARY_PATH=[getenv DYLD_LIBRARY_PATH]" > } > > ### Correction. It is a testsuite issue. -- H.J.
Re: [patch testsuite]: g++.dg/abi
On Mar 18, 2014, at 6:16 AM, Kai Tietz wrote: > this patch skips anon2.C and anon3.C test for mingw target. Issue > here is that weak under pe-coff is different to ELF-targets and > therefore test doesn't apply for So, what does the output look like? There should be a trace of weak of some sort in the output.
Re: [C++ Patch / RFC] PR 51474
OK. Jason
PATCH: PR target/60590: Can't recreate the same executable in testsuite
GNU linker sets DT_RPATH from the environment variable LD_RUN_PATH. set_ld_library_path_env_vars sets a few environment variables including LD_RUN_PATH. This patch logs all environment variables set by set_ld_library_path_env_vars so that one can recreate the same executable as "make check" run. OK to install? Thanks. H.J. --- 2014-03-19 H.J. Lu PR target/60590 * lib/target-libpath.exp (set_ld_library_path_env_vars): Log LD_LIBRARY_PATH, LD_RUN_PATH, SHLIB_PATH, LD_LIBRARY_PATH_32, LD_LIBRARY_PATH_64 and DYLD_LIBRARY_PATH. diff --git a/gcc/testsuite/lib/target-libpath.exp b/gcc/testsuite/lib/target-libpath.exp index 603ed8a..1891088 100644 --- a/gcc/testsuite/lib/target-libpath.exp +++ b/gcc/testsuite/lib/target-libpath.exp @@ -155,7 +155,12 @@ proc set_ld_library_path_env_vars { } { setenv DYLD_LIBRARY_PATH "$ld_library_path" } - verbose -log "set_ld_library_path_env_vars: ld_library_path=$ld_library_path" + verbose -log "LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH]" + verbose -log "LD_RUN_PATH=[getenv LD_RUN_PATH]" + verbose -log "SHLIB_PATH=[getenv SHLIB_PATH]" + verbose -log "LD_LIBRARY_PATH_32=[getenv LD_LIBRARY_PATH_32]" + verbose -log "LD_LIBRARY_PATH_64=[getenv LD_LIBRARY_PATH_64]" + verbose -log "DYLD_LIBRARY_PATH=[getenv DYLD_LIBRARY_PATH]" } ###
Re: [patch testsuite]: g++.dg/abi
2014-03-19 17:23 GMT+01:00 Mike Stump : > On Mar 18, 2014, at 6:16 AM, Kai Tietz wrote: >> this patch skips anon2.C and anon3.C test for mingw target. Issue >> here is that weak under pe-coff is different to ELF-targets and >> therefore test doesn't apply for > > So, what does the output look like? There should be a trace of weak of some > sort in the output. No, there is none. Output looks like: .seh_proc _ZN2N43._91CIiE3fn2ES2_ _ZN2N43._91CIiE3fn2ES2_: .LFB11: .seh_endprologue ret .seh_endproc .globl _ZN2N41qE .data .align 8 _ZN2N41qE: .quad _ZN2N43._91CIiE3fn2ES2_ .globl _ZN2N41pE .align 8 _ZN2N41pE: .quad _ZN2N43._91CIiE3fn1ENS0_1BE .globl _ZN2N31qE .align 8 _ZN2N31qE: .quad _ZN2N31D1CIiE3fn2ES2_... The concept of weak - as present in ELF - isn't known in COFF in general. There is some weak, but it works only for static library and in a limitted way. Therefore we can't (and don't) use it for COFF targets. Kai PS: I have another similiar reasoned patch for g++.dg/abi/thunk5.C on my pile too.
Re: PATCH: PR target/60590: Can't recreate the same executable in testsuite
On Mar 19, 2014, at 8:41 AM, H.J. Lu wrote: > GNU linker sets DT_RPATH from the environment variable LD_RUN_PATH. > set_ld_library_path_env_vars sets a few environment variables including > LD_RUN_PATH. This patch logs all environment variables set by > set_ld_library_path_env_vars so that one can recreate the same > executable as "make check" run. OK to install? Ok. If someone complains about the log size clutter, we can consider bumping it up to higher verbosity.
[jit] Tighten up the distinction between pointers and arrays
Committed to branch dmalcolm/jit: https://github.com/davidmalcolm/pygccjit/pull/3#issuecomment-37883129 showed a problem where a parameter expecting a (char *) was passed a char[1024] cast to a (char *) as its argument, leading to an ICE: libgccjit.so: internal compiler error: in convert_move, at expr.c:320 0x7fffebea98ad convert_move(rtx_def*, rtx_def*, int) ../../src/gcc/expr.c:320 0x7fffebec31cb expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../src/gcc/expr.c:8105 0x7fffec88d768 expand_gimple_stmt_1 ../../src/gcc/cfgexpand.c:2321 0x7fffec88d9cc expand_gimple_stmt ../../src/gcc/cfgexpand.c:2381 The issue was that the recording::type::dereference method is used for both pointers and for arrays, leading to sloppiness about where lvalues and rvalues can be pointers vs arrays. This commit introduces is_pointer and is_array methods, using them to tighten up type-checking, converting the above ICE into an type-check error when the cast is attempted: libgccjit.so: error: gcc_jit_context_new_cast: cannot cast buffer from type: char[1024] to type: char * The correct way to use an array as a pointer in the JIT API is to use gcc_jit_lvalue_get_address on the array, which gives you an rvalue representing the address of the initial element, and then to cast that rvalue as necessary. gcc/jit * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: accepts_writes_from): Accept writes from pointers, but not arrays. * internal-api.h (gcc::jit::recording::type::is_pointer): New. (gcc::jit::recording::type::is_array): New. (gcc::jit::recording::memento_of_get_type::accepts_writes_from): Allow (void *) to accept writes of pointers, but not arrays. (gcc::jit::recording::memento_of_get_type::is_pointer): New. (gcc::jit::recording::memento_of_get_type::is_array): New. (gcc::jit::recording::memento_of_get_pointer::is_pointer): New. (gcc::jit::recording::memento_of_get_pointer::is_array): New. (gcc::jit::recording::memento_of_get_const::is_pointer): New. (gcc::jit::recording::memento_of_get_const::is_array): New. (gcc::jit::recording::memento_of_get_volatile::is_pointer): New. (gcc::jit::recording::memento_of_get_volatile::is_array): New. (gcc::jit::recording::array_type::is_pointer): New. (gcc::jit::recording::array_type::is_array): New. (gcc::jit::recording::function_type::is_pointer): New. (gcc::jit::recording::function_type::is_array): New. (gcc::jit::recording::struct_::is_pointer): New. (gcc::jit::recording::struct_::is_array): New. * libgccjit.c (gcc_jit_context_new_rvalue_from_ptr): Require the pointer_type to be a pointer, not an array. (gcc_jit_context_null): Likewise. (is_valid_cast): Require pointer casts to be between pointer types, not arrays. (gcc_jit_context_new_array_access): Update error message from "not a pointer" to "not a pointer or array". (gcc_jit_rvalue_dereference_field): Require the pointer arg to be of pointer type, not an array. (gcc_jit_rvalue_dereference): Likewise. gcc/testsuite/ * jit.dg/test-array-as-pointer.c: New test case, verifying that there's a way to treat arrays as pointers. * jit.dg/test-combination.c: Add test-array-as-pointer.c... (create_code): ...here and... (verify_code): ...here. * jit.dg/test-error-array-as-pointer.c: New test case, verifying that bogus casts from array to pointer are caught by the type system, rather than leading to ICEs seen in: https://github.com/davidmalcolm/pygccjit/pull/3#issuecomment-37883129 --- gcc/jit/ChangeLog.jit | 35 +++ gcc/jit/internal-api.c | 2 +- gcc/jit/internal-api.h | 18 +++- gcc/jit/libgccjit.c| 14 +-- gcc/testsuite/ChangeLog.jit| 13 +++ gcc/testsuite/jit.dg/test-array-as-pointer.c | 101 + gcc/testsuite/jit.dg/test-combination.c| 9 ++ gcc/testsuite/jit.dg/test-error-array-as-pointer.c | 99 8 files changed, 282 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/jit.dg/test-array-as-pointer.c create mode 100644 gcc/testsuite/jit.dg/test-error-array-as-pointer.c diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 8244eba..efb1931 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,38 @@ +2014-03-19 David Malcolm + + * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: + accepts_writes_from): Accept writes from pointers, but not arrays. + + * internal-api.h (gcc::jit::recording::type::is_pointer): New. + (gcc::jit::recording::type::is_array): New. +
Re: [patch testsuite]: g++.dg/abi
Kai Tietz writes: > 2014-03-19 17:23 GMT+01:00 Mike Stump : >> On Mar 18, 2014, at 6:16 AM, Kai Tietz wrote: >>> this patch skips anon2.C and anon3.C test for mingw target. Issue >>> here is that weak under pe-coff is different to ELF-targets and >>> therefore test doesn't apply for >> >> So, what does the output look like? There should be a trace of weak of >> some sort in the output. > > No, there is none. Output looks like: > > .seh_proc _ZN2N43._91CIiE3fn2ES2_ > _ZN2N43._91CIiE3fn2ES2_: > .LFB11: > .seh_endprologue > ret > .seh_endproc > .globl _ZN2N41qE > .data > .align 8 > _ZN2N41qE: > .quad _ZN2N43._91CIiE3fn2ES2_ > .globl _ZN2N41pE > .align 8 > _ZN2N41pE: > .quad _ZN2N43._91CIiE3fn1ENS0_1BE > .globl _ZN2N31qE > .align 8 > _ZN2N31qE: > .quad _ZN2N31D1CIiE3fn2ES2_... > > The concept of weak - as present in ELF - isn't known in COFF in > general. There is some weak, but it works only for static library and > in a limitted way. Therefore we can't (and don't) use it for COFF > targets. In that case, it seems far better to have gcc/testsuite/lib/target-support.exp (check_weak_available) reflect that instead of lying about weak support. This way, everything else simply falls into place; no need to special-case many individual testcases. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [patch testsuite]: g++.dg/abi
On Mar 19, 2014, at 9:49 AM, Rainer Orth wrote: >> The concept of weak - as present in ELF - isn't known in COFF in >> general. There is some weak, but it works only for static library and >> in a limitted way. Therefore we can't (and don't) use it for COFF >> targets. > > In that case, it seems far better to have > gcc/testsuite/lib/target-support.exp (check_weak_available) reflect that > instead of lying about weak support. Yeah, this is the direction I was headed… :-)
[PATCH 2/2, AARCH64] Test case changes: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
Hi Marcus, On 14 March 2014 19:42, Marcus Shawcroft wrote: >>> >>> Do we need a new effective target test, why is the existing >>> "fstack_protector" not appropriate? >> >> "stack_protector" does a run time test. It failed in cross compilation >> environment and these are compile only tests. > > This works fine in my cross environment, how does yours fail? > > >> Also I thought richard suggested me to add a new option for this. >> ref: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03358.html > > I read that comment to mean use an effective target test instead of > matching triples. I don't see that re-using an existing effective > target test contradicts that suggestion. > > Looking through the test suite I see that there are: > > 6 tests that use dg-do compile with dg-require-effective-target > fstack_protector > > 4 tests that use dg-do run with dg-require-effective-target fstack_protector > > 2 tests that use dg-do run {target native} dg-require-effective-target > fstack_protector > > and finally the 2 tests we are discussing that use dg-compile with a > triple test. > > so there are already tests in the testsuite that use dg-do compile > with the existing effective target test. > > I see no immediately obvious reason why the two tests that require > target native require the native constraint... but I guess that is a > different issue. > I used the existing dg-require-effective-target check, "stack_protector" and added it in a separate line. ChangeLog. 2014-03-19 Venkataramanan Kumar * g++.dg/fstack-protector-strong.C: Add effetive target check for stack protection. * gcc.dg/fstack-protector-strong.c: Likewise. These two tests are passing now for aarch64-none-linux-gnu target under QEMU. Let me know if I can upstream these two patches. regards, Venkat. Index: gcc/testsuite/g++.dg/fstack-protector-strong.C === --- gcc/testsuite/g++.dg/fstack-protector-strong.C (revision 208609) +++ gcc/testsuite/g++.dg/fstack-protector-strong.C (working copy) @@ -1,7 +1,8 @@ /* Test that stack protection is done on chosen functions. */ -/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fstack-protector-strong" } */ +/* { dg-require-effective-target fstack_protector } */ class A { Index: gcc/testsuite/gcc.dg/fstack-protector-strong.c === --- gcc/testsuite/gcc.dg/fstack-protector-strong.c (revision 208609) +++ gcc/testsuite/gcc.dg/fstack-protector-strong.c (working copy) @@ -1,7 +1,8 @@ /* Test that stack protection is done on chosen functions. */ -/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fstack-protector-strong" } */ +/* { dg-require-effective-target fstack_protector } */ #include
[C++ Patch] PR 60384
Hi, in this minor regression we ICE during error recovery, when push_class_level_binding_1 (called by finish_member_declaration via pushdecl_class_level) gets a TEMPLATE_ID_EXPR as the name argument. It's a regression because, since r199779, invalid declarations get more often through (with TREE_TYPE an error_mark_node, like TREE_TYPE (x) in the case at issue). Thus the additional check I'm suggesting. Tested x86_64-linux. Thanks, Paolo. // /cp 2014-03-19 Paolo Carlini PR c++/60384 * name-lookup.c (push_class_level_binding_1): Check identifier_p on the name argument. /testsuite 2014-03-19 Paolo Carlini PR c++/60384 * g++.dg/cpp1y/pr60384.C: New. Index: cp/name-lookup.c === --- cp/name-lookup.c(revision 208682) +++ cp/name-lookup.c(working copy) @@ -3112,7 +3112,9 @@ push_class_level_binding_1 (tree name, tree x) if (!class_binding_level) return true; - if (name == error_mark_node) + if (name == error_mark_node + /* Can happen for an erroneous declaration (c++/60384). */ + || !identifier_p (name)) return false; /* Check for invalid member names. But don't worry about a default Index: testsuite/g++.dg/cpp1y/pr60384.C === --- testsuite/g++.dg/cpp1y/pr60384.C(revision 0) +++ testsuite/g++.dg/cpp1y/pr60384.C(working copy) @@ -0,0 +1,9 @@ +// PR c++/60384 +// { dg-do compile { target c++1y } } + +template int foo(); + +struct A +{ + typedef auto foo<>(); // { dg-error "typedef declared 'auto'" } +};
Re: [patch testsuite]: g++.dg/abi
On Wed, 19 Mar 2014, Kai Tietz wrote: > The concept of weak - as present in ELF - isn't known in COFF in > general. There is some weak, but it works only for static library and > in a limitted way. Therefore we can't (and don't) use it for COFF > targets. There are already two different checks (check_weak_available and check_weak_override_available), reflecting what different testcases need. Is the requirement for these tests logically different from both of those? If so, maybe there should be a third such check (even if in fact it does the same thing as check_weak_override_available). -- Joseph S. Myers jos...@codesourcery.com
[PATCH 1/2, AARCH64]: Machine descriptions: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
Hi Marcus, On 14 March 2014 19:42, Marcus Shawcroft wrote: > Hi Venkat > > On 5 February 2014 10:29, Venkataramanan Kumar > wrote: >> Hi Marcus, >> >>> + "ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0" >>> + [(set_attr "length" "12")]) >>> >>> This pattern emits an opaque sequence of instructions that cannot be >>> scheduled, is that necessary? Can we not expand individual >>> instructions or at least split ? >> >> Almost all the ports emits a template of assembly instructions. >> I m not sure why they have to be generated this way. >> But usage of these pattern is to clear the register that holds canary >> value immediately after its usage. > > I've just read the thread Andrew pointed out, thanks, I'm happy that > there is a good reason to do it this way. Andrew, thanks for > providing the background. > > + [(set_attr "length" "12")]) > + > > These patterns should also set the "type" attribute, a reasonable > value would be "multiple". > I have incorporated your review comments and split the patch into two. The first patch attached here contains Aarch64 machine descriptions for the stack protect patterns. ChangeLog. 2014-03-19 Venkataramanan Kumar * config/aarch64/aarch64.md (stack_protect_set, stack_protect_test) (stack_protect_set_, stack_protect_test_): Add machine descriptions for Stack Smashing Protector. Tested for aarch64-none-linux-gnu target under QEMU . regards, Venkat. Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 208609) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -102,6 +102,8 @@ UNSPEC_TLSDESC UNSPEC_USHL_2S UNSPEC_VSTRUCTDUMMY +UNSPEC_SP_SET +UNSPEC_SP_TEST ]) (define_c_enum "unspecv" [ @@ -3634,6 +3636,67 @@ DONE; }) +;; Named patterns for stack smashing protection. +(define_expand "stack_protect_set" + [(match_operand 0 "memory_operand") + (match_operand 1 "memory_operand")] + "" +{ + enum machine_mode mode = GET_MODE (operands[0]); + + emit_insn ((mode == DImode + ? gen_stack_protect_set_di + : gen_stack_protect_set_si) (operands[0], operands[1])); + DONE; +}) + +(define_insn "stack_protect_set_" + [(set (match_operand:PTR 0 "memory_operand" "=m") + (unspec:PTR [(match_operand:PTR 1 "memory_operand" "m")] +UNSPEC_SP_SET)) + (set (match_scratch:PTR 2 "=&r") (const_int 0))] + "" + "ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0" + [(set_attr "length" "12") + (set_attr "type" "multiple")]) + +(define_expand "stack_protect_test" + [(match_operand 0 "memory_operand") + (match_operand 1 "memory_operand") + (match_operand 2)] + "" +{ + + rtx result = gen_reg_rtx (Pmode); + + enum machine_mode mode = GET_MODE (operands[0]); + + emit_insn ((mode == DImode + ? gen_stack_protect_test_di + : gen_stack_protect_test_si) (result, + operands[0], + operands[1])); + + if (mode == DImode) +emit_jump_insn (gen_cbranchdi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx), + result, const0_rtx, operands[2])); + else +emit_jump_insn (gen_cbranchsi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx), + result, const0_rtx, operands[2])); + DONE; +}) + +(define_insn "stack_protect_test_" + [(set (match_operand:PTR 0 "register_operand") + (unspec:PTR [(match_operand:PTR 1 "memory_operand" "m") +(match_operand:PTR 2 "memory_operand" "m")] +UNSPEC_SP_TEST)) + (clobber (match_scratch:PTR 3 "=&r"))] + "" + "ldr\t%x3, %x1\;ldr\t%x0, %x2\;eor\t%x0, %x3, %x0" + [(set_attr "length" "12") + (set_attr "type" "multiple")]) + ;; AdvSIMD Stuff (include "aarch64-simd.md")
Re: [PATCH 1/2, AARCH64]: Machine descriptions: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
On 19 March 2014 17:11, Venkataramanan Kumar wrote: > I have incorporated your review comments and split the patch into two. > > The first patch attached here contains Aarch64 machine descriptions > for the stack protect patterns. > > ChangeLog. > > 2014-03-19 Venkataramanan Kumar > * config/aarch64/aarch64.md (stack_protect_set, stack_protect_test) > (stack_protect_set_, stack_protect_test_): Add > machine descriptions for Stack Smashing Protector. > > Tested for aarch64-none-linux-gnu target under QEMU . > > regards, > Venkat. Hi, This is OK for stage-1. Thanks /Marcus
Re: [RFA jit 2/2] introduce scoped_timevar
> "Trevor" == Trevor Saunders writes: Trevor> thanks for doing this. I wonder about naming, we already have Trevor> auto_vec and while I don't really care wether we use auto_ or Trevor> scoped_ it seems like being consistant would be nice. Sounds reasonable to me, I've made this change for v2. Tom
Re: [patch testsuite]: g++.dg/abi
2014-03-19 18:37 GMT+01:00 Joseph S. Myers : > On Wed, 19 Mar 2014, Kai Tietz wrote: > >> The concept of weak - as present in ELF - isn't known in COFF in >> general. There is some weak, but it works only for static library and >> in a limitted way. Therefore we can't (and don't) use it for COFF >> targets. > > There are already two different checks (check_weak_available and > check_weak_override_available), reflecting what different testcases need. > Is the requirement for these tests logically different from both of those? > If so, maybe there should be a third such check (even if in fact it does > the same thing as check_weak_override_available). > > -- > Joseph S. Myers > jos...@codesourcery.com On a second thought the disabling of weak-available for mingw-targets seems to be wrong. Actually weak is present. It just has a different meaning. Those testcases are - AFAIU them - actually checking that weaks are available. Nevertheless the check here intends to probe if weak-override is possible. As otherwise weaks make no sense here AFAICS. I don't think that we need to add a third check here. It might be enough to check for weak-override-available instead for those tests. Kai
Re: [4.8, PATCH 5/26] Backport Power8 and LE support: Test adjustments
Oops. Please ignore this for now. I'm preparing a patch series and sent this one prematurely. Thanks, Bill On Wed, 2014-03-19 at 10:25 -0500, Bill Schmidt wrote: > Hi, > > This patch (diff-le-tests) backports adjustments to a few tests for > powerpc64le and the ELFv2 ABI. > > Thanks, > Bill
[4.8, PATCH 5/26] Backport Power8 and LE support: Test adjustments
Hi, This patch (diff-le-tests) backports adjustments to a few tests for powerpc64le and the ELFv2 ABI. Thanks, Bill 2014-03-29 Bill Schmidt Backport from mainline 2013-11-27 Bill Schmidt * gfortran.dg/nan_7.f90: Disable for little endian PowerPC. Backport from mainline r205106: 2013-11-20 Ulrich Weigand * gcc.target/powerpc/darwin-longlong.c (msw): Make endian-safe. Backport from mainline r205046: 2013-11-19 Ulrich Weigand * gcc.target/powerpc/ppc64-abi-2.c (MAKE_SLOT): New macro to construct parameter slot value in endian-independent way. (fcevv, fciievv, fcvevv): Use it. Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:50:39.655337721 +0100 @@ -119,6 +119,12 @@ typedef union vector int v; } vector_int_t; +#ifdef __LITTLE_ENDIAN__ +#define MAKE_SLOT(x, y) ((long)x | ((long)y << 32)) +#else +#define MAKE_SLOT(x, y) ((long)y | ((long)x << 32)) +#endif + /* Paramter passing. s : gpr 3 v : vpr 2 @@ -226,8 +232,8 @@ fcevv (char *s, ...) sp = __builtin_frame_address(0); sp = sp->backchain; - if (sp->slot[2].l != 0x10002ULL - || sp->slot[4].l != 0x50006ULL) + if (sp->slot[2].l != MAKE_SLOT (1, 2) + || sp->slot[4].l != MAKE_SLOT (5, 6)) abort(); } @@ -268,8 +274,8 @@ fciievv (char *s, int i, int j, ...) sp = __builtin_frame_address(0); sp = sp->backchain; - if (sp->slot[4].l != 0x10002ULL - || sp->slot[6].l != 0x50006ULL) + if (sp->slot[4].l != MAKE_SLOT (1, 2) + || sp->slot[6].l != MAKE_SLOT (5, 6)) abort(); } @@ -296,8 +302,8 @@ fcvevv (char *s, vector int x, ...) sp = __builtin_frame_address(0); sp = sp->backchain; - if (sp->slot[4].l != 0x10002ULL - || sp->slot[6].l != 0x50006ULL) + if (sp->slot[4].l != MAKE_SLOT (1, 2) + || sp->slot[6].l != MAKE_SLOT (5, 6)) abort(); } Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:50:39.659337741 +0100 @@ -11,7 +11,11 @@ int msw(long long in) int i[2]; } ud; ud.ll = in; +#ifdef __LITTLE_ENDIAN__ + return ud.i[1]; +#else return ud.i[0]; +#endif } int main() Index: gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 === --- gcc-4_8-branch.orig/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:50:39.662337756 +0100 @@ -2,6 +2,7 @@ ! { dg-options "-fno-range-check" } ! { dg-require-effective-target fortran_real_16 } ! { dg-require-effective-target fortran_integer_16 } +! { dg-skip-if "" { "powerpc*le-*-*" } { "*" } { "" } } ! PR47293 NAN not correctly read character(len=200) :: str real(16) :: r
[RFA jit v2 1/2] introduce class toplev
This patch introduces a new "class toplev" and changes toplev_main and toplev_finalize to be methods of this class. Additionally, now the timevars are automatically stopped when the object is destroyed. This cleans up "compile" a bit and makes it simpler to reuse the toplev logic in other code. --- gcc/ChangeLog.jit | 14 + gcc/diagnostic.c | 2 +- gcc/jit/ChangeLog.jit | 5 + gcc/jit/internal-api.c | 25 +- gcc/main.c | 9 gcc/toplev.c | 56 +- gcc/toplev.h | 20 -- 7 files changed, 76 insertions(+), 55 deletions(-) diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index 77ac44c..c590ab1 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,3 +1,17 @@ +2014-03-19 Tom Tromey + + * diagnostic.c (bt_stop): Use toplev::main. + * main.c (main): Update. + * toplev.c (do_compile): Remove argument. Don't check + use_TV_TOTAL. + (toplev::toplev, toplev::~toplev, toplev::start_timevars): New + functions. + (toplev::main): Rename from toplev_main. Update. + (toplev::finalize): Rename from toplev_finalize. Update. + * toplev.h (class toplev): New. + (struct toplev_options): Remove. + (toplev_main, toplev_finalize): Don't declare. + 2014-03-11 David Malcolm * gcse.c (gcse_c_finalize): New, to clear test_insn between diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..56dc3ac 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -333,7 +333,7 @@ diagnostic_show_locus (diagnostic_context * context, static const char * const bt_stop[] = { "main", - "toplev_main", + "toplev::main", "execute_one_pass", "compile_file", }; diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index efb1931..e45d38c 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,8 @@ +2014-03-19 Tom Tromey + + * internal-api.c (compile): Use toplev, not toplev_options. + Simplify. + 2014-03-19 David Malcolm * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index e3ddc4d..95978bf 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -3650,7 +3650,7 @@ compile () /* Call into the rest of gcc. For now, we have to assemble command-line options to pass into - toplev_main, so that they can be parsed. */ + toplev::main, so that they can be parsed. */ /* Pass in user-provided "progname", if any, so that it makes it into GCC's "progname" global, used in various diagnostics. */ @@ -3724,25 +3724,15 @@ compile () ADD_ARG ("-fdump-ipa-all"); } - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = false; + toplev toplev (false); - if (time_report || !quiet_flag || flag_detailed_statistics) -timevar_init (); - - timevar_start (TV_TOTAL); - - toplev_main (num_args, const_cast (fake_args), &toplev_opts); - toplev_finalize (); + toplev.main (num_args, const_cast (fake_args)); + toplev.finalize (); active_playback_ctxt = NULL; if (errors_occurred ()) -{ - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return NULL; -} +return NULL; if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE)) dump_generated_code (); @@ -3765,8 +3755,6 @@ compile () if (ret) { timevar_pop (TV_ASSEMBLE); - timevar_stop (TV_TOTAL); - timevar_print (stderr); return NULL; } } @@ -3795,9 +3783,6 @@ compile () timevar_pop (TV_LOAD); } - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return result_obj; } diff --git a/gcc/main.c b/gcc/main.c index b893308..4bba041 100644 --- a/gcc/main.c +++ b/gcc/main.c @@ -1,5 +1,5 @@ /* main.c: defines main() for cc1, cc1plus, etc. - Copyright (C) 2007-2013 Free Software Foundation, Inc. + Copyright (C) 2007-2014 Free Software Foundation, Inc. This file is part of GCC. @@ -26,15 +26,14 @@ along with GCC; see the file COPYING3. If not see int main (int argc, char **argv); -/* We define main() to call toplev_main(), which is defined in toplev.c. +/* We define main() to call toplev::main(), which is defined in toplev.c. We do this in a separate file in order to allow the language front-end to define a different main(), if it so desires. */ int main (int argc, char **argv) { - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = true; + toplev toplev (true); - return toplev_main (argc, argv, &toplev_opts); + return toplev.main (argc, argv); } diff --git a/gcc/toplev.c b/gcc/toplev.c index f1ac560..5284621 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1,5 +1,5 @@ /* Top level of GCC compilers (cc1, cc1plus, etc.) - Copyright (C) 1987-2013 Free Software Foundation, Inc. + Copyright (C) 1987-2014 Free Software
[RFA jit v2 0/2] minor refactorings for reuse
Here's a second revision of my patches to the jit branch to clean up toplev and timevar uses a bit. The first revision was here: http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00895.html Compared with that revision, this one hopefully includes the ChangeLog.jit entries; and I took Trevor's suggestion and renamed the timevar class to "auto_timevar". Tom
[RFA jit v2 2/2] introduce auto_timevar
This introduces a new auto_timevar class. It pushes a given timevar in its constructor, and pops it in the destructor, giving a much simpler way to use timevars in the typical case where they can be scoped. --- gcc/ChangeLog.jit | 4 gcc/jit/ChangeLog.jit | 4 gcc/jit/internal-api.c | 16 +--- gcc/timevar.h | 26 +- 4 files changed, 38 insertions(+), 12 deletions(-) diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index c590ab1..ee1df88 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,5 +1,9 @@ 2014-03-19 Tom Tromey + * timevar.h (auto_timevar): New class. + +2014-03-19 Tom Tromey + * diagnostic.c (bt_stop): Use toplev::main. * main.c (main): Update. * toplev.c (do_compile): Remove argument. Don't check diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index e45d38c..69f2412 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,9 @@ 2014-03-19 Tom Tromey + * internal-api.c (compile): Use auto_timevar. + +2014-03-19 Tom Tromey + * internal-api.c (compile): Use toplev, not toplev_options. Simplify. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 95978bf..090d351 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -3737,8 +3737,6 @@ compile () if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE)) dump_generated_code (); - timevar_push (TV_ASSEMBLE); - /* Gross hacks follow: We have a .s file; we want a .so file. We could reuse parts of gcc/gcc.c to do this. @@ -3746,6 +3744,8 @@ compile () */ /* FIXME: totally faking it for now, not even using pex */ { +auto_timevar assemble_timevar (TV_ASSEMBLE); + char cmd[1024]; snprintf (cmd, 1024, "gcc -shared %s -o %s", m_path_s_file, m_path_so_file); @@ -3753,20 +3753,16 @@ compile () printf ("cmd: %s\n", cmd); int ret = system (cmd); if (ret) - { - timevar_pop (TV_ASSEMBLE); - return NULL; - } + return NULL; } - timevar_pop (TV_ASSEMBLE); // TODO: split out assembles vs linker /* dlopen the .so file. */ { -const char *error; +auto_timevar load_timevar (TV_LOAD); -timevar_push (TV_LOAD); +const char *error; /* Clear any existing error. */ dlerror (); @@ -3779,8 +3775,6 @@ compile () result_obj = new result (handle); else result_obj = NULL; - -timevar_pop (TV_LOAD); } return result_obj; diff --git a/gcc/timevar.h b/gcc/timevar.h index dc2a8bc..f018e39 100644 --- a/gcc/timevar.h +++ b/gcc/timevar.h @@ -1,5 +1,5 @@ /* Timing variables for measuring compiler performance. - Copyright (C) 2000-2013 Free Software Foundation, Inc. + Copyright (C) 2000-2014 Free Software Foundation, Inc. Contributed by Alex Samuel This file is part of GCC. @@ -110,6 +110,30 @@ timevar_pop (timevar_id_t tv) timevar_pop_1 (tv); } +// This is a simple timevar wrapper class that pushes a timevar in its +// constructor and pops the timevar in its destructor. +class auto_timevar +{ + public: + auto_timevar (timevar_id_t tv) +: m_tv (tv) + { +timevar_push (m_tv); + } + + ~auto_timevar () + { +timevar_pop (m_tv); + } + + private: + + // Private to disallow copies. + auto_timevar (const auto_timevar &); + + timevar_id_t m_tv; +}; + extern void print_time (const char *, long); #endif /* ! GCC_TIMEVAR_H */ -- 1.8.5.3
Re: [patch testsuite]: g++.dg/abi
On Mar 19, 2014, at 9:38 AM, Kai Tietz wrote: > 2014-03-19 17:23 GMT+01:00 Mike Stump : >> On Mar 18, 2014, at 6:16 AM, Kai Tietz wrote: >>> this patch skips anon2.C and anon3.C test for mingw target. Issue >>> here is that weak under pe-coff is different to ELF-targets and >>> therefore test doesn't apply for >> >> So, what does the output look like? There should be a trace of weak of some >> sort in the output. > > No, there is none. So, does the target support weak?
Re: [PATCH 2/2, AARCH64] Test case changes: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
On 19 March 2014 17:18, Venkataramanan Kumar wrote: > I used the existing dg-require-effective-target check, > "stack_protector" and added it in a separate line. > > ChangeLog. > > 2014-03-19 Venkataramanan Kumar > * g++.dg/fstack-protector-strong.C: Add effetive target check for > stack protection. > * gcc.dg/fstack-protector-strong.c: Likewise. > > These two tests are passing now for aarch64-none-linux-gnu target under QEMU. Venkat, I think this change is reasonable (for stage-1) but I'd like one of the testsuite maintainers to ACK the change. Cheers /Marcus
Re: [patch testsuite]: g++.dg/abi
2014-03-19 17:54 GMT+01:00 Mike Stump : > On Mar 19, 2014, at 9:49 AM, Rainer Orth > wrote: >>> The concept of weak - as present in ELF - isn't known in COFF in >>> general. There is some weak, but it works only for static library and >>> in a limitted way. Therefore we can't (and don't) use it for COFF >>> targets. >> >> In that case, it seems far better to have >> gcc/testsuite/lib/target-support.exp (check_weak_available) reflect that >> instead of lying about weak support. > > Yeah, this is the direction I was headed... :-) Ok, I will sent a patch for changing target-support.exp. And yes, target supports a kind of weak, but not the expected gnu-weak. Thanks, Kai
[jit] Avoid shadowing progname global
Committed to branch dmalcolm/jit: gcc/jit/ * internal-api.c (gcc::jit::recording::context::add_error_va): Rename local "progname" to "ctxt_progname" to avoid shadowing the related global, for clarity. (gcc::jit::playback::context::compile): Likewise. --- gcc/jit/ChangeLog.jit | 7 +++ gcc/jit/internal-api.c | 22 -- 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index efb1931..265242e 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,12 @@ 2014-03-19 David Malcolm + * internal-api.c (gcc::jit::recording::context::add_error_va): + Rename local "progname" to "ctxt_progname" to avoid shadowing + the related global, for clarity. + (gcc::jit::playback::context::compile): Likewise. + +2014-03-19 David Malcolm + * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: accepts_writes_from): Accept writes from pointers, but not arrays. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index e3ddc4d..819800a 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -610,18 +610,19 @@ recording::context::add_error_va (location *loc, const char *fmt, va_list ap) char buf[1024]; vsnprintf (buf, sizeof (buf) - 1, fmt, ap); - const char *progname = get_str_option (GCC_JIT_STR_OPTION_PROGNAME); - if (!progname) -progname = "libgccjit.so"; + const char *ctxt_progname = +get_str_option (GCC_JIT_STR_OPTION_PROGNAME); + if (!ctxt_progname) +ctxt_progname = "libgccjit.so"; if (loc) fprintf (stderr, "%s: %s: error: %s\n", -progname, +ctxt_progname, loc->get_debug_string (), buf); else fprintf (stderr, "%s: error: %s\n", -progname, +ctxt_progname, buf); if (!m_error_count) @@ -3629,8 +3630,8 @@ playback::context:: compile () { void *handle = NULL; + const char *ctxt_progname; result *result_obj = NULL; - const char *progname; const char *fake_args[20]; unsigned int num_args; @@ -3652,10 +3653,11 @@ compile () For now, we have to assemble command-line options to pass into toplev_main, so that they can be parsed. */ - /* Pass in user-provided "progname", if any, so that it makes it - into GCC's "progname" global, used in various diagnostics. */ - progname = get_str_option (GCC_JIT_STR_OPTION_PROGNAME); - fake_args[0] = progname ? progname : "libgccjit.so"; + /* Pass in user-provided program name as argv0, if any, so that it + makes it into GCC's "progname" global, used in various diagnostics. */ + ctxt_progname = get_str_option (GCC_JIT_STR_OPTION_PROGNAME); + fake_args[0] = +(ctxt_progname ? ctxt_progname : "libgccjit.so"); fake_args[1] = m_path_c_file; num_args = 2; -- 1.8.5.3
Re: [RFA jit v2 1/2] introduce class toplev
> "Tom" == Tom Tromey writes: Tom> This patch introduces a new "class toplev" and changes toplev_main and Tom> toplev_finalize to be methods of this class. Additionally, now the Tom> timevars are automatically stopped when the object is destroyed. This Tom> cleans up "compile" a bit and makes it simpler to reuse the toplev Tom> logic in other code. David asked me off-list to rename the field in class toplev, so here's a new patch that does this. Tom commit 66f92863ef55c26f673d02dd39027f340940a3bf Author: Tom Tromey Date: Tue Mar 18 08:07:40 2014 -0600 introduce class toplev This patch introduces a new "class toplev" and changes toplev_main and toplev_finalize to be methods of this class. Additionally, now the timevars are automatically stopped when the object is destroyed. This cleans up "compile" a bit and makes it simpler to reuse the toplev logic in other code. diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index 77ac44c..c590ab1 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,3 +1,17 @@ +2014-03-19 Tom Tromey + + * diagnostic.c (bt_stop): Use toplev::main. + * main.c (main): Update. + * toplev.c (do_compile): Remove argument. Don't check + use_TV_TOTAL. + (toplev::toplev, toplev::~toplev, toplev::start_timevars): New + functions. + (toplev::main): Rename from toplev_main. Update. + (toplev::finalize): Rename from toplev_finalize. Update. + * toplev.h (class toplev): New. + (struct toplev_options): Remove. + (toplev_main, toplev_finalize): Don't declare. + 2014-03-11 David Malcolm * gcse.c (gcse_c_finalize): New, to clear test_insn between diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..56dc3ac 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -333,7 +333,7 @@ diagnostic_show_locus (diagnostic_context * context, static const char * const bt_stop[] = { "main", - "toplev_main", + "toplev::main", "execute_one_pass", "compile_file", }; diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index efb1931..e45d38c 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,8 @@ +2014-03-19 Tom Tromey + + * internal-api.c (compile): Use toplev, not toplev_options. + Simplify. + 2014-03-19 David Malcolm * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index e3ddc4d..95978bf 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -3650,7 +3650,7 @@ compile () /* Call into the rest of gcc. For now, we have to assemble command-line options to pass into - toplev_main, so that they can be parsed. */ + toplev::main, so that they can be parsed. */ /* Pass in user-provided "progname", if any, so that it makes it into GCC's "progname" global, used in various diagnostics. */ @@ -3724,25 +3724,15 @@ compile () ADD_ARG ("-fdump-ipa-all"); } - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = false; + toplev toplev (false); - if (time_report || !quiet_flag || flag_detailed_statistics) -timevar_init (); - - timevar_start (TV_TOTAL); - - toplev_main (num_args, const_cast (fake_args), &toplev_opts); - toplev_finalize (); + toplev.main (num_args, const_cast (fake_args)); + toplev.finalize (); active_playback_ctxt = NULL; if (errors_occurred ()) -{ - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return NULL; -} +return NULL; if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE)) dump_generated_code (); @@ -3765,8 +3755,6 @@ compile () if (ret) { timevar_pop (TV_ASSEMBLE); - timevar_stop (TV_TOTAL); - timevar_print (stderr); return NULL; } } @@ -3795,9 +3783,6 @@ compile () timevar_pop (TV_LOAD); } - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return result_obj; } diff --git a/gcc/main.c b/gcc/main.c index b893308..4bba041 100644 --- a/gcc/main.c +++ b/gcc/main.c @@ -1,5 +1,5 @@ /* main.c: defines main() for cc1, cc1plus, etc. - Copyright (C) 2007-2013 Free Software Foundation, Inc. + Copyright (C) 2007-2014 Free Software Foundation, Inc. This file is part of GCC. @@ -26,15 +26,14 @@ along with GCC; see the file COPYING3. If not see int main (int argc, char **argv); -/* We define main() to call toplev_main(), which is defined in toplev.c. +/* We define main() to call toplev::main(), which is defined in toplev.c. We do this in a separate file in order to allow the language front-end to define a different main(), if it so desires. */ int main (int argc, char **argv) { - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = true; + toplev toplev (true); - return toplev_main (argc, argv, &toplev_opts); + return toplev.main (argc, argv); } diff --git a/gcc/toplev.c b/gcc/tople
[PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
This is a follow-on patch to one already committed: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html It implements patterns to simplify our RTL as follows: OR (Not:DI (A:DI), ZeroExtend:DI (B:SI)) --> the top half can be done with a MVN AND (Not:DI (A:DI), ZeroExtend:DI (B:SI)) --> the top half becomes zero. I've added test cases for both of these and also the existing anddi_notdi patterns. The tests all pass. Full regression runs passed. OK for stage 1? Cheers, Ian 2014-03-19 Ian Bolton gcc/ * config/arm/arm.md (*anddi_notdi_zesidi): New pattern * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern. testsuite/ * gcc.target/arm/anddi_notdi-1.c: New test. * gcc.target/arm/iordi_notdi-1.c: New test case. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..d2d85ee 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr "type" "multiple")] ) +(define_insn_and_split "*anddi_notdi_zesidi" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") +(and:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r")) +(zero_extend:DI + (match_operand:SI 1 "s_register_operand" "r,r"] + "TARGET_32BIT" + "#" + "TARGET_32BIT && reload_completed" + [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int 0))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[2] = gen_lowpart (SImode, operands[2]); + }" + [(set_attr "length" "8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + (define_insn_and_split "*anddi_notsesidi_di" [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") (and:DI (not:DI (sign_extend:DI diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 467c619..10bc8b1 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1418,6 +1418,30 @@ (set_attr "type" "multiple")] ) +(define_insn_and_split "*iordi_notdi_zesidi" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") + (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r")) + (zero_extend:DI +(match_operand:SI 1 "s_register_operand" "r,r"] + "TARGET_THUMB2" + "#" + "TARGET_THUMB2 && reload_completed" + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (not:SI (match_dup 4)))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[4] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + }" + [(set_attr "length" "8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + (define_insn_and_split "*iordi_notsesidi_di" [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") (ior:DI (not:DI (sign_extend:DI diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c new file mode 100644 index 000..cfb33fc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline --save-temps" } */ + +extern void abort (void); + +typedef long long s64int; +typedef int s32int; +typedef unsigned long long u64int; +typedef unsigned int u32int; + +s64int +anddi_di_notdi (s64int a, s64int b) +{ + return (a & ~b); +} + +s64int +anddi_di_notzesidi (s64int a, u32int b) +{ + return (a & ~(u64int) b); +} + +s64int +anddi_notdi_zesidi (s64int a, u32int b) +{ + return (~a & (u64int) b); +} + +s64int +anddi_di_notsesidi (s64int a, s32int b) +{ + return (a & ~(s64int) b); +} + +int main () +{ + s64int a64 = 0xdeadbeefll; + s64int b64 = 0x5f470112ll; + s64int c64 = 0xdeadbeef300fll; + + u32int c32 = 0x01124f4f; + s32int d32 = 0xabbaface; + + s64int z = anddi_di_notdi (c64, b64); + if (z != 0xdeadbeef2008ll) +abort (); + + z = anddi_di_notzesidi (a64, c32); + if (z != 0xdeadbeefb0b0ll) +abort (); + + z = anddi_notdi_zesidi (c64, c32); + if (z != 0x01104f4fll) +abort (); + + z = anddi_di_notsesidi (a64, d32); + if (z != 0x0531ll) +abort (); + + return 0; +} + +/* { dg-final { scan-assembler-times "bic\t" 6 } } */ + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c index cda9c0e..249f080 100644 --- a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c +++ b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c @@ -9,19 +9,25 @@ typedef unsigned long long u64int; typedef unsigned int u32int; s64int -iordi_notdi (s64int a, s64int b) +iordi_di_notdi
Re: [PATCH] Fix PR60505
On Tue, Mar 18, 2014 at 4:43 AM, Richard Biener wrote: > > On Mon, 17 Mar 2014, Cong Hou wrote: > > > On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener wrote: > > > On Fri, 14 Mar 2014, Cong Hou wrote: > > > > > >> On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener > > >> wrote: > > >> > On Fri, 14 Mar 2014, Jakub Jelinek wrote: > > >> > > > >> >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: > > >> >> > > Consider this fact and if there are alias checks, we can safely > > >> >> > > remove > > >> >> > > the epilogue if the maximum trip count of the loop is less than or > > >> >> > > equal to the calculated threshold. > > >> >> > > > >> >> > You have to consider n % vf != 0, so an argument on only maximum > > >> >> > trip count or threshold cannot work. > > >> >> > > >> >> Well, if you only check if maximum trip count is <= vf and you know > > >> >> that for n < vf the vectorized loop + it's epilogue path will not be > > >> >> taken, > > >> >> then perhaps you could, but it is a very special case. > > >> >> Now, the question is when we are guaranteed we enter the scalar > > >> >> versioned > > >> >> loop instead for n < vf, is that in case of versioning for alias or > > >> >> versioning for alignment? > > >> > > > >> > I think neither - I have plans to do the cost model check together > > >> > with the versioning condition but didn't get around to implement that. > > >> > That would allow stronger max bounds for the epilogue loop. > > >> > > >> In vect_transform_loop(), check_profitability will be set to true if > > >> th >= VF-1 and the number of iteration is unknown (we only consider > > >> unknown trip count here), where th is calculated based on the > > >> parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum > > >> value VF-1. If the loop needs to be versioned, then > > >> check_profitability with true value will be passed to > > >> vect_loop_versioning(), in which an enhanced loop bound check > > >> (considering cost) will be built. So I think if the loop is versioned > > >> and n < VF, then we must enter the scalar version, and in this case > > >> removing epilogue should be safe when the maximum trip count <= th+1. > > > > > > You mean exactly in the case where the profitability check ensures > > > that n % vf == 0? Thus effectively if n == maximum trip count? > > > That's quite a special case, no? > > > > > > Yes, it is a special case. But it is in this special case that those > > warnings are thrown out. Also, I think declaring an array with VF*N as > > length is not unusual. > > Ok, but then for the patch compute the cost model threshold once > in vect_analyze_loop_2 and store it in a new > LOOP_VINFO_COST_MODEL_THRESHOLD. Done. > Also you have to check > the return value from max_stmt_executions_int as that may return > -1 if the number cannot be computed (or isn't representable in > a HOST_WIDE_INT). It will be converted to unsigned type so that -1 means infinity. > You also should check for > LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT which should have the > same effect on the cost model check. Done. > > > The existing condition is already complicated enough - adding new > stuff warrants comments before the (sub-)checks. OK. Comments added. Below is the revised patch. Bootstrapped and tested on a x86-64 machine. Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e1d8666..eceefb3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,18 @@ +2014-03-11 Cong Hou + + PR tree-optimization/60505 + * tree-vectorizer.h (struct _stmt_vec_info): Add th field as the + threshold of number of iterations below which no vectorization will be + done. + * tree-vect-loop.c (new_loop_vec_info): + Initialize LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_analyze_loop_operations): + Set LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_transform_loop): + Use LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_analyze_loop_2): Check the maximum number + of iterations of the loop and see if we should build the epilogue. + 2014-03-10 Jakub Jelinek PR ipa/60457 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 41b6875..09ec1c0 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-03-11 Cong Hou + + PR tree-optimization/60505 + * gcc.dg/vect/pr60505.c: New test. + 2014-03-10 Jakub Jelinek PR ipa/60457 diff --git a/gcc/testsuite/gcc.dg/vect/pr60505.c b/gcc/testsuite/gcc.dg/vect/pr60505.c new file mode 100644 index 000..6940513 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr60505.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Wall -Werror" } */ + +void foo(char *in, char *out, int num) +{ + int i; + char ovec[16] = {0}; + + for(i = 0; i < num ; ++i) +out[i] = (ovec[i] = in[i]); + out[num] = ovec[num/2]; +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index df6ab6f..1c78e11 100644 --- a/gcc/tree-vect-loop.c ++
[4.8, PATCH 0/26] Backport Power8 and LE support
Hi, Support for Power8 features and the new powerpc64le-linux-gnu target, including the ELFv2 ABI, has been developed up till now on the ibm/gcc-4_8-branch. It was appropriate to use this separate branch while the support was unstable, but this branch will not represent a particularly good support mechanism for distributions going forward. Most distros are set up to pull from the major release branches, and having a separate branch for one target is quite inconvenient. Also, the ibm/gcc-4_8-branch's original purpose is to serve as the code base for IBM's Advance Toolchain 7.0. Over time the two purposes that the branch currently serves will diverge and make things even more complicated. The code is now tested and stable enough that we are ready to backport this support to the FSF 4.8 branch. This patch series constitutes that backport. Almost all of the changes are specific to PowerPC portions of the code, and for those patches I am only CCing David. However, some of the patches require changes to common code, and for these I will CC Richard and Jakub. Three of these are slightly unrelated but necessary patches, one to enable decimal float ABS builtins, and two others to fix PR54537 and PR56843. In addition there are patches that update configuration files throughout for the new target, and some small changes in common call support (call.c, expr.h, function.c) to support how the new ABI handles calls. I realize it is unusual to backport such a large amount of code, but we have been asked by distribution partners to do this, and we feel it makes good sense for long-term support. I have tested the patch series by applying it to a clean FSF 4.8 branch and comparing the test results against those from the IBM 4.8 branch on three systems: * Power8, little endian (--mcpu=power8) * Power8, big endian (--mcpu=power8) * Power7, big endian (--mcpu=power7) I also checked a recursive diff against the two source directories to ensure that no patches were missed. Thanks, Bill [ 1/26] diff-p8 [ 2/26] diff-p8-htm [ 3/26] diff-le-config [ 4/26] diff-le-libtool [ 5/26] diff-le-tests [ 6/26] diff-le-dfp [ 7/26] diff-le-vector [ 8/26] diff-abi-compat [ 9/26] diff-abi-calls [10/26] diff-abi-elfv2 [11/26] diff-abi-gotest [12/26] diff-le-align [13/26] diff-abi-libffi [14/26] diff-dfp-abs [15/26] diff-pr54537 [16/26] diff-pr56843 [17/26] diff-direct-move [18/26] diff-le-config-2 [19/26] diff-quad-memory [20/26] diff-lra [21/26] diff-le-vector-api [22/26] diff-mcall [23/26] diff-pr60137-pr60203 [24/26] diff-reload [25/26] diff-v1ti [26/26] diff-trunk-missing
[4.8, PATCH 3/26] Backport Power8 and LE support: Configury bits 1
Hi, This patch (diff-le-config) backports updates to more recent config.guess and config.sub versions to support the new powerpc64le target. Thanks, Bill 2014-03-29 Bill Schmidt Backport from mainline r203071: 2013-10-01 Joern Rennecke Import from savannah.gnu.org: * config.guess: Update to 2013-06-10 version. * config.sub: Update to 2013-10-01 version. Index: gcc-4_8-branch/config.guess === --- gcc-4_8-branch.orig/config.guess2013-12-28 17:41:32.765630566 +0100 +++ gcc-4_8-branch/config.guess 2013-12-28 17:50:37.995329461 +0100 @@ -1,10 +1,8 @@ #! /bin/sh # Attempt to guess a canonical system name. -# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, -# 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, -# 2011, 2012, 2013 Free Software Foundation, Inc. +# Copyright 1992-2013 Free Software Foundation, Inc. -timestamp='2012-12-30' +timestamp='2013-06-10' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -52,9 +50,7 @@ version="\ GNU config.guess ($timestamp) Originally written by Per Bothner. -Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, -2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, -2012, 2013 Free Software Foundation, Inc. +Copyright 1992-2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." @@ -136,6 +132,27 @@ UNAME_RELEASE=`(uname -r) 2>/dev/null` | UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown +case "${UNAME_SYSTEM}" in +Linux|GNU|GNU/*) + # If the system lacks a compiler, then just pick glibc. + # We could probably try harder. + LIBC=gnu + + eval $set_cc_for_build + cat <<-EOF > $dummy.c + #include + #if defined(__UCLIBC__) + LIBC=uclibc + #elif defined(__dietlibc__) + LIBC=dietlibc + #else + LIBC=gnu + #endif + EOF + eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep '^LIBC'` + ;; +esac + # Note: order is significant - the case branches are not exclusive. case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in @@ -857,21 +874,21 @@ EOF exit ;; *:GNU:*:*) # the GNU system - echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-gnu`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'` + echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-${LIBC}`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'` exit ;; *:GNU/*:*:*) # other systems with GNU libc and userland - echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-gnu + echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC} exit ;; i*86:Minix:*:*) echo ${UNAME_MACHINE}-pc-minix exit ;; aarch64:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; aarch64_be:Linux:*:*) UNAME_MACHINE=aarch64_be - echo ${UNAME_MACHINE}-unknown-linux-gnu + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; alpha:Linux:*:*) case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' < /proc/cpuinfo` in @@ -884,59 +901,54 @@ EOF EV68*) UNAME_MACHINE=alphaev68 ;; esac objdump --private-headers /bin/sh | grep -q ld.so.1 - if test "$?" = 0 ; then LIBC="libc1" ; else LIBC="" ; fi - echo ${UNAME_MACHINE}-unknown-linux-gnu${LIBC} + if test "$?" = 0 ; then LIBC="gnulibc1" ; fi + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} + exit ;; +arc:Linux:*:* | arceb:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; arm*:Linux:*:*) eval $set_cc_for_build if echo __ARM_EABI__ | $CC_FOR_BUILD -E - 2>/dev/null \ | grep -q __ARM_EABI__ then - echo ${UNAME_MACHINE}-unknown-linux-gnu + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} else if echo __ARM_PCS_VFP | $CC_FOR_BUILD -E - 2>/dev/null \ | grep -q __ARM_PCS_VFP then - echo ${UNAME_MACHINE}-unknown-linux-gnueabi + echo ${UNAME_MACHINE}-unknown-linux-${LIBC}eabi else - echo ${UNAME_MACHINE}-unknown-linux-gnueabihf + echo ${UNAME_MACHINE}-unknown-linux-${LIBC}eabihf fi fi exit ;; avr32*:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu + ech
[4.8, PATCH 5/26] Backport Power8 and LE support: Test adjustments
Hi, This patch (diff-le-tests) backports adjustments to a few tests for powerpc64le and the ELFv2 ABI. Thanks, Bill 2014-03-29 Bill Schmidt Backport from mainline 2013-11-27 Bill Schmidt * gfortran.dg/nan_7.f90: Disable for little endian PowerPC. Backport from mainline r205106: 2013-11-20 Ulrich Weigand * gcc.target/powerpc/darwin-longlong.c (msw): Make endian-safe. Backport from mainline r205046: 2013-11-19 Ulrich Weigand * gcc.target/powerpc/ppc64-abi-2.c (MAKE_SLOT): New macro to construct parameter slot value in endian-independent way. (fcevv, fciievv, fcvevv): Use it. Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:50:39.655337721 +0100 @@ -119,6 +119,12 @@ typedef union vector int v; } vector_int_t; +#ifdef __LITTLE_ENDIAN__ +#define MAKE_SLOT(x, y) ((long)x | ((long)y << 32)) +#else +#define MAKE_SLOT(x, y) ((long)y | ((long)x << 32)) +#endif + /* Paramter passing. s : gpr 3 v : vpr 2 @@ -226,8 +232,8 @@ fcevv (char *s, ...) sp = __builtin_frame_address(0); sp = sp->backchain; - if (sp->slot[2].l != 0x10002ULL - || sp->slot[4].l != 0x50006ULL) + if (sp->slot[2].l != MAKE_SLOT (1, 2) + || sp->slot[4].l != MAKE_SLOT (5, 6)) abort(); } @@ -268,8 +274,8 @@ fciievv (char *s, int i, int j, ...) sp = __builtin_frame_address(0); sp = sp->backchain; - if (sp->slot[4].l != 0x10002ULL - || sp->slot[6].l != 0x50006ULL) + if (sp->slot[4].l != MAKE_SLOT (1, 2) + || sp->slot[6].l != MAKE_SLOT (5, 6)) abort(); } @@ -296,8 +302,8 @@ fcvevv (char *s, vector int x, ...) sp = __builtin_frame_address(0); sp = sp->backchain; - if (sp->slot[4].l != 0x10002ULL - || sp->slot[6].l != 0x50006ULL) + if (sp->slot[4].l != MAKE_SLOT (1, 2) + || sp->slot[6].l != MAKE_SLOT (5, 6)) abort(); } Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:50:39.659337741 +0100 @@ -11,7 +11,11 @@ int msw(long long in) int i[2]; } ud; ud.ll = in; +#ifdef __LITTLE_ENDIAN__ + return ud.i[1]; +#else return ud.i[0]; +#endif } int main() Index: gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 === --- gcc-4_8-branch.orig/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:50:39.662337756 +0100 @@ -2,6 +2,7 @@ ! { dg-options "-fno-range-check" } ! { dg-require-effective-target fortran_real_16 } ! { dg-require-effective-target fortran_integer_16 } +! { dg-skip-if "" { "powerpc*le-*-*" } { "*" } { "" } } ! PR47293 NAN not correctly read character(len=200) :: str real(16) :: r
[4.8, PATCH 8/26] Backport Power8 and LE support: PR57949
Hi, This patch (diff-abi-compat) backports the ABI compatibility fix for PR57949. Thanks, Bill [gcc] 2014-03-29 Bill Schmidt Backport from mainline r201750. 2013-11-15 Ulrich Weigand Note: Default setting of -mcompat-align-parm inverted! 2013-08-14 Bill Schmidt PR target/57949 * doc/invoke.texi: Add documentation of mcompat-align-parm option. * config/rs6000/rs6000.opt: Add mcompat-align-parm option. * config/rs6000/rs6000.c (rs6000_function_arg_boundary): For AIX and Linux, correct BLKmode alignment when 128-bit alignment is required and compatibility flag is not set. (rs6000_gimplify_va_arg): For AIX and Linux, honor specified alignment for zero-size arguments when compatibility flag is not set. [gcc/testsuite] 2014-03-29 Bill Schmidt Backport from mainline r201750. 2013-11-15 Ulrich Weigand Note: Default setting of -mcompat-align-parm inverted! 2013-08-14 Bill Schmidt PR target/57949 * gcc.target/powerpc/pr57949-1.c: New. * gcc.target/powerpc/pr57949-2.c: New. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -8680,8 +8680,8 @@ rs6000_function_arg_boundary (enum machi || (type && TREE_CODE (type) == VECTOR_TYPE && int_size_in_bytes (type) >= 16)) return 128; - else if (TARGET_MACHO - && rs6000_darwin64_abi + else if (((TARGET_MACHO && rs6000_darwin64_abi) +|| (DEFAULT_ABI == ABI_AIX && !rs6000_compat_align_parm)) && mode == BLKmode && type && TYPE_ALIGN (type) > 64) return 128; @@ -10233,8 +10233,9 @@ rs6000_gimplify_va_arg (tree valist, tre We don't need to check for pass-by-reference because of the test above. We can return a simplifed answer, since we know there's no offset to add. */ - if (TARGET_MACHO - && rs6000_darwin64_abi + if (((TARGET_MACHO +&& rs6000_darwin64_abi) + || (DEFAULT_ABI == ABI_AIX && !rs6000_compat_align_parm)) && integer_zerop (TYPE_SIZE (type))) { unsigned HOST_WIDE_INT align, boundary; Index: gcc-4_8-test/gcc/config/rs6000/rs6000.opt === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.opt +++ gcc-4_8-test/gcc/config/rs6000/rs6000.opt @@ -550,6 +550,10 @@ mquad-memory Target Report Mask(QUAD_MEMORY) Var(rs6000_isa_flags) Generate the quad word memory instructions (lq/stq/lqarx/stqcx). +mcompat-align-parm +Target Report Var(rs6000_compat_align_parm) Init(1) Save +Generate aggregate parameter passing code with at most 64-bit alignment. + mupper-regs-df Target Undocumented Mask(UPPER_REGS_DF) Var(rs6000_isa_flags) Allow double variables in upper registers with -mcpu=power7 or -mvsx Index: gcc-4_8-test/gcc/doc/invoke.texi === --- gcc-4_8-test.orig/gcc/doc/invoke.texi +++ gcc-4_8-test/gcc/doc/invoke.texi @@ -17243,7 +17243,8 @@ following options: -mpopcntb -mpopcntd -mpowerpc64 @gol -mpowerpc-gpopt -mpowerpc-gfxopt -msingle-float -mdouble-float @gol -msimple-fpu -mstring -mmulhw -mdlmzb -mmfpgpr -mvsx @gol --mcrypto -mdirect-move -mpower8-fusion -mpower8-vector -mquad-memory} +-mcrypto -mdirect-move -mpower8-fusion -mpower8-vector -mquad-memory @gol +-mcompat-align-parm -mno-compat-align-parm} The particular options set for any particular CPU varies between compiler versions, depending on what setting seems to produce optimal @@ -18128,6 +18129,23 @@ stack location in the function prologue a pointer on AIX and 64-bit Linux systems. If the TOC value is not saved in the prologue, it is saved just before the call through the pointer. The @option{-mno-save-toc-indirect} option is the default. + +@item -mcompat-align-parm +@itemx -mno-compat-align-parm +@opindex mcompat-align-parm +Generate (do not generate) code to pass structure parameters with a +maximum alignment of 64 bits, for compatibility with older versions +of GCC. + +Older versions of GCC (prior to 4.9.0) incorrectly did not align a +structure parameter on a 128-bit boundary when that structure contained +a member requiring 128-bit alignment. This is corrected in more +recent versions of GCC. This option may be used to generate code +that is compatible with functions compiled with older versions of +GCC. + +In this version of the compiler, the @option{-mcompat-align-parm} +is the default, except when using the Linux ELFv2 ABI. @end table @node RX Options Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr57949-1.c === --- /dev/null +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr57949-1.c @@ -0,0 +1,19 @@ +/* {
[4.8, PATCH 9/26] Backport Power8 and LE support: ABI call support
Hi, This patch (diff-abi-calls) backports fixes to common code to support the new ELFv2 ABI. Copying Richard and Jakub for these bits. Thanks, Bill 2014-03-29 Bill Schmidt Backport from mainline r204798: 2013-11-14 Ulrich Weigand Alan Modra * function.c (assign_parms): Use all.reg_parm_stack_space instead of re-evaluating REG_PARM_STACK_SPACE target macro. (locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE. Use it instead of evaluating target macro REG_PARM_STACK_SPACE every time. (assign_parm_find_entry_rtl): Update call. * calls.c (initialize_argument_information): Update call. (emit_library_call_value_1): Likewise. * expr.h (locate_and_pad_parm): Update prototype. Backport from mainline r204797: 2013-11-14 Ulrich Weigand * calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL arguments. Backport from mainline r197003: 2013-03-23 Eric Botcazou * calls.c (expand_call): Add missing guard to code handling return of non-BLKmode structures in MSB. * function.c (expand_function_end): Likewise. Index: gcc-4_8-branch/gcc/calls.c === --- gcc-4_8-branch.orig/gcc/calls.c 2013-12-28 17:41:32.056627059 +0100 +++ gcc-4_8-branch/gcc/calls.c 2013-12-28 17:50:43.356356135 +0100 @@ -983,6 +983,7 @@ store_unaligned_arguments_into_pseudos ( for (i = 0; i < num_actuals; i++) if (args[i].reg != 0 && ! args[i].pass_on_stack + && GET_CODE (args[i].reg) != PARALLEL && args[i].mode == BLKmode && MEM_P (args[i].value) && (MEM_ALIGN (args[i].value) @@ -1327,6 +1328,7 @@ initialize_argument_information (int num #else args[i].reg != 0, #endif +reg_parm_stack_space, args[i].pass_on_stack ? 0 : args[i].partial, fndecl, args_size, &args[i].locate); #ifdef BLOCK_REG_PADDING @@ -3171,7 +3173,9 @@ expand_call (tree exp, rtx target, int i group load/store machinery below. */ if (!structure_value_addr && !pcc_struct_value + && TYPE_MODE (rettype) != VOIDmode && TYPE_MODE (rettype) != BLKmode + && REG_P (valreg) && targetm.calls.return_in_msb (rettype)) { if (shift_return_value (TYPE_MODE (rettype), false, valreg)) @@ -3734,7 +3738,8 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - 0, NULL_TREE, &args_size, &argvec[count].locate); + reg_parm_stack_space, 0, + NULL_TREE, &args_size, &argvec[count].locate); if (argvec[count].reg == 0 || argvec[count].partial != 0 || reg_parm_stack_space > 0) @@ -3821,7 +3826,7 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - argvec[count].partial, + reg_parm_stack_space, argvec[count].partial, NULL_TREE, &args_size, &argvec[count].locate); args_size.constant += argvec[count].locate.size.constant; gcc_assert (!argvec[count].locate.size.var); Index: gcc-4_8-branch/gcc/function.c === --- gcc-4_8-branch.orig/gcc/function.c 2013-12-28 17:41:32.056627059 +0100 +++ gcc-4_8-branch/gcc/function.c 2013-12-28 17:50:43.362356165 +0100 @@ -2507,6 +2507,7 @@ assign_parm_find_entry_rtl (struct assig } locate_and_pad_parm (data->promoted_mode, data->passed_type, in_regs, + all->reg_parm_stack_space, entry_parm ? data->partial : 0, current_function_decl, &all->stack_args_size, &data->locate); @@ -3485,11 +3486,7 @@ assign_parms (tree fndecl) /* Adjust function incoming argument size for alignment and minimum length. */ -#ifdef REG_PARM_STACK_SPACE - crtl->args.size = MAX (crtl->args.size, - REG_PARM_STACK_SPACE (fndecl)); -#endif - + crtl->args.size = MAX (crtl->args.size, all.reg_parm_stack_space); crtl->args.size = CEIL_ROUND (crtl->args.size, PARM_BOUNDARY / BITS_PER_UNIT); @@ -3693,6 +3690,9 @@ gimplify_parameters (void) IN_REGS is nonzero if the argument will be passed in registers. It will never be set if REG_PARM_STACK_SPACE is not defined. + REG_PARM_STACK_SPACE is the number of bytes of stack space reserved + for arguments which are passed in registers. + FNDECL is the function in which the argument was defined. There are two types of rounding that are done. The first, controlled by @@ -3713,19 +3713,16 @@ g
Re: [RFA jit v2 1/2] introduce class toplev
On Wed, 2014-03-19 at 12:10 -0600, Tom Tromey wrote: > > "Tom" == Tom Tromey writes: > > Tom> This patch introduces a new "class toplev" and changes toplev_main and > Tom> toplev_finalize to be methods of this class. Additionally, now the > Tom> timevars are automatically stopped when the object is destroyed. This > Tom> cleans up "compile" a bit and makes it simpler to reuse the toplev > Tom> logic in other code. > > David asked me off-list to rename the field in class toplev, so here's a > new patch that does this. Thanks! (yes, I greatly prefer having member data of a class to have a "m_" prefix, and for the ctor params to have equivalent names, without the prefix, which this patch does, for "toplev"). > Tom > > commit 66f92863ef55c26f673d02dd39027f340940a3bf > Author: Tom Tromey > Date: Tue Mar 18 08:07:40 2014 -0600 > > introduce class toplev > > This patch introduces a new "class toplev" and changes toplev_main and > toplev_finalize to be methods of this class. Additionally, now the > timevars are automatically stopped when the object is destroyed. This > cleans up "compile" a bit and makes it simpler to reuse the toplev > logic in other code. OK. Are you able to push this to my branch, or do you need me to do this?
[4.8, PATCH 11/26] Backport Power8 and LE support: gotest
Hi, This patch (diff-abi-gotest) backports enablement of the Go testsuite for powerpc64le. Thanks, Bill 2014-03-29 Bill Schmidt Backport from mainline r205000. 2013-11-19 Ulrich Weigand gotest: Recognize PPC ELF v2 function pointers in text section. Index: gcc-4_8-branch/libgo/testsuite/gotest === --- gcc-4_8-branch.orig/libgo/testsuite/gotest 2013-12-28 17:41:31.783625708 +0100 +++ gcc-4_8-branch/libgo/testsuite/gotest 2013-12-28 17:50:45.671367653 +0100 @@ -369,7 +369,7 @@ localname() { { text="T" case "$GOARCH" in - ppc64) text="D" ;; + ppc64) text="[TD]" ;; esac symtogo='sed -e s/_test/XXXtest/ -e s/.*_\([^_]*\.\)/\1/ -e s/XXXtest/_test/'
[4.8, PATCH 12/26] Backport Power8 and LE support: Defaults
Hi, This patch (diff-le-align) sets some miscellaneous defaults for little endian support. Thanks, Bill 2014-03-29 Bill Schmidt Apply mainline r205060. 2013-11-20 Alan Modra * config/rs6000/sysv4.h (CC1_ENDIAN_LITTLE_SPEC): Define as empty. * config/rs6000/rs6000.c (rs6000_option_override_internal): Default to strict alignment on older processors when little-endian. * config/rs6000/linux64.h (PROCESSOR_DEFAULT64): Default to power8 for ELFv2. Index: gcc-4_8-branch/gcc/config/rs6000/linux64.h === --- gcc-4_8-branch.orig/gcc/config/rs6000/linux64.h 2013-12-28 17:50:44.252360594 +0100 +++ gcc-4_8-branch/gcc/config/rs6000/linux64.h 2013-12-28 17:50:46.356371060 +0100 @@ -71,7 +71,11 @@ extern int dot_symbols; #undef PROCESSOR_DEFAULT #define PROCESSOR_DEFAULT PROCESSOR_POWER7 #undef PROCESSOR_DEFAULT64 +#ifdef LINUX64_DEFAULT_ABI_ELFv2 +#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8 +#else #define PROCESSOR_DEFAULT64 PROCESSOR_POWER7 +#endif /* We don't need to generate entries in .fixup, except when -mrelocatable or -mrelocatable-lib is given. */ Index: gcc-4_8-branch/gcc/config/rs6000/rs6000.c === --- gcc-4_8-branch.orig/gcc/config/rs6000/rs6000.c 2013-12-28 17:50:44.219360429 +0100 +++ gcc-4_8-branch/gcc/config/rs6000/rs6000.c 2013-12-28 17:50:46.369371125 +0100 @@ -3206,6 +3206,12 @@ rs6000_option_override_internal (bool gl } } + /* If little-endian, default to -mstrict-align on older processors. + Testing for htm matches power8 and later. */ + if (!BYTES_BIG_ENDIAN + && !(processor_target_table[tune_index].target_enable & OPTION_MASK_HTM)) +rs6000_isa_flags |= ~rs6000_isa_flags_explicit & OPTION_MASK_STRICT_ALIGN; + /* Add some warnings for VSX. */ if (TARGET_VSX) { Index: gcc-4_8-branch/gcc/config/rs6000/sysv4.h === --- gcc-4_8-branch.orig/gcc/config/rs6000/sysv4.h 2013-12-28 17:50:44.243360549 +0100 +++ gcc-4_8-branch/gcc/config/rs6000/sysv4.h2013-12-28 17:50:46.374371150 +0100 @@ -538,12 +538,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEF #defineCC1_ENDIAN_BIG_SPEC "" -#defineCC1_ENDIAN_LITTLE_SPEC "\ -%{!mstrict-align: %{!mno-strict-align: \ -%{!mcall-i960-old: \ - -mstrict-align \ -} \ -}}" +#defineCC1_ENDIAN_LITTLE_SPEC "" #defineCC1_ENDIAN_DEFAULT_SPEC "%(cc1_endian_big)"
[4.8, PATCH 14/26] Backport Power8 and LE support: DFP absolute value
Hi, This patch (diff-dfp-abs) backports some unrelated but necessary work to enable the DFP absolute value builtins. Copying Jakub who was involved with the original patch. Thanks, Bill 2014-03-29 Bill Schmidt Backport from mainline 2013-08-19 Peter Bergner Jakub Jelinek * builtins.def (BUILT_IN_FABSD32): New DFP ABS builtin. (BUILT_IN_FABSD64): Likewise. (BUILT_IN_FABSD128): Likewise. * builtins.c (expand_builtin): Add support for new DFP ABS builtins. (fold_builtin_1): Likewise. * config/rs6000/dfp.md (*abstd2_fpr): Handle non-overlapping destination and source operands. (*nabstd2_fpr): Likewise. 2014-03-29 Bill Schmidt Backport from mainline 2013-08-19 Peter Bergner * gcc.target/powerpc/dfp-dd-2.c: New test. * gcc.target/powerpc/dfp-td-2.c: Likewise. * gcc.target/powerpc/dfp-td-3.c: Likewise. Index: gcc-4_8-test/gcc/builtins.c === --- gcc-4_8-test.orig/gcc/builtins.c +++ gcc-4_8-test/gcc/builtins.c @@ -5861,6 +5861,9 @@ expand_builtin (tree exp, rtx target, rt switch (fcode) { CASE_FLT_FN (BUILT_IN_FABS): +case BUILT_IN_FABSD32: +case BUILT_IN_FABSD64: +case BUILT_IN_FABSD128: target = expand_builtin_fabs (exp, target, subtarget); if (target) return target; @@ -10313,6 +10316,9 @@ fold_builtin_1 (location_t loc, tree fnd return fold_builtin_strlen (loc, type, arg0); CASE_FLT_FN (BUILT_IN_FABS): +case BUILT_IN_FABSD32: +case BUILT_IN_FABSD64: +case BUILT_IN_FABSD128: return fold_builtin_fabs (loc, arg0, type); case BUILT_IN_ABS: Index: gcc-4_8-test/gcc/builtins.def === --- gcc-4_8-test.orig/gcc/builtins.def +++ gcc-4_8-test/gcc/builtins.def @@ -252,6 +252,9 @@ DEF_C99_BUILTIN(BUILT_IN_EXPM1L, DEF_LIB_BUILTIN(BUILT_IN_FABS, "fabs", BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSF, "fabsf", BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSL, "fabsl", BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_FABSD32, "fabsd32", BT_FN_DFLOAT32_DFLOAT32, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_FABSD64, "fabsd64", BT_FN_DFLOAT64_DFLOAT64, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_FABSD128, "fabsd128", BT_FN_DFLOAT128_DFLOAT128, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_FDIM, "fdim", BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO) DEF_C99_BUILTIN(BUILT_IN_FDIMF, "fdimf", BT_FN_FLOAT_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING_ERRNO) DEF_C99_BUILTIN(BUILT_IN_FDIML, "fdiml", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO) Index: gcc-4_8-test/gcc/config/rs6000/dfp.md === --- gcc-4_8-test.orig/gcc/config/rs6000/dfp.md +++ gcc-4_8-test/gcc/config/rs6000/dfp.md @@ -148,18 +148,24 @@ "") (define_insn "*abstd2_fpr" - [(set (match_operand:TD 0 "gpc_reg_operand" "=d") - (abs:TD (match_operand:TD 1 "gpc_reg_operand" "d")))] + [(set (match_operand:TD 0 "gpc_reg_operand" "=d,d") + (abs:TD (match_operand:TD 1 "gpc_reg_operand" "0,d")))] "TARGET_HARD_FLOAT && TARGET_FPRS" - "fabs %0,%1" - [(set_attr "type" "fp")]) + "@ + fabs %0,%1 + fabs %0,%1\;fmr %L0,%L1" + [(set_attr "type" "fp") + (set_attr "length" "4,8")]) (define_insn "*nabstd2_fpr" - [(set (match_operand:TD 0 "gpc_reg_operand" "=d") - (neg:TD (abs:TD (match_operand:TD 1 "gpc_reg_operand" "d"] + [(set (match_operand:TD 0 "gpc_reg_operand" "=d,d") + (neg:TD (abs:TD (match_operand:TD 1 "gpc_reg_operand" "0,d"] "TARGET_HARD_FLOAT && TARGET_FPRS" - "fnabs %0,%1" - [(set_attr "type" "fp")]) + "@ + fnabs %0,%1 + fnabs %0,%1\;fmr %L0,%L1" + [(set_attr "type" "fp") + (set_attr "length" "4,8")]) ;; Hardware support for decimal floating point operations. Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/dfp-dd-2.c === --- /dev/null +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/dfp-dd-2.c @@ -0,0 +1,26 @@ +/* Test generation of DFP instructions for POWER6. */ +/* { dg-do compile { target { powerpc*-*-linux* && powerpc_fprs } } } */ +/* { dg-options "-std=gnu99 -O1 -mcpu=power6" } */ + +/* { dg-final { scan-assembler-times "fneg" 1 } } */ +/* { dg-final { scan-assembler-times "fabs" 1 } } */ +/* { dg-final { scan-assembler-times "fnabs" 1 } } */ +/* { dg-final { scan-assembler-times "fmr" 0 } } */ + +_Decimal64 +func1 (_Decimal64 a, _Decimal64 b) +{ + return -b; +} + +_Decimal64 +func2 (_Decimal64 a, _Decimal64 b) +{ + return __builti
[4.8, PATCH 17/26] Backport Power8 and LE support: Direct moves
Hi, This patch (diff-direct-move) backports support for the Power8 direct move instructions for little endian. Thanks, Bill 2014-03-19 Bill Schmidt Backport from mainline 2013-10-23 Pat Haugen * gcc.target/powerpc/direct-move.h: Fix header for executable tests. Back port from mainline 2014-01-16 Michael Meissner PR target/59844 * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Add little endian support, remove tests for WORDS_BIG_ENDIAN. (p8_mfvsrd_3_): Likewise. (reload_gpr_from_vsx): Likewise. (reload_gpr_from_vsxsf): Likewise. (p8_mfvsrd_4_disf): Likewise. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.md === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.md +++ gcc-4_8-test/gcc/config/rs6000/rs6000.md @@ -9438,7 +9438,7 @@ (unspec:SF [(match_operand:SF 1 "register_operand" "r")] UNSPEC_P8V_RELOAD_FROM_GPR)) (clobber (match_operand:DI 2 "register_operand" "=r"))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN" + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "#" "&& reload_completed" [(const_int 0)] @@ -9465,7 +9465,7 @@ [(set (match_operand:DF 0 "register_operand" "=r") (unspec:DF [(match_operand:FMOVE128_GPR 1 "register_operand" "wa")] UNSPEC_P8V_RELOAD_FROM_VSX))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN" + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "mfvsrd %0,%x1" [(set_attr "type" "mftgpr")]) @@ -9475,7 +9475,7 @@ [(match_operand:FMOVE128_GPR 1 "register_operand" "wa")] UNSPEC_P8V_RELOAD_FROM_VSX)) (clobber (match_operand:FMOVE128_GPR 2 "register_operand" "=wa"))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN" + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "#" "&& reload_completed" [(const_int 0)] @@ -9502,7 +9502,7 @@ (unspec:SF [(match_operand:SF 1 "register_operand" "wa")] UNSPEC_P8V_RELOAD_FROM_VSX)) (clobber (match_operand:V4SF 2 "register_operand" "=wa"))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN" + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "#" "&& reload_completed" [(const_int 0)] @@ -9524,7 +9524,7 @@ [(set (match_operand:DI 0 "register_operand" "=r") (unspec:DI [(match_operand:V4SF 1 "register_operand" "wa")] UNSPEC_P8V_RELOAD_FROM_VSX))] - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN" + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "mfvsrd %0,%x1" [(set_attr "type" "mftgpr")]) Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/direct-move.h === --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/direct-move.h +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/direct-move.h @@ -1,5 +1,7 @@ /* Test functions for direct move support. */ +#include +extern void abort (void); #ifndef VSX_REG_ATTR #define VSX_REG_ATTR "wa" @@ -111,7 +113,7 @@ const struct test_struct test_functions[ void __attribute__((__noinline__)) test_value (TYPE a) { - size_t i; + long i; for (i = 0; i < sizeof (test_functions) / sizeof (test_functions[0]); i++) { @@ -127,8 +129,7 @@ test_value (TYPE a) int main (void) { - size_t i; - long j; + long i,j; union { TYPE value; unsigned char bytes[sizeof (TYPE)];
[4.8, PATCH 6/26] Backport Power8 and LE support: TDmode for LE
Hi, This patch (diff-le-dfp) backports fixes for TDmode on a little endian target. Thanks, Bill 2014-03-19 Bill Schmidt Backport from mainline r205123: 2013-11-20 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_cannot_change_mode_class): Do not allow subregs of TDmode in FPRs of smaller size in little-endian. (rs6000_split_multireg_move): When splitting an access to TDmode in FPRs, do not use simplify_gen_subreg. Backport from mainline r204927: 2013-11-17 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_emit_move): Use low word of sdmode_stack_slot also in little-endian mode. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -7963,7 +7963,9 @@ rs6000_emit_move (rtx dest, rtx source, } else if (INT_REGNO_P (REGNO (operands[1]))) { - rtx mem = adjust_address_nv (operands[0], mode, 4); + rtx mem = operands[0]; + if (BYTES_BIG_ENDIAN) + mem = adjust_address_nv (mem, mode, 4); mem = eliminate_regs (mem, VOIDmode, NULL_RTX); emit_insn (gen_movsd_hardfloat (mem, operands[1])); } @@ -7986,7 +7988,9 @@ rs6000_emit_move (rtx dest, rtx source, } else if (INT_REGNO_P (REGNO (operands[0]))) { - rtx mem = adjust_address_nv (operands[1], mode, 4); + rtx mem = operands[1]; + if (BYTES_BIG_ENDIAN) + mem = adjust_address_nv (mem, mode, 4); mem = eliminate_regs (mem, VOIDmode, NULL_RTX); emit_insn (gen_movsd_hardfloat (operands[0], mem)); } @@ -16082,6 +16086,13 @@ rs6000_cannot_change_mode_class (enum ma if (TARGET_IEEEQUAD && (to == TFmode || from == TFmode)) return true; + /* TDmode in floating-mode registers must always go into a register +pair with the most significant word in the even-numbered register +to match ISA requirements. In little-endian mode, this does not +match subreg numbering, so we cannot allow subregs. */ + if (!BYTES_BIG_ENDIAN && (to == TDmode || from == TDmode)) + return true; + if (from_size < 8 || to_size < 8) return true; @@ -19028,6 +19039,39 @@ rs6000_split_multireg_move (rtx dst, rtx gcc_assert (reg_mode_size * nregs == GET_MODE_SIZE (mode)); + /* TDmode residing in FP registers is special, since the ISA requires that + the lower-numbered word of a register pair is always the most significant + word, even in little-endian mode. This does not match the usual subreg + semantics, so we cannnot use simplify_gen_subreg in those cases. Access + the appropriate constituent registers "by hand" in little-endian mode. + + Note we do not need to check for destructive overlap here since TDmode + can only reside in even/odd register pairs. */ + if (FP_REGNO_P (reg) && DECIMAL_FLOAT_MODE_P (mode) && !BYTES_BIG_ENDIAN) +{ + rtx p_src, p_dst; + int i; + + for (i = 0; i < nregs; i++) + { + if (REG_P (src) && FP_REGNO_P (REGNO (src))) + p_src = gen_rtx_REG (reg_mode, REGNO (src) + nregs - 1 - i); + else + p_src = simplify_gen_subreg (reg_mode, src, mode, +i * reg_mode_size); + + if (REG_P (dst) && FP_REGNO_P (REGNO (dst))) + p_dst = gen_rtx_REG (reg_mode, REGNO (dst) + nregs - 1 - i); + else + p_dst = simplify_gen_subreg (reg_mode, dst, mode, +i * reg_mode_size); + + emit_insn (gen_rtx_SET (VOIDmode, p_dst, p_src)); + } + + return; +} + if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) { /* Move register range backwards, if we might have destructive
[4.8, PATCH 4/26] Backport Power8 and LE support: Libtool and configure bits 2
Hi, This patch (diff-le-libtool) backports changes to use a libtool.m4 that supports powerpc64le-*linux*. Thanks, Bill 2014-03-19 Bill Schmidt Backport from mainline 2013-11-22 Ulrich Weigand * libgo/config/libtool.m4: Update to mainline version. * libgo/configure: Regenerate. 2013-11-17 Ulrich Weigand * libgo/config/libtool.m4: Update to mainline version. * libgo/configure: Regenerate. 2013-11-15 Ulrich Weigand * libtool.m4: Update to mainline version. * libjava/libltdl/acinclude.m4: Likewise. * gcc/configure: Regenerate. * boehm-gc/configure: Regenerate. * libatomic/configure: Regenerate. * libbacktrace/configure: Regenerate. * libffi/configure: Regenerate. * libgfortran/configure: Regenerate. * libgomp/configure: Regenerate. * libitm/configure: Regenerate. * libjava/configure: Regenerate. * libjava/libltdl/configure: Regenerate. * libjava/classpath/configure: Regenerate. * libmudflap/configure: Regenerate. * libobjc/configure: Regenerate. * libquadmath/configure: Regenerate. * libsanitizer/configure: Regenerate. * libssp/configure: Regenerate. * libstdc++-v3/configure: Regenerate. * lto-plugin/configure: Regenerate. * zlib/configure: Regenerate. Backport from mainline 2013-09-20 Alan Modra * libtool.m4 (_LT_ENABLE_LOCK ): Remove non-canonical ppc host match. Support little-endian powerpc linux hosts. * configure: Regenerate. Index: gcc-4_8-branch/gcc/configure === --- gcc-4_8-branch.orig/gcc/configure 2013-12-28 17:41:32.733630408 +0100 +++ gcc-4_8-branch/gcc/configure2013-12-28 17:50:38.646332701 +0100 @@ -13589,7 +13589,7 @@ ia64-*-hpux*) rm -rf conftest* ;; -x86_64-*kfreebsd*-gnu|x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*| \ +x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \ s390*-*linux*|s390*-*tpf*|sparc*-*linux*) # Find out which ABI we are using. echo 'int i;' > conftest.$ac_ext @@ -13614,7 +13614,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* ;; esac ;; - ppc64-*linux*|powerpc64-*linux*) + powerpc64le-*linux*) + LD="${LD-ld} -m elf32lppclinux" + ;; + powerpc64-*linux*) LD="${LD-ld} -m elf32ppclinux" ;; s390x-*linux*) @@ -13633,7 +13636,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* x86_64-*linux*) LD="${LD-ld} -m elf_x86_64" ;; - ppc*-*linux*|powerpc*-*linux*) + powerpcle-*linux*) + LD="${LD-ld} -m elf64lppc" + ;; + powerpc-*linux*) LD="${LD-ld} -m elf64ppc" ;; s390*-*linux*|s390*-*tpf*) @@ -17827,7 +17833,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 17830 "configure" +#line 17836 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -17933,7 +17939,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 17936 "configure" +#line 17942 "configure" #include "confdefs.h" #if HAVE_DLFCN_H Index: gcc-4_8-branch/libtool.m4 === --- gcc-4_8-branch.orig/libtool.m4 2013-12-28 17:41:32.728630383 +0100 +++ gcc-4_8-branch/libtool.m4 2013-12-28 17:50:38.652332731 +0100 @@ -1220,7 +1220,7 @@ ia64-*-hpux*) rm -rf conftest* ;; -x86_64-*kfreebsd*-gnu|x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*| \ +x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \ s390*-*linux*|s390*-*tpf*|sparc*-*linux*) # Find out which ABI we are using. echo 'int i;' > conftest.$ac_ext @@ -1241,7 +1241,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* ;; esac ;; - ppc64-*linux*|powerpc64-*linux*) + powerpc64le-*linux*) + LD="${LD-ld} -m elf32lppclinux" + ;; + powerpc64-*linux*) LD="${LD-ld} -m elf32ppclinux" ;; s390x-*linux*) @@ -1260,7 +1263,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* x86_64-*linux*) LD="${LD-ld} -m elf_x86_64" ;; - ppc*-*linux*|powerpc*-*linux*) + powerpcle-*linux*) + LD="${LD-ld} -m elf64lppc" + ;; + powerpc-*linux*) LD="${LD-ld} -m elf64ppc" ;; s390*-*linux*|s390*-*tpf*) Index: gcc-4_8-branch/boehm-gc/configure === --- gcc-4_8-branch.orig/boehm-gc/configure 2013-12-28 17:41:32.733630408 +0100 +++ gcc-4_8-branch/boehm-gc/configure 2013-12-28
[4.8, PATCH 16/26] Backport Power8 and LE support: PR56843
Hi, This patch (diff-pr56843) backports the fix for PR56843. Thanks, Bill [gcc] 2014-03-19 Bill Schmidt Backport from mainline 2013-04-05 Bill Schmidt PR target/56843 * config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove. (rs6000_emit_swdiv_low_precision): Remove. (rs6000_emit_swdiv): Rewrite to handle between one and four iterations of Newton-Raphson generally; modify required number of iterations for some cases. * config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove. [gcc/testsuite] 2014-03-19 Bill Schmidt Backport from mainline 2013-04-05 Bill Schmidt PR target/56843 * gcc.target/powerpc/recip-1.c: Modify expected output. * gcc.target/powerpc/recip-3.c: Likewise. * gcc.target/powerpc/recip-4.c: Likewise. * gcc.target/powerpc/recip-5.c: Add expected output for iterations. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -29417,54 +29417,26 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx emit_insn (gen_rtx_SET (VOIDmode, dst, r)); } -/* Newton-Raphson approximation of floating point divide with just 2 passes - (either single precision floating point, or newer machines with higher - accuracy estimates). Support both scalar and vector divide. Assumes no - trapping math and finite arguments. */ +/* Newton-Raphson approximation of floating point divide DST = N/D. If NOTE_P, + add a reg_note saying that this was a division. Support both scalar and + vector divide. Assumes no trapping math and finite arguments. */ -static void -rs6000_emit_swdiv_high_precision (rtx dst, rtx n, rtx d) +void +rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p) { enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, y1, u0, v0; - enum insn_code code = optab_handler (smul_optab, mode); - insn_gen_fn gen_mul = GEN_FCN (code); - rtx one = rs6000_load_constant_and_splat (mode, dconst1); - - gcc_assert (code != CODE_FOR_nothing); - - /* x0 = 1./d estimate */ - x0 = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (VOIDmode, x0, - gen_rtx_UNSPEC (mode, gen_rtvec (1, d), - UNSPEC_FRES))); - - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - (d * x0) */ - - e1 = gen_reg_rtx (mode); - rs6000_emit_madd (e1, e0, e0, e0); /* e1 = (e0 * e0) + e0 */ - - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e1, x0, x0); /* y1 = (e1 * x0) + x0 */ - - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y1)); /* u0 = n * y1 */ - - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n);/* v0 = n - (d * u0) */ - - rs6000_emit_madd (dst, v0, y1, u0); /* dst = (v0 * y1) + u0 */ -} + rtx one, x0, e0, x1, xprev, eprev, xnext, enext, u, v; + int i; -/* Newton-Raphson approximation of floating point divide that has a low - precision estimate. Assumes no trapping math and finite arguments. */ + /* Low precision estimates guarantee 5 bits of accuracy. High + precision estimates guarantee 14 bits of accuracy. SFmode + requires 23 bits of accuracy. DFmode requires 52 bits of + accuracy. Each pass at least doubles the accuracy, leading + to the following. */ + int passes = (TARGET_RECIP_PRECISION) ? 1 : 3; + if (mode == DFmode || mode == V2DFmode) +passes++; -static void -rs6000_emit_swdiv_low_precision (rtx dst, rtx n, rtx d) -{ - enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, e2, y1, y2, y3, u0, v0, one; enum insn_code code = optab_handler (smul_optab, mode); insn_gen_fn gen_mul = GEN_FCN (code); @@ -29478,46 +29450,44 @@ rs6000_emit_swdiv_low_precision (rtx dst gen_rtx_UNSPEC (mode, gen_rtvec (1, d), UNSPEC_FRES))); - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - d * x0 */ - - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e0, x0, x0); /* y1 = x0 + e0 * x0 */ - - e1 = gen_reg_rtx (mode); - emit_insn (gen_mul (e1, e0, e0));/* e1 = e0 * e0 */ - - y2 = gen_reg_rtx (mode); - rs6000_emit_madd (y2, e1, y1, y1); /* y2 = y1 + e1 * y1 */ - - e2 = gen_reg_rtx (mode); - emit_insn (gen_mul (e2, e1, e1));/* e2 = e1 * e1 */ - - y3 = gen_reg_rtx (mode); - rs6000_emit_madd (y3, e2, y2, y2); /* y3 = y2 + e2 * y2 */ - - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y3)); /* u0 = n * y3 */ - - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n);/* v0 = n - d * u0 */ - - rs6000_emit_madd (dst, v0, y3, u0); /* dst = u0 + v0 * y3 */ -} + /* Each iteration but the
[4.8, PATCH 18/26] Backport Power8 and LE support: Configure bits 2
Hi, This patch (diff-le-config-2) backports more configure changes, particularly for multilib/multiarch targeting powerpc64le. Thanks, Bill 2014-03-19 Bill Schmidt Apply mainline r202190, powerpc64le multilibs and multiarch dir 2013-09-03 Alan Modra * config.gcc (powerpc*-*-linux*): Add support for little-endian multilibs to big-endian target and vice versa. * config/rs6000/t-linux64: Use := assignment on all vars. (MULTILIB_EXTRA_OPTS): Remove fPIC. (MULTILIB_OSDIRNAMES): Specify using mapping from multilib_options. * config/rs6000/t-linux64le: New file. * config/rs6000/t-linux64bele: New file. * config/rs6000/t-linux64lebe: New file. Index: gcc-4_8-test/gcc/config.gcc === --- gcc-4_8-test.orig/gcc/config.gcc +++ gcc-4_8-test/gcc/config.gcc @@ -2081,7 +2081,7 @@ powerpc*-*-linux*) tmake_file="rs6000/t-fprules rs6000/t-ppcos ${tmake_file} rs6000/t-ppccomm" case ${target} in powerpc*le-*-*) - tm_file="${tm_file} rs6000/sysv4le.h" ;; + tm_file="${tm_file} rs6000/sysv4le.h" ;; esac maybe_biarch=yes case ${target} in @@ -2104,6 +2104,19 @@ powerpc*-*-linux*) fi tm_file="rs6000/biarch64.h ${tm_file} rs6000/linux64.h glibc-stdint.h" tmake_file="$tmake_file rs6000/t-linux64" + case ${target} in + powerpc*le-*-*) + tmake_file="$tmake_file rs6000/t-linux64le" + case ${enable_targets} in + all | *powerpc64-* | *powerpc-*) + tmake_file="$tmake_file rs6000/t-linux64lebe" ;; + esac ;; + *) + case ${enable_targets} in + all | *powerpc64le-* | *powerpcle-*) + tmake_file="$tmake_file rs6000/t-linux64bele" ;; + esac ;; + esac extra_options="${extra_options} rs6000/linux64.opt" ;; *) Index: gcc-4_8-test/gcc/config/rs6000/t-linux64 === --- gcc-4_8-test.orig/gcc/config/rs6000/t-linux64 +++ gcc-4_8-test/gcc/config/rs6000/t-linux64 @@ -25,8 +25,8 @@ # it doesn't tell anything about the 32bit libraries on those systems. Set # MULTILIB_OSDIRNAMES according to what is found on the target. -MULTILIB_OPTIONS= m64/m32 -MULTILIB_DIRNAMES = 64 32 -MULTILIB_EXTRA_OPTS = fPIC -MULTILIB_OSDIRNAMES= ../lib64$(call if_multiarch,:powerpc64-linux-gnu) -MULTILIB_OSDIRNAMES+= $(if $(wildcard $(shell echo $(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call if_multiarch,:powerpc-linux-gnu) +MULTILIB_OPTIONS:= m64/m32 +MULTILIB_DIRNAMES := 64 32 +MULTILIB_EXTRA_OPTS := +MULTILIB_OSDIRNAMES := m64=../lib64$(call if_multiarch,:powerpc64-linux-gnu) +MULTILIB_OSDIRNAMES += m32=$(if $(wildcard $(shell echo $(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call if_multiarch,:powerpc-linux-gnu) Index: gcc-4_8-test/gcc/config/rs6000/t-linux64bele === --- /dev/null +++ gcc-4_8-test/gcc/config/rs6000/t-linux64bele @@ -0,0 +1,7 @@ +#rs6000/t-linux64end + +MULTILIB_OPTIONS+= mlittle +MULTILIB_DIRNAMES += le +MULTILIB_OSDIRNAMES += $(subst =,.mlittle=,$(subst lible32,lib32le,$(subst lible64,lib64le,$(subst lib,lible,$(subst -linux,le-linux,$(MULTILIB_OSDIRNAMES)) +MULTILIB_OSDIRNAMES += $(subst $(if $(findstring 64,$(target)),m64,m32).,,$(filter $(if $(findstring 64,$(target)),m64,m32).mlittle%,$(MULTILIB_OSDIRNAMES))) +MULTILIB_MATCHES:= ${MULTILIB_MATCHES_ENDIAN} Index: gcc-4_8-test/gcc/config/rs6000/t-linux64le === --- /dev/null +++ gcc-4_8-test/gcc/config/rs6000/t-linux64le @@ -0,0 +1,3 @@ +#rs6000/t-linux64le + +MULTILIB_OSDIRNAMES := $(subst -linux,le-linux,$(MULTILIB_OSDIRNAMES)) Index: gcc-4_8-test/gcc/config/rs6000/t-linux64lebe === --- /dev/null +++ gcc-4_8-test/gcc/config/rs6000/t-linux64lebe @@ -0,0 +1,7 @@ +#rs6000/t-linux64leend + +MULTILIB_OPTIONS+= mbig +MULTILIB_DIRNAMES += be +MULTILIB_OSDIRNAMES += $(subst =,.mbig=,$(subst libbe32,lib32be,$(subst libbe64,lib64be,$(subst lib,libbe,$(subst le-linux,-linux,$(MULTILIB_OSDIRNAMES)) +MULTILIB_OSDIRNAMES += $(subst $(if $(findstring 64,$(target)),m64,m32).,,$(filter $(if $(findstring 64,$(target)),m64,m32).mbig%,$(MULTILIB_OSDIRNAMES))) +MULTILIB_MATCHES:= ${MULTILIB_MATCHES_ENDIAN} Index: gcc-4_8-test/libsanitizer/configure.tgt === --- gcc-4_8-test.orig/libsanitizer/configure.tgt +++ gcc-
[4.8, PATCH 15/26] Backport Power8 and LE support: PR54537
Hi, This patch (diff-pr54537) backports a fix for PR54537 which is unrelated but necessary. Copying Richard and Jakub for the common code. Thanks, Bill [libstdc++-v3] 2014-03-29 Bill Schmidt Backport from mainline 2013-08-01 Fabien Chêne PR c++/54537 * include/tr1/cmath: Remove pow(double,double) overload, remove a duplicated comment about DR 550. Add a comment to explain the issue. * testsuite/tr1/8_c_compatibility/cmath/pow_cmath.cc: New. [gcc/cp] 2014-03-29 Bill Schmidt Back port from mainline 2013-08-01 Fabien Chêne PR c++/54537 * cp-tree.h: Check OVL_USED with OVERLOAD_CHECK. * name-lookup.c (do_nonmember_using_decl): Make sure we have an OVERLOAD before calling OVL_USED. Call diagnose_name_conflict instead of issuing an error without mentioning the conflicting declaration. [gcc/testsuite] 2014-03-29 Bill Schmidt Back port from mainline 2013-08-01 Fabien Chêne Peter Bergner PR c++/54537 * g++.dg/overload/using3.C: New. * g++.dg/overload/using2.C: Adjust. * g++.dg/lookup/using9.C: Likewise. Index: gcc-4_8-test/gcc/cp/cp-tree.h === --- gcc-4_8-test.orig/gcc/cp/cp-tree.h +++ gcc-4_8-test/gcc/cp/cp-tree.h @@ -331,7 +331,7 @@ typedef struct ptrmem_cst * ptrmem_cst_t /* If set, this was imported in a using declaration. This is not to confuse with being used somewhere, which is not important for this node. */ -#define OVL_USED(NODE) TREE_USED (NODE) +#define OVL_USED(NODE) TREE_USED (OVERLOAD_CHECK (NODE)) /* If set, this OVERLOAD was created for argument-dependent lookup and can be freed afterward. */ #define OVL_ARG_DEPENDENT(NODE) TREE_LANG_FLAG_0 (OVERLOAD_CHECK (NODE)) Index: gcc-4_8-test/gcc/cp/name-lookup.c === --- gcc-4_8-test.orig/gcc/cp/name-lookup.c +++ gcc-4_8-test/gcc/cp/name-lookup.c @@ -2286,8 +2286,7 @@ push_overloaded_decl_1 (tree decl, int f && compparms (TYPE_ARG_TYPES (TREE_TYPE (fn)), TYPE_ARG_TYPES (TREE_TYPE (decl))) && ! decls_match (fn, decl)) - error ("%q#D conflicts with previous using declaration %q#D", - decl, fn); + diagnose_name_conflict (decl, fn); dup = duplicate_decls (decl, fn, is_friend); /* If DECL was a redeclaration of FN -- even an invalid @@ -2519,7 +2518,7 @@ do_nonmember_using_decl (tree scope, tre if (new_fn == old_fn) /* The function already exists in the current namespace. */ break; - else if (OVL_USED (tmp1)) + else if (TREE_CODE (tmp1) == OVERLOAD && OVL_USED (tmp1)) continue; /* this is a using decl */ else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (new_fn)), TYPE_ARG_TYPES (TREE_TYPE (old_fn @@ -2534,7 +2533,7 @@ do_nonmember_using_decl (tree scope, tre break; else { - error ("%qD is already declared in this scope", name); + diagnose_name_conflict (new_fn, old_fn); break; } } Index: gcc-4_8-test/gcc/testsuite/g++.dg/lookup/using9.C === --- gcc-4_8-test.orig/gcc/testsuite/g++.dg/lookup/using9.C +++ gcc-4_8-test/gcc/testsuite/g++.dg/lookup/using9.C @@ -21,11 +21,11 @@ void h() f('h'); f(1); // { dg-error "ambiguous" } // { dg-message "candidate" "candidate note" { target *-*-* } 22 } - void f(int); // { dg-error "previous using declaration" } + void f(int); // { dg-error "previous declaration" } } void m() { void f(int); - using B::f; // { dg-error "already declared" } + using B::f; // { dg-error "previous declaration" } } Index: gcc-4_8-test/gcc/testsuite/g++.dg/overload/using2.C === --- gcc-4_8-test.orig/gcc/testsuite/g++.dg/overload/using2.C +++ gcc-4_8-test/gcc/testsuite/g++.dg/overload/using2.C @@ -45,7 +45,7 @@ using std::C1; extern "C" void exit (int) throw (); extern "C" void *malloc (__SIZE_TYPE__) throw () __attribute__((malloc)); - void abort (void) throw (); + void abort (void) throw (); // { dg-message "previous" } void _exit (int) throw (); // { dg-error "conflicts" "conflicts" } // { dg-message "void _exit" "_exit" { target *-*-* } 49 } @@ -54,14 +54,14 @@ using std::C1; // { dg-message "void C1" "C1" { target *-*-* } 53 } extern "C" void c2
[4.8, PATCH 20/26] Backport Power8 and LE support: LRA
Hi, This patch (diff-lra) backports the changes to enable -mlra for the PowerPC back end. Thanks, Bill 2014-03-19 Bill Schmidt Backport from mainline 2014-02-04 Michael Meissner * config/rs6000/rs6000.opt (-mlra): Add switch to enable the LRA register allocator. * config/rs6000/rs6000.c (TARGET_LRA_P): Add support for -mlra to enable the LRA register allocator. Back port the changes from the trunk to enable LRA. (rs6000_legitimate_offset_address_p): Likewise. (legitimate_lo_sum_address_p): Likewise. (use_toc_relative_ref): Likewise. (rs6000_legitimate_address_p): Likewise. (rs6000_emit_move): Likewise. (rs6000_secondary_memory_needed_mode): Likewise. (rs6000_alloc_sdmode_stack_slot): Likewise. (rs6000_lra_p): Likewise. * config/rs6000/sync.md (load_lockedti): Copy TI/PTI variables by 64-bit parts to force the register allocator to allocate even/odd register pairs for the quad word atomic instructions. (store_conditionalti): Likewise. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -1,5 +1,5 @@ /* Subroutines used for code generation on IBM RS/6000. - Copyright (C) 1991-2013 Free Software Foundation, Inc. + Copyright (C) 1991-2014 Free Software Foundation, Inc. Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu) This file is part of GCC. @@ -56,6 +56,7 @@ #include "intl.h" #include "params.h" #include "tm-constrs.h" +#include "ira.h" #include "opts.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -1563,6 +1564,9 @@ static const struct attribute_spec rs600 #undef TARGET_MODE_DEPENDENT_ADDRESS_P #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p +#undef TARGET_LRA_P +#define TARGET_LRA_P rs6000_lra_p + #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE rs6000_can_eliminate @@ -6242,7 +6246,7 @@ rs6000_legitimate_offset_address_p (enum return false; if (!reg_offset_addressing_ok_p (mode)) return virtual_stack_registers_memory_p (x); - if (legitimate_constant_pool_address_p (x, mode, strict)) + if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress)) return true; if (GET_CODE (XEXP (x, 1)) != CONST_INT) return false; @@ -6383,9 +6387,21 @@ legitimate_lo_sum_address_p (enum machin if (TARGET_ELF || TARGET_MACHO) { + bool large_toc_ok; + if (DEFAULT_ABI == ABI_V4 && flag_pic) return false; - if (TARGET_TOC) + /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls +push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS +recognizes some LO_SUM addresses as valid although this +function says opposite. In most cases, LRA through different +transformations can generate correct code for address reloads. +It can not manage only some LO_SUM cases. So we need to add +code analogous to one in rs6000_legitimize_reload_address for +LOW_SUM here saying that some addresses are still valid. */ + large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL + && small_toc_ref (x, VOIDmode)); + if (TARGET_TOC && ! large_toc_ok) return false; if (GET_MODE_NUNITS (mode) != 1) return false; @@ -6395,7 +6411,7 @@ legitimate_lo_sum_address_p (enum machin && (mode == DFmode || mode == DDmode))) return false; - return CONSTANT_P (x); + return CONSTANT_P (x) || large_toc_ok; } return false; @@ -7106,7 +7122,6 @@ use_toc_relative_ref (rtx sym) && ASM_OUTPUT_SPECIAL_POOL_ENTRY_P (get_pool_constant (sym), get_pool_mode (sym))) || (TARGET_CMODEL == CMODEL_MEDIUM - && !CONSTANT_POOL_ADDRESS_P (sym) && SYMBOL_REF_LOCAL_P (sym))); } @@ -7394,7 +7409,8 @@ rs6000_legitimate_address_p (enum machin if (reg_offset_p && legitimate_small_data_p (mode, x)) return 1; if (reg_offset_p - && legitimate_constant_pool_address_p (x, mode, reg_ok_strict)) + && legitimate_constant_pool_address_p (x, mode, +reg_ok_strict || lra_in_progress)) return 1; /* For TImode, if we have load/store quad and TImode in VSX registers, only allow register indirect addresses. This will allow the values to go in @@ -7680,6 +7696,7 @@ rs6000_conditional_register_usage (void) fixed_regs[i] = call_used_regs[i] = call_really_used_regs[i] = 1; } } + /* Try to output insns to set TARGET equal to the constant C if it can be done in less than N insns. Do all computations in MODE. @@ -8112,6 +8129,68 @@ rs6000_emit_move (rtx dest, rtx s
[4.8, PATCH 26/26] Backport Power8 and LE support: Missing support
Hi, This patch (diff-trunk-missing) backports some LE pieces that were found not to have been backported from trunk to the IBM 4.8 branch until relatively recently. Thanks, Bill 2014-03-19 Bill Schmidt Back port from trunk 2013-04-25 Alan Modra PR target/57052 * config/rs6000/rs6000.md (rotlsi3_internal7): Rename to rotlsi3_internal7le and condition on !BYTES_BIG_ENDIAN. (rotlsi3_internal8be): New BYTES_BIG_ENDIAN insn. Repeat for many other rotate/shift and mask patterns using subregs. Name lshiftrt insns. (ashrdisi3_noppc64): Rename to ashrdisi3_noppc64be and condition on WORDS_BIG_ENDIAN. 2013-06-07 Alan Modra * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't override user -mfp-in-toc. (offsettable_ok_by_alignment): Consider just the current access rather than the whole object, unless BLKmode. Handle CONSTANT_POOL_ADDRESS_P constants that lack a decl too. (use_toc_relative_ref): Allow CONSTANT_POOL_ADDRESS_P constants for -mcmodel=medium. * config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't override user -mfp-in-toc or -msum-in-toc. Default to -mno-fp-in-toc for -mcmodel=medium. 2013-06-18 Alan Modra * config/rs6000/rs6000.h (enum data_align): New. (LOCAL_ALIGNMENT, DATA_ALIGNMENT): Use rs6000_data_alignment. (DATA_ABI_ALIGNMENT): Define. (CONSTANT_ALIGNMENT): Correct comment. * config/rs6000/rs6000-protos.h (rs6000_data_alignment): Declare. * config/rs6000/rs6000.c (rs6000_data_alignment): New function. 2013-07-11 Ulrich Weigand * config/rs6000/rs6000.md (""*tls_gd_low"): Require GOT register as additional operand in UNSPEC. ("*tls_ld_low"): Likewise. ("*tls_got_dtprel_low"): Likewise. ("*tls_got_tprel_low"): Likewise. ("*tls_gd"): Update splitter. ("*tls_ld"): Likewise. ("tls_got_dtprel_"): Likewise. ("tls_got_tprel_"): Likewise. 2014-01-23 Pat Haugen * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't force flag_ira_loop_pressure if set via command line. 2014-02-06 Alan Modra PR target/60032 * config/rs6000/rs6000.c (rs6000_secondary_memory_needed_mode): Only change SDmode to DDmode when lra_in_progress. Index: gcc-4_8-test/gcc/config/rs6000/linux64.h === --- gcc-4_8-test.orig/gcc/config/rs6000/linux64.h +++ gcc-4_8-test/gcc/config/rs6000/linux64.h @@ -149,8 +149,11 @@ extern int dot_symbols; SET_CMODEL (CMODEL_MEDIUM); \ if (rs6000_current_cmodel != CMODEL_SMALL)\ { \ - TARGET_NO_FP_IN_TOC = 0; \ - TARGET_NO_SUM_IN_TOC = 0; \ + if (!global_options_set.x_TARGET_NO_FP_IN_TOC) \ + TARGET_NO_FP_IN_TOC \ + = rs6000_current_cmodel == CMODEL_MEDIUM; \ + if (!global_options_set.x_TARGET_NO_SUM_IN_TOC) \ + TARGET_NO_SUM_IN_TOC = 0; \ } \ } \ } \ Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h +++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h @@ -152,6 +152,7 @@ extern void rs6000_split_logical (rtx [] #endif /* RTX_CODE */ #ifdef TREE_CODE +extern unsigned int rs6000_data_alignment (tree, unsigned int, enum data_align); extern unsigned int rs6000_special_round_type_align (tree, unsigned int, unsigned int); extern unsigned int darwin_rs6000_special_round_type_align (tree, unsigned int, Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -3031,7 +3031,8 @@ rs6000_option_override_internal (bool gl calculation works better for RTL loop invariant motion on targets with enough (>= 32) registers. It is an expensive optimization. So it is on only for peak performance. */ - if (optimize >= 3 && global_init_p) + if (optimize >= 3 && global_init_p + && !global_options_set.x_flag_ira_loop_pressure) flag_ira_loop_pressure = 1; /* Set the pointer size. */ @@ -3520,7 +3521,8 @@ rs6000_option_override_internal (bool gl /* Place FP constants i
[4.8, PATCH 19/26] Backport Power8 and LE support: Quad memory atomic
Hi, This patch (diff-quad-memory) backports support for quad-memory atomic operations. Thanks, Bill [gcc/testsuite] 2014-03-19 Bill Schmidt Back port from mainline 2014-01-23 Michael Meissner PR target/59909 * gcc.target/powerpc/quad-atomic.c: New file to test power8 quad word atomic functions at runtime. [gcc] 2014-03-19 Bill Schmidt Back port from mainline 2014-01-23 Michael Meissner PR target/59909 * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mquad-memory-atomic. Update -mquad-memory documentation to say it is only used for non-atomic loads/stores. * config/rs6000/predicates.md (quad_int_reg_operand): Allow either -mquad-memory or -mquad-memory-atomic switches. * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Add -mquad-memory-atomic to ISA 2.07 support. * config/rs6000/rs6000.opt (-mquad-memory-atomic): Add new switch to separate support of normal quad word memory operations (ldq, stq) from the atomic quad word memory operations. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add support to separate non-atomic quad word operations from atomic quad word operations. Disable non-atomic quad word operations in little endian mode so that we don't have to swap words after the load and before the store. (quad_load_store_p): Add comment about atomic quad word support. (rs6000_opt_masks): Add -mquad-memory-atomic to the list of options printed with -mdebug=reg. * config/rs6000/rs6000.h (TARGET_SYNC_TI): Use -mquad-memory-atomic as the test for whether we have quad word atomic instructions. (TARGET_SYNC_HI_QI): If either -mquad-memory-atomic, -mquad-memory, or -mp8-vector are used, allow byte/half-word atomic operations. * config/rs6000/sync.md (load_lockedti): Insure that the address is a proper indexed or indirect address for the lqarx instruction. On little endian systems, swap the hi/lo registers after the lqarx instruction. (load_lockedpti): Use indexed_or_indirect_operand predicate to insure the address is valid for the lqarx instruction. (store_conditionalti): Insure that the address is a proper indexed or indirect address for the stqcrx. instruction. On little endian systems, swap the hi/lo registers before doing the stqcrx. instruction. (store_conditionalpti): Use indexed_or_indirect_operand predicate to insure the address is valid for the stqcrx. instruction. * gcc/config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define __QUAD_MEMORY__ and __QUAD_MEMORY_ATOMIC__ based on what type of quad memory support is available. Index: gcc-4_8-test/gcc/config/rs6000/predicates.md === --- gcc-4_8-test.orig/gcc/config/rs6000/predicates.md +++ gcc-4_8-test/gcc/config/rs6000/predicates.md @@ -270,7 +270,7 @@ { HOST_WIDE_INT r; - if (!TARGET_QUAD_MEMORY) + if (!TARGET_QUAD_MEMORY && !TARGET_QUAD_MEMORY_ATOMIC) return 0; if (GET_CODE (op) == SUBREG) @@ -633,6 +633,7 @@ (match_test "offsettable_nonstrict_memref_p (op)"))) ;; Return 1 if the operand is suitable for load/store quad memory. +;; This predicate only checks for non-atomic loads/stores. (define_predicate "quad_memory_operand" (match_code "mem") { Index: gcc-4_8-test/gcc/config/rs6000/rs6000-c.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-c.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000-c.c @@ -337,6 +337,10 @@ rs6000_target_modify_macros (bool define rs6000_define_or_undefine_macro (define_p, "__HTM__"); if ((flags & OPTION_MASK_P8_VECTOR) != 0) rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__"); + if ((flags & OPTION_MASK_QUAD_MEMORY) != 0) +rs6000_define_or_undefine_macro (define_p, "__QUAD_MEMORY__"); + if ((flags & OPTION_MASK_QUAD_MEMORY_ATOMIC) != 0) +rs6000_define_or_undefine_macro (define_p, "__QUAD_MEMORY_ATOMIC__"); if ((flags & OPTION_MASK_CRYPTO) != 0) rs6000_define_or_undefine_macro (define_p, "__CRYPTO__"); Index: gcc-4_8-test/gcc/config/rs6000/rs6000-cpus.def === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-cpus.def +++ gcc-4_8-test/gcc/config/rs6000/rs6000-cpus.def @@ -53,7 +53,8 @@ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DIRECT_MOVE \ | OPTION_MASK_HTM \ -| OPTION_MASK_QUAD_MEMORY) +| OPTION_MASK_QUAD_MEMORY
Re: [RFA jit v2 2/2] introduce auto_timevar
On Wed, 2014-03-19 at 11:52 -0600, Tom Tromey wrote: > This introduces a new auto_timevar class. It pushes a given timevar > in its constructor, and pops it in the destructor, giving a much > simpler way to use timevars in the typical case where they can be > scoped. > --- > gcc/ChangeLog.jit | 4 > gcc/jit/ChangeLog.jit | 4 > gcc/jit/internal-api.c | 16 +--- > gcc/timevar.h | 26 +- > 4 files changed, 38 insertions(+), 12 deletions(-) OK (and it fixes a bug in the earlier version of the patch in the dtor, which pushed rather than popped). Are you able to push this to my branch yourself, or do you need me to do this?
[4.8, PATCH 1/26 too big]
Hi, The main patch for this series was too large for the mailer to accept. Sorry about that. This piece is all powerpc-related and seems to have been delivered to David ok. If anyone else wants a copy of the patch, please contact me privately and I'll send it your way. Thanks, Bill
Re: [PATCH] [gomp4] Initial support of OpenACC loop directive in C front-end.
Hi Ilmir! On Tue, 18 Mar 2014 16:37:24 +0400, Ilmir Usmanov wrote: > This patch introduces support of OpenACC loop directive (and combined > directives) in C front-end up to GENERIC. Currently no clause is allowed. > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/goacc/loop-1.c > @@ -0,0 +1,89 @@ > +/* { dg-do compile } */ > + > +int test1() > +{ > + int i, j, k, b[10]; > + int a[30]; > + double d; > + float r; > + i = 0; > + #pragma acc loop > + for (i = 1; i < 10; i++) > +{ > +} Do you intend to support loop constructs that are not nested in a parallel or kernels construct? As I'm reading it, the specification is not clear on this. (I guess I'll raise this question with the OpenACC guys.) Grüße, Thomas pgpJV43AkyNA2.pgp Description: PGP signature
[4.8, PATCH 24/26] Backport Power8 and LE support: Reload issues
Hi, This patch (diff-reload) backports fixes for a couple of problems in PowerPC reload handling. Thanks, Bill 2014-03-19 Bill Schmidt Apply mainline r207798 2014-02-26 Alan Modra PR target/58675 PR target/57935 * config/rs6000/rs6000.c (rs6000_secondary_reload_inner): Use find_replacement on parts of insn rtl that might be reloaded. Backport from mainline r208287 2014-03-03 Bill Schmidt * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow reload of PLUS rtx's outside of GENERAL_REGS or BASE_REGS; relax constraint on constants to permit them being loaded into GENERAL_REGS or BASE_REGS. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -16380,7 +16380,7 @@ rs6000_secondary_reload_inner (rtx reg, rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); rclass = REGNO_REG_CLASS (regno); - addr = XEXP (mem, 0); + addr = find_replacement (&XEXP (mem, 0)); switch (rclass) { @@ -16391,19 +16391,18 @@ rs6000_secondary_reload_inner (rtx reg, if (GET_CODE (addr) == AND) { and_op2 = XEXP (addr, 1); - addr = XEXP (addr, 0); + addr = find_replacement (&XEXP (addr, 0)); } if (GET_CODE (addr) == PRE_MODIFY) { - scratch_or_premodify = XEXP (addr, 0); + scratch_or_premodify = find_replacement (&XEXP (addr, 0)); if (!REG_P (scratch_or_premodify)) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - if (GET_CODE (XEXP (addr, 1)) != PLUS) + addr = find_replacement (&XEXP (addr, 1)); + if (GET_CODE (addr) != PLUS) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - - addr = XEXP (addr, 1); } if (GET_CODE (addr) == PLUS @@ -16411,6 +16410,8 @@ rs6000_secondary_reload_inner (rtx reg, || !rs6000_legitimate_offset_address_p (PTImode, addr, false, true))) { + /* find_replacement already recurses into both operands of +PLUS so we don't need to call it here. */ addr_op1 = XEXP (addr, 0); addr_op2 = XEXP (addr, 1); if (!legitimate_indirect_address_p (addr_op1, false)) @@ -16486,7 +16487,7 @@ rs6000_secondary_reload_inner (rtx reg, || !VECTOR_MEM_ALTIVEC_P (mode))) { and_op2 = XEXP (addr, 1); - addr = XEXP (addr, 0); + addr = find_replacement (&XEXP (addr, 0)); } /* If we aren't using a VSX load, save the PRE_MODIFY register and use it @@ -16498,14 +16499,13 @@ rs6000_secondary_reload_inner (rtx reg, || and_op2 != NULL_RTX || !legitimate_indexed_address_p (XEXP (addr, 1), false))) { - scratch_or_premodify = XEXP (addr, 0); + scratch_or_premodify = find_replacement (&XEXP (addr, 0)); if (!legitimate_indirect_address_p (scratch_or_premodify, false)) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - if (GET_CODE (XEXP (addr, 1)) != PLUS) + addr = find_replacement (&XEXP (addr, 1)); + if (GET_CODE (addr) != PLUS) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - - addr = XEXP (addr, 1); } if (legitimate_indirect_address_p (addr, false) /* reg */ @@ -16765,8 +16765,14 @@ rs6000_preferred_reload_class (rtx x, en && easy_vector_constant (x, mode)) return ALTIVEC_REGS; - if (CONSTANT_P (x) && reg_classes_intersect_p (rclass, FLOAT_REGS)) -return NO_REGS; + if ((CONSTANT_P (x) || GET_CODE (x) == PLUS)) +{ + if (reg_class_subset_p (GENERAL_REGS, rclass)) + return GENERAL_REGS; + if (reg_class_subset_p (BASE_REGS, rclass)) + return BASE_REGS; + return NO_REGS; +} if (GET_MODE_CLASS (mode) == MODE_INT && rclass == NON_SPECIAL_REGS) return GENERAL_REGS;
[4.8, PATCH 22/26] Backport Power8 and LE support: -mcall-* endianness
Hi, This patch (diff-mcall) fixes big-endian assumptions for -mcall-aixdesc and various others. Thanks, Bill 2014-03-19 Bill Schmidt Backport from mainline r207658 2014-02-06 Ulrich Weigand * config/rs6000/sysv4.h (ENDIAN_SELECT): Do not attempt to enforce big-endian mode for -mcall-aixdesc, -mcall-freebsd, -mcall-netbsd, -mcall-openbsd, or -mcall-linux. (CC1_ENDIAN_BIG_SPEC): Remove. (CC1_ENDIAN_LITTLE_SPEC): Remove. (CC1_ENDIAN_DEFAULT_SPEC): Remove. (CC1_SPEC): Remove (always empty) %cc1_endian_... spec. (SUBTARGET_EXTRA_SPECS): Remove %cc1_endian_big, %cc1_endian_little, and %cc1_endian_default. * config/rs6000/sysv4le.h (CC1_ENDIAN_DEFAULT_SPEC): Remove. Index: gcc-4_8-test/gcc/config/rs6000/sysv4.h === --- gcc-4_8-test.orig/gcc/config/rs6000/sysv4.h +++ gcc-4_8-test/gcc/config/rs6000/sysv4.h @@ -522,8 +522,6 @@ extern int fixuplabelno; #define ENDIAN_SELECT(BIG_OPT, LITTLE_OPT, DEFAULT_OPT)\ "%{mlittle|mlittle-endian:"LITTLE_OPT ";" \ "mbig|mbig-endian:" BIG_OPT";" \ - "mcall-aixdesc|mcall-freebsd|mcall-netbsd|" \ - "mcall-openbsd|mcall-linux:" BIG_OPT";" \ "mcall-i960-old:"LITTLE_OPT ";" \ ":" DEFAULT_OPT "}" @@ -536,20 +534,12 @@ extern int fixuplabelno; %{memb|msdata=eabi: -memb}" \ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN) -#defineCC1_ENDIAN_BIG_SPEC "" - -#defineCC1_ENDIAN_LITTLE_SPEC "" - -#defineCC1_ENDIAN_DEFAULT_SPEC "%(cc1_endian_big)" - #ifndef CC1_SECURE_PLT_DEFAULT_SPEC #define CC1_SECURE_PLT_DEFAULT_SPEC "" #endif -/* Pass -G xxx to the compiler and set correct endian mode. */ +/* Pass -G xxx to the compiler. */ #defineCC1_SPEC "%{G*} %(cc1_cpu)" \ - ENDIAN_SELECT(" %(cc1_endian_big)", " %(cc1_endian_little)", \ - " %(cc1_endian_default)") \ "%{meabi: %{!mcall-*: -mcall-sysv }} \ %{!meabi: %{!mno-eabi: \ %{mrelocatable: -meabi } \ @@ -903,9 +893,6 @@ ncrtn.o%s" { "link_os_netbsd", LINK_OS_NETBSD_SPEC }, \ { "link_os_openbsd", LINK_OS_OPENBSD_SPEC }, \ { "link_os_default", LINK_OS_DEFAULT_SPEC }, \ - { "cc1_endian_big", CC1_ENDIAN_BIG_SPEC }, \ - { "cc1_endian_little", CC1_ENDIAN_LITTLE_SPEC }, \ - { "cc1_endian_default", CC1_ENDIAN_DEFAULT_SPEC }, \ { "cc1_secure_plt_default", CC1_SECURE_PLT_DEFAULT_SPEC }, \ { "cpp_os_ads", CPP_OS_ADS_SPEC }, \ { "cpp_os_yellowknife", CPP_OS_YELLOWKNIFE_SPEC }, \ Index: gcc-4_8-test/gcc/config/rs6000/sysv4le.h === --- gcc-4_8-test.orig/gcc/config/rs6000/sysv4le.h +++ gcc-4_8-test/gcc/config/rs6000/sysv4le.h @@ -22,9 +22,6 @@ #undef TARGET_DEFAULT #define TARGET_DEFAULT MASK_LITTLE_ENDIAN -#undef CC1_ENDIAN_DEFAULT_SPEC -#defineCC1_ENDIAN_DEFAULT_SPEC "%(cc1_endian_little)" - #undef DEFAULT_ASM_ENDIAN #defineDEFAULT_ASM_ENDIAN " -mlittle"
[4.8, PATCH 23/26] Backport Power8 and LE support: PR60137, PR60203
Hi, This patch (diff-pr60137-pr60203) backports fixes for two little-endian vector mode problems. Thanks, Bill [gcc] 2014-03-19 Bill Schmidt Backport from mainline r207699. 2014-02-11 Michael Meissner PR target/60137 * config/rs6000/rs6000.md (128-bit GPR splitter): Add a splitter for VSX/Altivec vectors that land in GPR registers. Backport from mainline r207808. 2014-02-15 Michael Meissner PR target/60203 * config/rs6000/rs6000.md (rreg): Add TFmode, TDmode constraints. (mov_internal, TFmode/TDmode): Split TFmode/TDmode moves into 64-bit and 32-bit moves. On 64-bit moves, add support for using direct move instructions on ISA 2.07. Also adjust instruction length for 64-bit. (mov_64bit, TFmode/TDmode): Likewise. (mov_32bit, TFmode/TDmode): Likewise. Backport from mainline r207868. 2014-02-18 Michael Meissner PR target/60203 * config/rs6000/rs6000.md (mov_64bit, TF/TDmode moves): Split 64-bit moves into 2 patterns. Do not allow the use of direct move for TDmode in little endian, since the decimal value has little endian bytes within a word, but the 64-bit pieces are ordered in a big endian fashion, and normal subreg's of TDmode are not allowed. (mov_64bit_dm): Likewise. (movtd_64bit_nodm): Likewise. [gcc/testsuite] 2014-03-19 Bill Schmidt Backport from mainline r207699. 2014-02-11 Michael Meissner PR target/60137 * gcc.target/powerpc/pr60137.c: New file. Backport from mainline r207808. 2014-02-15 Michael Meissner PR target/60203 * gcc.target/powerpc/pr60203.c: New testsuite. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.md === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.md +++ gcc-4_8-test/gcc/config/rs6000/rs6000.md @@ -378,6 +378,8 @@ (define_mode_attr rreg [(SF "f") (DF "ws") + (TF "f") + (TD "f") (V4SF "wf") (V2DF "wd")]) @@ -8990,10 +8992,40 @@ ;; It's important to list Y->r and r->Y before r->r because otherwise ;; reload, given m->r, will try to pick r->r and reload it, which ;; doesn't make progress. -(define_insn_and_split "*mov_internal" + +;; We can't split little endian direct moves of TDmode, because the words are +;; not swapped like they are for TImode or TFmode. Subregs therefore are +;; problematical. Don't allow direct move for this case. + +(define_insn_and_split "*mov_64bit_dm" + [(set (match_operand:FMOVE128 0 "nonimmediate_operand" "=m,d,d,Y,r,r,r,wm") + (match_operand:FMOVE128 1 "input_operand" "d,m,d,r,YGHF,r,wm,r"))] + "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_POWERPC64 + && (mode != TDmode || WORDS_BIG_ENDIAN) + && (gpc_reg_operand (operands[0], mode) + || gpc_reg_operand (operands[1], mode))" + "#" + "&& reload_completed" + [(pc)] +{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; } + [(set_attr "length" "8,8,8,12,12,8,8,8")]) + +(define_insn_and_split "*movtd_64bit_nodm" + [(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,Y,r,r") + (match_operand:TD 1 "input_operand" "d,m,d,r,YGHF,r"))] + "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_POWERPC64 && !WORDS_BIG_ENDIAN + && (gpc_reg_operand (operands[0], TDmode) + || gpc_reg_operand (operands[1], TDmode))" + "#" + "&& reload_completed" + [(pc)] +{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; } + [(set_attr "length" "8,8,8,12,12,8")]) + +(define_insn_and_split "*mov_32bit" [(set (match_operand:FMOVE128 0 "nonimmediate_operand" "=m,d,d,Y,r,r") (match_operand:FMOVE128 1 "input_operand" "d,m,d,r,YGHF,r"))] - "TARGET_HARD_FLOAT && TARGET_FPRS + "TARGET_HARD_FLOAT && TARGET_FPRS && !TARGET_POWERPC64 && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" "#" @@ -9429,6 +9461,15 @@ [(set_attr "length" "12") (set_attr "type" "three")]) +(define_split + [(set (match_operand:FMOVE128_GPR 0 "nonimmediate_operand" "") + (match_operand:FMOVE128_GPR 1 "input_operand" ""))] + "reload_completed + && (int_reg_operand (operands[0], mode) + || int_reg_operand (operands[1], mode))" + [(pc)] +{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }) + ;; Move SFmode to a VSX from a GPR register. Because scalar floating point ;; type is stored internally as double precision in the VSX registers, we have ;; to convert it from the vector format. Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr60137.c === --- /dev/null +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr60137.c @@ -0,0 +1,17 @@ +/* { dg-do compile { tar
Re: [Fortran][PATCH][gomp4]: Transform OpenACC loop directive
Hi Illmir, Ilmir Usmanov: This patch implements transformation of OpenACC loop directive from Fortran AST to GENERIC. If I followed correctly, with this patch the Fortran FE implementation of OpenACC is complete, except for: * !$acc cache() - parsing supported, but then aborting with a not-implemented error * OpenACC 2.0a additions. Am I right? Successfully bootstrapped and tested with no new regressions on x86_64-unknown-linux-gnu. OK for gomp4 branch? I leave the review of gcc/tree-pretty-print.c part (looks good to me) to Thomas, who might have also a comment to the Fortran part. For a DO loop, the code looks okay. For DO CONCURRENT, it is not. I think we should really consider to reject DO CONCURRENT with a "not permitted"; it is currently not explicitly supported by OpenACC; I think we can still worry about it, when it will be explicitly added to OpenACC. Otherwise, see gfc_trans_do_concurrent for how to handle the do concurrent loops. Issues with DO CONCURRENT: * You use "code->ext.iterator->var" - that's fine with DO but not with DO CONCURRENT, which uses a "code->ext.forall_iterator" * Do concurrent also handles multiple variables in a single statement, such as: integer :: i, j, b(3,5) DO CONCURRENT(i=1:3, j=1:5:2) b(i, j) = -42 END DO end * And do concurrent also supports masks: logical :: my_mask(3) integer :: i, b(3) b(i) = [5, 5, 2] my_mask = [.true., .false., .true.] do concurrent (i=1:3, b(i) == 5 .and. my_mask(i)) b(i) = -42 end do end Tobias
Re: [Patch, Fortran] PRs 60283/60543: Fix two wrong-code bugs related for implicit pure
Early *ping* - I think this wrong-code GCC 4.7/4.8/4.9 issue is pretty severe. Tobias Burnus wrote: This patch fixes two issues, where gfortran claims that a function is implicit pure, but it is not. That will cause a wrong-code optimization in the middle end. First problem, cf. PR60543, is that implicit pure was not set to 0 for calls to impure intrinsic subroutines. (BTW: There are no impure intrinsic functions.) Example: module m contains REAL(8) FUNCTION random() CALL RANDOM_NUMBER(random) END FUNCTION random end module m The second problem pops up if one adds a BLOCK ... END BLOCK around the random_number call after applying the patch of the PR, which just does: gfc_current_ns->proc_name->attr.implicit_pure = 0. The problem is that one sets only the implicit_pure of the block to 0 and not of the function. That's the reason that the patch became much longer and that I added gfc_unset_implicit_pure as new function. Thus, the suspicion I had when reviewing the OpenACC patches turned out to be founded. Cf. PR60283. Build and regtested on x86-64-gnu-linux. OK for the trunk and for the 4.7 and 4.8 branches? Note: I failed to create a test case. Tobias
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, Mar 19, 2014 at 02:23:58PM -0500, Bill Schmidt wrote: > Support for Power8 features and the new powerpc64le-linux-gnu target, > including the ELFv2 ABI, has been developed up till now on the > ibm/gcc-4_8-branch. It was appropriate to use this separate branch > while the support was unstable, but this branch will not represent a > particularly good support mechanism for distributions going forward. > Most distros are set up to pull from the major release branches, and > having a separate branch for one target is quite inconvenient. Also, > the ibm/gcc-4_8-branch's original purpose is to serve as the code base > for IBM's Advance Toolchain 7.0. Over time the two purposes that the > branch currently serves will diverge and make things even more > complicated. > > The code is now tested and stable enough that we are ready to backport > this support to the FSF 4.8 branch. This patch series constitutes that > backport. I guess the most important question is what guarantees there are that it won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, plus the C++ FE / libstdc++ changes), and how much does this affect code generation and overall stability of the PowerPC big endian existing targets. Jakub
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, Mar 19, 2014 at 4:05 PM, Jakub Jelinek wrote: > On Wed, Mar 19, 2014 at 02:23:58PM -0500, Bill Schmidt wrote: >> Support for Power8 features and the new powerpc64le-linux-gnu target, >> including the ELFv2 ABI, has been developed up till now on the >> ibm/gcc-4_8-branch. It was appropriate to use this separate branch >> while the support was unstable, but this branch will not represent a >> particularly good support mechanism for distributions going forward. >> Most distros are set up to pull from the major release branches, and >> having a separate branch for one target is quite inconvenient. Also, >> the ibm/gcc-4_8-branch's original purpose is to serve as the code base >> for IBM's Advance Toolchain 7.0. Over time the two purposes that the >> branch currently serves will diverge and make things even more >> complicated. >> >> The code is now tested and stable enough that we are ready to backport >> this support to the FSF 4.8 branch. This patch series constitutes that >> backport. > > I guess the most important question is what guarantees there are that it > won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, > plus the C++ FE / libstdc++ changes), and how much does this affect > code generation and overall stability of the PowerPC big endian existing > targets. Before this patch is approved, we are going to thoroughly confirm that it does not harm any other PowerPC targets (big endian PowerLinux, eABI, nor AIX). Any help with testng from the PPC eABI community is appreciated. - David
Re: [Patch, Fortran] PRs 60283/60543: Fix two wrong-code bugs related for implicit pure
Dear Tobias, The patch looks OK to me. If nothing else, it offers a rationalisation of all the lines of code that unset the attribute! I am somewhat puzzled by "Note: I failed to create a test case", wheras I find one at the end of the patch. Can you explain what you mean? Cheers Paul On 19 March 2014 21:21, Tobias Burnus wrote: > Early *ping* - I think this wrong-code GCC 4.7/4.8/4.9 issue is pretty > severe. > > > Tobias Burnus wrote: >> >> This patch fixes two issues, where gfortran claims that a function is >> implicit pure, but it is not. That will cause a wrong-code optimization in >> the middle end. >> >> First problem, cf. PR60543, is that implicit pure was not set to 0 for >> calls to impure intrinsic subroutines. (BTW: There are no impure intrinsic >> functions.) Example: >> >> module m >> contains >> REAL(8) FUNCTION random() >> CALL RANDOM_NUMBER(random) >> END FUNCTION random >> end module m >> >> >> The second problem pops up if one adds a BLOCK ... END BLOCK around the >> random_number call after applying the patch of the PR, which just does: >> gfc_current_ns->proc_name->attr.implicit_pure = 0. >> >> The problem is that one sets only the implicit_pure of the block to 0 and >> not of the function. That's the reason that the patch became much longer and >> that I added gfc_unset_implicit_pure as new function. >> >> Thus, the suspicion I had when reviewing the OpenACC patches turned out to >> be founded. Cf. PR60283. >> >> Build and regtested on x86-64-gnu-linux. >> OK for the trunk and for the 4.7 and 4.8 branches? >> >> Note: I failed to create a test case. >> >> Tobias > > -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
[C++ PATCH] Fix ICE in build_zero_init_1 (PR c++/60572)
Hi! On the following testcase starting with r199779 we have a FIELD_DECL with error_mark_node type, on which we ICE. Fixed by ignoring such FIELD_DECLs. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-19 Jakub Jelinek PR c++/60572 * init.c (build_zero_init_1): Ignore fields with error_mark_node type. * g++.dg/init/pr60572.C: New test. --- gcc/cp/init.c.jj2014-03-10 10:50:14.0 +0100 +++ gcc/cp/init.c 2014-03-19 07:43:54.077795662 +0100 @@ -192,6 +192,9 @@ build_zero_init_1 (tree type, tree nelts if (TREE_CODE (field) != FIELD_DECL) continue; + if (TREE_TYPE (field) == error_mark_node) + continue; + /* Don't add virtual bases for base classes if they are beyond the size of the current field, that means it is present somewhere else in the object. */ --- gcc/testsuite/g++.dg/init/pr60572.C.jj 2014-03-19 07:46:33.607894844 +0100 +++ gcc/testsuite/g++.dg/init/pr60572.C 2014-03-19 07:46:49.752804722 +0100 @@ -0,0 +1,13 @@ +// PR c++/60572 +// { dg-do compile } + +struct A +{ + A x; // { dg-error "incomplete type" } + virtual ~A () {} +}; + +struct B : A +{ + B () : A () {} +}; Jakub
Re: [RFA jit v2 1/2] introduce class toplev
David> OK. Are you able to push this to my branch, or do you need me to do David> this? Thanks, I was able to push them. Tom
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, 2014-03-19 at 21:05 +0100, Jakub Jelinek wrote: > I guess the most important question is what guarantees there are that it > won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, > plus the C++ FE / libstdc++ changes), and how much does this affect > code generation and overall stability of the PowerPC big endian existing > targets. > > Jakub > The three pieces that are somewhat controversial for non-powerpc targets are 9/26, 10/26, 15/26. * Uli and Alan, can you speak to any concerns for 9/26? * 10/26 hits libstdc++, but only in a minor way for the extract_symvers script; it adds a sed to ignore a string added for powerpc64le, so shouldn't be a problem. * 15/26 might be one we can do without. I need to check with Peter Bergner, who originally backported Fabien's patch, but unfortunately he is on vacation. That patch fixed a problem that originated on an x86 platform. I can try respinning the patch series without this one and see what breaks, or if Peter happens to see this while he's on vacation, perhaps he can comment. For PowerPC targets, I have already checked out powerpc64-linux (big endian). As David mentioned, I need to apply the patch series on an AIX machine and test it before this can be accepted. We don't have any way of testing the eabi stuff, so community help would be very much appreciated there. Thanks, Bill
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, 2014-03-19 at 16:03 -0500, Bill Schmidt wrote: > On Wed, 2014-03-19 at 21:05 +0100, Jakub Jelinek wrote: > > > I guess the most important question is what guarantees there are that it > > won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, > > plus the C++ FE / libstdc++ changes), and how much does this affect > > code generation and overall stability of the PowerPC big endian existing > > targets. > > > > Jakub > > > > The three pieces that are somewhat controversial for non-powerpc targets > are 9/26, 10/26, 15/26. I forgot to mention that these bits have all been upstream in trunk since last autumn, so there's been quite a bit of burn-in at that level. Obviously that is not the same as being burned in on 4.8, but it does help provide a bit of confidence. Bill > > * Uli and Alan, can you speak to any concerns for 9/26? > > * 10/26 hits libstdc++, but only in a minor way for the extract_symvers > script; it adds a sed to ignore a string added for powerpc64le, so > shouldn't be a problem. > > * 15/26 might be one we can do without. I need to check with Peter > Bergner, who originally backported Fabien's patch, but unfortunately he > is on vacation. That patch fixed a problem that originated on an x86 > platform. I can try respinning the patch series without this one and > see what breaks, or if Peter happens to see this while he's on vacation, > perhaps he can comment. > > For PowerPC targets, I have already checked out powerpc64-linux (big > endian). As David mentioned, I need to apply the patch series on an AIX > machine and test it before this can be accepted. We don't have any way > of testing the eabi stuff, so community help would be very much > appreciated there. > > Thanks, > Bill
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On 03/19/14 15:03, Bill Schmidt wrote: On Wed, 2014-03-19 at 21:05 +0100, Jakub Jelinek wrote: I guess the most important question is what guarantees there are that it won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, plus the C++ FE / libstdc++ changes), and how much does this affect code generation and overall stability of the PowerPC big endian existing targets. Jakub The three pieces that are somewhat controversial for non-powerpc targets are 9/26, 10/26, 15/26. * Uli and Alan, can you speak to any concerns for 9/26? I've got no concerns about 9/26. Uli, Alan and myself worked through this pretty thoroughly. I've had those in the back of my mind as something we're going to want to make sure to pull in. Jeff
PR libstdc++/60587
I'm debugging http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60587 and have found a number of problems. Firstly, the bug report is correct, this overload dereferences the __other argument without checking if that is OK: template inline bool __foreign_iterator_aux3(const _Safe_iterator<_Iterator, _Sequence>& __it, _InputIterator __other, std::true_type) Secondly, in this testcase we should never even have reached that overload, because we should have gone to this overload of _aux2: template inline bool __foreign_iterator_aux2(const _Safe_iterator<_Iterator, _Sequence>& __it, const _Safe_iterator<_OtherIterator, _Sequence>& __other, std::input_iterator_tag) { return __it._M_get_sequence() != __other._M_get_sequence(); } However that is not chosen by overload resolution because this is a better match when __other is non-const: template inline bool __foreign_iterator_aux2(const _Safe_iterator<_Iterator, _Sequence>& __it, _InputIterator __other, std::random_access_iterator_tag) Fixing the overload resolution bug makes the testcase in the PR pass, but the underlying problem of dereferencing an invalid iterator still exists and can be shown by changing the testcase slightly: #define _GLIBCXX_DEBUG #include int main() { std::vector a; std::vector b; a.push_back(1); a.insert(a.end(), b.begin(), b.end()); } That still dereferences b.begin(), but that too can be fixed (either as suggested in the PR or by passing the begin and end iterators into the __foreign_iter function) but I think there's still another problem. I'm looking again at the code that attempts to check if we have contiguous storage: if (std::addressof(*(__it._M_get_sequence()->_M_base().end() - 1)) - std::addressof(*(__it._M_get_sequence()->_M_base().begin())) == __it._M_get_sequence()->size() - 1) Are we really sure that ensures contiguous iterators? What if we have a deque with three blocks laid out in memory like this: 1XXX3XXx2XXX ^ ^ begin()end() 1 is the start of the first block, 2 is the start of the second block and 3 is the start of the third block. X is an element, x is reserved but uninitialized capacity . is unallocated memory (or memory not used by the deque) Here we have end() - begin() == size() but non-contiguous memory. If the __other iterator happens to point to the unallocated memory between 1 and 3 then it will appear to be part of the deque, but isn't. I think the safe thing to do is (as I suggested at the time) to have a trait saying which iterator types refer to contiguous memory. Our debug mode only supports our own containers, so the ones which are contiguous are known. For 4.9.0 I think the right option is simply to remove __foreign_iterator_aux3 and __foreign_iterator_aux4 completely. The fixed version of __foreign_iterator_aux2() can detect when we have iterators referring to the same sequence, which is what we really want to detect. That's what the attached patch does and what I'm going to test. --- debug/functions.h.orig 2014-03-19 21:34:43.038647394 + +++ debug/functions.h 2014-03-19 21:35:53.502617461 + @@ -175,62 +175,6 @@ return __first; } -#if __cplusplus >= 201103L - // Default implementation. - template -inline bool -__foreign_iterator_aux4(const _Safe_iterator<_Iterator, _Sequence>& __it, - typename _Sequence::const_pointer __begin, - typename _Sequence::const_pointer __other) -{ - typedef typename _Sequence::const_pointer _PointerType; - constexpr std::less<_PointerType> __l{}; - - return (__l(__other, __begin) - || __l(std::addressof(*(__it._M_get_sequence()->_M_base().end() - - 1)), __other)); -} - - // Fallback when address type cannot be implicitely casted to sequence - // const_pointer. - template -inline bool -__foreign_iterator_aux4(const _Safe_iterator<_Iterator, _Sequence>&, - _InputIterator, ...) -{ return true; } - - template -inline bool -__foreign_iterator_aux3(const _Safe_iterator<_Iterator, _Sequence>& __it, - _InputIterator __other, - std::true_type) -{ - // Only containers with all elements in contiguous memory can have their - // elements passed through pointers. - // Arithmetics is here just to make sure we are not dereferencing - // past-the-end iterator. - if (__it._M_get_sequence()->_M_base().begin() - != __it._M_get_sequence()->_M_base().end()) - if (std::addressof(*(__it._M_get_sequence()->_M_base().end() - 1)) - - std::addressof(*(__it._M_get_sequence()->_M_base().begin())) - == __it._M_get_sequence()->size() - 1) - return (__foreign_iterator_aux4 -
Re: [Patch, Fortran] PRs 60283/60543: Fix two wrong-code bugs related for implicit pure
Paul Richard Thomas wrote: The patch looks OK to me. If nothing else, it offers a rationalisation of all the lines of code that unset the attribute! I am somewhat puzzled by "Note: I failed to create a test case", wheras I find one at the end of the patch. Can you explain what you mean? What I meant was that I failed to create a run-time test case, which fails without the patch. However, after I wrote that, I saw that there is a dg-* which permits to check the .mod file for a string. That's why I could include a test case. Committed to the trunk as Rev. 208687. While looking at the patch again for backporting, I saw that I have missed the following parts. I will commit them tomorrow as obvious, unless someone protests. Tobias 2014-03-19 Tobias Burnus PR fortran/60543 * io.c (check_io_constraints): Use gfc_unset_implicit_pure. * resolve.c (resolve_ordinary_assign): Ditto. Index: gcc/fortran/io.c === --- gcc/fortran/io.c (Revision 208687) +++ gcc/fortran/io.c (Arbeitskopie) @@ -3259,9 +3259,8 @@ if (condition) \ "an internal file in a PURE procedure", io_kind_name (k)); - if (gfc_implicit_pure (NULL) && (k == M_READ || k == M_WRITE)) - gfc_current_ns->proc_name->attr.implicit_pure = 0; - + if (k == M_READ || k == M_WRITE) + gfc_unset_implicit_pure (NULL); } if (k != M_READ) Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (Revision 208687) +++ gcc/fortran/resolve.c (Arbeitskopie) @@ -9165,7 +9165,7 @@ resolve_ordinary_assign (gfc_code *code, gfc_names if (lhs->expr_type == EXPR_VARIABLE && lhs->symtree->n.sym != gfc_current_ns->proc_name && lhs->symtree->n.sym->ns != gfc_current_ns) - gfc_current_ns->proc_name->attr.implicit_pure = 0; + gfc_unset_implicit_pure (NULL); if (lhs->ts.type == BT_DERIVED && lhs->expr_type == EXPR_VARIABLE @@ -9173,11 +9173,11 @@ resolve_ordinary_assign (gfc_code *code, gfc_names && rhs->expr_type == EXPR_VARIABLE && (gfc_impure_variable (rhs->symtree->n.sym) || gfc_is_coindexed (rhs))) - gfc_current_ns->proc_name->attr.implicit_pure = 0; + gfc_unset_implicit_pure (NULL); /* Fortran 2008, C1283. */ if (gfc_is_coindexed (lhs)) - gfc_current_ns->proc_name->attr.implicit_pure = 0; + gfc_unset_implicit_pure (NULL); } /* F2008, 7.2.1.2. */
Re: PR libstdc++/60587
On 19/03/14 21:39 +, Jonathan Wakely wrote: I think the safe thing to do is (as I suggested at the time) to have a trait saying which iterator types refer to contiguous memory. Our debug mode only supports our own containers, so the ones which are contiguous are known. For 4.9.0 I think the right option is simply to remove __foreign_iterator_aux3 and __foreign_iterator_aux4 completely. The fixed version of __foreign_iterator_aux2() can detect when we have iterators referring to the same sequence, which is what we really want to detect. That's what the attached patch does and what I'm going to test. With my suggested change we get an XPASS for testsuite/23_containers/vector/debug/57779_neg.cc An __is_contiguous trait would solve that.
Re: PR libstdc++/60587
Hi > On 19/mar/2014, at 23:28, Jonathan Wakely wrote: > >> On 19/03/14 21:39 +, Jonathan Wakely wrote: >> I think the safe thing to do is (as I suggested at the time) to have a >> trait saying which iterator types refer to contiguous memory. Our >> debug mode only supports our own containers, so the ones which are >> contiguous are known. For 4.9.0 I think the right option is simply >> to remove __foreign_iterator_aux3 and __foreign_iterator_aux4 >> completely. The fixed version of __foreign_iterator_aux2() can detect >> when we have iterators referring to the same sequence, which is what >> we really want to detect. That's what the attached patch does and what >> I'm going to test. > > With my suggested change we get an XPASS for > testsuite/23_containers/vector/debug/57779_neg.cc > > An __is_contiguous trait would solve that. Funny, I thought we already had it... Paolo
[patch committed SH] Fix target/60039
I've committed the attached patch to fix PR target/60039 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60039 which is a regression from 4.5 for some sh3 users. Tested on sh4-unknown-linux-gnu with -mdiv=call-div1. I'd like to backport it to 4.8 in a week or two as usual. Regards, kaz -- 2014-03-19 Kaz Kojima PR target/60039 * config/sh/sh.md (udivsi3_i1): Clobber R1 register. --- ORIG/trunk/gcc/config/sh/sh.md 2014-03-02 09:49:58.0 +0900 +++ trunk/gcc/config/sh/sh.md 2014-03-18 14:43:26.515319735 +0900 @@ -2314,6 +2314,7 @@ (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) + (clobber (reg:SI R1_REG)) (clobber (reg:SI R4_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r"))] "TARGET_SH1 && TARGET_DIVIDE_CALL_DIV1"