Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 11/06/2015 08:16 PM, Kyrill Tkachov wrote: On 06/11/15 17:09, Kyrill Tkachov wrote: On 06/11/15 17:07, Nikolai Bozhenov wrote: On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote: Hi! I faced the same issue but I had somewhat different RTL for the consumer: (insn 20 15 21 2 (set (reg/i:SI 0 r0) (minus:SI (subreg:SI (reg:DI 117) 4) (mult:SI (reg:SI 123) (reg:SI 114 gasman.c:4 48 {*mulsi3subsi}) where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one cycle in this case? If the accumulator can be forwarded (i.e. a SImode register), there isn't a reason why a subreg:SI (reg:DI) will not get forwarded. The subreg:SI is an artifact before register allocation, thus it's a representation issue that the patch is fixing here unless I misunderstand your question. I mean, in my example it is not the multiplication result that is forwarded but its upper part. So, shouldn't we check that offset in a subreg expression is zero? Or is it ok to forward only the upper part of a multiplication? Could you please post the full RTL instruction we're talking about here as it appears in the scheduler dump? So that we're all on the same page about which case we're talking about. Sorry, missed the above instruction. This subreg is just a pre-register allocation representation of the instruction and will go away after reload. This particular function only really has a real effect in post-reload scheduling as it's only there when the final register numbers are known. I see. aarch_accumulator_forwarding always returns 0 for virtual registers. But isn't it overly pessimistic to assume that accumulator forwarding is never possible at sched1? I wonder if it would be better to be more optimistic about register allocation outcome. I mean, in case of virtual registers we could assume forwarding from A to B if B is the only consumer of A's result. Something like this: if (REGNO (dest) >= FIRST_VIRTUAL_REGISTER || REGNO (accumulator) >= FIRST_VIRTUAL_REGISTER) return (DF_REG_USE_COUNT (REGNO (dest)) == 1) && (DF_REF_INSN (DF_REG_USE_CHAIN (REGNO (dest))) == consumer); else return REGNO (dest) == REGNO (accumulator); Thanks, Nikolai
FW: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures
[Forwarding to gcc-patches, doh!] Best regards, Thomas --- Begin Message --- Hi, ARM-specific thumb2-slow-flash-data.c testcase shows 2 failures when running for arm-none-eabi with -mcpu=cortex-m7: FAIL: gcc.target/arm/thumb2-slow-flash-data.c (test for excess errors) FAIL: gcc.target/arm/thumb2-slow-flash-data.c scan-assembler-times movt 13 The first one is due to a missing type specifier in the declaration of labelref while the second one is due to different constant synthesis as a result of a different tuning for the CPU selected. This patch fixes these issues by adding the missing type specifier and checking for .word and similar directive instead of the number of movt. The new test passes for all of -mcpu=cortex-m{3,4,7} but fail when removing the -mslow-flash-data switch. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-11-04 Thomas Preud'homme * gcc.target/arm/thumb2-slow-flash-data.c: Add missing typespec for labelref and check use of constant pool by looking for .word and similar directives. diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c index 9852ea5..089a72b 100644 --- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c +++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c @@ -50,7 +50,7 @@ int foo (int a, int b) { int i; - volatile *labelref = &&label1; + volatile int *labelref = &&label1; if (a > b) { @@ -70,5 +70,4 @@ label1: return a + b; } -/* { dg-final { scan-assembler-times "movt" 13 } } */ -/* { dg-final { scan-assembler-times "movt.*LC0\\+4" 1 } } */ +/* { dg-final { scan-assembler-not "\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */ Is this ok for trunk? Best regards, Thomas --- End Message ---
Re: [PING][PATCH][4.9]Backport "Fix register corruption bug in ree"
On Fri, 6 Nov 2015, Renlin Li wrote: > Hi Richard, > > I am trying to come up with a simple test case, but failed to do so. > > I have isolated the function where bug is triggered. However, in order to make > it a useful test case, the silently corrupted register should be used > somewhere, and should affect the correctness of the program. I cannot create > such an context. > > > libstdc++-v3/testsuite/tr1/8_c_compatibility/complex/50880.cc is a concreted > example which will trigger this bug. Well, if that testcase exposes the bug you already have a testcase. Richard. > Here is a short description about why the test case failed: > > wrong code-gen in ree pass: > > (insn 279 278 531 2 (set (reg:TF 32 v0) > (zero_extend:TF (reg:DI 19 x19 [orig:150 __x+8 ] [150]))) > /libstdc++-v3/include/complex:861 691 {aarch64_movtflow_di} > (nil)) > (insn 531 279 275 2 (set (reg:TF 3 x3) > (reg:TF 32 v0)) /libstdc++-v3/include/complex:861 -1 > (nil)) > (insn 275 531 510 2 (set (reg:DI 23 x23 [orig:151 __y ] [151]) > (mem:DI (plus:DI (reg/v/f:DI 0 x0 [orig:104 __z ] [104]) > (const_int 16 [0x10])) [9 MEM[(const long double &)__z_4(D) + > 16]+0 S8 A128])) /libstdc++-v3/include/complex:859 34 {*movdi_aarch64} > (nil)) > (insn 510 275 19 2 (set (zero_extract:TF (reg:TF 32 v0) > (const_int 64 [0x40]) > (const_int 64 [0x40])) > (zero_extend:TF (reg:DI 2 x2 [ __x+-8 ]))) > /libstdc++-v3/include/complex:861 689 {aarch64_movtfhigh_di} > (nil)) > > > insn 531 clobbers x4 as well. That's exactly the same problem described in > Wilco's email. The following is what the correct code should look like. It's > generated after applying Wilco's fix. > > > (insn 279 278 275 2 (set (reg:DI 3 x3 [orig:185 __x ] [185]) > (reg:DI 19 x19 [orig:150 __x+8 ] [150])) > /libstdc++-v3/include/complex:861 34 {*movdi_aarch64} > (nil)) > (insn 275 279 509 2 (set (reg:DI 23 x23 [orig:151 __y ] [151]) > (mem:DI (plus:DI (reg/v/f:DI 0 x0 [orig:104 __z ] [104]) > (const_int 16 [0x10])) [9 MEM[(const long double &)__z_4(D) + > 16]+0 S8 A128])) /libstdc++-v3/include/complex:859 34 {*movdi_aarch64} > (nil)) > (insn 509 275 510 2 (set (reg:TF 32 v0) > (zero_extend:TF (reg:DI 3 x3 [orig:185 __x ] [185]))) > /libstdc++-v3/include/complex:861 691 {aarch64_movtflow_di} > (nil)) > (insn 510 509 19 2 (set (zero_extract:TF (reg:TF 32 v0) > (const_int 64 [0x40]) > (const_int 64 [0x40])) > (zero_extend:TF (reg:DI 2 x2 [ __x+-8 ]))) > /libstdc++-v3/include/complex:861 689 {aarch64_movtfhigh_di} > (nil)) > > > > Regards, > Renlin Li > > On 29/10/15 16:33, Richard Biener wrote: > > On October 29, 2015 4:37:08 PM GMT+01:00, Ramana Radhakrishnan > > wrote: > > > On Thu, Jun 4, 2015 at 2:16 PM, Renlin Li wrote: > > > > Ping ~ > > > > > > > > Can somebody review it? Thank you! > > > > > > > > Regards, > > > > Renlin Li > > > > > > > > > > > > On 16/04/15 10:32, Renlin Li wrote: > > > > > Ping~ > > > > > > > > > > Regards, > > > > > Renlin Li > > > > > > > > > > On 16/04/15 10:09, wrote: > > > > > > Ping~ > > > > > > Anybody has time to review it? > > > > > > > > > > > > > > > > > > Regards, > > > > > > Renlin Li > > > > > > > > > > > > On 06/02/15 17:48, Renlin Li wrote: > > > > > > > Hi all, > > > > > > > > > > > > > > This is a backport patch for branch 4.9. You can find the > > > original=20 > > > > > > > patch > > > here:https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00356.html > > > > > > > And it has been commit on the trunk as r215205. > > > > > > > > > > > > > > This fixes a few libstdc++-v3 test suite failures. > > > > > > > x86_64 bootstraps Okay, aarch64_be-none-elf libstdc++-v3 tested > > > Okay. > > > > > > > Okay to commit on branch 4.9? > > > > > > > > > > > > > > Regards, > > > > > > > Renlin Li > > > > > > > > > > > > > > 2015-02-06 Renlin Li > > > > > > > > > > > > > > Backport from mainline > > > > > > > 2014-09-12 Wilco Dijkstra > > > > > > > > > > > > > > * gcc/ree.c (combine_reaching_defs): Ensure inserted copy > > > don't=20 > > > > > > > change > > > > > > > the number of hard registers. > > > > > > > > > > richi - an RM question - > > > > > > Is this something that can be pulled back to GCC 4.9 branch assuming > > > testing still shows no regressions - it breaks aarch64 be on GCC 4.9 > > If it is a regression or wrong-code fix its OK with a test case. > > > > Richard. > > > > > ... > > > > > > > > > regards > > > Ramana > > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [gomp4, committed] Remove superfluous pass_expand_omp_ssa
Hi! On Sun, 8 Nov 2015 23:00:31 +0100, Tom de Vries wrote: > This patch removes a superfluous pass_expand_omp_ssa. [...] > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -122,7 +122,6 @@ along with GCC; see the file COPYING3. If not see > late. */ > NEXT_PASS (pass_split_functions); >POP_INSERT_PASSES () > - NEXT_PASS (pass_expand_omp_ssa); >NEXT_PASS (pass_release_ssa_names); >NEXT_PASS (pass_rebuild_cgraph_edges); >NEXT_PASS (pass_inline_parameters); Committed to gomp-4_0-branch in r229994: commit 38d123c26f4b1f53258742fd285df2a128150159 Author: tschwinge Date: Mon Nov 9 08:36:54 2015 + Adjust test case for removed pass_expand_omp_ssa gcc/testsuite/ * gcc.dg/gomp/notify-new-function-3.c: Scan "ompexpssa2" instead of "ompexpssa3". git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229994 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/testsuite/ChangeLog.gomp | 5 + gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp index a300e38..22f1c74 100644 --- gcc/testsuite/ChangeLog.gomp +++ gcc/testsuite/ChangeLog.gomp @@ -1,3 +1,8 @@ +2015-11-09 Thomas Schwinge + + * gcc.dg/gomp/notify-new-function-3.c: Scan "ompexpssa2" instead + of "ompexpssa3". + 2015-11-08 Tom de Vries * g++.dg/tree-ssa/copyprop-1.C: Update for extra pass_dce in pass list. diff --git gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c index c3d097e..033a407 100644 --- gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c +++ gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c @@ -11,4 +11,4 @@ foo (int *__restrict a, int *__restrict b, int *__restrict c) /* Check for new function notification in ompexpssa dump. */ -/* { dg-final { scan-tree-dump-times "Added new ssa gimple function foo\\.\[\\\$_\]loopfn\\.0 to callgraph" 1 "ompexpssa3" } } */ +/* { dg-final { scan-tree-dump-times "Added new ssa gimple function foo\\.\[\\\$_\]loopfn\\.0 to callgraph" 1 "ompexpssa2" } } */ Grüße Thomas signature.asc Description: PGP signature
[PATCH] Minor refactoring in tree-ssanames.c & freelists verifier
This is a piece of a larger refactoring. I'm postponing the larger refactoring until next stage1 -- I have some concerns about the compile-time impact that I'll need to measure. In the mean time I want to get knowledge of the freelists out of pass_release_ssa_names::execute. The larger refactoring will move the freelists into a class and pass_release_ssa_names won't have access to those private members. So the guts of that routine have been moved into what will ultimately be a method of the name manager. I'm also including a leak detector/verification routine for the freelists. This version will bootstrap on the trunk if the verifier is enabled at the end of flush_ssanames_freelist. However, the vectorizer leaks names horribly, thus it won't pass a regression test or a -O3 bootstrap. In the mean time, the verifier is still useful to help find leaks -- it's obviously not checking by default and the code is marked as a debug function. This wraps up the things I expect to address during this stage1 cycle. I've got a fair amount of patch review to cover in the immediate future as well as starting the bugfixing cycle. Bootstrapped and regression tested on x86_64-linux-gnu. Installed on the trunk. Jeff commit feba260ed75060c9acc265ff12a44557e280f5be Author: Jeff Law Date: Sun Nov 8 20:33:59 2015 -0700 [PATCH] Minor refactoring in tree-ssanames.c & freelists verifier * tree-into-ssa.c (names_to_release): No longer static. * tree-into-ssa.h (names_to_release): Declare. * tree-ssanames.c (verify_ssaname_freelists): New debug function. (release_free_names_and_compact_live_names): New function extracted from pass_release_ssa_names::execute. (pass_release_ssa_names::execute): Use it. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index dfb8350..c70ab87 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2015-11-09 Jeff Law + + * tree-into-ssa.c (names_to_release): No longer static. + * tree-into-ssa.h (names_to_release): Declare. + * tree-ssanames.c (verify_ssaname_freelists): New debug function. + (release_free_names_and_compact_live_names): New function extracted + from pass_release_ssa_names::execute. + (pass_release_ssa_names::execute): Use it. + 2015-11-09 Alan Modra * gensupport.c (add_mnemonic_string): Make len param a size_t. @@ -11,7 +20,7 @@ * gcc/bb-reorder.c (reorder_basic_blocks_simple): Treat a conditional branch with only one successor just like unconditional branches. -2015-11-08 Jeff Law +2015-11-08 Jeff Law * tree-ssa-threadupdate.c (register_jump_thraed): Assert that a non-FSM path has no edges marked with EDGE_DFS_BACK. @@ -369,7 +378,7 @@ the dominance info; free it if we can't. (pass_call_cdce::execute): Don't free the dominance info here. -2015-11-06 Jeff Law +2015-11-06 Jeff Law * tree-ssa-threadedge.c (dummy_simplify): Remove. (thread_around_empty_blocks): Remove backedge_seen_p argument. @@ -402,7 +411,7 @@ (build_scop_scattering): Call build_pbb_minimal_scattering_polyhedrons. (build_poly_scop): Call build_scop_minimal_scattering. -2015-11-06 Jeff Law +2015-11-06 Jeff Law * cfg-flags.def (IGNORE): New edge flag. * tree-vrp.c (identify_jump_threads): Mark and clear edges @@ -1174,7 +1183,7 @@ * tree-ssa.c (gimple_uses_undefined_value_p): New. * tree-ssa.h (gimple_uses_undefined_value_p): Declare. -2015-11-02 Jeff Law +2015-11-02 Jeff Law * tree-ssa-threadupdate.c (valid_jump_thread_path): Also detect cases where the loop latch edge is in the middle of an FSM path. @@ -1238,7 +1247,7 @@ PR middle-end/68166 * fold-const.c: Include "md5.h". -2015-11-01 Jeff Law +2015-11-01 Jeff Law * vmsdbgout.c: Revert unused header file reduction patch. @@ -1287,7 +1296,7 @@ * tree-ssa-structalias.c (ipa_pta_execute): Use make_copy_constraint. -2015-10-30 Jeff Law +2015-10-30 Jeff Law Nathan Sidwell * config/nvptx/nvptx.h (HARD_REGNO_NREGS): Avoid warning on unused diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c index 6533998..3086f82 100644 --- a/gcc/tree-into-ssa.c +++ b/gcc/tree-into-ssa.c @@ -94,7 +94,7 @@ static sbitmap interesting_blocks; /* Set of SSA names that have been marked to be released after they were registered in the replacement table. They will be finally released after we finish updating the SSA web. */ -static bitmap names_to_release; +bitmap names_to_release; /* vec of vec of PHIs to rewrite in a basic block. Element I corresponds the to basic block with index I. Allocated once per compilation, *not* diff --git a/gcc/tree-into-ssa.h b/gcc/tree-into-ssa.h index c053f78..aed1e95 100644 --- a/gcc/tree-into-ssa.h +++ b/gcc/tree-into-ssa.h @@ -48,5 +48,6 @@ extern void dump_names_replac
Re: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins
On 08/11/15 00:26, charles.bay...@linaro.org wrote: > From: Charles Baylis > > gcc/ChangeLog: > > Charles Baylis > > PR target/63870 > * config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator > qualifier_struct_load_store_lane_index. > (builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX. > (arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON > argument qualifiers. > (arm_expand_neon_builtin): Handle new NEON argument qualifier. > * config/arm/arm.h (ENDIAN_LANE_N): New macro. > > Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647 > --- > gcc/config/arm/arm-builtins.c | 48 > +++ > gcc/config/arm/arm.c | 1 + > gcc/config/arm/arm.h | 3 +++ > 3 files changed, 34 insertions(+), 18 deletions(-) > > diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c > index bad3dc3..6e3aad4 100644 > --- a/gcc/config/arm/arm-builtins.c > +++ b/gcc/config/arm/arm-builtins.c > @@ -67,7 +67,9 @@ enum arm_type_qualifiers >/* Polynomial types. */ >qualifier_poly = 0x100, >/* Lane indices - must be within range of previous argument = a vector. */ > - qualifier_lane_index = 0x200 > + qualifier_lane_index = 0x200, > + /* Lane indices for single lane structure loads and stores. */ > + qualifier_struct_load_store_lane_index = 0x400 > }; > > /* The qualifier_internal allows generation of a unary builtin from > @@ -1963,6 +1965,7 @@ typedef enum { >NEON_ARG_COPY_TO_REG, >NEON_ARG_CONSTANT, >NEON_ARG_LANE_INDEX, > + NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX, >NEON_ARG_MEMORY, >NEON_ARG_STOP > } builtin_arg; > @@ -2020,9 +2023,9 @@ neon_dereference_pointer (tree exp, tree type, > machine_mode mem_mode, > /* Expand a Neon builtin. */ > static rtx > arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, > - int icode, int have_retval, tree exp, ...) > + int icode, int have_retval, tree exp, > + builtin_arg *args) > { > - va_list ap; >rtx pat; >tree arg[SIMD_MAX_BUILTIN_ARGS]; >rtx op[SIMD_MAX_BUILTIN_ARGS]; > @@ -2037,13 +2040,11 @@ arm_expand_neon_args (rtx target, machine_mode > map_mode, int fcode, > || !(*insn_data[icode].operand[0].predicate) (target, tmode))) > target = gen_reg_rtx (tmode); > > - va_start (ap, exp); > - >formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode])); > >for (;;) > { > - builtin_arg thisarg = (builtin_arg) va_arg (ap, int); > + builtin_arg thisarg = args[argc]; > >if (thisarg == NEON_ARG_STOP) > break; > @@ -2079,6 +2080,18 @@ arm_expand_neon_args (rtx target, machine_mode > map_mode, int fcode, > op[argc] = copy_to_mode_reg (mode[argc], op[argc]); > break; > > + case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX: > + gcc_assert (argc > 1); > + if (CONST_INT_P (op[argc])) > + { > + neon_lane_bounds (op[argc], 0, > + GET_MODE_NUNITS (map_mode), exp); > + /* Keep to GCC-vector-extension lane indices in the RTL. */ > + op[argc] = > + GEN_INT (ENDIAN_LANE_N (map_mode, INTVAL (op[argc]))); > + } > + goto constant_arg; > + > case NEON_ARG_LANE_INDEX: > /* Previous argument must be a vector, which this indexes. */ > gcc_assert (argc > 0); > @@ -2089,19 +2102,22 @@ arm_expand_neon_args (rtx target, machine_mode > map_mode, int fcode, > } > /* Fall through - if the lane index isn't a constant then >the next case will error. */ > + > case NEON_ARG_CONSTANT: > +constant_arg: > if (!(*insn_data[icode].operand[opno].predicate) > (op[argc], mode[argc])) > - error_at (EXPR_LOCATION (exp), "incompatible type for argument > %d, " > -"expected %", argc + 1); > + { > + error ("%Kargument %d must be a constant immediate", > + exp, argc + 1); > + return const0_rtx; > + } > break; > + > case NEON_ARG_MEMORY: > /* Check if expand failed. */ > if (op[argc] == const0_rtx) > - { > - va_end (ap); > return 0; > - } > gcc_assert (MEM_P (op[argc])); > PUT_MODE (op[argc], mode[argc]); > /* ??? arm_neon.h uses the same built-in functions for signed > @@ -2122,8 +2138,6 @@ arm_expand_neon_args (rtx target, machine_mode > map_mode, int fcode, > } > } > > - va_end (ap); > - >if (have_retval) > switch (argc) >{ > @@ -2235,6 +2249,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx > target) > >if (d->quali
Re: RFC: C++ delayed folding merge
On Mon, 9 Nov 2015, Jason Merrill wrote: > I'm planning to merge the C++ delayed folding branch this week, but I need to > get approval of the back end changes (the first patch attached). Most of > these are the introduction of non-folding variants of convert_to_*, but there > are a few others. > > One question: The branch changes 'convert' to not fold its result, and it's > not clear to me whether that's part of the expected behavior of a front end > 'convert' function or not. History. convert is purely frontend (but shared, unfortunately between all frontends). I would expect that FEs that do not do delayed folding expect convert to fold. > Also, I'm a bit uncertain about merging this at the end of stage 1, since it's > a large internal change with relatively small user impact; it just improves > handling of constant expression corner cases. I'm inclined to go ahead with > it at this point, but I'm interested in contrary opinions. I welcome this change as it should allow cleaning up the FE-middle-end interface a bit more. It should be possible to remove all NON_LVALUE_EXPR adding/removal from the middle-end folders. Looks like the backend patch included frontend parts but as far as I skimmed it only diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 5e32901..d754a90 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -2091,6 +2091,17 @@ fold_convert_const (enum tree_code code, tree type, tree arg1) else if (TREE_CODE (arg1) == REAL_CST) return fold_convert_const_fixed_from_real (type, arg1); } + else if (TREE_CODE (type) == VECTOR_TYPE) +{ + if (TREE_CODE (arg1) == VECTOR_CST + && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (TREE_TYPE (arg1)) + && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1)) + { + tree r = copy_node (arg1); + TREE_TYPE (arg1) = type; + return r; + } +} looks suspicious. The issue here is that the vector elements will have the wrong type after this simple handling. If you fix that you can as well handle all kind of element type changes via recursing to fold_convert_const (that includes float to int / int to float changes). Not sure if we can even write a testcase for such conversions with the GCC vector extensions though. Thanks, Richard.
Re: [PATCH 2/4] [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier
On 08/11/15 00:26, charles.bay...@linaro.org wrote: > From: Charles Baylis > > gcc/ChangeLog: > > Charles Baylis > > PR target/63870 > * config/arm/arm-builtins.c: (arm_load1_qualifiers) Use > qualifier_struct_load_store_lane_index. > (arm_storestruct_lane_qualifiers) Likewise. > * config/arm/neon.md: (neon_vld1_lane) Reverse lane numbers for > big-endian. > (neon_vst1_lane) Likewise. > (neon_vld2_lane) Likewise. > (neon_vst2_lane) Likewise. > (neon_vld3_lane) Likewise. > (neon_vst3_lane) Likewise. > (neon_vld4_lane) Likewise. > (neon_vst4_lane) Likewise. > > Change-Id: Ic39898d288701bc5b712490265be688f5620c4e2 > --- > gcc/config/arm/arm-builtins.c | 4 ++-- > gcc/config/arm/neon.md| 49 > +++ > 2 files changed, 28 insertions(+), 25 deletions(-) > > diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c > index 6e3aad4..113e3da 100644 > --- a/gcc/config/arm/arm-builtins.c > +++ b/gcc/config/arm/arm-builtins.c > @@ -152,7 +152,7 @@ arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] > static enum arm_type_qualifiers > arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_none, qualifier_const_pointer_map_mode, > - qualifier_none, qualifier_immediate }; > + qualifier_none, qualifier_struct_load_store_lane_index }; > #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers) > > /* The first argument (return type) of a store should be void type, > @@ -171,7 +171,7 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] > static enum arm_type_qualifiers > arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_void, qualifier_pointer_map_mode, > - qualifier_none, qualifier_immediate }; > + qualifier_none, qualifier_struct_load_store_lane_index }; > #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers) > > #define v8qi_UP V8QImode > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index e5a2b0f..e8db020 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN) > UNSPEC_VLD1_LANE))] >"TARGET_NEON" > { > - HOST_WIDE_INT lane = INTVAL (operands[3]); > + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); > + operands[3] = GEN_INT (lane); >if (lane < 0 || lane >= max) > error ("lane out of range"); >if (max == 1) > @@ -4281,8 +4282,9 @@ if (BYTES_BIG_ENDIAN) > UNSPEC_VLD1_LANE))] >"TARGET_NEON" > { > - HOST_WIDE_INT lane = INTVAL (operands[3]); > + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); > + operands[3] = GEN_INT (lane); >int regno = REGNO (operands[0]); >if (lane < 0 || lane >= max) > error ("lane out of range"); > @@ -4367,8 +4369,9 @@ if (BYTES_BIG_ENDIAN) > UNSPEC_VST1_LANE))] >"TARGET_NEON" > { > - HOST_WIDE_INT lane = INTVAL (operands[2]); > + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); > + operands[2] = GEN_INT (lane); >if (lane < 0 || lane >= max) > error ("lane out of range"); >if (max == 1) > @@ -4387,7 +4390,7 @@ if (BYTES_BIG_ENDIAN) > UNSPEC_VST1_LANE))] >"TARGET_NEON" > { > - HOST_WIDE_INT lane = INTVAL (operands[2]); > + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >if (lane < 0 || lane >= max) > @@ -4396,8 +4399,8 @@ if (BYTES_BIG_ENDIAN) > { >lane -= max / 2; >regno += 2; > - operands[2] = GEN_INT (lane); > } > + operands[2] = GEN_INT (lane); >operands[1] = gen_rtx_REG (mode, regno); >if (max == 2) > return "vst1.\t{%P1}, %A0"; > @@ -4457,7 +4460,7 @@ if (BYTES_BIG_ENDIAN) > UNSPEC_VLD2_LANE))] >"TARGET_NEON" > { > - HOST_WIDE_INT lane = INTVAL (operands[3]); > + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[4]; > @@ -4466,7 +4469,7 @@ if (BYTES_BIG_ENDIAN) >ops[0] = gen_rtx_REG (DImode, regno); >ops[1] = gen_rtx_REG (DImode, regno + 2); >ops[2] = operands[1]; > - ops[3] = operands[3]; > + ops[3] = GEN_INT (lane); >output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops); >return ""; > } > @@ -4482,7 +4485,7 @@ if (BYTES_BIG_ENDIAN) > UNSPEC_VLD2_LANE))] >"TARGET_NEON" > { > - HOST_WIDE_INT lane = INTVAL (operands[3]); > + HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[4]; > @@ -4572,7 +4575,7 @@ if (BYTES_BIG_ENDIAN) > UNSPE
Re: FW: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures
On 09/11/15 08:21, Thomas Preud'homme wrote: [Forwarding to gcc-patches, doh!] Best regards, Thomas [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures.eml Subject: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures From: Thomas Preud'homme Date: 09/11/15 08:02 To: Richard Earnshaw , Ramana Radhakrishnan , Kyrylo Tkachov Hi, ARM-specific thumb2-slow-flash-data.c testcase shows 2 failures when running for arm-none-eabi with -mcpu=cortex-m7: FAIL: gcc.target/arm/thumb2-slow-flash-data.c (test for excess errors) FAIL: gcc.target/arm/thumb2-slow-flash-data.c scan-assembler-times movt 13 The first one is due to a missing type specifier in the declaration of labelref while the second one is due to different constant synthesis as a result of a different tuning for the CPU selected. This patch fixes these issues by adding the missing type specifier and checking for .word and similar directive instead of the number of movt. The new test passes for all of -mcpu=cortex-m{3,4,7} but fail when removing the -mslow-flash-data switch. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-11-04 Thomas Preud'homme * gcc.target/arm/thumb2-slow-flash-data.c: Add missing typespec for labelref and check use of constant pool by looking for .word and similar directives. Ok. Thanks, Kyrill diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c index 9852ea5..089a72b 100644 --- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c +++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c @@ -50,7 +50,7 @@ int foo (int a, int b) { int i; - volatile *labelref = &&label1; + volatile int *labelref = &&label1; if (a > b) { @@ -70,5 +70,4 @@ label1: return a + b; } -/* { dg-final { scan-assembler-times "movt" 13 } } */ -/* { dg-final { scan-assembler-times "movt.*LC0\\+4" 1 } } */ +/* { dg-final { scan-assembler-not "\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */ Is this ok for trunk? Best regards, Thomas
Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 09/11/15 08:14, Nikolai Bozhenov wrote: On 11/06/2015 08:16 PM, Kyrill Tkachov wrote: On 06/11/15 17:09, Kyrill Tkachov wrote: On 06/11/15 17:07, Nikolai Bozhenov wrote: On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote: Hi! I faced the same issue but I had somewhat different RTL for the consumer: (insn 20 15 21 2 (set (reg/i:SI 0 r0) (minus:SI (subreg:SI (reg:DI 117) 4) (mult:SI (reg:SI 123) (reg:SI 114 gasman.c:4 48 {*mulsi3subsi}) where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one cycle in this case? If the accumulator can be forwarded (i.e. a SImode register), there isn't a reason why a subreg:SI (reg:DI) will not get forwarded. The subreg:SI is an artifact before register allocation, thus it's a representation issue that the patch is fixing here unless I misunderstand your question. I mean, in my example it is not the multiplication result that is forwarded but its upper part. So, shouldn't we check that offset in a subreg expression is zero? Or is it ok to forward only the upper part of a multiplication? Could you please post the full RTL instruction we're talking about here as it appears in the scheduler dump? So that we're all on the same page about which case we're talking about. Sorry, missed the above instruction. This subreg is just a pre-register allocation representation of the instruction and will go away after reload. This particular function only really has a real effect in post-reload scheduling as it's only there when the final register numbers are known. I see. aarch_accumulator_forwarding always returns 0 for virtual registers. But isn't it overly pessimistic to assume that accumulator forwarding is never possible at sched1? I wonder if it would be better to be more optimistic about register allocation outcome. I mean, in case of virtual registers we could assume forwarding from A to B if B is the only consumer of A's result. Something like this: if (REGNO (dest) >= FIRST_VIRTUAL_REGISTER || REGNO (accumulator) >= FIRST_VIRTUAL_REGISTER) return (DF_REG_USE_COUNT (REGNO (dest)) == 1) && (DF_REF_INSN (DF_REG_USE_CHAIN (REGNO (dest))) == consumer); else return REGNO (dest) == REGNO (accumulator); Interesting... As far as I know sched1 tries to minimise live ranges before register allocation rather than trying to perform the most exact pipeline modelling, since we only know the exact registers used in sched2. What you're proposing is a heuristic, so it would need benchmarking results and analysis to be considered. Thanks, Kyrill Thanks, Nikolai
Re: RFC: C++ delayed folding merge
> One question: The branch changes 'convert' to not fold its result, and > it's not clear to me whether that's part of the expected behavior of a > front end 'convert' function or not. I don't think that you should change the behavior for front-ends that have an internal representation distinct from the GENERIC trees and thus do a global translation to GENERIC at the end; e.g. in the Ada compiler we'd rather have *more* folding than less during this translation. -- Eric Botcazou
Re: [PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem
On 06/11/15 10:46, Kyrill Tkachov wrote: Hi all, In this wrong-code PR the vector setmem expansion and arm_block_set_aligned_vect in particular use the wrong offset when calling adjust_automodify_address. In the attached testcase during the initial zeroing out we get two V16QI stores, but they both are recorded by adjust_automodify_address as modifying x+0 rather than x+0 and x+12 (the total size to be written is 28). This led to the scheduling pass moving the store from "x.g = 2;" to before the zeroing stores. This patch fixes the problem by keeping track of the offset to which stores are emitted and passing it to adjust_automodify_address as appropriate. From inspection I see arm_block_set_unaligned_vect also has this issue so I performed the same fix in that function as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? This bug appears on GCC 5 too and I'm currently testing this patch there. Ok to backport to GCC 5 as well? Bootstrap and testing for arm-none-linux-gnueabihf on GCC 5 branch looks ok too. Kyrill Thanks, Kyrill 2015-11-06 Kyrylo Tkachov PR target/68143 * config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of offset from dstbase and use it appropriately in adjust_automodify_address. (arm_block_set_aligned_vect): Likewise. 2015-11-06 Kyrylo Tkachov PR target/68143 * gcc.target/arm/pr68143_1.c: New test.
Re: [PATCH v2] libitm: Support sized delete.
On 11/09/2015 01:24 AM, Torvald Riegel wrote: This patch supports the sized variants of operator delete. Some change compare to v1. Tested on x86_64-linux. Looks good. r~
[gomp4, committed] Cleanup formatting of pass_expand_omp_ssa::execute
Hi, this patch makes the formatting of pass_expand_omp_ssa::execute identical to the one in trunk. Committed to gomp-4_0-branch. Thanks, - Tom Cleanup formatting of pass_expand_omp_ssa::execute 2015-11-09 Tom de Vries * omp-low.c (pass_expand_omp_ssa::execute): Cleanup formatting. --- gcc/omp-low.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 57ac2aa..b78a8d6 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -13657,10 +13657,7 @@ public: { return !(fun->curr_properties & PROP_gimple_eomp); } - virtual unsigned int execute (function *) -{ - return execute_expand_omp (); -} + virtual unsigned int execute (function *) { return execute_expand_omp (); } opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); } }; // class pass_expand_omp_ssa -- 1.9.1
[gomp4, committed] Remove TODO_remove_unused_locals from todo_flags_finish of pass_data_expand_omp_ssa
Hi, this patch removes TODO_remove_unused_locals from todo_flags_finish of pass_data_expand_omp_ssa. I can't reproduce the problem TODO_remove_unused_locals is supposed to fix with the current state of gomp-4_0-branch. Committed to gomp-4_0-branch. Thanks, - Tom Remove TODO_remove_unused_locals from todo_flags_finish of pass_data_expand_omp_ssa 2015-11-09 Tom de Vries * omp-low.c (pass_data_expand_omp_ssa): Remove TODO_remove_unused_locals from todo_flags_finish. --- gcc/omp-low.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index b78a8d6..4d6dc99 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -13641,8 +13641,7 @@ const pass_data pass_data_expand_omp_ssa = PROP_gimple_eomp, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ - TODO_cleanup_cfg | TODO_rebuild_alias - | TODO_remove_unused_locals, /* todo_flags_finish */ + TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */ }; class pass_expand_omp_ssa : public gimple_opt_pass -- 1.9.1
[gomp4, committed] Provide PROP_gimple_eomp unconditionally in pass_expand_omp
Hi, now that we don't postpone expanding the kernels region anymore, we can set PROP_gimple_eomp unconditionally. Committed to gomp-4_0-branch. Thanks, - Tom Provide PROP_gimple_eomp unconditionally in pass_expand_omp 2015-11-09 Tom de Vries * omp-low.c (pass_data_expand_omp): Add PROP_gimple_eomp to properties_provided. (pass_expand_omp::execute): Don't set PROP_gimple_eomp. --- gcc/omp-low.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 4d6dc99..cd9c9e6 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -13589,7 +13589,7 @@ const pass_data pass_data_expand_omp = OPTGROUP_NONE, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_gimple_any, /* properties_required */ - 0 /* Possibly PROP_gimple_eomp. */, /* properties_provided */ + PROP_gimple_eomp, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ 0, /* todo_flags_finish */ @@ -13603,14 +13603,12 @@ public: {} /* opt_pass methods: */ - virtual unsigned int execute (function *fun) + virtual unsigned int execute (function *) { bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0 || flag_openmp_simd != 0) && !seen_error ()); - fun->curr_properties |= PROP_gimple_eomp; - /* This pass always runs, to provide PROP_gimple_eomp. But often, there is nothing to do. */ if (!gate) -- 1.9.1
[gomp4, committed] Remove ssa support in expand_omp_target
Hi, now that we don't postpone expanding the kernels region anymore, we don't need ssa support in expand_omp_target. Committed to gomp-4_0-branch. Thanks, - Tom Remove ssa support in expand_omp_target 2015-11-09 Tom de Vries * omp-low.c (release_first_vuse_in_edge_dest): Remove (expand_omp_target): Remove ssa support. --- gcc/omp-low.c | 74 --- 1 file changed, 5 insertions(+), 69 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index cd9c9e6..ed7640a 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -6973,38 +6973,6 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from, } } -/* Release the first vuse in bb E->dest, either normal or phi arg for - edge E. */ - -static void -release_first_vuse_in_edge_dest (edge e) -{ - gimple_stmt_iterator i; - basic_block bb = e->dest; - - for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i)) -{ - gimple *phi = gsi_stmt (i); - tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e); - - if (!virtual_operand_p (arg)) - continue; - - mark_virtual_operand_for_renaming (arg); - return; -} - - for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i)) -{ - gimple *stmt = gsi_stmt (i); - if (gimple_vuse (stmt) == NULL_TREE) - continue; - - mark_virtual_operand_for_renaming (gimple_vuse (stmt)); - return; -} -} - /* Expand the OpenMP parallel or task directive starting at REGION. */ static void @@ -12727,6 +12695,7 @@ expand_omp_target (struct omp_region *region) /* Supported by expand_omp_taskreg, but not here. */ if (child_cfun != NULL) gcc_checking_assert (!child_cfun->cfg); + gcc_checking_assert (!gimple_in_ssa_p (cfun)); entry_bb = region->entry; exit_bb = region->exit; @@ -12735,7 +12704,7 @@ expand_omp_target (struct omp_region *region) mark_loops_in_oacc_kernels_region (region->entry, region->exit); basic_block entry_succ_bb = single_succ (entry_bb); - if (offloaded && !gimple_in_ssa_p (cfun)) + if (offloaded) { gsi = gsi_last_bb (entry_succ_bb); if (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_ENTRY_END) @@ -12793,25 +12762,8 @@ expand_omp_target (struct omp_region *region) gcc_assert (tgtcopy_stmt != NULL); arg = DECL_ARGUMENTS (child_fn); - if (!gimple_in_ssa_p (cfun)) - { - gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg); - gsi_remove (&gsi, true); - } - else - { - tree lhs = gimple_assign_lhs (tgtcopy_stmt); - gcc_assert (SSA_NAME_VAR (lhs) == arg); - /* We'd like to set the rhs to the default def in the child_fn, - but it's too early to create ssa names in the child_fn. - Instead, we set the rhs to the parm. In - move_sese_region_to_fn, we introduce a default def for the - parm, map the parm to it's default def, and once we encounter - this stmt, replace the parm with the default def. */ - gimple_assign_set_rhs1 (tgtcopy_stmt, arg); - gcc_assert (ssa_default_def (cfun, arg) == NULL); - update_stmt (tgtcopy_stmt); - } + gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg); + gsi_remove (&gsi, true); } /* Declare local variables needed in CHILD_CFUN. */ @@ -12854,23 +12806,11 @@ expand_omp_target (struct omp_region *region) stmt = gimple_build_return (NULL); gsi_insert_after (&gsi, stmt, GSI_SAME_STMT); gsi_remove (&gsi, true); - - /* A vuse in single_succ (exit_bb) may use a vdef from the region - which is about to be split off. Mark the vdef for renaming. */ - release_first_vuse_in_edge_dest (single_succ_edge (exit_bb)); } /* Move the offloading region into CHILD_CFUN. */ - if (gimple_in_ssa_p (cfun)) - { - init_tree_ssa (child_cfun); - init_ssa_operands (child_cfun); - child_cfun->gimple_df->in_ssa_p = true; - block = NULL_TREE; - } - else - block = gimple_block (entry_stmt); + block = gimple_block (entry_stmt); new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block); if (exit_bb) @@ -12937,8 +12877,6 @@ expand_omp_target (struct omp_region *region) if (changed) cleanup_tree_cfg (); } - if (gimple_in_ssa_p (cfun)) - update_ssa (TODO_update_ssa); if (flag_checking && !loops_state_satisfies_p (LOOPS_NEED_FIXUP)) verify_loop_structure (); pop_cfun (); @@ -13257,8 +13195,6 @@ expand_omp_target (struct omp_region *region) gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN); gsi_remove (&gsi, true); } - if (gimple_in_ssa_p (cfun)) -update_ssa (TODO_update_ssa_only_virtuals); } /* Expand the parallel region tree rooted at REGION. Expansion -- 1.9.1
Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
On Mon, Nov 09, 2015 at 08:52:13AM +0100, Uros Bizjak wrote: > On Sun, Nov 8, 2015 at 9:58 PM, Segher Boessenkool > wrote: > > On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote: > >> This patch stops combine from generating widening muls of anything else > >> but registers (immediates, memory, ...). This probably is a reasonable > >> tradeoff for all targets, even those (if any) that have such insns. > >> > >> > >I'll let you put it through it's paces on your setup :) > >> > >> > I'll let Segher give the final yes/no on this, but it generally looks > >> > good to me. > >> > >> It looks okay to me too. Testing now, combine patches have the tendency > >> to do unforeseen things on other targets ;-) > > > > Testing shows it makes a difference only very rarely. For many targets > > it makes no difference, for a few it is a small win. For 32-bit x86 it > > creates slightly bigger code. > > > > I think it looks good, but let's wait to hear Uros' opinion. > > >From the original patch submission, it looks that this patch would > also benefit x86_32. Yes, that is what I thought too. > Regarding the above code size increase - do you perhaps have a > testcase, to see what causes the difference? I could extract some. It happens quite rarely on usual code. > It isn't necessary due to > the patch, but perhaps some loads are moved to the insn and aren't > CSE'd anymore. Segher
Re: [hsa 4/12] OpenMP lowering/expansion changes (gridification)
Hi, On Thu, Nov 05, 2015 at 10:57:33PM +0100, Martin Jambor wrote: > ... > > For convenience of anybody reviewing the code, I'm attaching a very > simple testcase with selection of dumps that illustrate the whole > process. > My apologies, I have forgotten to attach the file, so let me quickly correct that now. The tar file consists of the source and a selection of dumps generated by a compilation with "-fopenmp -O -S -fdump-tree-all -fdump-tree-omplower-details" flags. Thanks, Martin plusone.tgz Description: application/compressed-tar
Re: Add a combined_fn enum
On 11/07/2015 01:22 PM, Richard Sandiford wrote: I'm working on a patch series that needs to be able to treat built-in functions and internal functions in a similar way. This patch adds a new enum, combined_fn, that combines the two together. It also adds utility functions for seeing which combined_fn (if any) is called by a given CALL_EXPR or gcall. Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. OK to install? I see it's already acked, but have you considered just doing away with the builtin/internal function distinction? Bernd
Re: Add a combined_fn enum
Bernd Schmidt writes: > On 11/07/2015 01:22 PM, Richard Sandiford wrote: >> I'm working on a patch series that needs to be able to treat built-in >> functions and internal functions in a similar way. This patch adds a >> new enum, combined_fn, that combines the two together. It also adds >> utility functions for seeing which combined_fn (if any) is called by >> a given CALL_EXPR or gcall. >> >> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. >> OK to install? > > I see it's already acked, but have you considered just doing away with > the builtin/internal function distinction? I think they're too different to be done away with entirely. built-in functions map directly to a specific C-level callable function and must have an fndecl, whereas no internal function should have an fndecl. Whether a built-in function is available depends on the selected language and what declarations the front-end has seen, while whether an internal function is available depends entirely on GCC internal information. We always have the option of not handling a built-in function specially and simply calling its function in the normal way, whereas for internal functions we always need to opencode the function in GCC. Thanks, Richard
[gomp4] Merge trunk r229831 (2015-11-06) into gomp-4_0-branch
Hi! Committed to gomp-4_0-branch in r229996: commit 16dbf91c2738167a38a4870c79015dfd619ef903 Merge: 38d123c 917fa02 Author: tschwinge Date: Mon Nov 9 09:07:26 2015 + svn merge -r 229816:229831 svn+ssh://gcc.gnu.org/svn/gcc/trunk git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229996 138bc75d-0d04-0410-961f-82ee72b054a4 Grüße Thomas signature.asc Description: PGP signature
Fix PR middle-end/68253
This apparently doesn't have any effect in practice, but valgrind rightfully complains, as reported by Martin L. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2015-11-09 Eric Botcazou PR middle-end/68253 * fold-const.c (fold_truth_andor_1): Initialize new variables to 0. -- Eric BotcazouIndex: fold-const.c === --- fold-const.c (revision 229993) +++ fold-const.c (working copy) @@ -5561,6 +5561,7 @@ fold_truth_andor_1 (location_t loc, enum || (rcode != EQ_EXPR && rcode != NE_EXPR)) return 0; + ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0; volatilep = 0; ll_inner = decode_field_reference (loc, ll_arg, &ll_bitsize, &ll_bitpos, &ll_mode,
Re: Add a combined_fn enum
On 11/09/2015 11:24 AM, Richard Sandiford wrote: Bernd Schmidt writes: I see it's already acked, but have you considered just doing away with the builtin/internal function distinction? I think they're too different to be done away with entirely. built-in functions map directly to a specific C-level callable function and must have an fndecl, whereas no internal function should have an fndecl. Whether a built-in function is available depends on the selected language and what declarations the front-end has seen, while whether an internal function is available depends entirely on GCC internal information. Yes... but aren't these things fixable relatively easily (compared with what your patches are doing)? I also have the problem that I can't quite see where your patch series is going. Let's take "Add internal bitcount functions", it adds new internal functions but no users AFAICS. What is the end goal here (there doesn't seem to be a [0/N] description in my inbox)? Bernd
Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants
On 03/11/15 11:16, James Greenhalgh wrote: On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote: In this patch from the series, all vmulx_lane variants have been implemented as a vdup followed by a vmulx. Existing implementations of intrinsics were refactored to use this new approach. Several new nameless md patterns are added that will enable the combine pass to pick up the dup/fmulx combination and replace it with a proper fmulx[lane] instruction. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. Hi, I have a small style comment below. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vmulx_lane_f32): New. (vmulx_lane_f64): New. (vmulxq_lane_f32): Refactored & moved. (vmulxq_lane_f64): Refactored & moved. (vmulx_laneq_f32): New. (vmulx_laneq_f64): New. (vmulxq_laneq_f32): New. (vmulxq_laneq_f64): New. (vmulxs_lane_f32): New. (vmulxs_laneq_f32): New. (vmulxd_lane_f64): New. (vmulxd_laneq_f64): New. * config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1, VDQSF): New pattern. (*aarch64_combine_dupfmulx2, VDQF): New pattern. (*aarch64_combine_dupfmulx3): New pattern. (*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern. I'm not sure I like the use of 1,2,3 for this naming scheme. Elsewhere in the file, this convention points to the number of operands a pattern requires (for example add3). I think elsewhere in the file we use: "*aarch64_mul3_elt" "*aarch64_mul3_elt_" "*aarch64_mul3_elt_to_128df" "*aarch64_mul3_elt_to_64v2df" Is there a reason not to follow that pattern? Thanks, James Hi, I've made the changes you've requested - the pattern names have been changed to follow better the naming conventions elsewhere in the file. Thanks, Bilyan diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 269e00237bb1153ebf42505906ec5b760b04aafe..5ff19094b2fb10b332d186a6de02752b31ed4141 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2880,6 +2880,79 @@ [(set_attr "type" "neon_fp_mul_")] ) +;; fmulxq_lane_f32, and fmulx_laneq_f32 + +(define_insn "*aarch64_mulx_elt_" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (unspec:VDQSF + [(match_operand:VDQSF 1 "register_operand" "w") + (vec_duplicate:VDQSF + (vec_select: + (match_operand: 2 "register_operand" "w") + (parallel [(match_operand:SI 3 "immediate_operand" "i")])))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +operands[3] = GEN_INT (ENDIAN_LANE_N (mode, + INTVAL (operands[3]))); +return "fmulx\t%0, %1, %2.[%3]"; + } + [(set_attr "type" "neon_fp_mul__scalar")] +) + +;; fmulxq_laneq_f32, fmulxq_laneq_f64, fmulx_lane_f32 + +(define_insn "*aarch64_mulx_elt" + [(set (match_operand:VDQF 0 "register_operand" "=w") + (unspec:VDQF + [(match_operand:VDQF 1 "register_operand" "w") + (vec_duplicate:VDQF + (vec_select: + (match_operand:VDQF 2 "register_operand" "w") + (parallel [(match_operand:SI 3 "immediate_operand" "i")])))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); +return "fmulx\t%0, %1, %2.[%3]"; + } + [(set_attr "type" "neon_fp_mul_")] +) + +;; fmulxq_lane_f64 + +(define_insn "*aarch64_mulx_elt_to_64v2df" + [(set (match_operand:V2DF 0 "register_operand" "=w") + (unspec:V2DF + [(match_operand:V2DF 1 "register_operand" "w") + (vec_duplicate:V2DF + (match_operand:DF 2 "register_operand" "w"))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +return "fmulx\t%0.2d, %1.2d, %2.d[0]"; + } + [(set_attr "type" "neon_fp_mul_d_scalar_q")] +) + +;; fmulxs_lane_f32, fmulxs_laneq_f32, fmulxd_lane_f64 == fmulx_lane_f64, +;; fmulxd_laneq_f64 == fmulx_laneq_f64 + +(define_insn "*aarch64_vgetfmulx" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand: 1 "register_operand" "w") + (vec_select: + (match_operand:VDQF_DF 2 "register_operand" "w") + (parallel [(match_operand:SI 3 "immediate_operand" "i")]))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); +return "fmulx\t%0, %1, %2.[%3]"; + } + [(set_attr "type" "fmul")] +) ;; q (define_insn "aarch64_" diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 265c26679089f5a1b30a41360687fb1e3c2a8e50..eb65c009a430ad1ad5e0eabd50b4968617379b5e 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -8509,32 +8509,6 @@ vmulq_n_u32 (uint32x4_t a, uint32_t b) return result; } -#define vmulxq_lane_f32(a, b, c)\ - __extension__ \ -({ \ -
Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants + Changelog
On 09/11/15 11:03, Bilyan Borisov wrote: On 03/11/15 11:16, James Greenhalgh wrote: On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote: In this patch from the series, all vmulx_lane variants have been implemented as a vdup followed by a vmulx. Existing implementations of intrinsics were refactored to use this new approach. Several new nameless md patterns are added that will enable the combine pass to pick up the dup/fmulx combination and replace it with a proper fmulx[lane] instruction. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. Hi, I have a small style comment below. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vmulx_lane_f32): New. (vmulx_lane_f64): New. (vmulxq_lane_f32): Refactored & moved. (vmulxq_lane_f64): Refactored & moved. (vmulx_laneq_f32): New. (vmulx_laneq_f64): New. (vmulxq_laneq_f32): New. (vmulxq_laneq_f64): New. (vmulxs_lane_f32): New. (vmulxs_laneq_f32): New. (vmulxd_lane_f64): New. (vmulxd_laneq_f64): New. * config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1, VDQSF): New pattern. (*aarch64_combine_dupfmulx2, VDQF): New pattern. (*aarch64_combine_dupfmulx3): New pattern. (*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern. I'm not sure I like the use of 1,2,3 for this naming scheme. Elsewhere in the file, this convention points to the number of operands a pattern requires (for example add3). I think elsewhere in the file we use: "*aarch64_mul3_elt" "*aarch64_mul3_elt_" "*aarch64_mul3_elt_to_128df" "*aarch64_mul3_elt_to_64v2df" Is there a reason not to follow that pattern? Thanks, James Hi, I've made the changes you've requested - the pattern names have been changed to follow better the naming conventions elsewhere in the file. Thanks, Bilyan Hi, You can find the new updated Changelog for this patch below. Thanks, Bilyan --- In this patch from the series, all vmulx_lane variants have been implemented as a vdup followed by a vmulx. Existing implementations of intrinsics were refactored to use this new approach. Several new nameless md patterns are added that will enable the combine pass to pick up the dup/fmulx combination and replace it with a proper fmulx[lane] instruction. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vmulx_lane_f32): New. (vmulx_lane_f64): Likewise. (vmulxq_lane_f32): Refactored & moved. (vmulxq_lane_f64): Likewise. (vmulx_laneq_f32): New. (vmulx_laneq_f64): Likewise. (vmulxq_laneq_f32): Likewise. (vmulxq_laneq_f64): Likewise. (vmulxs_lane_f32): Likewise. (vmulxs_laneq_f32): Likewise. (vmulxd_lane_f64): Likewise. (vmulxd_laneq_f64): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_mulx_elt_, VDQSF): New pattern. (*aarch64_mulx_elt, VDQF): Likewise. (*aarch64_mulx_elt_to_64v2df): Likewise. (*aarch64_vgetfmulx, VDQF_DF): Likewise. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/simd/vmulx_lane_f32_1.c: New. * gcc.target/aarch64/simd/vmulx_lane_f64_1.c: New. * gcc.target/aarch64/simd/vmulx_laneq_f32_1.c: New. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: New. * gcc.target/aarch64/simd/vmulxq_lane_f32_1.c: New. * gcc.target/aarch64/simd/vmulxq_lane_f64_1.c: New. * gcc.target/aarch64/simd/vmulxq_laneq_f32_1.c: New. * gcc.target/aarch64/simd/vmulxq_laneq_f64_1.c: New. * gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: New. * gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: New. * gcc.target/aarch64/simd/vmulxd_lane_f64_1.c: New. * gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: New.
[patch] Fix PR middle-end/68251
Hi, I don't understand how this didn't show up sooner given the amount of testing, but there is a nasty overloading for the new REF_REVERSE_STORAGE_ORDER flag: #define REF_REVERSE_STORAGE_ORDER(NODE) \ (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag) saturating_flag is part of the 'bits' structure, which is itself part of the 'u' union, which also contains the dependence_info structure, and: #define MR_DEPENDENCE_CLIQUE(NODE) \ (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.clique) #define MR_DEPENDENCE_BASE(NODE) \ (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.base) So there is a conflict on MEM_REF nodes. Therefore the attached patch changes REF_REVERSE_STORAGE_ORDER to use the default_def_flag instead. Tested on x86_64-suse-linux, OK for the mainline? I'll install the Fortran testcase once it is reduced because it takes a while to compile ATM. 2015-11-09 Eric Botcazou PR middle-end/68251 * tree-core.h (REF_REVERSE_STORAGE_ORDER): Move around. * tree.h (REF_REVERSE_STORAGE_ORDER): Change to default_def_flag. * tree-streamer-in.c (unpack_ts_base_value_fields): Adjust. * tree-streamer-out.c (pack_ts_base_value_fields): Likewise. -- Eric BotcazouIndex: tree-core.h === --- tree-core.h (revision 229993) +++ tree-core.h (working copy) @@ -1157,9 +1157,6 @@ struct GTY(()) tree_base { TYPE_SATURATING in other types - REF_REVERSE_STORAGE_ORDER in - BIT_FIELD_REF, MEM_REF - VAR_DECL_IS_VIRTUAL_OPERAND in VAR_DECL @@ -1174,6 +1171,9 @@ struct GTY(()) tree_base { default_def_flag: + TYPE_FINAL_P in + RECORD_TYPE, UNION_TYPE and QUAL_UNION_TYPE + TYPE_VECTOR_OPAQUE in VECTOR_TYPE @@ -1183,8 +1183,8 @@ struct GTY(()) tree_base { DECL_NONLOCAL_FRAME in VAR_DECL - TYPE_FINAL_P in - RECORD_TYPE, UNION_TYPE and QUAL_UNION_TYPE + REF_REVERSE_STORAGE_ORDER in + BIT_FIELD_REF, MEM_REF */ struct GTY(()) tree_typed { Index: tree.h === --- tree.h (revision 229993) +++ tree.h (working copy) @@ -927,7 +927,7 @@ extern void omp_clause_range_check_faile But, of course, the storage order must be preserved when the accesses themselves are rewritten or transformed. */ #define REF_REVERSE_STORAGE_ORDER(NODE) \ - (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag) + (TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.default_def_flag) /* These flags are available for each language front end to use internally. */ #define TREE_LANG_FLAG_0(NODE) \ Index: tree-streamer-in.c === --- tree-streamer-in.c (revision 229993) +++ tree-streamer-in.c (working copy) @@ -143,7 +143,10 @@ unpack_ts_base_value_fields (struct bitp TYPE_ADDR_SPACE (expr) = (unsigned) bp_unpack_value (bp, 8); } else if (TREE_CODE (expr) == BIT_FIELD_REF || TREE_CODE (expr) == MEM_REF) -REF_REVERSE_STORAGE_ORDER (expr) = (unsigned) bp_unpack_value (bp, 1); +{ + REF_REVERSE_STORAGE_ORDER (expr) = (unsigned) bp_unpack_value (bp, 1); + bp_unpack_value (bp, 8); +} else if (TREE_CODE (expr) == SSA_NAME) { SSA_NAME_IS_DEFAULT_DEF (expr) = (unsigned) bp_unpack_value (bp, 1); Index: tree-streamer-out.c === --- tree-streamer-out.c (revision 229993) +++ tree-streamer-out.c (working copy) @@ -117,7 +117,10 @@ pack_ts_base_value_fields (struct bitpac bp_pack_value (bp, TYPE_ADDR_SPACE (expr), 8); } else if (TREE_CODE (expr) == BIT_FIELD_REF || TREE_CODE (expr) == MEM_REF) -bp_pack_value (bp, REF_REVERSE_STORAGE_ORDER (expr), 1); +{ + bp_pack_value (bp, REF_REVERSE_STORAGE_ORDER (expr), 1); + bp_pack_value (bp, 0, 8); +} else if (TREE_CODE (expr) == SSA_NAME) { bp_pack_value (bp, SSA_NAME_IS_DEFAULT_DEF (expr), 1);
[PATCH] Fix memory leaks and use a pool_allocator
Hi. This is follow-up of changes that Richi started on Friday. Patch can bootstrap on x86_64-linux-pc and regression tests are running. Ready for trunk? Thanks, Martin >From ece1ad32be127c1e00dd18c8d0358bf842fad2bd Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 9 Nov 2015 10:49:14 +0100 Subject: [PATCH] Fix memory leaks and use a pool_allocator gcc/ChangeLog: 2015-11-09 Martin Liska * gcc.c (record_temp_file): Release name string. * ifcvt.c (noce_convert_multiple_sets): Release temporaries vector. * lra-lives.c (free_live_range_list): Utilize lra_live_range_pool for allocation and deallocation. (create_live_range): Likewise. (copy_live_range): Likewise. (lra_merge_live_ranges): Likewise. (remove_some_program_points_and_update_live_ranges): Likewise. (lra_create_live_ranges_1): Release point_freq_vec that can be not freed from previous iteration of the function. * tree-eh.c (lower_try_finally_switch): Release a vector. * tree-sra.c (sra_deinitialize): Release all vectors in base_access_vec. * tree-ssa-dom.c (free_edge_info): Make the function extern. * tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges): Release edge_info for a removed edge. (thread_through_all_blocks): Free region vector. * tree-ssa.h (free_edge_info): Declare function extern. --- gcc/gcc.c | 5 - gcc/ifcvt.c | 3 +++ gcc/lra-lives.c | 11 ++- gcc/tree-eh.c | 2 ++ gcc/tree-sra.c | 4 gcc/tree-ssa-dom.c | 2 +- gcc/tree-ssa-threadupdate.c | 6 +- gcc/tree-ssa.h | 1 + 8 files changed, 26 insertions(+), 8 deletions(-) diff --git a/gcc/gcc.c b/gcc/gcc.c index bbc9b23..8bbf5be 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -2345,7 +2345,10 @@ record_temp_file (const char *filename, int always_delete, int fail_delete) struct temp_file *temp; for (temp = always_delete_queue; temp; temp = temp->next) if (! filename_cmp (name, temp->name)) - goto already1; + { + free (name); + goto already1; + } temp = XNEW (struct temp_file); temp->next = always_delete_queue; diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index fff62de..eb6b7df 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) set_used_flags (targets[i]); } + temporaries.release (); + set_used_flags (cond); set_used_flags (x); set_used_flags (y); @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) } num_updated_if_blocks++; + targets.release (); return TRUE; } diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c index 1655c47..87003be 100644 --- a/gcc/lra-lives.c +++ b/gcc/lra-lives.c @@ -103,7 +103,7 @@ free_live_range_list (lra_live_range_t lr) while (lr != NULL) { next = lr->next; - delete lr; + lra_live_range_pool.remove (lr); lr = next; } } @@ -112,7 +112,7 @@ free_live_range_list (lra_live_range_t lr) static lra_live_range_t create_live_range (int regno, int start, int finish, lra_live_range_t next) { - lra_live_range_t p = new lra_live_range; + lra_live_range_t p = lra_live_range_pool.allocate (); p->regno = regno; p->start = start; p->finish = finish; @@ -124,7 +124,7 @@ create_live_range (int regno, int start, int finish, lra_live_range_t next) static lra_live_range_t copy_live_range (lra_live_range_t r) { - return new lra_live_range (*r); + return new (lra_live_range_pool) lra_live_range (*r); } /* Copy live range list given by its head R and return the result. */ @@ -167,7 +167,7 @@ lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2) r1->start = r2->start; lra_live_range_t temp = r2; r2 = r2->next; - delete temp; + lra_live_range_pool.remove (temp); } else { @@ -1081,7 +1081,7 @@ remove_some_program_points_and_update_live_ranges (void) } prev_r->start = r->start; prev_r->next = next_r; - delete r; + lra_live_range_pool.remove (r); } } } @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) dead_set = sparseset_alloc (max_regno); unused_set = sparseset_alloc (max_regno); curr_point = 0; + point_freq_vec.release (); point_freq_vec.create (get_max_uid () * 2); lra_point_freq = point_freq_vec.address (); int *post_order_rev_cfg = XNEWVEC (int, last_basic_block_for_fn (cfun)); diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c index 617d657..be140ee 100644 --- a/gcc/tree-eh.c +++ b/gcc/tree-eh.c @@ -1557,6 +1557,8 @@ lower_try_finally_switch (struct leh_state *state, struct leh_tf_state *tf) due to not wanting to process the same goto stmts twice. */ gimple_seq_add_stmt (&tf->top_p_seq, switch_stmt); gimple_seq_add_seq (&tf->top_p_seq, switch_body); + + case_label_vec.release (); } /* Decide whether or not we are going to duplicate the finally block. diff --git a/gcc
Re: [PATCH/RFC/RFA] Machine modes for address printing (all targets)
On Thu, 5 Nov 2015 11:22:04 +0100 Bernd Schmidt wrote: > > static void > > -mcore_print_operand_address (FILE * stream, rtx x) > > +mcore_print_operand_address (FILE * stream, machine_mode mode > > ATTRIBUTE_UNUSED, > > +rtx x) > > So apparently we're settling on writing the unused arg as just > "machine_mode" without a name. Please change everywhere. > > > @@ -1754,7 +1754,7 @@ mmix_print_operand_punct_valid_p (unsign > > /* TARGET_PRINT_OPERAND_ADDRESS. */ > > > > static void > > -mmix_print_operand_address (FILE *stream, rtx x) > > +mmix_print_operand_address (FILE *stream, machine_mode mode, rtx x) > > { > >if (REG_P (x)) > > { > > The arg appears to be unused - I'd expect to see a warning here. I've fixed those two, and a handful of other bits I missed. > Other thank that it looks OK. I'm not going to require that you test > every target, but it would be good to have the full set built to cc1 > before and after, and please be on the lookout for fallout. Thanks! I used the attached "build-all.sh" to test all the targets affected by the patch with "make all-gcc": those now all succeed (I'm sure I reinvented a wheel here, but perhaps the target list is useful to someone else). Julian ChangeLog gcc/ * final.c (output_asm_insn): Pass VOIDmode to output_address. (output_address): Add MODE argument. Pass to print_operand_address hook. * targhooks.c (default_print_operand_address): Add MODE argument. * targhooks.h (default_print_operand_address): Update prototype. * output.h (output_address): Update prototype. * target.def (print_operand_address): Add MODE argument. * config/vax/vax.c (print_operand_address): Pass VOIDmode to output_address. (print_operand): Pass access mode to output_address. * config/mcore/mcore.c (mcore_print_operand_address): Add MODE argument. (mcore_print_operand): Update calls to mcore_print_operand_address. * config/fr30/fr30.c (fr30_print_operand): Pass VOIDmode to output_address. * config/lm32/lm32.c (lm32_print_operand): Pass mode in calls to output_address. * config/tilegx/tilegx.c (output_memory_reference_mode): Remove global. (tilegx_print_operand): Don't set above global. Update calls to output_address. (tilegx_print_operand_address): Add MODE argument. Use instead of output_memory_reference_mode global. * config/frv/frv.c (frv_print_operand_address): Add MODE argument. (frv_print_operand): Pass mode to frv_print_operand_address calls. * config/mn10300/mn10300.c (mn10300_print_operand): Pass mode to output_address. * config/cris/cris.c (cris_print_operand_address): Add MODE argument. (cris_print_operand): Pass mode to output_address calls. * config/spu/spu.c (print_operand): Pass mode to output_address calls. * config/aarch64/aarch64.h (aarch64_print_operand) (aarch64_print_operand_address): Remove prototypes. * config/aarch64/aarch64.c (aarch64_memory_reference_mode): Delete global. (aarch64_print_operand): Make static. Update calls to output_address. (aarch64_print_operand_address): Add MODE argument. Use instead of aarch64_memory_reference_mode global. (TARGET_PRINT_OPERAND, TARGET_PRINT_OPERAND_ADDRESS): Define target hooks. * config/aarch64/aarch64.h (PRINT_OPERAND, PRINT_OPERAND_ADDRESS): Delete macro definitions. * config/pa/pa.c (pa_print_operand): Pass mode in output_address calls. * config/xtensa/xtensa.c (print_operand): Pass mode in output_address calls. * config/h8300/h8300.c (h8300_print_operand_address): Add MODE argument. (h83000_print_operand): Update calls to h8300_print_operand_address and output_address. * config/ia64/ia64.c (ia64_print_operand_address): Add MODE argument. * config/tilepro/tilepro.c (output_memory_reference_mode): Delete global. (tilepro_print_operand): Pass mode to output_address. (tilepro_print_operand_address): Add MODE argument. Use instead of output_memory_reference_mode. * config/nvptx/nvptx.c (output_decl_chunk, nvptx_assemble_integer) (nvptx_output_call_insn, nvptx_print_address_operand): Pass VOIDmode to output_address calls. (nvptx_print_operand_address): Add MODE argument. * config/alpha/alpha.c (print_operand): Pass mode argument in output_address calls. * config/m68k/m68k.c (print_operand): Pass mode argument in output_address call. * config/avr/avr.c (avr_print_operand_address): Add MODE argument. (avr_print_operand): Update calls to avr_print_operand_address. * config/sparc/sparc.c (sparc_print_operand_address): Add MODE argument. Update calls to output_address. (sparc_print_operand): Pass mode to output_address. * config/iq2000/iq2000.c (iq2000_print_operand_address): Add MODE argument. (iq2000_print_operand): Pass mode in output_address calls. * config/stormy16/stormy
[gomp4] Re: [openacc] tile, independent, default, private and firstprivate support in c/++
Hi! On Thu, 5 Nov 2015 18:10:49 -0800, Cesar Philippidis wrote: > I've applied this patch to trunk. It also includes the fortran and > template changes. [...] > Also, because of these reduction problems, I decided not to merge > combined_loops.f90 with combined-directives.f90 yet because the latter > relies on scanning which would fail with the errors detected during > gimplfication. I'm planning on adding a couple of more test cases [...] ACK. Merging Cesar's trunk r229832 into gomp-4_0-branch, I tried to replicate this part of his commit: > --- a/gcc/fortran/trans-openmp.c > +++ b/gcc/fortran/trans-openmp.c > @@ -3449,16 +3478,33 @@ gfc_trans_oacc_combined_directive (gfc_code *code) > [...] > + loop_clauses.lists[OMP_LIST_PRIVATE] > + = construct_clauses.lists[OMP_LIST_PRIVATE]; > + loop_clauses.lists[OMP_LIST_REDUCTION] > + = construct_clauses.lists[OMP_LIST_REDUCTION]; > [...] > + construct_clauses.lists[OMP_LIST_PRIVATE] = NULL; > [...] ... in the gcc/fortran/trans-openmp.c:gfc_filter_oacc_combined_clauses function that we're using on gomp-4_0-branch, but ran into ICEs that looked like "double free" (or similar; maybe related to the slightly different structure of this code due to device_type support), so applied the following hack and XFAILs as part of the merge, which needs to be resolved later (on gomp-4_0-branch). (A few related TODOs also remain to be addressed in gcc/testsuite/c-c++-common/goacc/combined-directives.c.) --- gcc/fortran/trans-openmp.c +++ gcc/fortran/trans-openmp.c @@ -3691,12 +3691,14 @@ gfc_filter_oacc_combined_clauses (gfc_omp_clauses **orig_clauses, (*orig_clauses)->tile = false; (*loop_clauses)->tile_list = (*orig_clauses)->tile_list; (*orig_clauses)->tile_list = NULL; {+#if 0 /* TODO */+} (*loop_clauses)->lists[OMP_LIST_PRIVATE] = (*orig_clauses)->lists[OMP_LIST_PRIVATE]; (*orig_clauses)->lists[OMP_LIST_PRIVATE] = NULL; (*loop_clauses)->lists[OMP_LIST_REDUCTION] = (*orig_clauses)->lists[OMP_LIST_REDUCTION]; /* Don't reset (*orig_clauses)->lists[OMP_LIST_REDUCTION]. */ {+#endif+} (*loop_clauses)->device_types = (*orig_clauses)->device_types; --- gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 +++ gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 @@ -4,6 +4,9 @@ ! { dg-options "-fopenacc -fdump-tree-gimple" } ! TODO {+! Fix OMP_LIST_PRIVATE and OMP_LIST_REDUCTION splitting in+} {+! gcc/fortran/trans-openmp.c:gfc_filter_oacc_combined_clauses, and remove tree+} {+! scanning XFAILs.+} ! Enable and update tree scanning for reduction clauses. ! Enable/add/update device_type clauses and tree scanning. @@ -154,12 +157,12 @@ subroutine test ! !$acc end kernels loop end subroutine test ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" {+{ xfail *-*-* }+} } } ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } } With that, committed to gomp-4_0-branch in r230012: commit 80ebc1e747e3422e0cc57c4c11387ec26bbf8814 Merge: 175e08b ef014f9 Author: tschwinge Date: Mon Nov 9 11:10:12 2015 + svn merge -r 229831:229832 svn+ssh://gcc.gnu.org/svn/gcc/trunk git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230012 138bc75d-0d04-0410-961f-82ee72b054a4 Grüße Thomas signature.asc Description: PGP signature
[PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
Hi all, In this PR we hit an ICE in simplify-rtx.c: #if TARGET_SUPPORTS_WIDE_INT == 0 /* This assert keeps the simplification from producing a result that cannot be represented in a CONST_DOUBLE but a lot of upstream callers expect that this function never fails to simplify something and so you if you added this to the test above the code would die later anyway. If this assert happens, you just need to make the port support wide int. */ gcc_assert (width <= HOST_BITS_PER_DOUBLE_INT); #endif The aarch64 port does not define TARGET_SUPPORTS_WIDE_INT. From what I understand, in order to define it we need to remove the instances in the backend where we use CONST_DOUBLE rtxes with VOIDmode (i.e. for storing CONST_INTs). From what I can see we don't create such instances, so this patch defines TARGET_SUPPORTS_WIDE_INT and cleans up a couple of places in the backend that try to hypothetically handle CONST_DOUBLEs as CONST_INTs if they are passed down as such from other places in the compiler. Bootstrapped and tested on aarch64-none-linux-gnu on trunk and GCC 5. No codegen differences on SPEC2006. Ok for trunk and GCC 5? Thanks, Kyrill 2015-11-09 Kyrylo Tkachov PR target/68129 * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): Delete VOIDmode case. Assert that mode is not VOIDmode. * config/aarch64/predicates.md (const0_operand): Remove const_double match. 2015-11-09 Kyrylo Tkachov PR target/68129 * gcc.target/aarch64/pr68129_1.c: New test. commit 10562c44766a57e4762e926f876f5457f9899e33 Author: Kyrylo Tkachov Date: Wed Oct 28 10:49:44 2015 + [AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 47ccc32..389bfc0 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4401,11 +4401,10 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case CONST_DOUBLE: - /* CONST_DOUBLE can represent a double-width integer. - In this case, the mode of x is VOIDmode. */ - if (GET_MODE (x) == VOIDmode) - ; /* Do Nothing. */ - else if (aarch64_float_const_zero_rtx_p (x)) + /* Since we define TARGET_SUPPORTS_WIDE_INT we shouldn't ever + be getting CONST_DOUBLEs holding integers. */ + gcc_assert (GET_MODE (x) != VOIDmode); + if (aarch64_float_const_zero_rtx_p (x)) { fputc ('0', f); break; diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index b041a1e..0fac0a7 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -863,6 +863,8 @@ extern enum aarch64_code_model aarch64_cmodel; (aarch64_cmodel == AARCH64_CMODEL_TINY \ || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC) +#define TARGET_SUPPORTS_WIDE_INT 1 + /* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register. */ #define AARCH64_VALID_SIMD_DREG_MODE(MODE) \ ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \ diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 1bcbf62..8775460 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -32,7 +32,7 @@ (define_predicate "aarch64_call_insn_operand" ;; Return true if OP a (const_int 0) operand. (define_predicate "const0_operand" - (and (match_code "const_int, const_double") + (and (match_code "const_int") (match_test "op == CONST0_RTX (mode)"))) (define_predicate "aarch64_ccmp_immediate" diff --git a/gcc/testsuite/gcc.target/aarch64/pr68129_1.c b/gcc/testsuite/gcc.target/aarch64/pr68129_1.c new file mode 100644 index 000..112331e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr68129_1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fno-split-wide-types" } */ + +typedef int V __attribute__ ((vector_size (8 * sizeof (int; + +void +foo (V *p, V *q) +{ + *p = (*p == *q); +}
Re: Unreviewed patch
Jeff Law writes: > On 11/06/2015 06:29 AM, Rainer Orth wrote: >> The following patch has remained unrevied for a month: >> >> [build] Support init priority on Solaris >> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00716.html >> >> It needs build and ia64 maintainers and someone familiar with the init >> priority support to review. > This is OK. Thanks for pinging -- I didn't have it in my queue of things > to look at, so it'd definitely fallen through the cracks. Thanks, and no worries: I'd been away for 3 weeks, so I didn't ping before. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH][AArch64][cleanup] Remove uses of CONST_DOUBLE_HIGH, CONST_DOUBLE_LOW
Hi all, From what I understand, we shouldn't encounter the case where CONST_DOUBLE is used to hold a large integer value, accessed through CONST_DOUBLE_LOW and CONST_DOUBLE_HIGH. The aarch64 backend doesn't generate any such rtxes and the midend shouldn't be passing any such rtxes either. This allows for a bit of cleanup in aarch64_simd_valid_immediate. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyrill P.S. I propose this patch separate from my other patch that defines TARGET_SUPPORTS_WIDE_INT for aarch64 (https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00934.html) as this is not needed to fix the ICE in that case and just removes some dead code and is thus not appropriate for backporting in my opinion. 2015-11-09 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Remove integer CONST_DOUBLE handling. It should never occur. commit 4b6c35c0f171a4841cb219e86b9d9ea752e2e849 Author: Kyrylo Tkachov Date: Thu Oct 29 10:28:12 2015 + [AArch64][cleanup] Remove uses of CONST_DOUBLE_HIGH, CONST_DOUBLE_LOW diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 389bfc0..cbdff44 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10080,32 +10080,16 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse, it must be laid out in the vector register in reverse order. */ rtx el = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i); unsigned HOST_WIDE_INT elpart; - unsigned int part, parts; - if (CONST_INT_P (el)) -{ - elpart = INTVAL (el); - parts = 1; -} - else if (GET_CODE (el) == CONST_DOUBLE) -{ - elpart = CONST_DOUBLE_LOW (el); - parts = 2; -} - else -gcc_unreachable (); + gcc_assert (CONST_INT_P (el)); + elpart = INTVAL (el); + + for (unsigned int byte = 0; byte < innersize; byte++) + { + bytes[idx++] = (elpart & 0xff) ^ invmask; + elpart >>= BITS_PER_UNIT; + } - for (part = 0; part < parts; part++) -{ - unsigned int byte; - for (byte = 0; byte < innersize; byte++) -{ - bytes[idx++] = (elpart & 0xff) ^ invmask; - elpart >>= BITS_PER_UNIT; -} - if (GET_CODE (el) == CONST_DOUBLE) -elpart = CONST_DOUBLE_HIGH (el); -} } /* Sanity check. */
[PATCH][ARM] Use snprintf rather than sprintf where possible
Hi all, Judging by the thread at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01912.html I looked at replacing calls to sprintf with calls to snprintf in the arm backend. We use them a lot to print assembly mnemonics into static char buffers. This patch replaces the calls with snprintf and adds a size argument equal to the size of the buffer used. This way, if any of the format strings changes/increases past the size of the allocated buffer, snprintf will truncate it (and the assembler will catch it) rather than trying to write past the end of the buffer with unexpected results. I managed to replace all uses of sprintf in the arm backend except the one in aout.h: #define ASM_GENERATE_INTERNAL_LABEL(STRING, PREFIX, NUM) \ sprintf (STRING, "*%s%s%u", LOCAL_LABEL_PREFIX, PREFIX, (unsigned int)(NUM)) Here, ASM_GENERATE_INTERNAL_LABEL is used in various places in the midend to print labels to static buffers. I've seen those buffers have sizes ranging from 12 chars to 256 chars. The size of the buffer that ASM_GENERATE_INTERNAL_LABEL can depend on is not mandated in the documentation or passed down to the macro, so I think this is a bit dangerous. In practice, however, I don't think we print labels that long that that would cause an issue. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Kyrill 2015-11-09 Kyrylo Tkachov * config/arm/arm.c (arm_set_fixed_optab_libfunc): Use snprintf rather than sprintf. (arm_set_fixed_conv_libfunc): Likewise. (arm_option_override): Likewise. (neon_output_logic_immediate): Likewise. (neon_output_shift_immediate): Likewise. (arm_output_multireg_pop): Likewise. (vfp_output_vstmd): Likewise. (output_move_vfp): Likewise. (output_move_neon): Likewise. (output_return_instruction): Likewise. (arm_elf_asm_cdtor): Likewise. (arm_output_shift): Likewise. (arm_output_iwmmxt_shift_immediate): Likewise. (arm_output_iwmmxt_tinsr): Likewise. * config/arm/neon.md (*neon_mov, VDX): Likewise. (*neon_mov, VQXMOV): Likewise. (neon_vc_insn): Likewise. (neon_vc_insn_unspec): Likewise. commit bed313cb485dac52a5bb9984b55c9985af07ab35 Author: Kyrylo Tkachov Date: Wed Oct 21 14:40:07 2015 +0100 [ARM] Use snprintf rather than sprintf diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f86ebd9..fa34a7b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2303,9 +2303,10 @@ arm_set_fixed_optab_libfunc (optab optable, machine_mode mode, char buffer[50]; if (num_suffix == 0) -sprintf (buffer, "__gnu_%s%s", funcname, modename); +snprintf (buffer, sizeof (buffer), "__gnu_%s%s", funcname, modename); else -sprintf (buffer, "__gnu_%s%s%d", funcname, modename, num_suffix); +snprintf (buffer, sizeof (buffer), "__gnu_%s%s%d", funcname, + modename, num_suffix); set_optab_libfunc (optable, mode, buffer); } @@ -2324,8 +2325,8 @@ arm_set_fixed_conv_libfunc (convert_optab optable, machine_mode to, && ALL_FRACT_MODE_P (from) == ALL_FRACT_MODE_P (to)) maybe_suffix_2 = "2"; - sprintf (buffer, "__gnu_%s%s%s%s", funcname, fromname, toname, - maybe_suffix_2); + snprintf (buffer, sizeof (buffer), "__gnu_%s%s%s%s", funcname, + fromname, toname, maybe_suffix_2); set_conv_libfunc (optable, to, from, buffer); } @@ -3036,7 +3037,8 @@ arm_option_override (void) if (!arm_selected_tune) arm_selected_tune = &all_cores[arm_selected_cpu->core]; - sprintf (arm_arch_name, "__ARM_ARCH_%s__", arm_selected_cpu->arch); + snprintf (arm_arch_name, sizeof (arm_arch_name), + "__ARM_ARCH_%s__", arm_selected_cpu->arch); insn_flags = arm_selected_cpu->flags; arm_base_arch = arm_selected_cpu->base_arch; @@ -12631,9 +12633,11 @@ neon_output_logic_immediate (const char *mnem, rtx *op2, machine_mode mode, gcc_assert (is_valid != 0); if (quad) -sprintf (templ, "%s.i%d\t%%q0, %%2", mnem, width); +snprintf (templ, sizeof (templ), "%s.i%d\t%%q0, %%2", + mnem, width); else -sprintf (templ, "%s.i%d\t%%P0, %%2", mnem, width); +snprintf (templ, sizeof (templ), "%s.i%d\t%%P0, %%2", + mnem, width); return templ; } @@ -12653,9 +12657,11 @@ neon_output_shift_immediate (const char *mnem, char sign, rtx *op2, gcc_assert (is_valid != 0); if (quad) -sprintf (templ, "%s.%c%d\t%%q0, %%q1, %%2", mnem, sign, width); +snprintf (templ, sizeof (templ), + "%s.%c%d\t%%q0, %%q1, %%2", mnem, sign, width); else -sprintf (templ, "%s.%c%d\t%%P0, %%P1, %%2", mnem, sign, width); +snprintf (templ, sizeof (templ), + "%s.%c%d\t%%P0, %%P1, %%2", mnem, sign, width); return templ; } @@ -17672,9 +17678,7 @@ arm_output_multireg_pop (rtx *operands, bool return_pc, rtx cond, bool reverse, conditional = reverse ? "%?%D0" : "%?%d0"; if ((regno_base == SP_REGNUM) && update) -{ - sprintf (pattern, "pop%s\t{", conditional); -} +snprintf (patter
[AARCH64] Adding constant folding for __builtin_fmulx* with scalar 32 and 64 bit arguments
This patch adds an extension to aarch64_gimple_fold_builtin () that does constant folding on __builtin_fmulx* calls for 32 and 64 bit floating point scalar modes. We fold when both arguments are constant, as well as when only one is. The special cases of 0*inf, -0*inf, 0*-inf, and -0*-inf are also handled. The case for vector constant arguments will be dealt with in a future patch since the tests for that would be obscure and would unnecessarily complicate this patch. Added tests to check for proper handling of constant folding. Tested on targets aarch64-none-elf and aarch64_be-none-elf. --- gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Added constant folding. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/simd/vmulx.x: New. * gcc.target/aarch64/simd/vmulx_f64_2.c: Likewise. * gcc.target/aarch64/simd/vmulxd_f64_2.c: Likewise. * gcc.target/aarch64/simd/vmulxs_f32_2.c: Likewise. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index a1998ed550ac801e4d80baae122bf58e394a563f..339054d344900c942d5ce7c047479de3bbb4e61b 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1362,7 +1362,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) if (fndecl) { int fcode = DECL_FUNCTION_CODE (fndecl); - int nargs = gimple_call_num_args (stmt); + unsigned nargs = gimple_call_num_args (stmt); tree *args = (nargs > 0 ? gimple_call_arg_ptr (stmt, 0) : &error_mark_node); @@ -1386,7 +1386,54 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) new_stmt = gimple_build_assign (gimple_call_lhs (stmt), REDUC_MIN_EXPR, args[0]); break; - + BUILTIN_GPF (BINOP, fmulx, 0) + { + gcc_assert (nargs == 2); + bool a0_cst_p = TREE_CODE (args[0]) == REAL_CST; + bool a1_cst_p = TREE_CODE (args[1]) == REAL_CST; + if (a0_cst_p || a1_cst_p) + { + if (a0_cst_p && a1_cst_p) + { + tree t0 = TREE_TYPE (args[0]); + real_value a0 = (TREE_REAL_CST (args[0])); + real_value a1 = (TREE_REAL_CST (args[1])); + if (real_equal (&a1, &dconst0)) + std::swap (a0, a1); + /* According to real_equal (), +0 equals -0. */ + if (real_equal (&a0, &dconst0) && real_isinf (&a1)) + { + real_value res = dconst2; + res.sign = a0.sign ^ a1.sign; + new_stmt = +gimple_build_assign (gimple_call_lhs (stmt), + REAL_CST, + build_real (t0, res)); + } + else + new_stmt = + gimple_build_assign (gimple_call_lhs (stmt), + MULT_EXPR, + args[0], args[1]); + } + else /* a0_cst_p ^ a1_cst_p. */ + { + real_value const_part = a0_cst_p + ? TREE_REAL_CST (args[0]) : TREE_REAL_CST (args[1]); + if (!real_equal (&const_part, &dconst0) + && !real_isinf (&const_part)) + new_stmt = + gimple_build_assign (gimple_call_lhs (stmt), + MULT_EXPR, args[0], args[1]); + } + } + if (new_stmt) + { + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + gimple_set_vdef (new_stmt, gimple_vdef (stmt)); + } + break; + } default: break; } diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x new file mode 100644 index ..8968a64a95cb40a466dd77fea4e9f9f63ad707dc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x @@ -0,0 +1,46 @@ +#define PASS_ARRAY(...) {__VA_ARGS__} + +#define SETUP_TEST_CASE_VEC(I, INTRINSIC, BASE_TYPE, TYPE1, TYPE2, \ + VALS1, VALS2, EXPS, LEN, FM, Q_LD, Q_ST, \ + V1, V2) \ + do \ +{ \ + int i##I;\ + BASE_TYPE vec##I##_1_data[] = VALS1; \ + BASE_TYPE vec##I##_2_data[] = VALS2; \ + V1 TYPE1 vec##I##_1 = vld1##Q_LD##_##FM (vec##I##_1_data); \ + V2 TYPE2 vec##I##_2 = vld1##Q_LD##_##FM (vec##I##_2_data); \ + TYPE1 actual##I##_v = INTRINSIC (vec##I##_1, vec##I##_2); \ + volatile BASE_TYPE expected##I[] = EXPS;\ + BASE_TYPE actual##I[LEN]; \ + vst1##Q_ST##_##FM (actual##I, actual##I##_v);\ + for (i##I = 0; i##I < LEN; ++i##I) \ +if (actual##I[i##I] != expected##I[i##I])\ + abort ();\ +} \ + while (0)\ + +#define SETUP_TEST_CASE_SCALAR(I, INTRINSIC, TYPE, VAL1, VAL2, EXP) \ + do \ +{ \ + TYPE vec_##I##_1 = VAL1; \ + TYPE vec_##I##_2 = VAL2; \ + TYPE expected_##I = EXP; \ + volatile TYPE actual_##I = INTRINSIC (vec_##I##_1, vec_##I##_2); \ + if (actual_##I != expected_##I) \ +abort ();\ +} \ + while (0)\ + +/* Functions used to return values that won't be optimised away. */ +float32_t __attribute__ ((noinline)) +foo32 () +{ + return 1.0; +} + +float64_t __attribute__
Re: [patch] Fix PR middle-end/68251
On Mon, Nov 9, 2015 at 12:13 PM, Eric Botcazou wrote: > Hi, > > I don't understand how this didn't show up sooner given the amount of testing, > but there is a nasty overloading for the new REF_REVERSE_STORAGE_ORDER flag: > > #define REF_REVERSE_STORAGE_ORDER(NODE) \ >(TREE_CHECK2 (NODE, BIT_FIELD_REF, MEM_REF)->base.u.bits.saturating_flag) > > saturating_flag is part of the 'bits' structure, which is itself part of the > 'u' union, which also contains the dependence_info structure, and: > > #define MR_DEPENDENCE_CLIQUE(NODE) \ > (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.clique) > #define MR_DEPENDENCE_BASE(NODE) \ > (TREE_CHECK2 (NODE, MEM_REF, TARGET_MEM_REF)->base.u.dependence_info.base) > > So there is a conflict on MEM_REF nodes. Therefore the attached patch changes > REF_REVERSE_STORAGE_ORDER to use the default_def_flag instead. > > Tested on x86_64-suse-linux, OK for the mainline? I'll install the Fortran > testcase once it is reduced because it takes a while to compile ATM. Looks good to me. I wonder where you store the info on TARGET_MEM_REFs though? Or is IVOPTs prohibited from messing with such refs (pessmizing them)? Thanks, Richard. > > 2015-11-09 Eric Botcazou > > PR middle-end/68251 > * tree-core.h (REF_REVERSE_STORAGE_ORDER): Move around. > * tree.h (REF_REVERSE_STORAGE_ORDER): Change to default_def_flag. > * tree-streamer-in.c (unpack_ts_base_value_fields): Adjust. > * tree-streamer-out.c (pack_ts_base_value_fields): Likewise. > > -- > Eric Botcazou
Re: [vec-cmp, patch 3/6] Vectorize comparison
On 26 Oct 16:09, Richard Biener wrote: > On Wed, Oct 14, 2015 at 6:12 PM, Ilya Enkovich wrote: > > + > > + ops.release (); > > + vec_defs.release (); > > No need to release auto_vec<>s at the end of scope explicitely. Fixed > > > + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2); > > + new_stmt = gimple_build_assign (mask, vec_compare); > > + new_temp = make_ssa_name (mask, new_stmt); > > + gimple_assign_set_lhs (new_stmt, new_temp); > > new_temp = make_ssa_name (mask); > gimple_build_assign (new_temp, code, vec_rhs1, vec_rhs2); > > for the 4 stmts above. Fixed > > > + > > + vec_oprnds0.release (); > > + vec_oprnds1.release (); > > Please use auto_vec<>s. These are used to hold vecs returned by vect_get_slp_defs. Thus can't use auto_vec. > > Ok with those changes. > > RIchard. > gcc/ 2015-11-09 Ilya Enkovich * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var. (vect_create_destination_var): Likewise. * tree-vect-stmts.c (vectorizable_comparison): New. (vect_analyze_stmt): Add vectorizable_comparison. (vect_transform_stmt): Likewise. * tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var. (enum stmt_vec_info_type): Add comparison_vec_info_type. (vectorizable_comparison): New. diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 11bce79..926752b 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -3790,6 +3790,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind var_kind, const char *name) case vect_scalar_var: prefix = "stmp"; break; + case vect_mask_var: +prefix = "mask"; +break; case vect_pointer_var: prefix = "vectp"; break; @@ -4379,7 +4382,11 @@ vect_create_destination_var (tree scalar_dest, tree vectype) tree type; enum vect_var_kind kind; - kind = vectype ? vect_simple_var : vect_scalar_var; + kind = vectype +? VECTOR_BOOLEAN_TYPE_P (vectype) +? vect_mask_var +: vect_simple_var +: vect_scalar_var; type = vectype ? vectype : TREE_TYPE (scalar_dest); gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME); diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index f1216c8..ee549f4 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7416,6 +7416,185 @@ vectorizable_condition (gimple *stmt, gimple_stmt_iterator *gsi, return true; } +/* vectorizable_comparison. + + Check if STMT is comparison expression that can be vectorized. + If VEC_STMT is also passed, vectorize the STMT: create a vectorized + comparison, put it in VEC_STMT, and insert it at GSI. + + Return FALSE if not a vectorizable STMT, TRUE otherwise. */ + +bool +vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi, +gimple **vec_stmt, tree reduc_def, +slp_tree slp_node) +{ + tree lhs, rhs1, rhs2; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + tree vectype1 = NULL_TREE, vectype2 = NULL_TREE; + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE; + tree new_temp; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type}; + unsigned nunits; + int ncopies; + enum tree_code code; + stmt_vec_info prev_stmt_info = NULL; + int i, j; + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); + vec vec_oprnds0 = vNULL; + vec vec_oprnds1 = vNULL; + gimple *def_stmt; + tree mask_type; + tree mask; + + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) +return false; + + mask_type = vectype; + nunits = TYPE_VECTOR_SUBPARTS (vectype); + + if (slp_node || PURE_SLP_STMT (stmt_info)) +ncopies = 1; + else +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + + gcc_assert (ncopies >= 1); + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) +return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def + && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle + && reduc_def)) +return false; + + if (STMT_VINFO_LIVE_P (stmt_info)) +{ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"value used after loop.\n"); + return false; +} + + if (!is_gimple_assign (stmt)) +return false; + + code = gimple_assign_rhs_code (stmt); + + if (TREE_CODE_CLASS (code) != tcc_comparison) +return false; + + rhs1 = gimple_assign_rhs1 (stmt); + rhs2 = gimple_assign_rhs2 (stmt); + + if (!vect_is_simple_use (rhs1, stmt_info->vinfo, &def_stmt, + &dts[0], &vectype1)) +return false; + + if (!vect_is_simple_use (rhs2, stmt_info->vinfo, &def_stmt, + &dts[1], &vectype2)) +return false; + + if (vectype1 && vectype2 + && TYPE_VECTOR_SUBPARTS (vectype1) !
Re: [patch] Fix PR middle-end/68251
> Looks good to me. Thanks. > I wonder where you store the info on TARGET_MEM_REFs though? Or > is IVOPTs prohibited from messing with such refs (pessmizing them)? Yes, the latter: * tree-ssa-loop-ivopts.c (may_be_nonaddressable_p) : New. Return true if reverse storage order. : Likewise. : Likewise. : Likewise. : Likewise. (split_address_cost): Likewise. Bail out if reverse storage order -- Eric Botcazou
Re: [vec-cmp, patch 4/6] Support vector mask invariants
On 26 Oct 16:21, Richard Biener wrote: > On Wed, Oct 14, 2015 at 6:13 PM, Ilya Enkovich wrote: > > - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val); > > + { > > + /* Can't use VIEW_CONVERT_EXPR for booleans because > > +of possibly different sizes of scalar value and > > +vector element. */ > > + if (VECTOR_BOOLEAN_TYPE_P (type)) > > + { > > + if (integer_zerop (val)) > > + val = build_int_cst (TREE_TYPE (type), 0); > > + else if (integer_onep (val)) > > + val = build_int_cst (TREE_TYPE (type), 1); > > + else > > + gcc_unreachable (); > > + } > > + else > > + val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val); > > I think the existing code is fine with using fold_convert () here > which should also work > for the boolean types. So does just > > val = fold_convert (TREE_TYPE (type), val); > > work? It seems to work OK. > > > @@ -7428,13 +7459,13 @@ vectorizable_condition (gimple *stmt, > > gimple_stmt_iterator *gsi, > > gimple *gtemp; > > vec_cond_lhs = > > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), > > - stmt, NULL); > > + stmt, NULL, comp_vectype); > > vect_is_simple_use (TREE_OPERAND (cond_expr, 0), stmt, > > loop_vinfo, >emp, &def, &dts[0]); > > > > vec_cond_rhs = > > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), > > - stmt, NULL); > > + stmt, NULL, comp_vectype); > > vect_is_simple_use (TREE_OPERAND (cond_expr, 1), stmt, > > loop_vinfo, >emp, &def, &dts[1]); > > I still don't like this very much but I guess without some major > refactoring of all > the functions there isn't a better way to do it for now. > > Thus, ok with trying the change suggested above. > > Thanks, > Richard. > Here is an updated version. Thanks, Ilya -- gcc/ 2015-11-09 Ilya Enkovich * expr.c (const_vector_mask_from_tree): New. (const_vector_from_tree): Use const_vector_mask_from_tree for boolean vectors. * tree-vect-stmts.c (vect_init_vector): Support boolean vector invariants. (vect_get_vec_def_for_operand): Add VECTYPE arg. (vectorizable_condition): Directly provide vectype for invariants used in comparison. * tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE arg. diff --git a/gcc/expr.c b/gcc/expr.c index 2b2174f..03936ee 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -11423,6 +11423,40 @@ try_tablejump (tree index_type, tree index_expr, tree minval, tree range, return 1; } +/* Return a CONST_VECTOR rtx representing vector mask for + a VECTOR_CST of booleans. */ +static rtx +const_vector_mask_from_tree (tree exp) +{ + rtvec v; + unsigned i; + int units; + tree elt; + machine_mode inner, mode; + + mode = TYPE_MODE (TREE_TYPE (exp)); + units = GET_MODE_NUNITS (mode); + inner = GET_MODE_INNER (mode); + + v = rtvec_alloc (units); + + for (i = 0; i < VECTOR_CST_NELTS (exp); ++i) +{ + elt = VECTOR_CST_ELT (exp, i); + + gcc_assert (TREE_CODE (elt) == INTEGER_CST); + if (integer_zerop (elt)) + RTVEC_ELT (v, i) = CONST0_RTX (inner); + else if (integer_onep (elt) + || integer_minus_onep (elt)) + RTVEC_ELT (v, i) = CONSTM1_RTX (inner); + else + gcc_unreachable (); +} + + return gen_rtx_CONST_VECTOR (mode, v); +} + /* Return a CONST_VECTOR rtx for a VECTOR_CST tree. */ static rtx const_vector_from_tree (tree exp) @@ -11438,6 +11472,9 @@ const_vector_from_tree (tree exp) if (initializer_zerop (exp)) return CONST0_RTX (mode); + if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) +return const_vector_mask_from_tree (exp); + units = GET_MODE_NUNITS (mode); inner = GET_MODE_INNER (mode); diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index ee549f4..af203ab 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1300,7 +1300,7 @@ vect_init_vector (gimple *stmt, tree val, tree type, gimple_stmt_iterator *gsi) if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val))) { if (CONSTANT_CLASS_P (val)) - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val); + val = fold_convert (TREE_TYPE (type), val); else { new_temp = make_ssa_name (TREE_TYPE (type)); @@ -1328,16 +1328,18 @@ vect_init_vector (gimple *stmt, tree val, tree type, gimple_stmt_iterator *gsi) STMT_VINFO_VEC_STMT of the defining st
Re: [PATCH] Fix memory leaks and use a pool_allocator
On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška wrote: > Hi. > > This is follow-up of changes that Richi started on Friday. > > Patch can bootstrap on x86_64-linux-pc and regression tests are running. > > Ready for trunk? * tree-ssa-dom.c (free_edge_info): Make the function extern. ... * tree-ssa.h (free_edge_info): Declare function extern. declare this in tree-ssa-threadupdate.h instead and renaming it to sth less "public", like free_dom_edge_info. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index fff62de..eb6b7df 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) set_used_flags (targets[i]); } + temporaries.release (); + set_used_flags (cond); set_used_flags (x); set_used_flags (y); @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) } num_updated_if_blocks++; + targets.release (); return TRUE; suspiciously look like candidates for an auto_vec<> (didn't check). @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) dead_set = sparseset_alloc (max_regno); unused_set = sparseset_alloc (max_regno); curr_point = 0; + point_freq_vec.release (); point_freq_vec.create (get_max_uid () * 2); a truncate (0) instead of a release () should be cheaper, avoiding the re-allocation. @@ -674,6 +674,10 @@ sra_deinitialize (void) assign_link_pool.release (); obstack_free (&name_obstack, NULL); + for (hash_map >::iterator it = + base_access_vec->begin (); it != base_access_vec->end (); ++it) +(*it).second.release (); + delete base_access_vec; I wonder if the better fix is to provide a proper free method for the hash_map? A hash_map with 'auto_vec' looks suspicous - eventually a proper release was intented here via default_hash_map_traits <>? Anyway, most of the things above can be improved as followup of course. Thanks, Richard. > Thanks, > Martin
[PATCH][optabs][ifcvt][1/3] Define negcc, notcc optabs
Hi all, This is a rebase of the patch I posted at: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00154.html The patch has been ok'd by Jeff but I wanted to hold off committing it until my fixes for the ifcvt regressions on sparc and x86_64 were fixed. The rebase conflicts were due to Richard's optabs splitting patch. I've also noticed that in my original patch I had a comparison of branch cost with the magic number '2'. I removed it from this version as it's not really meaningful. The transformation this patch enables is, at the moment, only supported for arm and aarch64 where it is always beneficial. If/when we have a proper ifcvt costing model (perhaps for GCC 7?) we'll update this accordingly if needed. Jeff, sorry for taking so long to commit this, I just wanted to fix the other ifcvt fallout before proceeding with more new functionality. I have also uncovered a bug in the arm implementation of these optabs (patch 3/3 in the series), so I'll post an updated version of that patch as well soon. Ok to commit this updated version instead? Bootstrapped and tested on arm, aarch64 and x86_64. It has been sitting in my tree for a couple of months now with no issues. Thanks, Kyrill 2015-11-09 Kyrylo Tkachov * ifcvt.c (noce_try_inverse_constants): New function. (noce_process_if_block): Call it. * optabs.h (emit_conditional_neg_or_complement): Declare prototype. * optabs.def (negcc_optab, notcc_optab): Declare. * optabs.c (emit_conditional_neg_or_complement): New function. * doc/tm.texi (Standard Names): Document negcc, notcc names. commit 93cd987e9ab02ac68b44b2470bb5c4c6345efeca Author: Kyrylo Tkachov Date: Thu Aug 13 18:14:52 2015 +0100 [optabs][ifcvt][1/3] Define negcc, notcc optabs diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 619259f..c4e43f3 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5791,6 +5791,21 @@ move operand 2 or (operands 2 + operand 3) into operand 0 according to the comparison in operand 1. If the comparison is false, operand 2 is moved into operand 0, otherwise (operand 2 + operand 3) is moved. +@cindex @code{neg@var{mode}cc} instruction pattern +@item @samp{neg@var{mode}cc} +Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally +move the negation of operand 2 or the unchanged operand 3 into operand 0 +according to the comparison in operand 1. If the comparison is true, the negation +of operand 2 is moved into operand 0, otherwise operand 3 is moved. + +@cindex @code{not@var{mode}cc} instruction pattern +@item @samp{not@var{mode}cc} +Similar to @samp{neg@var{mode}cc} but for conditional complement. +Conditionally move the bitwise complement of operand 2 or the unchanged +operand 3 into operand 0 according to the comparison in operand 1. +If the comparison is true, the complement of operand 2 is moved into +operand 0, otherwise operand 3 is moved. + @cindex @code{cstore@var{mode}4} instruction pattern @item @samp{cstore@var{mode}4} Store zero or nonzero in operand 0 according to whether a comparison diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 157a716..1e773d8 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1179,6 +1179,83 @@ noce_try_store_flag (struct noce_if_info *if_info) } } + +/* Convert "if (test) x = -A; else x = A" into + x = A; if (test) x = -x if the machine can do the + conditional negate form of this cheaply. + Try this before noce_try_cmove that will just load the + immediates into two registers and do a conditional select + between them. If the target has a conditional negate or + conditional invert operation we can save a potentially + expensive constant synthesis. */ + +static bool +noce_try_inverse_constants (struct noce_if_info *if_info) +{ + if (!noce_simple_bbs (if_info)) +return false; + + if (!CONST_INT_P (if_info->a) + || !CONST_INT_P (if_info->b) + || !REG_P (if_info->x)) +return false; + + machine_mode mode = GET_MODE (if_info->x); + + HOST_WIDE_INT val_a = INTVAL (if_info->a); + HOST_WIDE_INT val_b = INTVAL (if_info->b); + + rtx cond = if_info->cond; + + rtx x = if_info->x; + rtx target; + + start_sequence (); + + rtx_code code; + if (val_b != HOST_WIDE_INT_MIN && val_a == -val_b) +code = NEG; + else if (val_a == ~val_b) +code = NOT; + else +{ + end_sequence (); + return false; +} + + rtx tmp = gen_reg_rtx (mode); + noce_emit_move_insn (tmp, if_info->a); + + target = emit_conditional_neg_or_complement (x, code, mode, cond, tmp, tmp); + + if (target) +{ + rtx_insn *seq = get_insns (); + + if (!seq) + { + end_sequence (); + return false; + } + + if (target != if_info->x) + noce_emit_move_insn (if_info->x, target); + + seq = end_ifcvt_sequence (if_info); + + if (!seq) + return false; + + emit_insn_before_setloc (seq, if_info->jump, + INSN_LOCATION (if_info->insn_a)); + return true; +} + + end_sequence (); + return false; +} + + /* Convert "if (test) x =
GCC 6 Status Report (2015-11-09)
Status == We've pushed back the switch to Stage 3 to the end of Saturday, Nov 14th. This is to allow smooth draining of review queues. Quality Data Priority # Change from last report --- --- P14+ 2 P2 84 P3 130+ 10 P4 83- 5 P5 32 --- --- Total P1-P3 218+ 12 Total 333+ 7 Previous Report === https://gcc.gnu.org/ml/gcc/2015-10/msg00113.html
Re: [PATCH] Fix memory leaks and use a pool_allocator
On Mon, Nov 09, 2015 at 01:11:48PM +0100, Richard Biener wrote: > On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška wrote: > > Hi. > > > > This is follow-up of changes that Richi started on Friday. > > > > Patch can bootstrap on x86_64-linux-pc and regression tests are running. > > > > Ready for trunk? > > * tree-ssa-dom.c (free_edge_info): Make the function extern. > ... > * tree-ssa.h (free_edge_info): Declare function extern. > > declare this in tree-ssa-threadupdate.h instead and renaming it to > sth less "public", like free_dom_edge_info. > > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c > index fff62de..eb6b7df 100644 > --- a/gcc/ifcvt.c > +++ b/gcc/ifcvt.c > @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) >set_used_flags (targets[i]); > } > > + temporaries.release (); > + >set_used_flags (cond); >set_used_flags (x); >set_used_flags (y); > @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) > } > >num_updated_if_blocks++; > + targets.release (); >return TRUE; > > suspiciously look like candidates for an auto_vec<> (didn't check). I was about to say the same thing after a little checking (maybe the region one in tree-ssa-threadupdate.c to, but didn't check that) > > @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) >dead_set = sparseset_alloc (max_regno); >unused_set = sparseset_alloc (max_regno); >curr_point = 0; > + point_freq_vec.release (); >point_freq_vec.create (get_max_uid () * 2); > > a truncate (0) instead of a release () should be cheaper, avoiding the > re-allocation. yeah, or even change it to just grow the array, afaict it doesn't expect the array to be cleared? > @@ -674,6 +674,10 @@ sra_deinitialize (void) >assign_link_pool.release (); >obstack_free (&name_obstack, NULL); > > + for (hash_map >::iterator it = > + base_access_vec->begin (); it != base_access_vec->end (); ++it) > +(*it).second.release (); > + >delete base_access_vec; > > I wonder if the better fix is to provide a proper free method for the > hash_map? > A hash_map with 'auto_vec' looks suspicous - eventually a proper release > was intented here via default_hash_map_traits <>? in fact I would expect that already works, but apparently not, so I'd say that's the bug. Trev > > Anyway, most of the things above can be improved as followup of course. > > Thanks, > Richard. > > > Thanks, > > Martin
Re: [PATCH] S/390: Fix warning in "*movstr" pattern.
On 11/04/2015 02:39 AM, Dominik Vogt wrote: > On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote: >> Dominik Vogt wrote: >> >>> @@ -2936,7 +2936,7 @@ >>> (set (mem:BLK (match_operand:P 1 "register_operand" "0")) >>> (mem:BLK (match_operand:P 3 "register_operand" "2"))) >>> (set (match_operand:P 0 "register_operand" "=d") >>> - (unspec [(mem:BLK (match_dup 1)) >>> + (unspec:P [(mem:BLK (match_dup 1)) >>> (mem:BLK (match_dup 3)) >>> (reg:SI 0)] UNSPEC_MVST)) >>> (clobber (reg:CC CC_REGNUM))] >> >> Don't you have to change the expander too? Otherwise the >> pattern will no longer match ... > > Yes, you're right. This turned out to be a bit tricky to do > because the "movstr" expander doesn't allow variants with > different modes. :-/ > > New patch attached, including a test case that works on 31-bit and > 64-bit. Could you please check that the generated code doesn't change with a larger code base (e.g. speccpu)? It should not affect it but I really think we omitted the mode here for a reason (although I don't remember why). -Andreas-
[patch] backport PIE support for FreeBSD to gcc-49
Hi, any objections that I apply this patch to gcc-4.9? It is FreeBSD only. TIA, Andreas 2015-11-09 Andreas Tobler Backport from mainline 2015-05-18 Andreas Tobler * config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build pie executables. (FBSD_ENDFILE_SPEC): Likewise. * config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from config/freebsd-spec.h. (ENDFILE_SPEC): Likewise. 2015-11-02 Andreas Tobler * config/rs6000/freebsd64.h (ASM_SPEC32): Adust spec to handle PIE executables. Index: gcc/config/freebsd-spec.h === --- gcc/config/freebsd-spec.h (revision 230016) +++ gcc/config/freebsd-spec.h (working copy) @@ -66,8 +66,9 @@ "%{!shared: \ %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ %{!p:%{profile:gcrt1.o%s} \ -%{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}" +%{!profile: \ +%{pie: Scrt1.o%s;:crt1.o%s} \ + crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}" /* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on the magical crtend.o file (see crtstuff.c) which provides part of @@ -76,7 +77,7 @@ `crtn.o'. */ #define FBSD_ENDFILE_SPEC \ - "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s" + "%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s" /* Provide a LIB_SPEC appropriate for FreeBSD as configured and as required by the user-land thread model. Before __FreeBSD_version Index: gcc/config/i386/freebsd.h === --- gcc/config/i386/freebsd.h (revision 230016) +++ gcc/config/i386/freebsd.h (working copy) @@ -59,29 +59,16 @@ #define SUBTARGET_EXTRA_SPECS \ { "fbsd_dynamic_linker", FBSD_DYNAMIC_LINKER } -/* Provide a STARTFILE_SPEC appropriate for FreeBSD. Here we add - the magical crtbegin.o file (see crtstuff.c) which provides part - of the support for getting C++ file-scope static object constructed - before entering `main'. */ - -#undef STARTFILE_SPEC -#define STARTFILE_SPEC \ - "%{!shared: \ - %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ - %{!p:%{profile:gcrt1.o%s} \ -%{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}" +/* Use the STARTFILE_SPEC from config/freebsd-spec.h. */ -/* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on - the magical crtend.o file (see crtstuff.c) which provides part of - the support for getting C++ file-scope static object constructed - before entering `main', followed by a normal "finalizer" file, - `crtn.o'. */ +#undef STARTFILE_SPEC +#define STARTFILE_SPEC FBSD_STARTFILE_SPEC -#undef ENDFILE_SPEC -#define ENDFILE_SPEC \ - "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s" +/* Use the ENDFILE_SPEC from config/freebsd-spec.h. */ +#undef ENDFILE_SPEC +#define ENDFILE_SPEC FBSD_ENDFILE_SPEC + /* Provide a LINK_SPEC appropriate for FreeBSD. Here we provide support for the special GCC options -static and -shared, which allow us to link things in one of these three modes by applying the appropriate Index: gcc/config/rs6000/freebsd64.h === --- gcc/config/rs6000/freebsd64.h (revision 230016) +++ gcc/config/rs6000/freebsd64.h (working copy) @@ -130,7 +130,7 @@ #defineLINK_OS_FREEBSD_SPEC "%{m32:%(link_os_freebsd_spec32)}%{!m32:%(link_os_freebsd_spec64)}" #define ASM_SPEC32 "-a32 \ -%{mrelocatable} %{mrelocatable-lib} %{fpic:-K PIC} %{fPIC:-K PIC} \ +%{mrelocatable} %{mrelocatable-lib} %{fpic|fpie|fPIC|fPIE:-K PIC} \ %{memb} %{!memb: %{msdata=eabi: -memb}} \ %{!mlittle: %{!mlittle-endian: %{!mbig: %{!mbig-endian: \ %{mcall-freebsd: -mbig} \
[patch] backport remove of soft float for FreeBSD powerpc for gcc-4.9
Hi all, any objections when I apply the below patch to gcc-4.9? TIA, Andreas 2015-11-09 Andreas Tobler Backport from mainline 2015-03-04 Andreas Tobler * config/rs6000/t-freebsd64: Remove 32-bit soft-float multilibs. Index: gcc/config/rs6000/t-freebsd64 === --- gcc/config/rs6000/t-freebsd64 (revision 230016) +++ gcc/config/rs6000/t-freebsd64 (working copy) @@ -21,11 +21,9 @@ # On FreeBSD the 32-bit libraries are found under /usr/lib32. # Set MULTILIB_OSDIRNAMES according to this. -MULTILIB_OPTIONS= m32 msoft-float -MULTILIB_DIRNAMES = 32 nof +MULTILIB_OPTIONS= m32 +MULTILIB_DIRNAMES = 32 MULTILIB_EXTRA_OPTS = fPIC mstrict-align MULTILIB_EXCEPTIONS = -MULTILIB_EXCLUSIONS = !m32/msoft-float MULTILIB_OSDIRNAMES= ../lib32 -#MULTILIB_MATCHES= $(MULTILIB_MATCHES_FLOAT)
[PATCH] Improve BB vectorization dependence analysis
Currently BB vectorization computes all dependences inside a BB region and fails all vectorization if it cannot handle some of them. This is obviously not needed - BB vectorization can restrict the dependence tests to those that are needed to apply the load/store motion effectively performed by the vectorization (sinking all participating loads/stores to the place of the last one). With restructuring it that way it's also easy to not give up completely but only for the SLP instance we cannot vectorize (this gives a slight bump in my SPEC CPU 2006 testing to 756 vectorized basic block regions). But first and foremost this patch is to reduce the dependence analysis cost and somewhat mitigate the compile-time effects of the first patch. For fixing PR56118 only a cost model issue remains. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-11-09 Richard Biener PR tree-optimization/56118 * tree-vectorizer.h (vect_find_last_scalar_stmt_in_slp): Declare. * tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Export. * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): New function. (vect_slp_analyze_data_ref_dependences): Instead of computing all dependences of the region DRs just analyze the code motions SLP vectorization will perform. Remove SLP instances that cannot have their store/load motions applied. (vect_analyze_data_refs): Allow DRs without a vectype in BB vectorization. * gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust. Index: gcc/tree-vectorizer.h === *** gcc/tree-vectorizer.h.orig 2015-11-09 11:01:55.688175321 +0100 --- gcc/tree-vectorizer.h 2015-11-09 11:02:18.987432840 +0100 *** extern void vect_detect_hybrid_slp (loop *** 1075,1080 --- 1075,1081 extern void vect_get_slp_defs (vec , slp_tree, vec > *, int); extern bool vect_slp_bb (basic_block); + extern gimple *vect_find_last_scalar_stmt_in_slp (slp_tree); /* In tree-vect-patterns.c. */ /* Pattern recognition functions. Index: gcc/tree-vect-data-refs.c === *** gcc/tree-vect-data-refs.c.orig 2015-11-09 10:22:33.140125722 +0100 --- gcc/tree-vect-data-refs.c 2015-11-09 11:33:05.503874719 +0100 *** vect_slp_analyze_data_ref_dependence (st *** 581,586 --- 581,629 } + /* Analyze dependences involved in the transform of SLP NODE. */ + + static bool + vect_slp_analyze_node_dependences (slp_instance instance, slp_tree node) + { + /* This walks over all stmts involved in the SLP load/store done + in NODE verifying we can sink them up to the last stmt in the + group. */ + gimple *last_access = vect_find_last_scalar_stmt_in_slp (node); + for (unsigned k = 0; k < SLP_INSTANCE_GROUP_SIZE (instance); ++k) + { + gimple *access = SLP_TREE_SCALAR_STMTS (node)[k]; + if (access == last_access) + continue; + stmt_vec_info access_stmt_info = vinfo_for_stmt (access); + gimple_stmt_iterator gsi = gsi_for_stmt (access); + gsi_next (&gsi); + for (; gsi_stmt (gsi) != last_access; gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + if (!STMT_VINFO_DATA_REF (stmt_info) + || (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) + && DR_IS_READ (STMT_VINFO_DATA_REF (access_stmt_info + continue; + + ddr_p ddr = initialize_data_dependence_relation + (STMT_VINFO_DATA_REF (access_stmt_info), + STMT_VINFO_DATA_REF (stmt_info), vNULL); + if (vect_slp_analyze_data_ref_dependence (ddr)) + { + /* ??? If the dependence analysis failed we can resort to the +alias oracle which can handle more kinds of stmts. */ + free_dependence_relation (ddr); + return false; + } + free_dependence_relation (ddr); + } + } + return true; + } + + /* Function vect_analyze_data_ref_dependences. Examine all the data references in the basic-block, and make sure there *** vect_slp_analyze_data_ref_dependence (st *** 590,610 bool vect_slp_analyze_data_ref_dependences (bb_vec_info bb_vinfo) { - struct data_dependence_relation *ddr; - unsigned int i; - if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "=== vect_slp_analyze_data_ref_dependences ===\n"); ! if (!compute_all_dependences (BB_VINFO_DATAREFS (bb_vinfo), ! &BB_VINFO_DDRS (bb_vinfo), ! vNULL, true)) ! return false; ! FOR_EACH_VEC_ELT (BB_VINFO_DDRS (bb_vinfo), i, ddr) ! if (vect_slp_analyze_data_ref
Re: [PATCH 1/2] s/390: Implement "target" attribute.
On 11/02/2015 09:44 AM, Dominik Vogt wrote: > (@Uli: I'd like to hear your opinion on this issue. > Original message: > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03403.html). > > On Fri, Oct 30, 2015 at 03:09:39PM +0100, Andreas Krebbel wrote: >> Why do we need x_s390_arch_specified and x_s390_tune_specified? You >> should be able to use opts_set->x_s390_arch and opts_set->x_s390_tune >> instead? (patch attached, your tests keep working with that change). > > The idea was that -mtune on the command line is *not* overridden > by the "arch" target attribute. This would allow to change the > architecture for a specific function and keep the -mtune= option > from the command line. But as a matter of fact, the current patch > doesn't do it either (bug?). Your testcases even seem to check for this behavior so it looked intentional to me. But I agree that being able to keep the -mtune cmdline value for a function while only changing the used instruction set would be good. Could you please elaborate why implementing this requires the new flags? -Andreas-
Re: [PATCH] Fix memory leaks and use a pool_allocator
On 11/09/2015 01:11 PM, Richard Biener wrote: > On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška wrote: >> Hi. >> >> This is follow-up of changes that Richi started on Friday. >> >> Patch can bootstrap on x86_64-linux-pc and regression tests are running. >> >> Ready for trunk? > > * tree-ssa-dom.c (free_edge_info): Make the function extern. > ... > * tree-ssa.h (free_edge_info): Declare function extern. > > declare this in tree-ssa-threadupdate.h instead and renaming it to > sth less "public", like free_dom_edge_info. > > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c > index fff62de..eb6b7df 100644 > --- a/gcc/ifcvt.c > +++ b/gcc/ifcvt.c > @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) >set_used_flags (targets[i]); > } > > + temporaries.release (); > + >set_used_flags (cond); >set_used_flags (x); >set_used_flags (y); > @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) > } > >num_updated_if_blocks++; > + targets.release (); >return TRUE; > > suspiciously look like candidates for an auto_vec<> (didn't check). > > @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) >dead_set = sparseset_alloc (max_regno); >unused_set = sparseset_alloc (max_regno); >curr_point = 0; > + point_freq_vec.release (); >point_freq_vec.create (get_max_uid () * 2); > > a truncate (0) instead of a release () should be cheaper, avoiding the > re-allocation. > > @@ -674,6 +674,10 @@ sra_deinitialize (void) >assign_link_pool.release (); >obstack_free (&name_obstack, NULL); > > + for (hash_map >::iterator it = > + base_access_vec->begin (); it != base_access_vec->end (); ++it) > +(*it).second.release (); > + >delete base_access_vec; > > I wonder if the better fix is to provide a proper free method for the > hash_map? > A hash_map with 'auto_vec' looks suspicous - eventually a proper release > was intented here via default_hash_map_traits <>? > > Anyway, most of the things above can be improved as followup of course. > > Thanks, > Richard. > >> Thanks, >> Martin Hi. All suggested changes were applied, sending v2 and waiting for bootstrap and regression tests. Thanks, Martin >From c97270f2daadcca1efe6201adf1eb0df469ca91e Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 9 Nov 2015 10:49:14 +0100 Subject: [PATCH 1/2] Fix memory leaks and use a pool_allocator gcc/ChangeLog: 2015-11-09 Martin Liska * gcc.c (record_temp_file): Release name string. * ifcvt.c (noce_convert_multiple_sets): Use auto_vec instead of vec. * lra-lives.c (free_live_range_list): Utilize lra_live_range_pool for allocation and deallocation. (create_live_range): Likewise. (copy_live_range): Likewise. (lra_merge_live_ranges): Likewise. (remove_some_program_points_and_update_live_ranges): Likewise. (lra_create_live_ranges_1): Release point_freq_vec that can be not freed from previous iteration of the function. * tree-eh.c (lower_try_finally_switch): Use auto_vec instead of vec. * tree-sra.c (sra_deinitialize): Release all vectors in base_access_vec. * tree-ssa-dom.c (free_dom_edge_info): Make the function extern. * tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges): Release edge_info for a removed edge. (thread_through_all_blocks): Free region vector. * tree-ssa.h (free_dom_edge_info): Declare function extern. --- gcc/gcc.c | 5 - gcc/ifcvt.c | 8 +--- gcc/lra-lives.c | 14 -- gcc/tree-eh.c | 2 +- gcc/tree-sra.c | 6 ++ gcc/tree-ssa-dom.c | 8 gcc/tree-ssa-threadupdate.c | 6 +- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 34 insertions(+), 16 deletions(-) diff --git a/gcc/gcc.c b/gcc/gcc.c index bbc9b23..8bbf5be 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -2345,7 +2345,10 @@ record_temp_file (const char *filename, int always_delete, int fail_delete) struct temp_file *temp; for (temp = always_delete_queue; temp; temp = temp->next) if (! filename_cmp (name, temp->name)) - goto already1; + { + free (name); + goto already1; + } temp = XNEW (struct temp_file); temp->next = always_delete_queue; diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index fff62de..3401faa 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -3076,12 +3076,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) rtx_code cond_code = GET_CODE (cond); /* The true targets for a conditional move. */ - vec targets = vNULL; + auto_vec targets; /* The temporaries introduced to allow us to not consider register overlap. */ - vec temporaries = vNULL; + auto_vec temporaries; /* The insns we've emitted. */ - vec unmodified_insns = vNULL; + auto_vec unmodified_insns; int count = 0; FOR_BB_INSNS (then_bb, insn) @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_i
[PATCH] 02/N Fix memory leaks in IPA
Hi. Following changes were consulted with Martin Jambor to properly release memory in IPA. It fixes leaks which popped up in tramp3d with -O2. Bootstrap and regression tests have been running. Ready after it finishes? Thanks, Martin >From 85b63f738030dd7a901c228ba76e24f820d31c5d Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 9 Nov 2015 12:38:27 +0100 Subject: [PATCH 2/2] Fix memory leaks in IPA. gcc/ChangeLog: 2015-11-09 Martin Liska * ipa-inline-analysis.c (estimate_function_body_sizes): Call body_info release function. * ipa-prop.c (ipa_release_body_info): New function. (ipa_analyze_node): Call the function. (ipa_node_params::~ipa_node_params): Release known_csts. * ipa-prop.h (ipa_release_body_info): Declare. --- gcc/ipa-inline-analysis.c | 2 +- gcc/ipa-prop.c| 20 +++- gcc/ipa-prop.h| 2 +- 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index c07b0da..8c8b8e3 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -2853,7 +2853,7 @@ estimate_function_body_sizes (struct cgraph_node *node, bool early) inline_summaries->get (node)->self_time = time; inline_summaries->get (node)->self_size = size; nonconstant_names.release (); - fbi.bb_infos.release (); + ipa_release_body_info (&fbi); if (opt_for_fn (node->decl, optimize)) { if (!early) diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index d15f0eb..f379ea7 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -2258,6 +2258,19 @@ analysis_dom_walker::before_dom_children (basic_block bb) ipa_compute_jump_functions_for_bb (m_fbi, bb); } +/* Release body info FBI. */ + +void +ipa_release_body_info (struct ipa_func_body_info *fbi) +{ + int i; + struct ipa_bb_info *bi; + + FOR_EACH_VEC_ELT (fbi->bb_infos, i, bi) +free_ipa_bb_info (bi); + fbi->bb_infos.release (); +} + /* Initialize the array describing properties of formal parameters of NODE, analyze their uses and compute jump functions associated with actual arguments of calls from within NODE. */ @@ -2313,11 +2326,7 @@ ipa_analyze_node (struct cgraph_node *node) analysis_dom_walker (&fbi).walk (ENTRY_BLOCK_PTR_FOR_FN (cfun)); - int i; - struct ipa_bb_info *bi; - FOR_EACH_VEC_ELT (fbi.bb_infos, i, bi) -free_ipa_bb_info (bi); - fbi.bb_infos.release (); + ipa_release_body_info (&fbi); free_dominance_info (CDI_DOMINATORS); pop_cfun (); } @@ -3306,6 +3315,7 @@ ipa_node_params::~ipa_node_params () free (lattices); /* Lattice values and their sources are deallocated with their alocation pool. */ + known_csts.release (); known_contexts.release (); lattices = NULL; diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h index b69ee8a..2fe824d 100644 --- a/gcc/ipa-prop.h +++ b/gcc/ipa-prop.h @@ -775,7 +775,7 @@ bool ipa_modify_expr (tree *, bool, ipa_parm_adjustment_vec); ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *, ipa_parm_adjustment_vec, bool); - +void ipa_release_body_info (struct ipa_func_body_info *); /* From tree-sra.c: */ tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, -- 2.6.2
Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
On 30 October 2015 at 16:52, Matthew Wahab wrote: > On 30/10/15 12:51, Christophe Lyon wrote: >> >> On 23 October 2015 at 14:26, Matthew Wahab >> wrote: >>> >>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions, >>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and >>> vqrdmlsh for these instructions. The new intrinsics are of the form >>> vqrdml{as}h[q]_. >>> >>> Tested the series for aarch64-none-linux-gnu with native bootstrap and >>> make check on an ARMv8 architecture. Also tested aarch64-none-elf with >>> cross-compiled check-gcc on an ARMv8.1 emulator. >>> >> >> Is there a publicly available simulator for v8.1? QEMU or Foundation >> Model? >> > > Sorry, I don't know. > Matthew > So, what will happen to the testsuite once this is committed? Are we going to see FAILs when using QEMU? Thanks, Christophe.
RE: [RFC][PATCH] Preferred rename register in regrename pass
Hi Bernd, Sorry for late reply. The updated patch was bootstrapped on x86_64-unknown-linux-gnu and cross tested on mips-img-linux-gnu using r229786. The results below were generated for CSiBE benchmark and the numbers in columns express bytes in format 'net (gain/loss)' to show the difference with and without the patch when -frename-registers switch is used. I looked at the gains, especially for MIPS and 'teem', and it appears that renaming registers affects the rtl_dce pass i.e. makes it less effective. However, on average case the patch appears to reduce the code size slightly and moves are genuinely removed. I haven't tested the performance extensively but the SPEC benchmarks showed almost the same results, which could be just the noise. | MIPS n64 -Os | MIPS o32 -Os | x86_64 -Os | ---+++--+ bzip2-1.0.2| -32 (0/-32) | -24 (0/-24) | -34 (1/-35)| cg_compiler| -172 (0/-172) | -156 (0/-156) | -46 (0/-46)| compiler | -36 (0/-36) | -24 (0/-24) | -6(0/-6) | flex-2.5.31| -68 (0/-68) | -80 (0/-80) | -98 (7/-105) | jikespg-1.3| -284 (0/-284) | -204 (0/-204) | -127 (9/-136) | jpeg-6b| -52 (8/-60) | -20 (0/-20) | -80 (11/-91) | libmspack | -136 (0/-136) | -28 (0/-28) | -33 (23/-56) | libpng-1.2.5 | -72 (0/-72) | -64 (0/-64) | -176 (14/-190) | linux-2.4.23 | -700 (20/-720) | -384 (0/-384) | -691 (44/-735) | lwip-0.5.3 | -4 (0/-4)| -4 (0/-4)| +4(13/-9)| mpeg2dec-0.3.1 | -16 (0/-16) || -142 (6/-148) | mpgcut-1.1 | -24 (0/-24) | -12 (4/-16) | -2(0/-2) | OpenTCP-1.0.4 | -28 (0/-28) | -12 (0/-12) | -1(0/-1) | replaypc-0.4.0 | -32 (0/-32) | -12 (0/-12) | -4(2/-6) | teem-1.6.0 | -88 (480/-568)| +108 (564/-456)| -1272 (117/-1389)| ttt-0.10.1 | -24 (0/-24) | -20 (0/-20) | -16 (0/-16)| unrarlib-0.4.0 | -20 (0/-20) | -8 (0/-8)| -59 (9/-68)| zlib-1.1.4 | -12 (0/-12) | -4 (0/-4)| -23 (8/-31)| | MIPS n64 -O2 | MIPS o32 -O2 | x86_64 -O2 | ---+++--+ bzip2-1.0.2| -104 (0/-104) | -48 (0/-48) | -55 (0/-55)| cg_compiler| -184 (4/-188) | -232 (0/-232) | -31 (5/-36)| compiler | -32 (0/-32) | -12 (0/-12) | -4(1/-5) | flex-2.5.31| -96 (0/-96) | -112 (0/-112) | -12 (34/-46) | jikespg-1.3| -540 (20/-560) | -476 (4/-480) | -154 (30/-184) | jpeg-6b| -112 (16/-128) | -60 (0/-60) | -136 (84/-220) | libmspack | -164 (0/-164) | -40 (0/-40) | -87 (32/-119) | libpng-1.2.5 | -120 (8/-128) | -92 (4/-96) | -140 (53/-193) | linux-2.4.23 | -596 (12/-608) | -320 (8/-328) | -794 (285/-1079)| lwip-0.5.3 | -8 (0/-8)| -8 (0/-8)| +2(4/-2) | mpeg2dec-0.3.1 | -44 (0/-44) | -4 (0/-4)| -122 (8/-130) | mpgcut-1.1 | -8 (0/-8)| -8 (0/-8)| +28 (32/-4)| OpenTCP-1.0.4 | -4 (0/-4)| -4 (0/-4)| -2(0/-2) | replaypc-0.4.0 | -20 (0/-20) | -24 (0/-24) | -13 (0/-13)| teem-1.6.0 | +100 (740/-640)| +84 (736/-652)| -1998 (168/-2166)| ttt-0.10.1 | -16 (0/-16) || | unrarlib-0.4.0 | -16 (0/-16) | -8 (0/-8)| +19 (37/-18) | zlib-1.1.4 | -12 (0/-12) | -4 (0/-4)| -15 (1/-16)| Regards, Robert > Hi Robert, > > gcc/ > > * regrename.c (create_new_chain): Initialize terminated_dead, > > renamed and tied_chain. > > (find_best_rename_reg): Pick and check register from the tied chain. > > (regrename_do_replace): Mark head as renamed. > > (scan_rtx_reg): Tie chains in move insns. Set terminate_dead flag. > > * regrename.h (struct du_head): Add tied_chain, renamed and > > terminated_dead members. > > Thanks - this looks a lot better already. You didn't say how it was > bootstrapped and tested; please include this information for future > submissions. For a patch like this, some data on the improvement you got > would also be appreciated. > > I'd still like to investigate the possibility of further simplification: > > > + { > > + /* Find the input chain. */ > > + for (i = c->id - 1; id_to_chain.iterate (i, &head); i--) > > + if (head->last && head->last->insn == insn > > + && head->terminated_dead) > > + { > > + gcc_assert (head->regno == REGNO (recog_data.operand[1])); > > + c->tied_chain = head; > > + head->tied_chain = c; > > + > > + if (dump_file) > > + fprintf (dump_file, "Tying chain %s (%d) with %s (%d)\n", > > + reg_names[c->regno], c->id, > > + reg_names[head->regno], head->
Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers
On 08/11/15 00:26, charles.bay...@linaro.org wrote: > From: Charles Baylis > > Charles Baylis > > * config/arm/neon.md (neon_vld1_lane): Remove error for invalid > lane number. > (neon_vst1_lane): Likewise. > (neon_vld2_lane): Likewise. > (neon_vst2_lane): Likewise. > (neon_vld3_lane): Likewise. > (neon_vst3_lane): Likewise. > (neon_vld4_lane): Likewise. > (neon_vst4_lane): Likewise. > The only way we can get here is through the intrinsics - we do a check for lane numbers earlier. If things go horribly wrong - the assembler will complain, so it's ok to elide this internal_error here, thus OK. regards Ramana > Change-Id: Id7b4b6fa7320157e62e5bae574b4c4688d921774 > --- > gcc/config/arm/neon.md | 48 > 1 file changed, 8 insertions(+), 40 deletions(-) > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index e8db020..6574e6e 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -4264,8 +4264,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >operands[3] = GEN_INT (lane); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >if (max == 1) > return "vld1.\t%P0, %A1"; >else > @@ -4286,9 +4284,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >operands[3] = GEN_INT (lane); >int regno = REGNO (operands[0]); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4372,8 +4368,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >operands[2] = GEN_INT (lane); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >if (max == 1) > return "vst1.\t{%P1}, %A0"; >else > @@ -4393,9 +4387,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4464,8 +4456,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = gen_rtx_REG (DImode, regno); >ops[1] = gen_rtx_REG (DImode, regno + 2); >ops[2] = operands[1]; > @@ -4489,9 +4479,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4579,8 +4567,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = operands[0]; >ops[1] = gen_rtx_REG (DImode, regno); >ops[2] = gen_rtx_REG (DImode, regno + 2); > @@ -4604,9 +4590,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4723,8 +4707,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[5]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = gen_rtx_REG (DImode, regno); >ops[1] = gen_rtx_REG (DImode, regno + 2); >ops[2] = gen_rtx_REG (DImode, regno + 4); > @@ -4750,9 +4732,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[5]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4895,8 +4875,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >rtx ops[5]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = operands[0]; >ops[1] = gen_rtx_REG (DImode, regno); >ops[2] = gen_rtx_REG (DImode, regno + 2); > @@ -4922,9 +4900,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >rtx ops[5]; > - if (lane < 0 || lane >= max) > -err
Re: OpenACC Firstprivate
On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote: > Index: gcc/gimplify.c > === > --- gcc/gimplify.c(revision 229892) > +++ gcc/gimplify.c(working copy) > @@ -108,9 +108,15 @@ enum omp_region_type >/* Data region with offloading. */ >ORT_TARGET = 32, >ORT_COMBINED_TARGET = 33, > + > + ORT_ACC = 0x40, /* An OpenACC region. */ > + ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct. */ > + ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET, /* Parallel construct */ > + ORT_ACC_KERNELS = ORT_ACC | ORT_TARGET | 0x80, /* Kernels construct. */ > + >/* Dummy OpenMP region, used to disable expansion of > DECL_VALUE_EXPRs in taskloop pre body. */ > - ORT_NONE = 64 > + ORT_NONE = 0x100 > }; If you want to switch to hexadecimal, you should change all values in the enum to hexadecimal for consistency. > > /* Gimplify hashtable helper. */ > @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re >else > c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED; > > + c->combined_loop = false; > + c->distribute = false; > + c->target_map_scalars_firstprivate = false; > + c->target_map_pointers_as_0len_arrays = false; > + c->target_firstprivatize_array_bases = false; Why this? c is XCNEW allocated, so zero initialized. > @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct >/* We shouldn't be re-adding the decl with the same data >sharing class. */ >gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0); > - /* The only combination of data sharing classes we should see is > - FIRSTPRIVATE and LASTPRIVATE. */ >nflags = n->value | flags; > - gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS) > - == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE) > + /* The only combination of data sharing classes we should see is > + FIRSTPRIVATE and LASTPRIVATE. However, OpenACC permits > + reduction variables to be used in data sharing clauses. */ > + gcc_assert ((ctx->region_type & ORT_ACC) != 0 > + || ((nflags & GOVD_DATA_SHARE_CLASS) > + == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)) > || (flags & GOVD_DATA_SHARE_CLASS) == 0); Are you sure you want to give up on any kind of consistency checks for OpenACC? If only reduction is special on OpenACC, perhaps you could tweak the assert for that instead? Something that can be done incrementally of course. > + > + /* OpenMP doesn't look in outer contexts to find an > + enclosing data clause. */ I'm puzzled by the comment. OpenMP does look in outer context for clauses that need that (pretty much all closes but private), that is the do_outer: recursion in omp_notice_variable. Say for firstprivate in order to copy (or copy construct) the private variable one needs the access to the outer context's var etc.). So perhaps it would help to document what you are doing here for OpenACC and why. > + struct gimplify_omp_ctx *octx = ctx->outer_context; > + if ((ctx->region_type & ORT_ACC) && octx) > + { > + omp_notice_variable (octx, decl, in_code); > + > + for (; octx; octx = octx->outer_context) > + { > + if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET))) > + break; > + splay_tree_node n2 > + = splay_tree_lookup (octx->variables, > + (splay_tree_key) decl); > + if (n2) > + { > + nflags |= GOVD_MAP; > + goto found_outer; > + } > + } > } > - else if (nflags == flags) > - nflags |= GOVD_MAP; > + The main issue I have is with the omp-low.c changes. I see: "2.5.9 private clause The private clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang. 2.5.10 firstprivate clause The firstprivate clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang, and that the copy will be initialized with the value of that item on the host when the parallel construct is encountered." but looking at what you actually emit looks like standard present_copyin clause I think with a private variable defined in the region where the value of the present_copyin mapped variable is assigned to the private one. This I'm afraid performs often two copies rather than just one (one to copy the host value to the present_copyin mapped value, another one in the region), but more importantly, if the var is already mapped, you could initialize the private var with old data. Say int arr[64]; // initialize arr #pragma acc data copyin (arr) { // modify arr on the host # pragma acc parallel firstprivate (arr)
Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
On 09/11/15 13:31, Christophe Lyon wrote: On 30 October 2015 at 16:52, Matthew Wahab wrote: On 30/10/15 12:51, Christophe Lyon wrote: On 23 October 2015 at 14:26, Matthew Wahab wrote: The ARMv8.1 architecture extension adds two Adv.SIMD instructions, sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and vqrdmlsh for these instructions. The new intrinsics are of the form vqrdml{as}h[q]_. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check on an ARMv8 architecture. Also tested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1 emulator. Is there a publicly available simulator for v8.1? QEMU or Foundation Model? Sorry, I don't know. Matthew So, what will happen to the testsuite once this is committed? Are we going to see FAILs when using QEMU? No, the check at the top of the test files +/* { dg-require-effective-target arm_v8_1a_neon_hw } */ should make this test UNSUPPORTED if the the HW/simulator can't execute it. (Support for this check is added in patch #5 in this series.) Note that the aarch64-none-linux make check was run on ARMv8 HW which can't execute the test and correctly reported it as unsupported. Matthew
Re: [vec-cmp, patch 2/6] Vectorization factor computation
2015-10-20 16:45 GMT+03:00 Richard Biener : > On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich wrote: >> 2015-10-13 16:37 GMT+03:00 Richard Biener : >>> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich >>> wrote: Hi, This patch handles statements with boolean result in vectorization factor computation. For comparison its operands type is used instead of restult type to compute VF. Other boolean statements are ignored for VF. Vectype for comparison is computed using type of compared values. Computed type is propagated into other boolean operations. >>> >>> This feels rather ad-hoc, mixing up the existing way of computing >>> vector type and VF. I'd rather have turned the whole >>> vector type computation around to the scheme working on the operands >>> rather than on the lhs and then searching >>> for smaller/larger types on the rhs'. >>> >>> I know this is a tricky function (heh, but you make it even worse...). >>> And it needs a helper with knowledge about operations >>> so one can compute the result vector type for an operation on its >>> operands. The seeds should be PHIs (handled like now) >>> and loads, and yes, externals need special handling. >>> >>> Ideally we'd do things in two stages, first compute vector types in a >>> less constrained manner (not forcing a single vector size) >>> and then in a 2nd run promote to a common size also computing the VF to do >>> that. >> >> This sounds like a refactoring, not a functional change, right? Also I >> don't see a reason to analyze DF to compute vectypes if we promote it >> to a single vector size anyway. For booleans we have to do it because >> boolean vectors of the same size may have different number of >> elements. What is the reason to do it for other types? > > For conversions and operators which support different sized operands That's what we handle in vector patterns and use some helper functions to determine vectypes there. Looks like this refactoring would affects patterns significantly. Probably compute vectypes before searching for patterns? > >> Shouldn't it be a patch independent from comparison vectorization series? > > As you like. I'd like to move on with vector comparison and consider VF computation refactoring when it's stabilized. This patch is the last one (except target ones) not approved in all vector comparison related series. Would it be OK to go on with it in a current shape? Thanks, Ilya
Re: OpenACC Firstprivate
On 11/09/15 08:46, Jakub Jelinek wrote: On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote: Index: gcc/gimplify.c === If you want to switch to hexadecimal, you should change all values in the enum to hexadecimal for consistency. ok. /* Gimplify hashtable helper. */ @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re else c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED; + c->combined_loop = false; + c->distribute = false; + c->target_map_scalars_firstprivate = false; + c->target_map_pointers_as_0len_arrays = false; + c->target_firstprivatize_array_bases = false; Why this? c is XCNEW allocated, so zero initialized. I presumed it necessary, as it was on the branch. will remove. @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct /* We shouldn't be re-adding the decl with the same data sharing class. */ gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0); - /* The only combination of data sharing classes we should see is -FIRSTPRIVATE and LASTPRIVATE. */ nflags = n->value | flags; - gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS) - == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE) + /* The only combination of data sharing classes we should see is +FIRSTPRIVATE and LASTPRIVATE. However, OpenACC permits +reduction variables to be used in data sharing clauses. */ + gcc_assert ((ctx->region_type & ORT_ACC) != 0 + || ((nflags & GOVD_DATA_SHARE_CLASS) + == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)) || (flags & GOVD_DATA_SHARE_CLASS) == 0); Are you sure you want to give up on any kind of consistency checks for OpenACC? If only reduction is special on OpenACC, perhaps you could tweak the assert for that instead? Something that can be done incrementally of course. Will investigate (later) + + /* OpenMP doesn't look in outer contexts to find an + enclosing data clause. */ I'm puzzled by the comment. OpenMP does look in outer context for clauses that need that (pretty much all closes but private), that is the do_outer: recursion in omp_notice_variable. Say for firstprivate in order to copy (or copy construct) the private variable one needs the access to the outer context's var etc.). So perhaps it would help to document what you are doing here for OpenACC and why. Ok. It seemed (and it may become clearer with default handling added), that OpenACC and OpenMP scanned scopes in opposite orders. I remember trying to get the ACC code to scan in the same order, but came up blank. Anyway, you're right, it should say what OpenACC is trying. The main issue I have is with the omp-low.c changes. I see: "2.5.9 private clause The private clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang. 2.5.10 firstprivate clause The firstprivate clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang, and that the copy will be initialized with the value of that item on the host when the parallel construct is encountered." but looking at what you actually emit looks like standard present_copyin clause I think with a private variable defined in the region where the value of the present_copyin mapped variable is assigned to the private one. This I'm afraid performs often two copies rather than just one (one to copy the host value to the present_copyin mapped value, another one in the region), I don't think that can be avoided. The host doesn't have control over when the CTAs (a gang) start -- they may even be serialized onto the same physical HW. So each gang has to initialize its own instance. Or did you mean something else? but more importantly, if the var is already mapped, you could initialize the private var with old data. Say int arr[64]; // initialize arr #pragma acc data copyin (arr) { // modify arr on the host # pragma acc parallel firstprivate (arr) { ... } } Hm, I suspect that is either ill formed or the std does not contemplate. Is that really what you want? If not, any reason not to implement GOMP_MAP_FIRSTPRIVATE and GOMP_MAP_FIRSTPRIVATE_INT on the libgomp oacc-* side and just use the OpenMP firstprivate handling in omp-low.c? I would have to investigate ... nathan
Re: [PATCH] Use signed boolean type for boolean vectors
On 03 Nov 14:42, Richard Biener wrote: > On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich wrote: > > 2015-10-28 18:21 GMT+03:00 Richard Biener : > >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich > >> wrote: > >>> Hi, > >>> > >>> Testing boolean vector conversions I found several runtime regressions > >>> and investigation showed it's due to incorrect conversion caused by > >>> unsigned boolean type. When boolean vector is represented as an > >>> integer vector on target it's a signed integer actually. Unsigned > >>> boolean type was chosen due to possible single bit values, but for > >>> multiple bit values it causes wrong casting. The easiest way to fix > >>> it is to use signed boolean value. The following patch does this and > >>> fixes my problems with conversion. Bootstrapped and tested on > >>> x86_64-unknown-linux-gnu. Is it OK? > >> > >> Hmm. Actually formally the "boolean" vectors were always 0 or -1 > >> (all bits set). That is also true for a signed boolean with precision 1 > >> but with higher precision what makes sure to sign-extend 'true'? > >> > >> So it's far from an obvious change, esp as you don't change the > >> precision == 1 case. [I still think we should have precision == 1 > >> for all boolean types] > >> > >> Richard. > >> > > > > For 1 bit precision signed type value 1 is out of range, right? This might > > break > > in many place due to used 1 as true value. > > For vectors -1 is true. Did you try whether it breaks many places? > build_int_cst (type, 1) should still work fine. > > Richard. > I tried it and didn't find any new failures. So looks I was wrong assuming it should cause many failures. Testing is not complete because many SPEC benchmarks are failing to compile on -O3 for AVX-512 on trunk. But I think we may proceed with signed type and fix constant generation issues if any revealed. This patch was bootstrapped and regtested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2015-11-09 Ilya Enkovich * optabs.c (expand_vec_cond_expr): Always get sign from type. * tree.c (wide_int_to_tree): Support negative values for boolean. (build_nonstandard_boolean_type): Use signed type for booleans. diff --git a/gcc/optabs.c b/gcc/optabs.c index fdcdc6a..44971ad 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, op0a = TREE_OPERAND (op0, 0); op0b = TREE_OPERAND (op0, 1); tcode = TREE_CODE (op0); - unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); } else { @@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, op0a = op0; op0b = build_zero_cst (TREE_TYPE (op0)); tcode = LT_EXPR; - unsignedp = false; } cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a)); + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode) diff --git a/gcc/tree.c b/gcc/tree.c index 18d6544..6fb4c09 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref &pcst) case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; - if (hwi < 2) + if (IN_RANGE (hwi, 0, 1)) ix = hwi; break; @@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT precision) type = make_node (BOOLEAN_TYPE); TYPE_PRECISION (type) = precision; - fixup_unsigned_type (type); + fixup_signed_type (type); if (precision <= MAX_INT_CACHED_PREC) nonstandard_boolean_type_cache[precision] = type;
Re: OpenACC Firstprivate
On 11/09/15 08:59, Nathan Sidwell wrote: On 11/09/15 08:46, Jakub Jelinek wrote: On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote: Say int arr[64]; // initialize arr #pragma acc data copyin (arr) { // modify arr on the host # pragma acc parallel firstprivate (arr) { ... } } Hm, I suspect that is either ill formed or the std does not contemplate. just realized, there are two ways to consider the above. 1) it's ill formed. Once you've transferred data to the device, modifying it on the host is unspecified. I'm having trouble finding words in the std that actually say that though :( 2) on a system with shared physical global memory, the host modification would be visiable on the device (possibly at an arbitrary point due to lack of synchronization primitive?) I don't think this changes 'why not use OpenMP's ...' question, because IIUC you think that can be made to DTRT anyway? nathan
Re: RFC: C++ delayed folding merge
On 11/09/2015 04:24 AM, Eric Botcazou wrote: One question: The branch changes 'convert' to not fold its result, and it's not clear to me whether that's part of the expected behavior of a front end 'convert' function or not. I don't think that you should change the behavior for front-ends that have an internal representation distinct from the GENERIC trees and thus do a global translation to GENERIC at the end; e.g. in the Ada compiler we'd rather have *more* folding than less during this translation. Right, the change is just to the C++ front end 'convert'. Jason
Re: OpenACC Firstprivate
On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote: > >This I'm afraid performs often two copies rather than just one (one to copy > >the host value to the present_copyin mapped value, another one in the > >region), > > I don't think that can be avoided. The host doesn't have control over when > the CTAs (a gang) start -- they may even be serialized onto the same > physical HW. So each gang has to initialize its own instance. Or did you > mean something else? So, what is the scope of the private and firstprivate vars in OpenACC? In OpenMP if a variable is private or firstprivate on the target construct, unless further privatized in inner constructs it is really shared among all the threads in all the teams (ro one var per all CTAs/workers in PTX terms). Is that the case for OpenACC too, or are the vars e.g. private to each CTA already or to each thread in each CTA, something different? If they are shared by all CTAs, then you should hopefully be able to use the GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use those to provide you the initializer data to initialize your private vars from as a cheaper alternative to mapping. Jakub
Re: [Patch AArch64] Switch constant pools to separate rodata sections.
On 08/11/15 11:42, Andreas Schwab wrote: > This is causing a bootstrap comparison failure in gcc/go/gogo.o. I'm looking into this - this is now PR68256. regards Ramana > > Andreas. >
Re: [PATCH] Fix memory leaks and use a pool_allocator
On Mon, Nov 9, 2015 at 2:26 PM, Martin Liška wrote: > On 11/09/2015 01:11 PM, Richard Biener wrote: >> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška wrote: >>> Hi. >>> >>> This is follow-up of changes that Richi started on Friday. >>> >>> Patch can bootstrap on x86_64-linux-pc and regression tests are running. >>> >>> Ready for trunk? >> >> * tree-ssa-dom.c (free_edge_info): Make the function extern. >> ... >> * tree-ssa.h (free_edge_info): Declare function extern. >> >> declare this in tree-ssa-threadupdate.h instead and renaming it to >> sth less "public", like free_dom_edge_info. >> >> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c >> index fff62de..eb6b7df 100644 >> --- a/gcc/ifcvt.c >> +++ b/gcc/ifcvt.c >> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info >> *if_info) >>set_used_flags (targets[i]); >> } >> >> + temporaries.release (); >> + >>set_used_flags (cond); >>set_used_flags (x); >>set_used_flags (y); >> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info >> *if_info) >> } >> >>num_updated_if_blocks++; >> + targets.release (); >>return TRUE; >> >> suspiciously look like candidates for an auto_vec<> (didn't check). >> >> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) >>dead_set = sparseset_alloc (max_regno); >>unused_set = sparseset_alloc (max_regno); >>curr_point = 0; >> + point_freq_vec.release (); >>point_freq_vec.create (get_max_uid () * 2); >> >> a truncate (0) instead of a release () should be cheaper, avoiding the >> re-allocation. >> >> @@ -674,6 +674,10 @@ sra_deinitialize (void) >>assign_link_pool.release (); >>obstack_free (&name_obstack, NULL); >> >> + for (hash_map >::iterator it = >> + base_access_vec->begin (); it != base_access_vec->end (); ++it) >> +(*it).second.release (); >> + >>delete base_access_vec; >> >> I wonder if the better fix is to provide a proper free method for the >> hash_map? >> A hash_map with 'auto_vec' looks suspicous - eventually a proper release >> was intented here via default_hash_map_traits <>? >> >> Anyway, most of the things above can be improved as followup of course. >> >> Thanks, >> Richard. >> >>> Thanks, >>> Martin > > Hi. > > All suggested changes were applied, sending v2 and waiting for bootstrap and > regression tests. Ok. Thanks, Richard. > Thanks, > Martin
Re: [PATCH] 02/N Fix memory leaks in IPA
On Mon, Nov 9, 2015 at 2:29 PM, Martin Liška wrote: > Hi. > > Following changes were consulted with Martin Jambor to properly release > memory in IPA. It fixes leaks which popped up in tramp3d with -O2. > > Bootstrap and regression tests have been running. > > Ready after it finishes? Ok. Richard. > Thanks, > Martin
Re: [PATCH] Use signed boolean type for boolean vectors
On Mon, Nov 9, 2015 at 3:03 PM, Ilya Enkovich wrote: > On 03 Nov 14:42, Richard Biener wrote: >> On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich >> wrote: >> > 2015-10-28 18:21 GMT+03:00 Richard Biener : >> >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich >> >> wrote: >> >>> Hi, >> >>> >> >>> Testing boolean vector conversions I found several runtime regressions >> >>> and investigation showed it's due to incorrect conversion caused by >> >>> unsigned boolean type. When boolean vector is represented as an >> >>> integer vector on target it's a signed integer actually. Unsigned >> >>> boolean type was chosen due to possible single bit values, but for >> >>> multiple bit values it causes wrong casting. The easiest way to fix >> >>> it is to use signed boolean value. The following patch does this and >> >>> fixes my problems with conversion. Bootstrapped and tested on >> >>> x86_64-unknown-linux-gnu. Is it OK? >> >> >> >> Hmm. Actually formally the "boolean" vectors were always 0 or -1 >> >> (all bits set). That is also true for a signed boolean with precision 1 >> >> but with higher precision what makes sure to sign-extend 'true'? >> >> >> >> So it's far from an obvious change, esp as you don't change the >> >> precision == 1 case. [I still think we should have precision == 1 >> >> for all boolean types] >> >> >> >> Richard. >> >> >> > >> > For 1 bit precision signed type value 1 is out of range, right? This might >> > break >> > in many place due to used 1 as true value. >> >> For vectors -1 is true. Did you try whether it breaks many places? >> build_int_cst (type, 1) should still work fine. >> >> Richard. >> > > I tried it and didn't find any new failures. So looks I was wrong assuming > it should cause many failures. Testing is not complete because many SPEC > benchmarks are failing to compile on -O3 for AVX-512 on trunk. But I think > we may proceed with signed type and fix constant generation issues if any > revealed. This patch was bootstrapped and regtested on > x86_64-unknown-linux-gnu. OK for trunk? Ok. Richard. > Thanks, > Ilya > -- > gcc/ > > 2015-11-09 Ilya Enkovich > > * optabs.c (expand_vec_cond_expr): Always get sign from type. > * tree.c (wide_int_to_tree): Support negative values for boolean. > (build_nonstandard_boolean_type): Use signed type for booleans. > > > diff --git a/gcc/optabs.c b/gcc/optabs.c > index fdcdc6a..44971ad 100644 > --- a/gcc/optabs.c > +++ b/gcc/optabs.c > @@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, > tree op1, tree op2, >op0a = TREE_OPERAND (op0, 0); >op0b = TREE_OPERAND (op0, 1); >tcode = TREE_CODE (op0); > - unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); > } >else > { > @@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, > tree op1, tree op2, >op0a = op0; >op0b = build_zero_cst (TREE_TYPE (op0)); >tcode = LT_EXPR; > - unsignedp = false; > } >cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a)); > + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); > > >gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode) > diff --git a/gcc/tree.c b/gcc/tree.c > index 18d6544..6fb4c09 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref &pcst) > case BOOLEAN_TYPE: > /* Cache false or true. */ > limit = 2; > - if (hwi < 2) > + if (IN_RANGE (hwi, 0, 1)) > ix = hwi; > break; > > @@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT > precision) > >type = make_node (BOOLEAN_TYPE); >TYPE_PRECISION (type) = precision; > - fixup_unsigned_type (type); > + fixup_signed_type (type); > >if (precision <= MAX_INT_CACHED_PREC) > nonstandard_boolean_type_cache[precision] = type;
Re: [vec-cmp, patch 2/6] Vectorization factor computation
On Mon, Nov 9, 2015 at 2:54 PM, Ilya Enkovich wrote: > 2015-10-20 16:45 GMT+03:00 Richard Biener : >> On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich >> wrote: >>> 2015-10-13 16:37 GMT+03:00 Richard Biener : On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich wrote: > Hi, > > This patch handles statements with boolean result in vectorization factor > computation. For comparison its operands type is used instead of restult > type to compute VF. Other boolean statements are ignored for VF. > > Vectype for comparison is computed using type of compared values. > Computed type is propagated into other boolean operations. This feels rather ad-hoc, mixing up the existing way of computing vector type and VF. I'd rather have turned the whole vector type computation around to the scheme working on the operands rather than on the lhs and then searching for smaller/larger types on the rhs'. I know this is a tricky function (heh, but you make it even worse...). And it needs a helper with knowledge about operations so one can compute the result vector type for an operation on its operands. The seeds should be PHIs (handled like now) and loads, and yes, externals need special handling. Ideally we'd do things in two stages, first compute vector types in a less constrained manner (not forcing a single vector size) and then in a 2nd run promote to a common size also computing the VF to do that. >>> >>> This sounds like a refactoring, not a functional change, right? Also I >>> don't see a reason to analyze DF to compute vectypes if we promote it >>> to a single vector size anyway. For booleans we have to do it because >>> boolean vectors of the same size may have different number of >>> elements. What is the reason to do it for other types? >> >> For conversions and operators which support different sized operands > > That's what we handle in vector patterns and use some helper functions > to determine vectypes there. Looks like this refactoring would affects > patterns significantly. Probably compute vectypes before searching for > patterns? > >> >>> Shouldn't it be a patch independent from comparison vectorization series? >> >> As you like. > > I'd like to move on with vector comparison and consider VF computation > refactoring when it's stabilized. This patch is the last one (except > target ones) not approved in all vector comparison related series. > Would it be OK to go on with it in a current shape? Yes. Thanks, Richard. > Thanks, > Ilya
Re: [vec-cmp, patch 3/6] Vectorize comparison
On Mon, Nov 9, 2015 at 1:07 PM, Ilya Enkovich wrote: > On 26 Oct 16:09, Richard Biener wrote: >> On Wed, Oct 14, 2015 at 6:12 PM, Ilya Enkovich >> wrote: >> > + >> > + ops.release (); >> > + vec_defs.release (); >> >> No need to release auto_vec<>s at the end of scope explicitely. > > Fixed > >> >> > + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2); >> > + new_stmt = gimple_build_assign (mask, vec_compare); >> > + new_temp = make_ssa_name (mask, new_stmt); >> > + gimple_assign_set_lhs (new_stmt, new_temp); >> >> new_temp = make_ssa_name (mask); >> gimple_build_assign (new_temp, code, vec_rhs1, vec_rhs2); >> >> for the 4 stmts above. > > Fixed > >> >> > + >> > + vec_oprnds0.release (); >> > + vec_oprnds1.release (); >> >> Please use auto_vec<>s. > > These are used to hold vecs returned by vect_get_slp_defs. Thus can't > use auto_vec. Ok. Richard. >> >> Ok with those changes. >> >> RIchard. >> > > > gcc/ > > 2015-11-09 Ilya Enkovich > > * tree-vect-data-refs.c (vect_get_new_vect_var): Support > vect_mask_var. > (vect_create_destination_var): Likewise. > * tree-vect-stmts.c (vectorizable_comparison): New. > (vect_analyze_stmt): Add vectorizable_comparison. > (vect_transform_stmt): Likewise. > * tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var. > (enum stmt_vec_info_type): Add comparison_vec_info_type. > (vectorizable_comparison): New. > > > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index 11bce79..926752b 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -3790,6 +3790,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind > var_kind, const char *name) >case vect_scalar_var: > prefix = "stmp"; > break; > + case vect_mask_var: > +prefix = "mask"; > +break; >case vect_pointer_var: > prefix = "vectp"; > break; > @@ -4379,7 +4382,11 @@ vect_create_destination_var (tree scalar_dest, tree > vectype) >tree type; >enum vect_var_kind kind; > > - kind = vectype ? vect_simple_var : vect_scalar_var; > + kind = vectype > +? VECTOR_BOOLEAN_TYPE_P (vectype) > +? vect_mask_var > +: vect_simple_var > +: vect_scalar_var; >type = vectype ? vectype : TREE_TYPE (scalar_dest); > >gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME); > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index f1216c8..ee549f4 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -7416,6 +7416,185 @@ vectorizable_condition (gimple *stmt, > gimple_stmt_iterator *gsi, >return true; > } > > +/* vectorizable_comparison. > + > + Check if STMT is comparison expression that can be vectorized. > + If VEC_STMT is also passed, vectorize the STMT: create a vectorized > + comparison, put it in VEC_STMT, and insert it at GSI. > + > + Return FALSE if not a vectorizable STMT, TRUE otherwise. */ > + > +bool > +vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi, > +gimple **vec_stmt, tree reduc_def, > +slp_tree slp_node) > +{ > + tree lhs, rhs1, rhs2; > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + tree vectype1 = NULL_TREE, vectype2 = NULL_TREE; > + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE; > + tree new_temp; > + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > + enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type}; > + unsigned nunits; > + int ncopies; > + enum tree_code code; > + stmt_vec_info prev_stmt_info = NULL; > + int i, j; > + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); > + vec vec_oprnds0 = vNULL; > + vec vec_oprnds1 = vNULL; > + gimple *def_stmt; > + tree mask_type; > + tree mask; > + > + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > +return false; > + > + mask_type = vectype; > + nunits = TYPE_VECTOR_SUBPARTS (vectype); > + > + if (slp_node || PURE_SLP_STMT (stmt_info)) > +ncopies = 1; > + else > +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; > + > + gcc_assert (ncopies >= 1); > + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > +return false; > + > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def > + && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle > + && reduc_def)) > +return false; > + > + if (STMT_VINFO_LIVE_P (stmt_info)) > +{ > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > +"value used after loop.\n"); > + return false; > +} > + > + if (!is_gimple_assign (stmt)) > +return false; > + > + code = gimple_assign_rhs_code (stmt); > + > + if (TREE_CODE_CLASS (code) != tcc_comparison) > +return false; > + > + rhs1 = gimple_assign_rhs1 (stmt); > + rhs2 = gimple_assign_rhs
Re: [vec-cmp, patch 4/6] Support vector mask invariants
On Mon, Nov 9, 2015 at 1:11 PM, Ilya Enkovich wrote: > On 26 Oct 16:21, Richard Biener wrote: >> On Wed, Oct 14, 2015 at 6:13 PM, Ilya Enkovich >> wrote: >> > - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val); >> > + { >> > + /* Can't use VIEW_CONVERT_EXPR for booleans because >> > +of possibly different sizes of scalar value and >> > +vector element. */ >> > + if (VECTOR_BOOLEAN_TYPE_P (type)) >> > + { >> > + if (integer_zerop (val)) >> > + val = build_int_cst (TREE_TYPE (type), 0); >> > + else if (integer_onep (val)) >> > + val = build_int_cst (TREE_TYPE (type), 1); >> > + else >> > + gcc_unreachable (); >> > + } >> > + else >> > + val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), >> > val); >> >> I think the existing code is fine with using fold_convert () here >> which should also work >> for the boolean types. So does just >> >> val = fold_convert (TREE_TYPE (type), val); >> >> work? > > It seems to work OK. > >> >> > @@ -7428,13 +7459,13 @@ vectorizable_condition (gimple *stmt, >> > gimple_stmt_iterator *gsi, >> > gimple *gtemp; >> > vec_cond_lhs = >> > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), >> > - stmt, NULL); >> > + stmt, NULL, comp_vectype); >> > vect_is_simple_use (TREE_OPERAND (cond_expr, 0), stmt, >> > loop_vinfo, >emp, &def, &dts[0]); >> > >> > vec_cond_rhs = >> > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), >> > - stmt, NULL); >> > + stmt, NULL, comp_vectype); >> > vect_is_simple_use (TREE_OPERAND (cond_expr, 1), stmt, >> > loop_vinfo, >emp, &def, &dts[1]); >> >> I still don't like this very much but I guess without some major >> refactoring of all >> the functions there isn't a better way to do it for now. >> >> Thus, ok with trying the change suggested above. >> >> Thanks, >> Richard. >> > > Here is an updated version. Ok. Thanks, Richard. > Thanks, > Ilya > -- > gcc/ > > 2015-11-09 Ilya Enkovich > > * expr.c (const_vector_mask_from_tree): New. > (const_vector_from_tree): Use const_vector_mask_from_tree > for boolean vectors. > * tree-vect-stmts.c (vect_init_vector): Support boolean vector > invariants. > (vect_get_vec_def_for_operand): Add VECTYPE arg. > (vectorizable_condition): Directly provide vectype for invariants > used in comparison. > * tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE > arg. > > > diff --git a/gcc/expr.c b/gcc/expr.c > index 2b2174f..03936ee 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -11423,6 +11423,40 @@ try_tablejump (tree index_type, tree index_expr, > tree minval, tree range, >return 1; > } > > +/* Return a CONST_VECTOR rtx representing vector mask for > + a VECTOR_CST of booleans. */ > +static rtx > +const_vector_mask_from_tree (tree exp) > +{ > + rtvec v; > + unsigned i; > + int units; > + tree elt; > + machine_mode inner, mode; > + > + mode = TYPE_MODE (TREE_TYPE (exp)); > + units = GET_MODE_NUNITS (mode); > + inner = GET_MODE_INNER (mode); > + > + v = rtvec_alloc (units); > + > + for (i = 0; i < VECTOR_CST_NELTS (exp); ++i) > +{ > + elt = VECTOR_CST_ELT (exp, i); > + > + gcc_assert (TREE_CODE (elt) == INTEGER_CST); > + if (integer_zerop (elt)) > + RTVEC_ELT (v, i) = CONST0_RTX (inner); > + else if (integer_onep (elt) > + || integer_minus_onep (elt)) > + RTVEC_ELT (v, i) = CONSTM1_RTX (inner); > + else > + gcc_unreachable (); > +} > + > + return gen_rtx_CONST_VECTOR (mode, v); > +} > + > /* Return a CONST_VECTOR rtx for a VECTOR_CST tree. */ > static rtx > const_vector_from_tree (tree exp) > @@ -11438,6 +11472,9 @@ const_vector_from_tree (tree exp) >if (initializer_zerop (exp)) > return CONST0_RTX (mode); > > + if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) > +return const_vector_mask_from_tree (exp); > + >units = GET_MODE_NUNITS (mode); >inner = GET_MODE_INNER (mode); > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index ee549f4..af203ab 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -1300,7 +1300,7 @@ vect_init_vector (gimple *stmt, tree val, tree type, > gimple_stmt_iterator *gsi) >if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val))) > { > if (CONSTANT_CLASS_P (val)) > - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type
Re: Extend tree-call-cdce to calls whose result is used
Hi, On Sat, 7 Nov 2015, Richard Sandiford wrote: > For -fmath-errno, builtins.c currently expands calls to sqrt to: > > y = sqrt_optab (x); > if (y != y) > [ sqrt (x); or errno = EDOM; ] > > - the call to sqrt is protected by the result of the optab rather > than the input. It would be better to check !(x >= 0), like > tree-call-cdce.c does. It depends. With fast-math (and hence without NaNs) you can trivially optimize away a (y != y) test. You can't do so with !(x>=0) at all. > - the branch isn't exposed at the gimple level and so gets little > high-level optimisation. > > - we do this for log too, but for log a zero input produces > -inf rather than a NaN, and sets errno to ERANGE rather than EDOM. > > This patch moves the code to tree-call-cdce.c instead, This somehow feels wrong. Dead-code elimination doesn't have anything to do with the transformation you want, it rather is rewriting all feasible calls into something else, like fold_builtins does. Also cdce currently doesn't seem to do any checks on the fast-math flags, so I wonder if some of the conditions that you now also insert for calls whose results are used stay until final code. > Previously the pass was only enabled by default at -O2 or above, but the > old builtins.c code was enabled at -O. The patch therefore enables the > pass at -O as well. The pass is somewhat expensive in that it removes dominator info and schedules a full ssa update. The transformation is trivial enough that dominators and SSA form can be updated on the fly, I think without that it's not feasible for -O. But as said I think this transformation should better be moved into builtin folding (or other call folding), at which point also the fast-math flags can be checked. The infrastructure routines of tree-call-cdce can be used there of course. If so moved the cdce pass would be subsumed by that btw. (because the dead call result will be trivially exposed), and that would be a good thing. Ciao, Michael.
[PATCH] Fix PR68248
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-11-09 Richard Biener PR tree-optimization/68248 * tree-vect-generic.c (expand_vector_operations_1): Handle scalar rhs2. * gcc.dg/torture/pr68248.c: New testcase. Index: gcc/tree-vect-generic.c === *** gcc/tree-vect-generic.c (revision 230003) --- gcc/tree-vect-generic.c (working copy) *** expand_vector_operations_1 (gimple_stmt_ *** 1527,1532 --- 1528,1535 tree srhs1, srhs2 = NULL_TREE; if ((srhs1 = ssa_uniform_vector_p (rhs1)) != NULL_TREE && (rhs2 == NULL_TREE + || (! VECTOR_TYPE_P (TREE_TYPE (rhs2)) + && (srhs2 = rhs2)) || (srhs2 = ssa_uniform_vector_p (rhs2)) != NULL_TREE) /* As we query direct optabs restrict to non-convert operations. */ && TYPE_MODE (TREE_TYPE (type)) == TYPE_MODE (TREE_TYPE (srhs1))) Index: gcc/testsuite/gcc.dg/torture/pr68248.c === *** gcc/testsuite/gcc.dg/torture/pr68248.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr68248.c (working copy) *** *** 0 --- 1,20 + /* { dg-do compile } */ + + int a, b, c, d; + + int + fn1 (int p1) + { + return a > 0 ? p1 : p1 >> a; + } + + void + fn2 () + { + char e; + for (; c; c++) + { + e = fn1 (!d ^ 2); + b ^= e; + } + }
Re: OpenACC Firstprivate
On 11/09/15 09:10, Jakub Jelinek wrote: On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote: This I'm afraid performs often two copies rather than just one (one to copy the host value to the present_copyin mapped value, another one in the region), I don't think that can be avoided. The host doesn't have control over when the CTAs (a gang) start -- they may even be serialized onto the same physical HW. So each gang has to initialize its own instance. Or did you mean something else? So, what is the scope of the private and firstprivate vars in OpenACC? In OpenMP if a variable is private or firstprivate on the target construct, unless further privatized in inner constructs it is really shared among all the threads in all the teams (ro one var per all CTAs/workers in PTX terms). Is that the case for OpenACC too, or are the vars e.g. private to each CTA already or to each thread in each CTA, something different? If they are shared by all CTAs, then you should hopefully be able to use the GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use those to provide you the initializer data to initialize your private vars from as a cheaper alternative to mapping. I'm going to try and get clarification, but I think the intent is to initialize with the value seen on the device. Consider: int foo = 0; #pragma acc data copyin(foo) { #pragma acc parallel present(foo) { foo = 2; } if (expr){ #pragma update host (foo) } #pragma acc parallel firstprivate (foo) { // which initialization value? } } Here we copy data to the device, then set it a distinct value there. We conditionally update the host's instance from the device. My thinking is that the intent of the firstprivate is to initialize with the value known on the device (and behave as-if copyin, if it's not there). Not the value most recently seen on the host -- the update clause could change that, and may well be being used as a debugging aide, so it seems bizarre that it can change program semantics in such a way.
Re: [PATCH v2 11/13] Test case for conversion from __seg_tls:0
On Tue, Oct 20, 2015 at 11:27 PM, Richard Henderson wrote: > --- > gcc/testsuite/gcc.target/i386/addr-space-3.c | 10 ++ > 1 file changed, 10 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-3.c > > diff --git a/gcc/testsuite/gcc.target/i386/addr-space-3.c > b/gcc/testsuite/gcc.target/i386/addr-space-3.c > new file mode 100644 > index 000..63f1f03 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O" } */ > +/* { dg-final { scan-assembler "[fg]s:0" } } */ Causes ERROR: (DejaGnu) proc "fg" does not exist. The error code is NONE The info on the error is: close: spawn id exp6 not open while executing "close -i exp6" invoked from within "catch "close -i $spawn_id"" > + > +void test(int *y) > +{ > + int *x = (int __seg_tls *)0; > + if (x == y) > +asm(""); > +} > -- > 2.4.3 >
Re: Add null identifiers to genmatch
On Mon, Nov 9, 2015 at 12:17 AM, Jeff Law wrote: > On 11/07/2015 07:31 AM, Pedro Alves wrote: >> >> Hi Richard, >> >> Passerby comment below. >> >> On 11/07/2015 01:21 PM, Richard Sandiford wrote: >>> >>> -/* Lookup the identifier ID. */ >>> +/* Lookup the identifier ID. Allow "null" if ALLOW_NULL. */ >>> >>> id_base * >>> -get_operator (const char *id) >>> +get_operator (const char *id, bool allow_null = false) >>> { >>> + if (allow_null && strcmp (id, "null") == 0) >>> +return null_id; >>> + >>> id_base tem (id_base::CODE, id); >> >> >> Boolean params are best avoided if possible, IMO. In this case, >> it seems this could instead be a new wrapper function, like: > > This hasn't been something we've required for GCC.I've come across this > recommendation a few times over the last several months as I continue to > look at refactoring and best practices for codebases such as GCC. > > By encoding the boolean in the function's signature, it (IMHO) does make the > code a bit easier to read, primarily because you don't have to go lookup the > tense of the boolean). The problem is when the boolean is telling us some > property an argument, but there's more than one argument and other similar > situations. > > I wonder if the real benefit is in the refactoring necessary to do things in > this way without a ton of code duplication. I think the patch is ok as-is. Thus ok. Thanks, Richard. > Jeff > >
Re: Add a combined_fn enum
Bernd Schmidt writes: > On 11/09/2015 11:24 AM, Richard Sandiford wrote: >> Bernd Schmidt writes: >>> I see it's already acked, but have you considered just doing away with >>> the builtin/internal function distinction? >> >> I think they're too different to be done away with entirely. built-in >> functions map directly to a specific C-level callable function and >> must have an fndecl, whereas no internal function should have an fndecl. >> Whether a built-in function is available depends on the selected >> language and what declarations the front-end has seen, while whether >> an internal function is available depends entirely on GCC internal >> information. > > Yes... but aren't these things fixable relatively easily (compared with > what your patches are doing)? I'm not sure what you mean by "fix" though. I don't think we can change any of the constraints above. > I also have the problem that I can't quite see where your patch series > is going. Let's take "Add internal bitcount functions", it adds new > internal functions but no users AFAICS. What is the end goal here (there > doesn't seem to be a [0/N] description in my inbox)? The main goal is to allow functions to be vectorised simply by defining the associated optab. At the moment you can get a scalar square root instruction simply by defining something like sqrtdf2. But if you want to have vectorised sqrt, you need to have a target-specific C-level built-in function for the vector form of sqrt, implement TARGET_BUILTIN_VECTORIZED_FUNCTION, and expand the sqrt in the same way that the target expands other directly-called built-in functions. That seems unnecessarily indirect, especially when in practice those target-specific functions tend to use patterns like sqrtv2df2 anyway. It seems better to have GCC use the vector optabs directly. This was prompted by the patch Dave Sherwood posted to support scalar and vector fmin() and fmax() even with -fno-fast-math on aarch64. As things stood we needed the same approach: use an optab for the scalar version and TARGET_BUILTIN_VECTORIZED_FUNCTION for the vector version. The problem is that at present there's no aarch64 built-in function that does what we want, so we'd have to define a new one. And that seems silly when GCC really ought to be able to use the vector version of the optab without any more help from the target. I'm hoping to post those patches later today. But the series has other side-benefits, like: - allowing genmatch to fold calls to internal functions - splitting the computational part of maths function from the the fallback errno handling at an earlier point, so that they get more optimisation - clearly separating out the call folds that we're prepared to do on gimple, rather than calling into builtins.c. Thanks, Richard
Re: [PATCH] PR/67682, break SLP groups up if only some elements match
On 06/11/15 12:55, Richard Biener wrote: > >> + /* GROUP_GAP of the first group now has to skip over the second group >> too. */ >> + GROUP_GAP (first_vinfo) += group2_size; > > Please add a MSG_NOTE debug printf stating that we split the group and > at which element. Done. > I think you want to add && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt)) > otherwise this could be SLP reductions where there is no way the split > group would enable vectorization. Ah, I had thought that the (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) check sufficed for that, as per e.g. the section above /* Create a node (a root of the SLP tree) for the packed grouped stores. */ But done. > Note that BB vectorization now also very aggressively falls back to > considering > non-matches being "external". > > Not sure why that doesn't trigger for your testcases ;) I tested against trunk r229944, on which all of my scan-tree-dump's were failing > I'm comfortable with the i < group_size half of the patch. For the other > piece > I'd like to see more compile-time / success data from, say, building > SPEC CPU 2006. Well, as a couple of quick data points, a compile of SPEC2000 on aarch64-none-linux-gnu (-Ofast -fomit-frame-pointer -mcpu=cortex-a57), I have: 3080 successes without patch; +79 successes from the "i < vectorization_factor" part of the patch (on its own) +90 successes from the (i>=vectorization_factor) && "i < group_size" part (on its own) +(79 from first) +(90 from second) + (an additional 62) from both parts together. And for SPEC2006, aarch64-linux-gnu (-O3 -fomit-frame-pointer -mcpu=cortex-a57): 11979 successes without patch; + 499 from the "i < vectorization_factor" part + 264 from the (i >= vectorization factor) && (i < group_size)" part + extra 336 if both parts combined. I haven't done any significant measurements of compile-time yet. (snipping this bit out-of-order) > Hmm. This is of course pretty bad for compile-time for the non-matching > case as that effectively will always split into N pieces if we feed it > garbage (that is, without being sure that at least one pice _will_ vectorize). > > OTOH with the current way of computing "matches" we'd restrict ourselves > to cases where the first stmt we look at (and match to) happens to be > the operation that in the end will form a matching group. ... > Eventually we'd want to improve the "matches" return > to include the affected stmts (ok, that'll be not very easy) so we can do > a cheap "if we split here will it fix that particular mismatch" check. Yes, I think there are a bunch of things we can do here, that would be more powerful than the simple approach I used here. The biggest limiting factor will probably be (lack of) permutation, i.e. if we only SLP stores to consecutive addresses. > So, please split the patch and I suggest to leave the controversical part > for next stage1 together with some improvement on the SLP build process > itself? Here's a reduced version with just the second case, bootstrapped+check-gcc/g++ on x86_64. gcc/ChangeLog: * tree-vect-slp.c (vect_split_slp_store_group): New. (vect_analyze_slp_instance): During basic block SLP, recurse on subgroups if vect_build_slp_tree fails after 1st vector. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic. * gcc.dg/vect/bb-slp-subgroups-1.c: New. * gcc.dg/vect/bb-slp-subgroups-2.c: New. * gcc.dg/vect/bb-slp-subgroups-3.c: New. * gcc.dg/vect/bb-slp-subgroups-4.c: New. --- gcc/testsuite/gcc.dg/vect/bb-slp-7.c | 10 ++-- gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +++ gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +++ gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 ++ gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 ++ gcc/tree-vect-slp.c| 74 +- 6 files changed, 246 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c index ab54a48..b8bef8c 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c @@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y) unsigned int *pout = &out[0]; unsigned int a0, a1, a2, a3; - /* Non isomorphic. */ + /* Non isomorphic, even 64-bit subgroups. */ a0 = *pin++ + 23; - a1 = *pin++ + 142; + a1 = *pin++ * 142; a2 = *pin++ + 2; a3 = *pin++ * 31; - + *pout++ = a0 * x; *pout++ = a1 * y; *pout++ = a2 * x; @@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y) /* Check results. */ if (out[0] != (in[0] +
Re: [PATCH] Minor refactoring in tree-ssanames.c & freelists verifier
Hi, On Mon, 9 Nov 2015, Jeff Law wrote: +verify_ssaname_freelists (struct function *fun) +{ + /* Do nothing if we are in RTL format. */ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) +{ + if (bb->flags & BB_RTL) + return; +} gimple_in_ssa_p (fun); + /* Then note the operands of each statement. */ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + ssa_op_iter iter; + gimple *stmt = gsi_stmt (gsi); + FOR_EACH_SSA_TREE_OPERAND (t, stmt, iter, SSA_OP_ALL_OPERANDS) + if (TREE_CODE (t) == SSA_NAME) + bitmap_set_bit (names_in_il, SSA_NAME_VERSION (t)); + } t will always be an SSA_NAME here. Ciao, Michael.
Re: [Patch] Change to argument promotion in fixed conversion library calls
On Fri, Nov 6, 2015 at 8:14 PM, Bernd Schmidt wrote: > On 11/06/2015 08:04 PM, Steve Ellcey wrote: >> >> When I made this change I had one regression in the GCC testsuite >> (gcc.dg/fixed-point/convert-sat.c). I tracked this down to the >> fact that emit_library_call_value_1 does not do any argument promotion >> because it does not have the original tree type information for library >> calls. It only knows about modes. I can't change >> emit_library_call_value_1 >> to do the promotion because it does not know whether to do a signed or >> unsigned promotion, but expand_fixed_convert could do the conversion >> before calling emit_library_call_value_1 and that is what this patch does. > > > Hmm, difficult. I can see how there would be a problem, but considering how > many calls to emit_library_call_* we have, I'm slightly worried whether this > is really is a good approach. > > On the other hand, there seems to be precedent in this file: > > if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION > (SImode)) > from = convert_to_mode (SImode, from, unsignedp); > >> The 'real' long term fix for this problem is to have tree types for >> builtin >> functions so the proper promotions can always be done but that is a fairly >> large change that I am not willing to tackle right now and it could >> probably >> not be done in time for GCC 6.0 anyway. > > > Yeah, but I agree that this is the real fix. We should aim to get rid of the > emit_library_call functions. Indeed. In the "great plan" of simplifying RTL expansion by moving stuff up to the GIMPLE level this could be done in a lowering stage lowering all operations that we need to do via libcalls to GIMPLE calls. Now, we'd either need proper function declarations for all libcalls of optabs for this or have the optab internal function stuff from Richard also provide the libcall fallback. In the expansion code for the as-libcall path we can then simply use the type of the incoming argument (as we could if emit_library_call_value_1 would in addition to the RTX operands also receive the original tree ones). Richard. >> + if (SCALAR_INT_MODE_P (from_mode)) >> +{ >> + /* If we need to promote the integer function argument we need to >> do > > > Extra space at the start of the comment. > >> + it here instead of inside emit_library_call_value because here >> we >> + know if we should be doing a signed or unsigned promotion. */ >> + >> + machine_mode arg_mode; >> + int unsigned_p = 0; >> + >> + arg_mode = promote_function_mode (NULL_TREE, from_mode, >> + &unsigned_p, NULL_TREE, 0); >> + if (arg_mode != from_mode) >> + { >> + from = convert_to_mode (arg_mode, from, uintp); >> + from_mode = arg_mode; >> + } >> +} > > > Move this into a separate function (prepare_libcall_arg)? I'll think about > it over the weekend and let others chime in if they want, but I think I'll > probably end up approving it with that change. > > > Bernd
Re: Extend tree-call-cdce to calls whose result is used
Michael Matz writes: > On Sat, 7 Nov 2015, Richard Sandiford wrote: >> For -fmath-errno, builtins.c currently expands calls to sqrt to: >> >> y = sqrt_optab (x); >> if (y != y) >> [ sqrt (x); or errno = EDOM; ] >> >> - the call to sqrt is protected by the result of the optab rather >> than the input. It would be better to check !(x >= 0), like >> tree-call-cdce.c does. > > It depends. With fast-math (and hence without NaNs) you can trivially > optimize away a (y != y) test. You can't do so with !(x>=0) at all. -ffast-math would already cause us to treat the function as not setting errno, so the code wouldn't be used. >> - the branch isn't exposed at the gimple level and so gets little >> high-level optimisation. >> >> - we do this for log too, but for log a zero input produces >> -inf rather than a NaN, and sets errno to ERANGE rather than EDOM. >> >> This patch moves the code to tree-call-cdce.c instead, > > This somehow feels wrong. Dead-code elimination doesn't have anything to > do with the transformation you want, it rather is rewriting all feasible > calls into something else, like fold_builtins does. Also cdce currently > doesn't seem to do any checks on the fast-math flags, so I wonder if some > of the conditions that you now also insert for calls whose results are > used stay until final code. Isn't that just a question of what you call the current pass though? Arguably "conditional DCE" was always a confusing name, since the pass only _adds_ code :-) Obviously the point is that it eliminates the external call at runtime in most cases, but so does the transformation I'm adding. I'd be happy to rename the file to something else if that would help. >> Previously the pass was only enabled by default at -O2 or above, but the >> old builtins.c code was enabled at -O. The patch therefore enables the >> pass at -O as well. > > The pass is somewhat expensive in that it removes dominator info and > schedules a full ssa update. The transformation is trivial enough that > dominators and SSA form can be updated on the fly, I think without that > it's not feasible for -O. r229916 fixed that for the non-EH case. I posted a patch to update the vops for the non-EH case as well: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html If this aspect is going to be a blocker, maybe we can reconsider that, since the EH case should be rare. > But as said I think this transformation should better be moved into > builtin folding (or other call folding), at which point also the fast-math > flags can be checked. The infrastructure routines of tree-call-cdce can > be used there of course. If so moved the cdce pass would be subsumed by > that btw. (because the dead call result will be trivially exposed), and > that would be a good thing. Yeah, that was why I added the code to the cdce pass. I don't think it makes sense to have one pass that adds range checks for calls whose lhs is unused and a separate pass for calls whose lhs isn't used. But by "builtin folding", do you mean fold_builtin_n etc.? I thought the point was that we were trying to move away from relying on folding at that level. Or did you mean something else? Thanks, Richard
Re: [PATCH PR52272]Be smart when adding iv candidates
On 11/08/2015 10:11 AM, Richard Biener wrote: On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng" wrote: +inline bool +iv_common_cand_hasher::equal (const iv_common_cand *ccand1, + const iv_common_cand *ccand2) +{ + return ccand1->hash == ccand2->hash +&& operand_equal_p (ccand1->base, ccand2->base, 0) +&& operand_equal_p (ccand1->step, ccand2->step, 0) +&& TYPE_PRECISION (TREE_TYPE (ccand1->base)) + == TYPE_PRECISION (TREE_TYPE (ccand2->base)); Yes. Patch is OK then. Doesn't follow the formatting rules though in the quoted piece. Bernd
Re: Add a combined_fn enum
On 11/09/2015 03:57 PM, Richard Sandiford wrote: I'm not sure what you mean by "fix" though. I don't think we can change any of the constraints above. I don't see why they couldn't be relaxed. What's stopping us from defining some builtins independently of the frontends, and don't all languages except Fortran use the normal builtins.def anyway? Also - if you can have an internal function without a fndecl, why wouldn't it work to allow that for builtins as well? But this is probably moot since I can now see a little more clearly what you want to do. The main goal is to allow functions to be vectorised simply by defining the associated optab. Ok. So these internal functions would support arbitrary modes like V[1248][SDX]F etc., unlike the builtins? That wasn't entirely clear from the patches I looked at, it seemed like you were just duplicating every builtin as an internal function (something to which I would have objected). Bernd
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 9 November 2015 at 11:32, Kyrill Tkachov wrote: > 2015-11-09 Kyrylo Tkachov > > PR target/68129 > * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. > * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): > Delete VOIDmode case. Assert that mode is not VOIDmode. > * config/aarch64/predicates.md (const0_operand): Remove const_double > match. > > 2015-11-09 Kyrylo Tkachov > > PR target/68129 > * gcc.target/aarch64/pr68129_1.c: New test. Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ? Cheers /Marcus
[PATCH series, 16] Use parloops to parallelize oacc kernels regions
Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1 Insert new exit block only when needed in transform_to_exit_first_loop_alt 2 Make create_parallel_loop return void 3 Ignore reduction clause on kernels directive 4 Implement -foffload-alias 5 Add in_oacc_kernels_region in struct loop 6 Add pass_oacc_kernels 7 Add pass_dominator_oacc_kernels 8 Add pass_ch_oacc_kernels 9 Add pass_parallelize_loops_oacc_kernels 10 Add pass_oacc_kernels pass group in passes.def 11 Update testcases after adding kernels pass group 12 Handle acc loop directive 13 Add c-c++-common/goacc/kernels-*.c 14 Add gfortran.dg/goacc/kernels-*.f95 15 Add libgomp.oacc-c-c++-common/kernels-*.c 16 Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. Thanks, - Tom --- 1 Insert new exit block only when needed in transform_to_exit_first_loop_alt 2015-06-30 Tom de Vries * tree-parloops.c (transform_to_exit_first_loop_alt): Insert new exit block only when needed. --- 2 Make create_parallel_loop return void 2015-11-09 Tom de Vries * tree-parloops.c (create_parallel_loop): Return void. --- 3 Ignore reduction clause on kernels directive 2015-11-08 Tom de Vries * c-omp.c (c_oacc_split_loop_clauses): Don't copy OMP_CLAUSE_REDUCTION, classify as loop clause. --- 4 Implement -foffload-alias 2015-11-03 Tom de Vries * common.opt (foffload-alias): New option. * flag-types.h (enum offload_alias): New enum. * omp-low.c (install_var_field): Handle flag_offload_alias. * doc/invoke.texi (@item Code Generation Options): Add -foffload-alias. (@item -foffload-alias): New item. * c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test. * c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test. --- 5 Add in_oacc_kernels_region in struct loop 2015-11-09 Tom de Vries * cfgloop.h (struct loop): Add in_oacc_kernels_region field. * omp-low.c (mark_loops_in_oacc_kernels_region): New function. (expand_omp_target): Call mark_loops_in_oacc_kernels_region. --- 6 Add pass_oacc_kernels 2015-11-09 Tom de Vries * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. (pass_data_oacc_kernels): New pass_data. (class pass_oacc_kernels): New pass. (make_pass_oacc_kernels): New function. --- 7 Add pass_dominator_oacc_kernels 2015-11-09 Tom de Vries * tree-pass.h (make_pass_dominator_oacc_kernels): Declare. * tree-ssa-dom.c (class dominator_base): New class. Factor out of ... (class pass_dominator): ... here. (dominator_base::may_peel_loop_headers_p) (pass_dominator::may_peel_loop_headers_p): New function. (pass_dominator_oacc_kernels): New pass. (make_pass_dominator_oacc_kernels): New function. (dominator_base::execute): Use may_peel_loop_headers_p. --- 8 Add pass_ch_oacc_kernels 2015-11-09 Tom de Vries * tree-pass.h (make_pass_ch_oacc_kernels): Declare. * tree-ssa-loop-ch.c (pass_ch::pass_ch (pass_data, gcc::context)): New constructor. (pass_data_ch_oacc_kernels): New pass_data. (class pass_ch_oacc_kernels): New pass. (pass_ch_oacc_kernels::process_loop_p): New function. (make_pass_ch_oacc_kernels): New function. --- 9 Add pass_parallelize_loops_oacc_kernels 2015-11-09 Tom de Vries * omp-low.c (set_oacc_fn_attrib): Make extern. * omp-low.c (expand_omp_atomic_fetch_op): Release defs of update stmt. * omp-low.h (set_oacc_fn_attrib): Declare. * tree-parloops.c (struct reduction_info): Add reduc_addr field. (create_call_for_reduction_1): Handle case that reduc_addr is non-NULL. (create_parallel_loop, gen_parallel_loop, try_create_reduction_list): Add and handle function parameter oacc_kernels_p. (get_omp_data_i_param): New function. (ref_conflicts_with_region, oacc_entry_exit_ok_1) (oacc_entry_exit_single_gang, oacc_entry_exit_ok): New function. (parallelize_loops): Add and handle function parameter oacc_kernels_p. Calculate dominance info. Skip loops that are not in a kernels region in oacc_kernels_p mode. Skip inner loops of parallelized loops. (
Re: [PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location
On 11/04/2015 05:17 PM, Andreas Arnez wrote: > Since r223098 ("Implement -Wmisleading-indentation") the backward-jump > generated for a C while- or for-loop can get the wrong line number. > This is because the check for misleading indentation peeks ahead one > token, advancing input_location to after the loop, and then > c_finish_loop() creates the back-jump and calls add_stmt(), which > assigns input_location to the statement by default. > > This patch swaps the check for misleading indentation with the finishing > of the loop, such that input_location still has the right value at the > time of any invocations of add_stmt(). Note that this does not fully > cover all cases where the back-jump gets the wrong location. > > gcc/c/ChangeLog: > > PR debug/67192 > * c-parser.c (c_parser_while_statement): Finish the loop before > parsing ahead for misleading indentation. > (c_parser_for_statement): Likewise. > > gcc/testsuite/ChangeLog: > > PR debug/67192 > * gcc.dg/guality/pr67192.c: New test. Applied. Thanks! -Andreas-
Re: [PATCH v3 2/2] [PR debug/67192] Further fix C loops' back-jump location
On 11/04/2015 05:18 PM, Andreas Arnez wrote: > After parsing an unconditional "while"- or "for"-loop, the C front-end > generates a backward-goto statement and implicitly sets its location to > the current input_location. But in some cases the parser peeks ahead > first, such that input_location already points to the line after the > loop and the generated backward-goto gets the wrong line number. > > One way this can occur is with a loop body consisting of an "if" > statement, because then the parser peeks for an optional "else" before > finishing the loop. > > Another way occurred after r223098 ("Implement > -Wmisleading-indentation"), even with a loop body enclosed in braces. > This was because the check for misleading indentation always peeks ahead > one token as well. > > This patch avoids the use of input_location and sets the location of the > backward-goto to the start of the loop body instead, or, if there is no > loop body, to the start of the loop. > > gcc/c/ChangeLog: > > PR debug/67192 > * c-typeck.c (c_finish_loop): For unconditional loops, set the > location of the backward-goto to the start of the loop body. > > gcc/testsuite/ChangeLog: > > PR debug/67192 > * gcc.dg/guality/pr67192.c (f3, f4): New functions. > (main): Invoke them. Applied. Thanks! -Andreas-
[PATCH, 1/16] Insert new exit block only when needed in transform_to_exit_first_loop_alt
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. In transform_to_exit_first_loop_alt we insert a new exit block in between the new loop header and the old exit block. Currently, we also do this if this is not necessary. This patch figures out when we need to insert a new exit block, and only then inserts it. Thanks, - Tom Insert new exit block only when needed in transform_to_exit_first_loop_alt 2015-06-30 Tom de Vries * tree-parloops.c (transform_to_exit_first_loop_alt): Insert new exit block only when needed. --- gcc/tree-parloops.c | 42 -- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 3d41275..6a49aa9 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1695,10 +1695,15 @@ transform_to_exit_first_loop_alt (struct loop *loop, /* Set the latch arguments of the new phis to ivtmp/sum_b. */ flush_pending_stmts (post_inc_edge); - /* Create a new empty exit block, inbetween the new loop header and the old - exit block. The function separate_decls_in_region needs this block to - insert code that is active on loop exit, but not any other path. */ - basic_block new_exit_block = split_edge (exit); + + basic_block new_exit_block = NULL; + if (!single_pred_p (exit->dest)) +{ + /* Create a new empty exit block, inbetween the new loop header and the + old exit block. The function separate_decls_in_region needs this block + to insert code that is active on loop exit, but not any other path. */ + new_exit_block = split_edge (exit); +} /* Insert and register the reduction exit phis. */ for (gphi_iterator gsi = gsi_start_phis (exit_block); @@ -1706,17 +1711,24 @@ transform_to_exit_first_loop_alt (struct loop *loop, gsi_next (&gsi)) { gphi *phi = gsi.phi (); + gphi *nphi = NULL; tree res_z = PHI_RESULT (phi); + tree res_c; - /* Now that we have a new exit block, duplicate the phi of the old exit - block in the new exit block to preserve loop-closed ssa. */ - edge succ_new_exit_block = single_succ_edge (new_exit_block); - edge pred_new_exit_block = single_pred_edge (new_exit_block); - tree res_y = copy_ssa_name (res_z, phi); - gphi *nphi = create_phi_node (res_y, new_exit_block); - tree res_c = PHI_ARG_DEF_FROM_EDGE (phi, succ_new_exit_block); - add_phi_arg (nphi, res_c, pred_new_exit_block, UNKNOWN_LOCATION); - add_phi_arg (phi, res_y, succ_new_exit_block, UNKNOWN_LOCATION); + if (new_exit_block != NULL) + { + /* Now that we have a new exit block, duplicate the phi of the old + exit block in the new exit block to preserve loop-closed ssa. */ + edge succ_new_exit_block = single_succ_edge (new_exit_block); + edge pred_new_exit_block = single_pred_edge (new_exit_block); + tree res_y = copy_ssa_name (res_z, phi); + nphi = create_phi_node (res_y, new_exit_block); + res_c = PHI_ARG_DEF_FROM_EDGE (phi, succ_new_exit_block); + add_phi_arg (nphi, res_c, pred_new_exit_block, UNKNOWN_LOCATION); + add_phi_arg (phi, res_y, succ_new_exit_block, UNKNOWN_LOCATION); + } + else + res_c = PHI_ARG_DEF_FROM_EDGE (phi, exit); if (virtual_operand_p (res_z)) continue; @@ -1724,7 +1736,9 @@ transform_to_exit_first_loop_alt (struct loop *loop, gimple *reduc_phi = SSA_NAME_DEF_STMT (res_c); struct reduction_info *red = reduction_phi (reduction_list, reduc_phi); if (red != NULL) - red->keep_res = nphi; + red->keep_res = (nphi != NULL + ? nph
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 09/11/15 15:34, Marcus Shawcroft wrote: On 9 November 2015 at 11:32, Kyrill Tkachov wrote: 2015-11-09 Kyrylo Tkachov PR target/68129 * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): Delete VOIDmode case. Assert that mode is not VOIDmode. * config/aarch64/predicates.md (const0_operand): Remove const_double match. 2015-11-09 Kyrylo Tkachov PR target/68129 * gcc.target/aarch64/pr68129_1.c: New test. Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ? Not really, here is the patch with the test in gcc.dg/ if that's preferred. Thanks, Kyrill 2015-11-09 Kyrylo Tkachov PR target/68129 * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): Delete VOIDmode case. Assert that mode is not VOIDmode. * config/aarch64/predicates.md (const0_operand): Remove const_double match. 2015-11-09 Kyrylo Tkachov PR target/68129 * gcc.dg/pr68129_1.c: New test. Cheers /Marcus commit 623ffaa527b17ad01179c30c1d4a9911243f818a Author: Kyrylo Tkachov Date: Wed Oct 28 10:49:44 2015 + [AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ce155dc..927b72a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4403,11 +4403,10 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case CONST_DOUBLE: - /* CONST_DOUBLE can represent a double-width integer. - In this case, the mode of x is VOIDmode. */ - if (GET_MODE (x) == VOIDmode) - ; /* Do Nothing. */ - else if (aarch64_float_const_zero_rtx_p (x)) + /* Since we define TARGET_SUPPORTS_WIDE_INT we shouldn't ever + be getting CONST_DOUBLEs holding integers. */ + gcc_assert (GET_MODE (x) != VOIDmode); + if (aarch64_float_const_zero_rtx_p (x)) { fputc ('0', f); break; diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index b041a1e..0fac0a7 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -863,6 +863,8 @@ extern enum aarch64_code_model aarch64_cmodel; (aarch64_cmodel == AARCH64_CMODEL_TINY \ || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC) +#define TARGET_SUPPORTS_WIDE_INT 1 + /* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register. */ #define AARCH64_VALID_SIMD_DREG_MODE(MODE) \ ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \ diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 1bcbf62..8775460 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -32,7 +32,7 @@ (define_predicate "aarch64_call_insn_operand" ;; Return true if OP a (const_int 0) operand. (define_predicate "const0_operand" - (and (match_code "const_int, const_double") + (and (match_code "const_int") (match_test "op == CONST0_RTX (mode)"))) (define_predicate "aarch64_ccmp_immediate" diff --git a/gcc/testsuite/gcc.dg/pr68129_1.c b/gcc/testsuite/gcc.dg/pr68129_1.c new file mode 100644 index 000..112331e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr68129_1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fno-split-wide-types" } */ + +typedef int V __attribute__ ((vector_size (8 * sizeof (int; + +void +foo (V *p, V *q) +{ + *p = (*p == *q); +}
[PATCH, 2/16] Make create_parallel_loop return void
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. this patch makes create_parallel_loop return void. The result is currently unused. Thanks, - Tom Make create_parallel_loop return void 2015-11-09 Tom de Vries * tree-parloops.c (create_parallel_loop): Return void. --- gcc/tree-parloops.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 6a49aa9..17415a8 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1986,10 +1986,9 @@ transform_to_exit_first_loop (struct loop *loop, /* Create the parallel constructs for LOOP as described in gen_parallel_loop. LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL. NEW_DATA is the variable that should be initialized from the argument - of LOOP_FN. N_THREADS is the requested number of threads. Returns the - basic block containing GIMPLE_OMP_PARALLEL tree. */ + of LOOP_FN. N_THREADS is the requested number of threads. */ -static basic_block +static void create_parallel_loop (struct loop *loop, tree loop_fn, tree data, tree new_data, unsigned n_threads, location_t loc) { @@ -2162,8 +2161,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, /* After the above dom info is hosed. Re-compute it. */ free_dominance_info (CDI_DOMINATORS); calculate_dominance_info (CDI_DOMINATORS); - - return paral_bb; } /* Generates code to execute the iterations of LOOP in N_THREADS -- 1.9.1
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 9 November 2015 at 15:45, Kyrill Tkachov wrote: > > On 09/11/15 15:34, Marcus Shawcroft wrote: >> >> On 9 November 2015 at 11:32, Kyrill Tkachov >> wrote: >> >>> 2015-11-09 Kyrylo Tkachov >>> >>> PR target/68129 >>> * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. >>> * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): >>> Delete VOIDmode case. Assert that mode is not VOIDmode. >>> * config/aarch64/predicates.md (const0_operand): Remove const_double >>> match. >>> >>> 2015-11-09 Kyrylo Tkachov >>> >>> PR target/68129 >>> * gcc.target/aarch64/pr68129_1.c: New test. >> >> Hi, This test isn't aarch64 specific, does it need to be in >> gcc.target/aarch64 ? > > > Not really, here is the patch with the test in gcc.dg/ if that's preferred. OK /Marcus
Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)
Hi, On Sun, Nov 08, 2015 at 07:36:16PM -0500, Michael Meissner wrote: > [gcc/testsuite] > * lib/target-supports.exp (check_p9vector_hw_available): Add > checks for power9 availability. > (check_effective_target_powerpc_p9vector_ok): Likewise. It's probably better not to use this for modulo; it is confusing and if you'll later need to untangle it it is much more work. > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ Lose this line? If Darwin cannot support modulo, the next line will catch that. +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > "-mcpu=power9" } } */ > +/* { dg-options "-mcpu=power9 -O3" } */ Is -O3 needed? Why won't -O2 work? > +proc check_p9vector_hw_available { } { > +return [check_cached_effective_target p9vector_hw_available { > + # Some simulators are known to not support VSX/power8 instructions. > + # For now, disable on Darwin > + if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || > [istarget *-*-darwin*]} { Long line. > Index: gcc/config/rs6000/rs6000.md > === > --- gcc/config/rs6000/rs6000.md (revision 229972) > +++ gcc/config/rs6000/rs6000.md (working copy) > @@ -2885,9 +2885,9 @@ (define_insn_and_split "*div3_sra_ > (set_attr "cell_micro" "not")]) > > (define_expand "mod3" > - [(use (match_operand:GPR 0 "gpc_reg_operand" "")) > - (use (match_operand:GPR 1 "gpc_reg_operand" "")) > - (use (match_operand:GPR 2 "reg_or_cint_operand" ""))] > + [(set (match_operand:GPR 0 "gpc_reg_operand" "") > + (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "") > + (match_operand:GPR 2 "reg_or_cint_operand" "")))] You could delete the empty constraint strings while you're at it. > +;; On machines with modulo support, do a combined div/mod the old fashioned > +;; method, since the multiply/subtract is faster than doing the mod > instruction > +;; after a divide. You can instead have a "divmod" insn that is split to either of div, mod, or div+mul+sub depending on which of the outputs is unused. Peepholes do not get all cases. This can be a later improvement of course. Segher
Re: [PATCH][AArch64][cleanup] Remove uses of CONST_DOUBLE_HIGH, CONST_DOUBLE_LOW
On 9 November 2015 at 11:34, Kyrill Tkachov wrote: > * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): > Remove integer CONST_DOUBLE handling. It should never occur. OK /Marcus
[PATCH, 3/16] Ignore reduction clause on kernels directive
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. As discussed here ( https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00785.html ), the kernels directive does not allow the reduction clause. This patch fixes that. Thanks, - Tom Ignore reduction clause on kernels directive 2015-11-08 Tom de Vries * c-omp.c (c_oacc_split_loop_clauses): Don't copy OMP_CLAUSE_REDUCTION, classify as loop clause. --- gcc/c-family/c-omp.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index 3e93b59..a3b99b2 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -867,7 +867,7 @@ c_omp_check_loop_iv_exprs (location_t stmt_loc, tree declv, tree decl, tree c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses) { - tree next, loop_clauses, t; + tree next, loop_clauses; loop_clauses = *not_loop_clauses = NULL_TREE; for (; clauses ; clauses = next) @@ -886,16 +886,11 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses) case OMP_CLAUSE_SEQ: case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_PRIVATE: + case OMP_CLAUSE_REDUCTION: OMP_CLAUSE_CHAIN (clauses) = loop_clauses; loop_clauses = clauses; break; - /* Reductions belong in both constructs. */ - case OMP_CLAUSE_REDUCTION: - t = copy_node (clauses); - OMP_CLAUSE_CHAIN (t) = loop_clauses; - loop_clauses = t; - /* Parallel/kernels clauses. */ default: OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses; -- 1.9.1
Re: Add a combined_fn enum
Bernd Schmidt writes: > On 11/09/2015 03:57 PM, Richard Sandiford wrote: > >> I'm not sure what you mean by "fix" though. I don't think we can change >> any of the constraints above. > > I don't see why they couldn't be relaxed. What's stopping us from > defining some builtins independently of the frontends, and don't all > languages except Fortran use the normal builtins.def anyway? I think the problem isn't so much with the functions that we define ourselves as those that simply mirror a standard library function. We can't assume that all the libm functions exist. It's also valid for the implementation to add target-specific attributes on top of what the standard says (e.g. "nomips16" on MIPS). So if a call starts out as a call to sinf and we treat it as a call to __builtin_sinf for optimisation purposes, there is always conceptually an underlying "physical" sinf function. We also can't convert calls to __builtin_sin to calls to __builtin_sinf without proof that a sinf definition exists (not guaranteed for C90). > Also - if you can have an internal function without a fndecl, why > wouldn't it work to allow that for builtins as well? But this is > probably moot since I can now see a little more clearly what you want > to do. I think that's the defining characteristic of builtins vs. internal functions though. built-in functions are directly callable by source code and have a defined function signature. Internal functions are not directly callable by source code and have a flexible function signature. There are cases where this difference doesn't matter, which is why the combined_fn enum seemed like a good idea. But I don't think we can do away with the distinction entirely. >> The main goal is to allow functions to be vectorised simply by defining >> the associated optab. > > Ok. So these internal functions would support arbitrary modes like > V[1248][SDX]F etc., unlike the builtins? Yeah. They also never set errno. > That wasn't entirely clear from the patches I looked at, it seemed > like you were just duplicating every builtin as an internal function > (something to which I would have objected). Yeah, sorry about that. I'll post the patches shortly. Thanks, Richard
Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
On Sun, Nov 08, 2015 at 07:37:53PM -0500, Michael Meissner wrote: > This patch adds support for scalar count trailing zeros instruction that is > being added to ISA 3.0 (power9). I bet you should change CTZ_DEFINED_VALUE_AT_ZERO as well. > +(define_insn "ctz2_hw" > + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > + (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))] > + "TARGET_CTZ" > + "cnttz %0,%1" > + [(set_attr "type" "cntlz")]) We should probably rename this attr value now. "cntz" maybe? Could be later of course. Segher
Re: [OpenACC] declare directive
Jakub, This is an update of the patch from: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00893.html. There was an unused variable in c/c-parser.c that was removed. I've included the ChangeLog as a convenience, but nothing was changed in the file. Thanks! Jim 2015-XX-XX James Norris Joseph Myers gcc/c-family/ * c-pragma.c (oacc_pragmas): Add entry for declare directive. * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE. (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and PRAGMA_OACC_CLAUSE_LINK. gcc/c/ * c-parser.c (c_parser_pragma): Handle PRAGMA_OACC_DECLARE. (c_parser_omp_clause_name): Handle 'device_resident' clause. (c_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and PRAGMA_OMP_CLAUSE_LINK. (c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and PRAGMA_OACC_CLAUSE_LINK. (OACC_DECLARE_CLAUSE_MASK): New definition. (c_parser_oacc_declare): New function. gcc/cp/ * parser.c (cp_parser_omp_clause_name): Handle 'device_resident' clause. (cp_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and PRAGMA_OMP_CLAUSE_LINK. (cp_paser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and PRAGMA_OMP_CLAUSE_LINK. (OACC_DECLARE_CLAUSE_MASK): New definition. (cp_parser_oacc_declare): New function. (cp_parser_pragma): Handle PRAGMA_OACC_DECLARE. * pt.c (tsubst_expr): Handle OACC_DECLARE. gcc/ * gimple-pretty-print.c (dump_gimple_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE. * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DECLARE. (is_gomple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_DECLARE. * gimplify.c (gimplify_bind_expr): Prepend 'exit' stmt to cleanup. * omp-builtins.def (BUILT_IN_GOACC_DECLARE): New builtin. * omp-low.c (expand_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE and BUILTIN_GOACC_DECLARE. (build_omp_regions_1): Handlde GF_OMP_TARGET_KIND_OACC_DECLARE. (lower_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE, GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK. (make_gimple_omp_edges): Handle GF_OMP_TARGET_KIND_OACC_DECLARE. gcc/testsuite * c-c++-common/goacc/declare-1.c: New test. * c-c++-common/goacc/declare-2.c: Likewise. include/ * gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK. libgomp/ * libgomp.map (GOACC_2.0.1): Export GOACC_declare. * oacc-parallel.c (GOACC_declare): New function. * testsuite/libgomp.oacc-c-c++-common/declare-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/declare-5.c: Likewise. diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c index ac11838..cd0cc27 100644 --- a/gcc/c-family/c-pragma.c +++ b/gcc/c-family/c-pragma.c @@ -1207,6 +1207,7 @@ static const struct omp_pragma_def oacc_pragmas[] = { { "atomic", PRAGMA_OACC_ATOMIC }, { "cache", PRAGMA_OACC_CACHE }, { "data", PRAGMA_OACC_DATA }, + { "declare", PRAGMA_OACC_DECLARE }, { "enter", PRAGMA_OACC_ENTER_DATA }, { "exit", PRAGMA_OACC_EXIT_DATA }, { "kernels", PRAGMA_OACC_KERNELS }, diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index 953c4e3..c6a2981 100644 --- a/gcc/c-family/c-pragma.h +++ b/gcc/c-family/c-pragma.h @@ -30,6 +30,7 @@ enum pragma_kind { PRAGMA_OACC_ATOMIC, PRAGMA_OACC_CACHE, PRAGMA_OACC_DATA, + PRAGMA_OACC_DECLARE, PRAGMA_OACC_ENTER_DATA, PRAGMA_OACC_EXIT_DATA, PRAGMA_OACC_KERNELS, @@ -151,6 +152,7 @@ enum pragma_omp_clause { PRAGMA_OACC_CLAUSE_CREATE, PRAGMA_OACC_CLAUSE_DELETE, PRAGMA_OACC_CLAUSE_DEVICEPTR, + PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT, PRAGMA_OACC_CLAUSE_GANG, PRAGMA_OACC_CLAUSE_HOST, PRAGMA_OACC_CLAUSE_INDEPENDENT, @@ -175,7 +177,8 @@ enum pragma_omp_clause { PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE, PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF, PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE, - PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION + PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION, + PRAGMA_OACC_CLAUSE_LINK = PRAGMA_OMP_CLAUSE_LINK }; extern struct cpp_reader* parse_in; diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 23d0107..8edf745 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1231,6 +1231,7 @@ static vec *c_parser_expr_list (c_parser *, bool, bool, vec **, location_t *, tree *, vec *, unsigned int * = NULL); +static void c_parser_oacc_declare (c_parser *); static void c_parser_oacc_enter_exit_data (c_parser *, bool); static void c_parser_oacc_update (c_parser *); static void c_parser_omp_construct (c_parser *); @@ -9697,6 +
[PATCH, 4/16] Implement -foffload-alias
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. this patch addresses the problem that once the offloading region has been split off from the original function, alias analysis can no longer use information available in the original function that would allow it to do a more precise analysis for the offloading function. [ At some point we could use fipa-pta for that, as discussed in PR46032, but that's not feasible now. ] The basic idea behind the patch is that for typical usage, the base pointers used in an offloaded region are non-aliasing. The patch works by adding restrict to the types of the fields used to pass data to an offloading region. The patch implements a new option -foffload-alias=. The option -foffload-alias=none instructs the compiler to assume that object references and pointer dereferences in an offload region do not alias. The option -foffload-alias=pointer instructs the compiler to assume that objects references in an offload region do not alias. The option -foffload-alias=all instructs the compiler to make no assumptions about aliasing in offload regions. The default value is -foffload-alias=none. Thanks, - Tom Implement -foffload-alias 2015-11-03 Tom de Vries * common.opt (foffload-alias): New option. * flag-types.h (enum offload_alias): New enum. * omp-low.c (install_var_field): Handle flag_offload_alias. * doc/invoke.texi (@item Code Generation Options): Add -foffload-alias. (@item -foffload-alias): New item. * c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test. * c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test. --- gcc/common.opt | 16 ++ gcc/doc/invoke.texi| 11 gcc/flag-types.h | 7 +++ gcc/omp-low.c | 28 +- .../goacc/kernels-loop-offload-alias-none.c| 61 ++ .../goacc/kernels-loop-offload-alias-ptr.c | 44 6 files changed, 165 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c diff --git a/gcc/common.opt b/gcc/common.opt index 961a1b6..7135b1a 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1735,6 +1735,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32) EnumValue Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) +foffload-alias= +Common Joined RejectNegative Enum(offload_alias) Var(flag_offload_alias) Init(OFFLOAD_ALIAS_NONE) +-foffload-alias=[all|pointer|none] Assume non-aliasing in an offload region + +Enum +Name(offload_alias) Type(enum offload_alias) UnknownError(unknown offload aliasing %qs) + +EnumValue +Enum(offload_alias) String(all) Value(OFFLOAD_ALIAS_ALL) + +EnumValue +Enum(offload_alias) String(pointer) Value(OFFLOAD_ALIAS_POINTER) + +EnumValue +Enum(offload_alias) String(none) Value(OFFLOAD_ALIAS_NONE) + fomit-frame-pointer Common Report Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2e5953b..6928efd 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1143,6 +1143,7 @@ See S/390 and zSeries Options. -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol -fno-common -fno-ident @gol +-foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]
Re: [PATCH], Add power9 support to GCC, patch #1 (revised)
On Sun, Nov 8, 2015 at 4:33 PM, Michael Meissner wrote: > This is patch #1 that I revised. I changed -mfusion-toc to -mtoc-fusion. I > changed the references to ISA 2.08 to 3.0. I added two new debug switches for > code in future patches that in undergoing development and is not ready to be > on > by default. > > I have done a bootstrap build on a little endian power8 system and there were > no regressions in this patch. Is it ok to install in the trunk? > > 2015-11-08 Michael Meissner > > * config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for > ISA 3.0 (power9). > (-mpower9-vector): Likewise. > (-mpower9-dform): Likewise. > (-mpower9-minmax): Likewise. > (-mtoc-fusion): Likewise. > (-mmodulo): Likewise. > (-mfloat128-hardware): Likewise. > > * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option > mask for ISA 3.0 (power9). > (POWERPC_MASKS): Add new ISA 3.0 switches. > (power9 cpu): Add power9 cpu. > > * config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for > power9. > (ASM_CPU_SPEC): Likewise. > (EXTRA_SPECS): Likewise. > > * config/rs6000/rs6000-opts.h (enum processor_type): Add > PROCESSOR_POWER9. > > * config/rs6000/rs6000.c (power9_cost): Initial cost setup for > power9. > (rs6000_debug_reg_global): Add support for power9 fusion. > (rs6000_setup_reg_addr_masks): Cache mode size. > (rs6000_option_override_internal): Until real power9 tuning is > added, use -mtune=power8 for -mcpu=power9. > (rs6000_setup_reg_addr_masks): Do not allow pre-increment, > pre-decrement, or pre-modify on SFmode/DFmode if we allow the use > of Altivec registers. > (rs6000_option_override_internal): Add support for ISA 3.0 > switches. > (rs6000_loop_align): Add support for power9 cpu. > (rs6000_file_start): Likewise. > (rs6000_adjust_cost): Likewise. > (rs6000_issue_rate): Likewise. > (insn_must_be_first_in_group): Likewise. > (insn_must_be_last_in_group): Likewise. > (force_new_group): Likewise. > (rs6000_register_move_cost): Likewise. > (rs6000_opt_masks): Likewise. > > * config/rs6000/rs6000.md (cpu attribute): Add power9. > * config/rs6000/rs6000-tables.opt: Regenerate. > > * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define > _ARCH_PWR9 if power9 support is available. > > * config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9. > * config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise. > > * configure.ac: Determine if the assembler supports the ISA 3.0 > instructions. > * config.in (HAVE_AS_POWER9): Likewise. > * configure: Regenerate. > > * doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0 > switches. Okay. Thanks, David
Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)
On Sun, Nov 8, 2015 at 4:36 PM, Michael Meissner wrote: > This is patch #2. It adds support for the new modulus instructions that are > being added in ISA 3.0 (power9): > > I have built this patch (along with patches #3 and #4) with a bootstrap build > on a power8 little endian system. There were no regressions in the test > suite. Is this patch ok to install in the trunk once patch #1 has been > installed. > > [gcc] > 2015-11-08 Michael Meissner > > * config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for > modulus instructions if we have hardware support. > > * config/rs6000/rs6000.md (mod3): Add support for ISA 3.0 > modulus instructions. > (umod3): Likewise. > (divmod peephole): Likewise. > (udivmod peephole): Likewise. > > [gcc/testsuite] > 2015-11-08 Michael Meissner > > * lib/target-supports.exp (check_p9vector_hw_available): Add > checks for power9 availability. > (check_effective_target_powerpc_p9vector_ok): Likewise. > (check_vect_support_and_set_flags): Likewise. > > * gcc.target/powerpc/mod-1.c: New test. > * gcc.target/powerpc/mod-2.c: Likewise. This is okay, but let's wait for revised #3 since you tested 2, 3, 4 together. Thanks, David
[PATCH 0/6] Automatically use vector optabs
The main goal of this series is to allow functions to be vectorised simply by defining the associated optab. At the moment you can get a scalar square root instruction by defining an md pattern like sqrtdf2. But if you want to have vectorised sqrt, you need to have a target- specific C-level built-in function for the vector form of sqrt, implement TARGET_BUILTIN_VECTORIZED_FUNCTION, and expand the sqrt in the same way that the target expands other directly-called built-in functions. That seems unnecessarily indirect, especially when in practice those target-specific functions tend to use patterns like sqrtv2df2 anyway. It also means GCC has less idea what the vector function is doing. The series uses the combined_fn enum and internal functions from the patches I posted on Saturday to make the vectoriser pick up vector optabs automatically and represent the result as a vector internal function call. For example, we can convert BUILT_IN_SQRT{F,,L} or a scalar IFN_SQRT into a vector IFN_SQRT if the appropriate sqrtM2 optab is defined. Tested on x86_64-linux-gnu, aarch64-linux-gnu, arm-linux-gnu and powerpc64-linux-gnu. Thanks, Richard
Re: [OpenACC] declare directive
On Mon, Nov 09, 2015 at 10:01:32AM -0600, James Norris wrote: > + if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) Here you only look up "omp declare target", not "omp declare target link". So, what happens if you mix that (once in some copy clause, once in link), or mention twice in link, etc.? Needs testsuite coverage and clear rules. > + DECL_ATTRIBUTES (decl) = > + tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (decl)); Incorrect formatting, = goes already on the following line, no whitespace at end of line, and next line is indented below CL from DECL. > + t = build_omp_clause (OMP_CLAUSE_LOCATION (c) , OMP_CLAUSE_MAP); Wrong formatting, no space before ,. > +if (ret_clauses) > + { > + tree fndecl = current_function_decl; > + tree attrs = lookup_attribute ("oacc declare returns", > +DECL_ATTRIBUTES (fndecl)); Why do you use an attribute for this? I think adding the automatic vars to hash_map during gimplification of the OACC_DECLARE is best. > + tree id = get_identifier ("oacc declare returns"); > + DECL_ATTRIBUTES (fndecl) = > + tree_cons (id, ret_clauses, DECL_ATTRIBUTES (fndecl)); Formatting error. > --- a/gcc/gimplify.c > +++ b/gcc/gimplify.c > @@ -1065,6 +1065,7 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p) >gimple_seq body, cleanup; >gcall *stack_save; >location_t start_locus = 0, end_locus = 0; > + tree ret_clauses = NULL; > >tree temp = voidify_wrapper_expr (bind_expr, NULL); > > @@ -1166,9 +1167,56 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p) > clobber_stmt = gimple_build_assign (t, clobber); > gimple_set_location (clobber_stmt, end_locus); > gimplify_seq_add_stmt (&cleanup, clobber_stmt); > + > + if (flag_openacc) > + { > + tree attrs = lookup_attribute ("oacc declare returns", > +DECL_ATTRIBUTES (current_function_decl)); > + tree clauses, c, c_next = NULL, c_prev = NULL; > + > + if (!attrs) > + break; > + > + clauses = TREE_VALUE (attrs); > + > + for (c = clauses; c; c_prev = c, c = c_next) > + { > + c_next = OMP_CLAUSE_CHAIN (c); > + > + if (t == OMP_CLAUSE_DECL (c)) > + { > + if (ret_clauses) > + OMP_CLAUSE_CHAIN (c) = ret_clauses; > + > + ret_clauses = c; > + > + if (c_prev == NULL) > + clauses = c_next; > + else > + OMP_CLAUSE_CHAIN (c_prev) = c_next; > + } > + } This doesn't really scale. Consider 1 clauses on various oacc declare constructs in a single function, and 100 automatic variables in such a function. So, what I'm suggesting is during gimplification of OACC_DECLARE, if you find a clause on an automatic variable in the current function that you want to unmap afterwards, have a static hash_map *oacc_declare_returns; and you just add into the hash map the VAR_DECL -> the clause you want, then in this spot you check if (oacc_declare_returns) { clause = lookup in hash_map (t); if (clause) { ... } } > + > + if (clauses == NULL) > + { > + DECL_ATTRIBUTES (current_function_decl) = > + remove_attribute ("oacc declare returns", > + DECL_ATTRIBUTES (current_function_decl)); Wrong formatting. Jakub
[PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
In practice all targets that can vectorise sqrt define the appropriate sqrt2 optab. The only case where this isn't immediately obvious is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't be exercised for sqrt. This patch therefore uses the internal function interface instead of going via the target hook. gcc/ * tree-vect-patterns.c: Include internal-fn.h. (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*. diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index bab9a4f..a803e8c 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "dumpfile.h" #include "builtins.h" +#include "internal-fn.h" #include "case-cfn-macros.h" /* Pattern recognition functions */ @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec *stmts, tree *type_in, if (TREE_CODE (exp) == REAL_CST && real_equal (&TREE_REAL_CST (exp), &dconsthalf)) { - tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT); *type_in = get_vectype_for_scalar_type (TREE_TYPE (base)); - if (*type_in) + if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in)) { - gcall *stmt = gimple_build_call (newfn, 1, base); - if (vectorizable_function (stmt, *type_in, *type_in) - != NULL_TREE) - { - var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); - gimple_call_set_lhs (stmt, var); - return stmt; - } + gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base); + var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); + gimple_call_set_lhs (stmt, var); + return stmt; } }