Re: [PATCH] Fix profiledbootstrap ada checking failure (PR debug/79255)
On Thu, Mar 23, 2017 at 05:24:31PM -0400, Jason Merrill wrote: > On Thu, Mar 23, 2017 at 4:44 PM, Jakub Jelinek wrote: > > The following C testcase shows how profiledbootstrap fails with checking > > compiler. We have a (nested) FUNCTION_DECL inside of BLOCK_VARS of an > > inline function, when it gets inlined, it is moved into > > BLOCK_NONLOCALIZED_VARS. And, decls_for_scope calls process_scope_var > > with NULL decl and non-NULL origin for all BLOCK_NONLOCALIZED_VARS. > > That is fine for variables, but for FUNCTION_DECLs it can actually > > try to dwarf2out_abstract_function that FUNCTION_DECL, which should be > > really done only when it is inlined (i.e. BLOCK_ABSTRACT_ORIGIN of > > some BLOCK). > > And when it's cloned. > > But does it make sense for gen_decl_die to call > dwarf2out_abstract_function when decl is null? That seems wrong. Before r144529 we had just: if (DECL_ORIGIN (decl) != decl) dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); and that is indeed to handle clones etc. r144529 changed that to: if (origin || DECL_ORIGIN (decl) != decl) dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); All of the decl is NULL introduced in r144529 implies decl_or_origin is abstract. Removing that origin || wouldn't really work, we'd have to rewrite most of gen_decl_die FUNCTION_DECL handling to use decl_or_origin instead of decl etc. But it doesn't look right to me, conceptually a FUNCTION_DECL in BLOCK_NONLOCALIZED_VARS is nothing that represents a clone or something abstract in itself. We can't keep it in BLOCK_VARS just because those are chained through DECL_CHAIN and a single FUNCTION_DECL can't be put into multiple BLOCK_VARS chains. It is still the same FUNCTION_DECL, not a sign that we want to have in each inline function a separate function declaration with abstract origin of the original FUNCTION_DECL. Jakub
Re: [PATCH] fwprop: Prevent infinite looping (PR79405)
On Thu, Mar 23, 2017 at 4:45 PM, Segher Boessenkool wrote: > On Thu, Mar 23, 2017 at 04:16:56PM +0100, Richard Biener wrote: >> On Thu, Mar 23, 2017 at 3:58 PM, Segher Boessenkool >> wrote: >> > The algorithm fwprop uses never reconsiders a possible propagation, >> > although it could succeed if the def (in the def-use to propagate) >> > has changed. This causes fwprop to do infinite propagations, like >> > in the scenario in the PR, where we end up with effectively >> > B = A >> > A = B >> > D = A >> > where only propagations into the last statement are still tried, and >> > that loops (it becomes D = B, then back to D = A, etc.) >> > >> > Fixing this properly isn't easy; this patch instead limits the number >> > of propagations performed to the number of uses we originally had, >> > which is the maximum number of propagations that can be done if there >> > are no such infinite loops. >> > >> > Bootstrapped and regression checked on powerpc64-linux {-m64,-m32}; >> > is this okay for trunk? >> >> https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01485.html >> >> ? > > What about it? That suggestion would make fwprop do *less* useful work, > while in principle the problem is that it *already* does not enough! Ok, not knowing a lot about fwprop I take your word for it (but GIMPLE level forwprop works that way and certainly you can't substitute a def into a use that is before the def). > If fwprop actually tried all propagations (and never tried substituting > X with X, which it currently does), there would be no problem. To me it looked like fwprop tries to iterate over all propagations but it iterates over a changing data structure which is why it "oscillates". But I didn't actually verify that theory (in fact, I just very much disliked Berns patch give its compile-time cost). > This patch is a workaround; it makes no difference on pretty much any > code, except this single testcase (which has undefined behaviour, > uninitialised variables). Yeah, your fix looks similar to the other hack I suggested. Richard. > > Segher
Re: [PATCH] Fix PR80158
On Thu, Mar 23, 2017 at 7:20 PM, Bill Schmidt wrote: > Hi, > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80158 reports an ICE in > SLSR while building 416.gamess on x86_64. This is a latent (but > previously harmless) bug that was exposed by the fix for PR80054. > When replacing any statement with a strength reduction, SLSR updates > the candidate statement S associated with the associated candidate > record in the candidate table. However, S may actually be associated > with two candidate records when an expression may be treated as > either a CAND_ADD or a CAND_MULT. In this case, the second one can > be reached via the next_interp field. Hmm, the following code in SLSR suggests there may be more than one such alternate candidate: while (arg_cand->kind != CAND_ADD && arg_cand->kind != CAND_PHI) { if (!arg_cand->next_interp) return; arg_cand = lookup_cand (arg_cand->next_interp); } > In the excerpt from 416.gamess, the first candidate record was > updated when its statement was replaced, but the next-interp record > was not. Later, that record was used as a basis for another tree > of strength-reduction candidates, but since it now pointed to an > orphaned statement without a basic block, the ICE occurred when > checking dominance against the statement's block. > > The fix is simply to ensure that the next_interp record is always > updated when present. > > I've bootstrapped and tested this on powerpc64le-unknown-linux-gnu > with no regressions. I tested that the standalone test works on > x86_64 with this fix, but I have been unable to perform an x86_64 > regstrap because the machine I have access to has insufficient > disk space. :/ I would appreciate it if you could do this for me. > > Assuming no problems with x86_64 regstrap, is this ok for trunk? > > Note: I will be unreachable from now until next Tuesday (i.e., > returning the 28th). If needed, please feel free to commit the > patch on my behalf. Otherwise I will attend to it when I return. I'll do a regstrap and gamess build and will commit if that succeeds. The above can be addressed as followup. Thanks, Richard. > Thanks! > Bill > > > [gcc] > > 2017-03-23 Bill Schmidt > > PR tree-optimization/80158 > * gimple-ssa-strength-reduction.c (replace_mult_candidate): When > replacing a candidate statement, also replace it for the > candidate's alternate interpretation. > (replace_rhs_if_not_dup): Likewise. > (replace_one_candidate): Likewise. > > [gcc/testsuite] > > 2017-03-23 Bill Schmidt > > PR tree-optimization/80158 > * gfortran.fortran-torture/compile/pr80158.f: New file. > > > Index: gcc/gimple-ssa-strength-reduction.c > === > --- gcc/gimple-ssa-strength-reduction.c (revision 246420) > +++ gcc/gimple-ssa-strength-reduction.c (working copy) > @@ -2089,6 +2089,8 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ > gimple_set_location (copy_stmt, gimple_location (c->cand_stmt)); > gsi_replace (&gsi, copy_stmt, false); > c->cand_stmt = copy_stmt; > + if (c->next_interp) > + lookup_cand (c->next_interp)->cand_stmt = copy_stmt; > if (dump_file && (dump_flags & TDF_DETAILS)) > stmt_to_print = copy_stmt; > } > @@ -2118,6 +2120,8 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ > basis_name, bump_tree); > update_stmt (gsi_stmt (gsi)); >c->cand_stmt = gsi_stmt (gsi); > + if (c->next_interp) > + lookup_cand (c->next_interp)->cand_stmt = gsi_stmt (gsi); > if (dump_file && (dump_flags & TDF_DETAILS)) > stmt_to_print = gsi_stmt (gsi); > } > @@ -3405,6 +3409,8 @@ replace_rhs_if_not_dup (enum tree_code new_code, t >gimple_assign_set_rhs_with_ops (&gsi, new_code, new_rhs1, new_rhs2); >update_stmt (gsi_stmt (gsi)); >c->cand_stmt = gsi_stmt (gsi); > + if (c->next_interp) > + lookup_cand (c->next_interp)->cand_stmt = gsi_stmt (gsi); > >if (dump_file && (dump_flags & TDF_DETAILS)) > return gsi_stmt (gsi); > @@ -3511,6 +3517,8 @@ replace_one_candidate (slsr_cand_t c, unsigned i, > gimple_assign_set_rhs_with_ops (&gsi, MINUS_EXPR, basis_name, rhs2); > update_stmt (gsi_stmt (gsi)); >c->cand_stmt = gsi_stmt (gsi); > + if (c->next_interp) > + lookup_cand (c->next_interp)->cand_stmt = gsi_stmt (gsi); > > if (dump_file && (dump_flags & TDF_DETAILS)) > stmt_to_print = gsi_stmt (gsi); > @@ -3532,6 +3540,8 @@ replace_one_candidate (slsr_cand_t c, unsigned i, > gimple_set_location (copy_stmt, gimple_location (c->cand_stmt)); > gsi_replace (&gsi, copy_stmt, false); > c->cand_s
Re: [PATCH] Fix profiledbootstrap ada checking failure (PR debug/79255)
On Fri, Mar 24, 2017 at 8:46 AM, Jakub Jelinek wrote: > On Thu, Mar 23, 2017 at 05:24:31PM -0400, Jason Merrill wrote: >> On Thu, Mar 23, 2017 at 4:44 PM, Jakub Jelinek wrote: >> > The following C testcase shows how profiledbootstrap fails with checking >> > compiler. We have a (nested) FUNCTION_DECL inside of BLOCK_VARS of an >> > inline function, when it gets inlined, it is moved into >> > BLOCK_NONLOCALIZED_VARS. And, decls_for_scope calls process_scope_var >> > with NULL decl and non-NULL origin for all BLOCK_NONLOCALIZED_VARS. >> > That is fine for variables, but for FUNCTION_DECLs it can actually >> > try to dwarf2out_abstract_function that FUNCTION_DECL, which should be >> > really done only when it is inlined (i.e. BLOCK_ABSTRACT_ORIGIN of >> > some BLOCK). >> >> And when it's cloned. >> >> But does it make sense for gen_decl_die to call >> dwarf2out_abstract_function when decl is null? That seems wrong. > > Before r144529 we had just: > if (DECL_ORIGIN (decl) != decl) > dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); > and that is indeed to handle clones etc. > > r144529 changed that to: > if (origin || DECL_ORIGIN (decl) != decl) > dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); > All of the decl is NULL introduced in r144529 implies decl_or_origin > is abstract. > > Removing that origin || wouldn't really work, we'd have to rewrite most of > gen_decl_die FUNCTION_DECL handling to use decl_or_origin instead of > decl etc. > > But it doesn't look right to me, conceptually a FUNCTION_DECL in > BLOCK_NONLOCALIZED_VARS is nothing that represents a clone or something > abstract in itself. We can't keep it in BLOCK_VARS just because those > are chained through DECL_CHAIN and a single FUNCTION_DECL can't be > put into multiple BLOCK_VARS chains. It is still the same FUNCTION_DECL, > not a sign that we want to have in each inline function a separate > function declaration with abstract origin of the original FUNCTION_DECL. Yeah, the thing BLOCK_NONLOCALIZED_VARS wants to do is optimize generated dwarf by adding a DW_AT_abstract_origin (just to refer to the subprogram DIE) but it doesn't actually care about refering to an abstract instance (a concrete one would work just fine). So I think in the context of scope vars calling dwarf2out_abstract_function is _always_ wrong. But it would require quite some refactoring to do this in a clean way. Richard. > Jakub
Re: [PATCH] Fix profiledbootstrap ada checking failure (PR debug/79255)
On Fri, Mar 24, 2017 at 09:29:00AM +0100, Richard Biener wrote: > Yeah, the thing BLOCK_NONLOCALIZED_VARS wants to do is optimize generated > dwarf by adding a DW_AT_abstract_origin (just to refer to the > subprogram DIE) but Well, for FUNCTION_DECLs in BLOCK_VARS/BLOCK_NONLOCALIZED_VARS we actually don't emit any further DIE and so there is no DW_AT_abstract_origin. E.g. gen_subprogram_die has: /* Detect and ignore this case, where we are trying to output something we have already output. */ if (get_AT (old_die, DW_AT_low_pc) || get_AT (old_die, DW_AT_ranges)) return; That is why the posted testcase doesn't ICE without -fno-toplevel-reorder, normally the body is emitted earlier and so we don't do anything at all. Otherwise we just want to make sure we have a DIE and, if it is inline/clone, have also DW_AT_inline set, and if the DIE is without parent that we put it into proper place in the DIE tree. And when we actually see the body of the function we fill locations and all other details. Jakub
[PATCH] Fix calls.c for a _complex type (PR ipa/80104).
Hello. Briefly described in the PR, running ICF (without any optimization level) can create a thunk call that does not use an SSA_NAME which is a default def of an argument of the caller: c (complex float b) { complex float arg.1; float retval.0; [100.00%]: arg.1_2 = b; retval.0_4 = a (arg.1_2); [tail call] return retval.0_4; } The "arg" variable creation was introduced by Jason in r207301. As complex type is passed as invisible reference, we need to find address of the argument. Thus I'm suggesting to find it alternatively via SSA_NAME_DEF_STMT. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. And fixes the ICE on s390x cross compiler. Ready to be installed? Martin >From 6624f7b292d174b5c0c1a276c1ed919c9930dca4 Mon Sep 17 00:00:00 2001 From: marxin Date: Thu, 23 Mar 2017 16:03:04 +0100 Subject: [PATCH] Fix calls.c for a _complex type (PR ipa/80104). gcc/ChangeLog: 2017-03-23 Martin Liska PR ipa/80104 * calls.c (initialize_argument_information): Find real argument of a gimple call (of thunk) in order to find address of the argument. gcc/testsuite/ChangeLog: 2017-03-23 Martin Liska * gcc.dg/ipa/pr80104.c: New test. --- gcc/calls.c| 11 +-- gcc/testsuite/gcc.dg/ipa/pr80104.c | 15 +++ 2 files changed, 24 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr80104.c diff --git a/gcc/calls.c b/gcc/calls.c index 61caf4ca752..0ee21d9e951 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1745,8 +1745,15 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, address. */ if (TREE_CODE (args[i].tree_value) == SSA_NAME) { - gcc_assert (SSA_NAME_IS_DEFAULT_DEF (args[i].tree_value)); - args[i].tree_value = SSA_NAME_VAR (args[i].tree_value); + if (SSA_NAME_IS_DEFAULT_DEF (args[i].tree_value)) + args[i].tree_value = SSA_NAME_VAR (args[i].tree_value); + else + { + gimple *g = SSA_NAME_DEF_STMT (args[i].tree_value); + if (gimple_assign_single_p (g)) + args[i].tree_value = gimple_assign_rhs1 (g); + } + gcc_assert (TREE_CODE (args[i].tree_value) == PARM_DECL); } /* Argument setup code may have copied the value to register. We diff --git a/gcc/testsuite/gcc.dg/ipa/pr80104.c b/gcc/testsuite/gcc.dg/ipa/pr80104.c new file mode 100644 index 000..7e75c9907e7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr80104.c @@ -0,0 +1,15 @@ +/* PR ipa/80104 */ +/* { dg-do compile } */ +/* { dg-options "-fipa-icf" } */ + +float +a (_Complex float b) +{ + return *&b; +} + +float +c (_Complex float b) +{ + return (&b)[0]; +} -- 2.12.0
Re: [PATCH 0/3] Introduce internal_error_cont and exclude it from pot files
I would like to ping that. I'm not sure what's agreement after I read discussion in: https://gcc.gnu.org/ml/gcc/2017-03/msg00070.html Martin Sebor may know, CC'ing him. Thanks, Martin
Re: [PATCH] fwprop: Prevent infinite looping (PR79405)
On Fri, Mar 24, 2017 at 09:13:59AM +0100, Richard Biener wrote: > >> https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01485.html > >> > >> ? > > > > What about it? That suggestion would make fwprop do *less* useful work, > > while in principle the problem is that it *already* does not enough! > > Ok, not knowing a lot about fwprop I take your word for it (but GIMPLE level > forwprop works that way and certainly you can't substitute a def into a use > that is before the def). When fwprop has done a propagation it makes fresh new uses for all the sources of the resulting insn, which it adds at the end of the table. It doesn't reuse the original uses. > > If fwprop actually tried all propagations (and never tried substituting > > X with X, which it currently does), there would be no problem. > > To me it looked like fwprop tries to iterate over all propagations but > it iterates over a changing data structure which is why it "oscillates". > But I didn't actually verify that theory (in fact, I just very much disliked > Berns patch give its compile-time cost). We have, in order, in one bb: B = A|Z A = B D = A where each of those is the only set of its dest. Now the first thing tried is propagating the first into the second. This fails. Then, Z is found to be zero, so we get B = A A = B D = A If the first was now propagated into the second again, all is fine (all three vars are set to A). But this is not tried again. The second is propagated into the third, giving B = A A = B D = B and then the first is propagated into the third, and we repeat forever. > > This patch is a workaround; it makes no difference on pretty much any > > code, except this single testcase (which has undefined behaviour, > > uninitialised variables). > > Yeah, your fix looks similar to the other hack I suggested. I have implemented the "retry things" as well fwiw, but a) it is too big and invasive for stage 4, and b) it kind of sucks, needs more work, even more invasive. The workaround is cheap and solves the immediate problem. Segher
[PATCH][ARM] PR target/79871: Clean up lane/constant bounds checking errors for translation
Hi all, This PR complains that the bounds checking error strings contain a string placeholder for "constant" or "lane" which makes it hard for translators (who may want to move words around in a different language for syntactical reasons). This patch cleans that up. The bunching up of common functionality between neon_lane_bounds and arm_const_bounds was a bit dubious in any case as arm_const_bounds never passed down a non-NULL tree anyway, so one of the paths of bonds_check was always used in only the neon_lane_bounds case anyway. I also add some function comments and use IN_RANGE for range checking. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? (I believe such translation improvements fall under the documentation rule at this stage). Thanks, Kyrill 2017-03-24 Kyrylo Tkachov PR target/79871 * config/arm/arm.c (bounds_check): Delete. (neon_lane_bounds): Adjust. Make sure error string template doesn't use string placeholder. (arm_const_bounds): Likewise. commit 102b86a782297c725c4796c4dd36d33fdf024ee7 Author: Kyrylo Tkachov Date: Thu Mar 23 10:50:32 2017 + PR target/79871: Clean up lane/constant bounds checking errors for translation diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7b2d3d5..98059da 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12186,13 +12186,16 @@ neon_expand_vector_init (rtx target, rtx vals) emit_move_insn (target, mem); } -/* Ensure OPERAND lies between LOW (inclusive) and HIGH (exclusive). Raise - ERR if it doesn't. EXP indicates the source location, which includes the - inlining history for intrinsics. */ +/* Bounds-check lanes. + Ensure OPERAND lies between LOW (inclusive) and HIGH (exclusive). Emit an + an error if it doesn't. EXP indicates the source location, which includes + the inlining history for intrinsics. + We don't unify this and arm_const_bounds because the error string needs to + explicity contain "constant" or "lane" for translation purposes. */ -static void -bounds_check (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high, - const_tree exp, const char *desc) +void +neon_lane_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high, + const_tree exp) { HOST_WIDE_INT lane; @@ -12200,31 +12203,34 @@ bounds_check (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high, lane = INTVAL (operand); - if (lane < low || lane >= high) + if (!IN_RANGE (lane, low, high - 1)) { if (exp) - error ("%K%s %wd out of range %wd - %wd", - exp, desc, lane, low, high - 1); + error ("%Klane %wd out of range %wd - %wd", + exp, lane, low, high - 1); else - error ("%s %wd out of range %wd - %wd", desc, lane, low, high - 1); + error ("lane %wd out of range %wd - %wd", + lane, low, high - 1); } } -/* Bounds-check lanes. */ +/* Bounds-check constants. + Ensure OPERAND lies between LOW (inclusive) and HIGH (exclusive). Emit an + an error if it doesn't. See neon_lane_bounds comment for + translation comment. */ void -neon_lane_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high, - const_tree exp) +arm_const_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high) { - bounds_check (operand, low, high, exp, "lane"); -} + HOST_WIDE_INT constant; -/* Bounds-check constants. */ + gcc_assert (CONST_INT_P (operand)); -void -arm_const_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high) -{ - bounds_check (operand, low, high, NULL_TREE, "constant"); + constant = INTVAL (operand); + + if (!IN_RANGE (constant, low, high - 1)) +error ("constant %wd out of range %wd - %wd", + constant, low, high - 1); } HOST_WIDE_INT
Re: [PATCH] Fix profiledbootstrap ada checking failure (PR debug/79255)
On Fri, Mar 24, 2017 at 9:43 AM, Jakub Jelinek wrote: > On Fri, Mar 24, 2017 at 09:29:00AM +0100, Richard Biener wrote: >> Yeah, the thing BLOCK_NONLOCALIZED_VARS wants to do is optimize generated >> dwarf by adding a DW_AT_abstract_origin (just to refer to the >> subprogram DIE) but > > Well, for FUNCTION_DECLs in BLOCK_VARS/BLOCK_NONLOCALIZED_VARS we actually > don't > emit any further DIE and so there is no DW_AT_abstract_origin. > E.g. gen_subprogram_die has: > /* Detect and ignore this case, where we are trying to output > something we have already output. */ > if (get_AT (old_die, DW_AT_low_pc) > || get_AT (old_die, DW_AT_ranges)) > return; Hmm, but we do want to put the function in scope? THus void foo () {} void bar () { int foo; { void foo(); foo(); } } should have a DIE for foo in bar (possibly refering to the concrete instance for optimization). Richard. > That is why the posted testcase doesn't ICE without -fno-toplevel-reorder, > normally the body is emitted earlier and so we don't do anything at all. > Otherwise we just want to make sure we have a DIE and, if it is > inline/clone, have also DW_AT_inline set, and if the DIE is without parent > that we put it into proper place in the DIE tree. And when we actually > see the body of the function we fill locations and all other details. > > Jakub
Re: [PATCH] Fix calls.c for a _complex type (PR ipa/80104).
On Fri, Mar 24, 2017 at 10:25 AM, Martin Liška wrote: > Hello. > > Briefly described in the PR, running ICF (without any optimization level) can > create a thunk call > that does not use an SSA_NAME which is a default def of an argument of the > caller: > > c (complex float b) > { > complex float arg.1; > float retval.0; > >[100.00%]: > arg.1_2 = b; > retval.0_4 = a (arg.1_2); [tail call] > > return retval.0_4; > > } > > The "arg" variable creation was introduced by Jason in r207301. > As complex type is passed as invisible reference, we need to find address of > the argument. > Thus I'm suggesting to find it alternatively via SSA_NAME_DEF_STMT. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > And fixes the ICE > on s390x cross compiler. > > Ready to be installed? Ick, that looks like fragile code ... looks like the call_from_thunk_p case expects the default def or the param-decl. So why not create it that way? Or simply not do this "optimization"? Richard. > Martin
Re: [PATCH] fwprop: Prevent infinite looping (PR79405)
On Fri, Mar 24, 2017 at 10:39 AM, Segher Boessenkool wrote: > On Fri, Mar 24, 2017 at 09:13:59AM +0100, Richard Biener wrote: >> >> https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01485.html >> >> >> >> ? >> > >> > What about it? That suggestion would make fwprop do *less* useful work, >> > while in principle the problem is that it *already* does not enough! >> >> Ok, not knowing a lot about fwprop I take your word for it (but GIMPLE level >> forwprop works that way and certainly you can't substitute a def into a use >> that is before the def). > > When fwprop has done a propagation it makes fresh new uses for all the > sources of the resulting insn, which it adds at the end of the table. > It doesn't reuse the original uses. Yes, that's what I suspected. >> > If fwprop actually tried all propagations (and never tried substituting >> > X with X, which it currently does), there would be no problem. >> >> To me it looked like fwprop tries to iterate over all propagations but >> it iterates over a changing data structure which is why it "oscillates". >> But I didn't actually verify that theory (in fact, I just very much disliked >> Berns patch give its compile-time cost). > > We have, in order, in one bb: > > B = A|Z > A = B > D = A > > where each of those is the only set of its dest. Now the first thing > tried is propagating the first into the second. This fails. Then, > Z is found to be zero, so we get > > B = A > A = B > D = A > > If the first was now propagated into the second again, all is fine > (all three vars are set to A). But this is not tried again. The > second is propagated into the third, giving > > B = A > A = B > D = B > > and then the first is propagated into the third, and we repeat > forever. > >> > This patch is a workaround; it makes no difference on pretty much any >> > code, except this single testcase (which has undefined behaviour, >> > uninitialised variables). >> >> Yeah, your fix looks similar to the other hack I suggested. > > I have implemented the "retry things" as well fwiw, but a) it is too > big and invasive for stage 4, and b) it kind of sucks, needs more > work, even more invasive. The workaround is cheap and solves the > immediate problem. Agreed, still iterating over the DF uses in the first place looks like the bug (given this "all uses" data structure changes during propagation!). I'd have done for (BBs in RPO order) for (insn in BB) repeat: for (use in insn) if (propagate_into (use)) goto repeat; Richard. > > Segher
[PATCH] Fix PR80167
Seems we have simply forgotten about the case of replacing with a default-def. Bootstrapped / tested on x86_64-unknown-linux-gnu, applied. Richard. 2017-03-24 Richard Biener PR tree-optimization/80167 * graphite-isl-ast-to-gimple.c (translate_isl_ast_to_gimple::is_valid_rename): Handle default-defs properly. (translate_isl_ast_to_gimple::get_rename): Likewise. * gcc.dg/graphite/pr80167.c: New testcase. Index: gcc/graphite-isl-ast-to-gimple.c === *** gcc/graphite-isl-ast-to-gimple.c(revision 246437) --- gcc/graphite-isl-ast-to-gimple.c(working copy) *** bool translate_isl_ast_to_gimple:: *** 1123,1128 --- 1123,1131 is_valid_rename (tree rename, basic_block def_bb, basic_block use_bb, phi_node_kind phi_kind, tree old_name, basic_block old_bb) const { + if (SSA_NAME_IS_DEFAULT_DEF (rename)) + return true; + /* The def of the rename must either dominate the uses or come from a back-edge. Also the def must respect the loop closed ssa form. */ if (!is_loop_closed_ssa_use (use_bb, rename)) *** get_rename (basic_block new_bb, tree old *** 1178,1183 --- 1181,1187 basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (rename)); if (is_valid_rename (rename, bb, new_bb, phi_kind, old_name, old_bb) && (phi_kind == close_phi + || ! bb || flow_bb_inside_loop_p (bb->loop_father, new_bb))) return rename; return NULL_TREE; Index: gcc/testsuite/gcc.dg/graphite/pr80167.c === *** gcc/testsuite/gcc.dg/graphite/pr80167.c (nonexistent) --- gcc/testsuite/gcc.dg/graphite/pr80167.c (working copy) *** *** 0 --- 1,24 + /* { dg-do compile } */ + /* { dg-options "-O2 -floop-nest-optimize" } */ + + typedef struct + { + short a; + short b; + short c; + } d; + extern d e[]; + int f[8]; + void + g (d *i) + { + int h = 0; + for (; h < 28; h++) + e[h].a = e[h].b = i[h].a; + h = 0; + for (; h < 8; h++) + f[h] = i[h].b + i[h].c; + h = 0; + for (; h < 8; h++) + f[h] = i[h].b; + }
Re: [PATCH] Fix doloop ICE (PR rtl-optimization/80112)
On 03/21/2017 09:00 AM, Jakub Jelinek wrote: Hi! doloop_condition_get computes cmp in several places, and in one of them wants to fail if the condition inside of it isn't NE against const0_rtx. The problem with that is that nothing checked what cmp is yet, /* Check for (set (pc) (if_then_else (condition) (label_ref (label)) (pc))). */ if (GET_CODE (cmp) != SET || SET_DEST (cmp) != pc_rtx || GET_CODE (SET_SRC (cmp)) != IF_THEN_ELSE || GET_CODE (XEXP (SET_SRC (cmp), 1)) != LABEL_REF || XEXP (SET_SRC (cmp), 2) != pc_rtx) return 0; is checked only a couple of lines later. To fix this, either we can use the following patch which guards the early check with the necessary subset of the later check, or we could for if (GET_CODE (cmp) != SET || GET_CODE (SET_SRC (cmp)) != IF_THEN_ELSE) return 0; before cond = XEXP (SET_SRC (cmp), 0);, or set some bool flag and only verify this requirement if the bool flag is true after the later check. Any preferences? The following has been successfully bootstrapped/regtested on powerpc64le-linux. 2017-03-21 Jakub Jelinek PR rtl-optimization/80112 * loop-doloop.c (doloop_condition_get): Don't check condition if cmp isn't SET with IF_THEN_ELSE src. * gcc.dg/pr80112.c: New test. This code is a bit of a mess. I could argue for at least 3 approaches to fix the bug, none of which I particularly like. Yours isn't any worse than the other approaches, so let's just go with it. OK. jeff
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On 03/21/2017 02:21 PM, Jakub Jelinek wrote: On Tue, Mar 21, 2017 at 06:53:34PM +0100, Jakub Jelinek wrote: On Tue, Mar 21, 2017 at 08:41:43AM +0100, Jakub Jelinek wrote: On Tue, Mar 21, 2017 at 08:38:20AM +1000, Richard Henderson wrote: On 03/21/2017 07:15 AM, Jakub Jelinek wrote: Not really sure what we should do if both i1 and i2 are frame related, shall we check for each of the CFA reg notes if they are available and equal? Or punt if either of the insns is frame related? I would punt if either is frame related. Ok, I'll test then the following patch and gather some statistic on how often we trigger this. The statistics I've gathered unfortunately shows that at least on powerpc64le-linux it is very important to not give up if both i1 and i2 are frame related and have rtx_equal_p notes. I've set on unpatched old_insns_match_p flags when returning non-dir_none and checked those flags in the various callers of these when about to successfully perform cross-jumping, head-merging etc. With /f vs. non-/f, the only 3 hits were on the new pr80102.C testcase during powerpc64le-linux bootstrap/regtest, but /f vs. /f there were 167601 hits. Here is updated patch which allows cross-jumping of RTX_FRAME_RELATED_P insns, as long as they have the same CFA related notes on them (same order if more than one). Bootstrapped/regtested on powerpc64le-linux, ok for trunk? 2017-03-21 Jakub Jelinek PR target/80102 * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between /f and non-/f instructions. If both i1 and i2 are frame related, verify all CFA notes, their order and content. * g++.dg/opt/pr80102.C: New test. Presumably this didn't ICE at some point in the past, so it's a regression? (it's not marked as such in the BZ). --- gcc/cfgcleanup.c.jj 2017-03-21 07:56:55.711233924 +0100 +++ gcc/cfgcleanup.c2017-03-21 19:20:40.517746664 +0100 @@ -1149,6 +1149,11 @@ old_insns_match_p (int mode ATTRIBUTE_UN else if (p1 || p2) return dir_none; + /* Do not allow cross-jumping between frame related insns and other + insns. */ + if (RTX_FRAME_RELATED_P (i1) != RTX_FRAME_RELATED_P (i2)) +return dir_none; + p1 = PATTERN (i1); p2 = PATTERN (i2); @@ -1207,6 +1212,58 @@ old_insns_match_p (int mode ATTRIBUTE_UN } } + /* If both i1 and i2 are frame related, verify all the CFA notes + in the same order and with the same content. */ + if (RTX_FRAME_RELATED_P (i1)) +{ + static enum reg_note cfa_note_kinds[] = { + REG_FRAME_RELATED_EXPR, REG_CFA_DEF_CFA, REG_CFA_ADJUST_CFA, + REG_CFA_OFFSET, REG_CFA_REGISTER, REG_CFA_EXPRESSION, + REG_CFA_VAL_EXPRESSION, REG_CFA_RESTORE, REG_CFA_SET_VDRAP, + REG_CFA_TOGGLE_RA_MANGLE, REG_CFA_WINDOW_SAVE, REG_CFA_FLUSH_QUEUE + }; ISTM this could get out of date very easily. Is there a clean way to generate the array of cfa notes as we build up the notes from reg-notes.def? Jeff
Re: [testsuite] Add missing dg-require-effective-target alloca to gcc testsuite
Hi Tom, > On 23/03/17 18:25, Mike Stump wrote: >> On Mar 23, 2017, at 8:46 AM, Tom de Vries wrote: >>> >>> I've run the gcc testsuite for target nvptx-none and ran into "test for >>> excess errors" FAILs due to: >>> ... >>> sorry, unimplemented: target cannot support alloca. >> >> We'd encourage ports to support alloca. :-) >> >>> OK for trunk for stage1? >> >> Ok. Ok for release branches and trunk as well, if you want. I'd >> recommend trunk, if your port is meant to work and test out nicely in gcc >> 7. >> > > Committed to trunk. seems you didn't properly test this patchset. It caused +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++11 (test for warnings, line 24) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++11 (test for warnings, line 34) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++11 (test for excess err ors) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++14 (test for warnings, line 24) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++14 (test for warnings, line 34) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++14 (test for excess err ors) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++98 (test for warnings, line 24) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++98 (test for warnings, line 34) +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++98 (test for excess err ors) everywhere. Adding that dg-require-effective-target line requires adjusting dg-warning etc. line numbers. Fixed as follows, installed on mainline after checking with the appropriate runtest invocations (for c and c++) on x86_64-pc-linux-gnu. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2017-03-24 Rainer Orth * c-c++-common/Wimplicit-fallthrough-7.c: Adjust dg-warning line numbers. # HG changeset patch # Parent 288df338f490e514591c89d11734d7de56f62460 Adjust c-c++-common/Wimplicit-fallthrough-7.c line numbers diff --git a/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c b/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c --- a/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c +++ b/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c @@ -21,8 +21,8 @@ f (int i) switch (i) { case 1: - { /* { dg-warning "statement may fall through" "" { target c } 23 } */ - int a[i]; /* { dg-warning "statement may fall through" "" { target c++ } 24 } */ + { /* { dg-warning "statement may fall through" "" { target c } 24 } */ + int a[i]; /* { dg-warning "statement may fall through" "" { target c++ } 25 } */ } case 2: bar (99); @@ -31,8 +31,8 @@ f (int i) switch (i) { case 1: - for (int j = 0; j < 10; j++) /* { dg-warning "statement may fall through" "" { target c } 33 } */ - map[j] = j; /* { dg-warning "statement may fall through" "" { target c++ } 34 } */ + for (int j = 0; j < 10; j++) /* { dg-warning "statement may fall through" "" { target c } 34 } */ + map[j] = j; /* { dg-warning "statement may fall through" "" { target c++ } 35 } */ case 2: bar (99); }
Re: [testsuite] Add missing dg-require-effective-target alloca to gcc testsuite
On Fri, Mar 24, 2017 at 1:38 PM, Rainer Orth wrote: > Hi Tom, > >> On 23/03/17 18:25, Mike Stump wrote: >>> On Mar 23, 2017, at 8:46 AM, Tom de Vries wrote: I've run the gcc testsuite for target nvptx-none and ran into "test for excess errors" FAILs due to: ... sorry, unimplemented: target cannot support alloca. >>> >>> We'd encourage ports to support alloca. :-) >>> OK for trunk for stage1? >>> >>> Ok. Ok for release branches and trunk as well, if you want. I'd >>> recommend trunk, if your port is meant to work and test out nicely in gcc >>> 7. >>> >> >> Committed to trunk. > > seems you didn't properly test this patchset. It caused > > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++11 (test for > warnings, > line 24) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++11 (test for > warnings, > line 34) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++11 (test for excess > err > ors) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++14 (test for > warnings, > line 24) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++14 (test for > warnings, > line 34) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++14 (test for excess > err > ors) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++98 (test for > warnings, > line 24) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++98 (test for > warnings, > line 34) > +FAIL: c-c++-common/Wimplicit-fallthrough-7.c -std=gnu++98 (test for excess > err > ors) > > everywhere. Adding that dg-require-effective-target line requires > adjusting dg-warning etc. line numbers. > > Fixed as follows, installed on mainline after checking with the > appropriate runtest invocations (for c and c++) on x86_64-pc-linux-gnu. Similar -m64 FAIL: gcc.dg/Walloca-2.c note (test for warnings, line 38) FAIL: gcc.dg/Wvla-larger-than-2.c note (test for warnings, line 25) -m32 FAIL: gcc.dg/Walloca-1.c (test for warnings, line 26) FAIL: gcc.dg/Walloca-1.c (test for excess errors) FAIL: gcc.dg/Walloca-2.c (test for warnings, line 38) FAIL: gcc.dg/Walloca-2.c (test for excess errors) FAIL: gcc.dg/Wvla-larger-than-2.c note (test for warnings, line 25) > Rainer > > -- > - > Rainer Orth, Center for Biotechnology, Bielefeld University > > > 2017-03-24 Rainer Orth > > * c-c++-common/Wimplicit-fallthrough-7.c: Adjust dg-warning line > numbers. >
Re: [testsuite] Add missing dg-require-effective-target alloca to gcc testsuite
Hi Richard, > Similar > > -m64 > FAIL: gcc.dg/Walloca-2.c note (test for warnings, line 38) > FAIL: gcc.dg/Wvla-larger-than-2.c note (test for warnings, line 25) > > -m32 > FAIL: gcc.dg/Walloca-1.c (test for warnings, line 26) > FAIL: gcc.dg/Walloca-1.c (test for excess errors) > FAIL: gcc.dg/Walloca-2.c (test for warnings, line 38) > FAIL: gcc.dg/Walloca-2.c (test for excess errors) > FAIL: gcc.dg/Wvla-larger-than-2.c note (test for warnings, line 25) right, just noticed that myself. Fix in progress... Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix profiledbootstrap ada checking failure (PR debug/79255)
On Fri, Mar 24, 2017 at 12:45:28PM +0100, Richard Biener wrote: > On Fri, Mar 24, 2017 at 9:43 AM, Jakub Jelinek wrote: > > On Fri, Mar 24, 2017 at 09:29:00AM +0100, Richard Biener wrote: > >> Yeah, the thing BLOCK_NONLOCALIZED_VARS wants to do is optimize generated > >> dwarf by adding a DW_AT_abstract_origin (just to refer to the > >> subprogram DIE) but > > > > Well, for FUNCTION_DECLs in BLOCK_VARS/BLOCK_NONLOCALIZED_VARS we actually > > don't > > emit any further DIE and so there is no DW_AT_abstract_origin. > > E.g. gen_subprogram_die has: > > /* Detect and ignore this case, where we are trying to output > > something we have already output. */ > > if (get_AT (old_die, DW_AT_low_pc) > > || get_AT (old_die, DW_AT_ranges)) > > return; > > Hmm, but we do want to put the function in scope? THus > > void foo () {} > void bar () > { > int foo; > { > void foo(); > foo(); > } > } > > should have a DIE for foo in bar (possibly refering to the concrete instance > for optimization). We actually do that. If I change your testcase so that it actually triggers the changed part of the code (so there is inlining etc.), so: volatile int v; __attribute__((noinline)) void foo () { v++; } static inline void bar () { int foo; { void foo(); foo(); } } void baz (void) { bar (); bar (); } then at -gdwarf-3 -dA -O2 the difference between vanilla GCC and my patched GCC is following (used -gdwarf-3 so that there are no DW_FORM_flag_present that result in DIE offset changes): .uleb128 0xd# (DIE (0x117) DW_TAG_subprogram) .ascii "bar\0" # DW_AT_name .byte 0x1 # DW_AT_decl_file (prQQ.c) .byte 0x3 # DW_AT_decl_line .byte 0x3 # DW_AT_inline .long 0x13c # DW_AT_sibling .uleb128 0xe# (DIE (0x123) DW_TAG_variable) .ascii "foo\0" # DW_AT_name .byte 0x1 # DW_AT_decl_file (prQQ.c) .byte 0x5 # DW_AT_decl_line .long 0x41# DW_AT_type .uleb128 0xf# (DIE (0x12e) DW_TAG_lexical_block) .uleb128 0x10 # (DIE (0x12f) DW_TAG_subprogram) .byte 0x1 # DW_AT_external .ascii "foo\0" # DW_AT_name .byte 0x1 # DW_AT_decl_file (prQQ.c) .byte 0x7 # DW_AT_decl_line - .byte 0 # DW_AT_inline + .byte 0x1 # DW_AT_declaration .uleb128 0x11 # (DIE (0x138) DW_TAG_unspecified_parameters) .byte 0 # end of children of DIE 0x12f .byte 0 # end of children of DIE 0x12e .byte 0 # end of children of DIE 0x117 .uleb128 0x12 # (DIE (0x13c) DW_TAG_subprogram) .byte 0x1 # DW_AT_external .ascii "foo\0" # DW_AT_name .byte 0x1 # DW_AT_decl_file (prQQ.c) .byte 0x2 # DW_AT_decl_line .quad .LFB0 # DW_AT_low_pc .quad .LFE0 # DW_AT_high_pc .byte 0x1 # DW_AT_frame_base .byte 0x9c# DW_OP_call_frame_cfa .byte 0x1 # DW_AT_GNU_all_call_sites .byte 0 # end of children of DIE 0xb and corresponding change in .debug_abbrev. That seems to be the right change to me, inside of bar (the DW_AT_inline one) we have a declaration of foo, and we properly use DW_AT_declaration for it, then at the toplevel we actually have the full definition die for foo, and then we have in two spots inside baz .uleb128 0x6# (DIE (0x6d) DW_TAG_inlined_subroutine) .long 0x117 # DW_AT_abstract_origin without the need to duplicate the foo declaration in there, that is inherited through DW_AT_abstract_origin. Jakub
Re: [testsuite] Add missing dg-require-effective-target alloca to gcc testsuite
Hi Richard, >> Similar >> >> -m64 >> FAIL: gcc.dg/Walloca-2.c note (test for warnings, line 38) >> FAIL: gcc.dg/Wvla-larger-than-2.c note (test for warnings, line 25) >> >> -m32 >> FAIL: gcc.dg/Walloca-1.c (test for warnings, line 26) >> FAIL: gcc.dg/Walloca-1.c (test for excess errors) >> FAIL: gcc.dg/Walloca-2.c (test for warnings, line 38) >> FAIL: gcc.dg/Walloca-2.c (test for excess errors) >> FAIL: gcc.dg/Wvla-larger-than-2.c note (test for warnings, line 25) > > right, just noticed that myself. Fix in progress... here's what I committed after similar testing. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2017-03-24 Rainer Orth * c-c++-common/Wimplicit-fallthrough-7.c: Adjust dg-warning etc. line numbers. * gcc.dg/Walloca-1.c: Likewise. * gcc.dg/Walloca-2.c: Likewise. * gcc.dg/Wvla-larger-than-2.c: Likewise. # HG changeset patch # Parent 288df338f490e514591c89d11734d7de56f62460 Adjust c-c++-common/Wimplicit-fallthrough-7.c line numbers diff --git a/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c b/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c --- a/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c +++ b/gcc/testsuite/c-c++-common/Wimplicit-fallthrough-7.c @@ -21,8 +21,8 @@ f (int i) switch (i) { case 1: - { /* { dg-warning "statement may fall through" "" { target c } 23 } */ - int a[i]; /* { dg-warning "statement may fall through" "" { target c++ } 24 } */ + { /* { dg-warning "statement may fall through" "" { target c } 24 } */ + int a[i]; /* { dg-warning "statement may fall through" "" { target c++ } 25 } */ } case 2: bar (99); @@ -31,8 +31,8 @@ f (int i) switch (i) { case 1: - for (int j = 0; j < 10; j++) /* { dg-warning "statement may fall through" "" { target c } 33 } */ - map[j] = j; /* { dg-warning "statement may fall through" "" { target c++ } 34 } */ + for (int j = 0; j < 10; j++) /* { dg-warning "statement may fall through" "" { target c } 34 } */ + map[j] = j; /* { dg-warning "statement may fall through" "" { target c++ } 35 } */ case 2: bar (99); } diff --git a/gcc/testsuite/gcc.dg/Walloca-1.c b/gcc/testsuite/gcc.dg/Walloca-1.c --- a/gcc/testsuite/gcc.dg/Walloca-1.c +++ b/gcc/testsuite/gcc.dg/Walloca-1.c @@ -25,7 +25,7 @@ void foo1 (size_t len, size_t len2, size useit (s); // OK, constant argument to alloca s = alloca (num); // { dg-warning "large due to conversion" "" { target lp64 } } - // { dg-warning "unbounded use of 'alloca'" "" { target { ! lp64 } } 26 } + // { dg-warning "unbounded use of 'alloca'" "" { target { ! lp64 } } 27 } useit (s); s = alloca (3); /* { dg-warning "is too large" } */ diff --git a/gcc/testsuite/gcc.dg/Walloca-2.c b/gcc/testsuite/gcc.dg/Walloca-2.c --- a/gcc/testsuite/gcc.dg/Walloca-2.c +++ b/gcc/testsuite/gcc.dg/Walloca-2.c @@ -37,8 +37,8 @@ g3 (int n) if (n > 0 && n < 3000) { p = __builtin_alloca (n); // { dg-warning "'alloca' may be too large" "" { target lp64} } - // { dg-message "note:.*argument may be as large as 2999" "note" { target lp64 } 38 } - // { dg-warning "unbounded use of 'alloca'" "" { target { ! lp64 } } 38 } + // { dg-message "note:.*argument may be as large as 2999" "note" { target lp64 } 39 } + // { dg-warning "unbounded use of 'alloca'" "" { target { ! lp64 } } 39 } } else p = __builtin_malloc (n); diff --git a/gcc/testsuite/gcc.dg/Wvla-larger-than-2.c b/gcc/testsuite/gcc.dg/Wvla-larger-than-2.c --- a/gcc/testsuite/gcc.dg/Wvla-larger-than-2.c +++ b/gcc/testsuite/gcc.dg/Wvla-larger-than-2.c @@ -24,7 +24,7 @@ f2 (__SIZE_TYPE__ a) { // 11 * 4 bytes = 44: Not OK. uint32_t x[a]; // { dg-warning "array may be too large" } - // { dg-message "note:.*argument may be as large as 44" "note" { target *-*-* } 25 } + // { dg-message "note:.*argument may be as large as 44" "note" { target *-*-* } 26 } f0 (x); } }
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On Fri, Mar 24, 2017 at 06:37:10AM -0600, Jeff Law wrote: > > 2017-03-21 Jakub Jelinek > > > > PR target/80102 > > * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between /f > > and non-/f instructions. If both i1 and i2 are frame related, > > verify all CFA notes, their order and content. > > > > * g++.dg/opt/pr80102.C: New test. > Presumably this didn't ICE at some point in the past, so it's a regression? > (it's not marked as such in the BZ). It doesn't ICE for me with r238210 and ICEs with current trunk, I don't have too many ppc64le compilers around though. > > + /* If both i1 and i2 are frame related, verify all the CFA notes > > + in the same order and with the same content. */ > > + if (RTX_FRAME_RELATED_P (i1)) > > +{ > > + static enum reg_note cfa_note_kinds[] = { > > + REG_FRAME_RELATED_EXPR, REG_CFA_DEF_CFA, REG_CFA_ADJUST_CFA, > > + REG_CFA_OFFSET, REG_CFA_REGISTER, REG_CFA_EXPRESSION, > > + REG_CFA_VAL_EXPRESSION, REG_CFA_RESTORE, REG_CFA_SET_VDRAP, > > + REG_CFA_TOGGLE_RA_MANGLE, REG_CFA_WINDOW_SAVE, REG_CFA_FLUSH_QUEUE > > + }; > ISTM this could get out of date very easily. Is there a clean way to > generate the array of cfa notes as we build up the notes from reg-notes.def? We could e.g. #ifndef REG_CFA_NOTE # define REG_CFA_NOTE(NAME) REG_NOTE(NAME) #endif and then REG_CFA_NOTE (FRAME_RELATED_EXPR) etc. in reg-notes.def (and document that REG_CFA_NOTE should be used for notes related to CFA). Then in cfgcleanups.c we could just #undef REG_CFA_NOTE #define DEF_REG_NOTE(NAME) #define REG_CFA_NOTE(NAME) REG_##NAME, #include "reg-notes.def" #undef DEF_REG_NOTE #undef REG_CFA_NOTE to populate the cfa_note_kinds array. Jakub
Re: [PATCH] Fix profiledbootstrap ada checking failure (PR debug/79255)
On Fri, Mar 24, 2017 at 3:46 AM, Jakub Jelinek wrote: > On Thu, Mar 23, 2017 at 05:24:31PM -0400, Jason Merrill wrote: >> On Thu, Mar 23, 2017 at 4:44 PM, Jakub Jelinek wrote: >> > The following C testcase shows how profiledbootstrap fails with checking >> > compiler. We have a (nested) FUNCTION_DECL inside of BLOCK_VARS of an >> > inline function, when it gets inlined, it is moved into >> > BLOCK_NONLOCALIZED_VARS. And, decls_for_scope calls process_scope_var >> > with NULL decl and non-NULL origin for all BLOCK_NONLOCALIZED_VARS. >> > That is fine for variables, but for FUNCTION_DECLs it can actually >> > try to dwarf2out_abstract_function that FUNCTION_DECL, which should be >> > really done only when it is inlined (i.e. BLOCK_ABSTRACT_ORIGIN of >> > some BLOCK). >> >> And when it's cloned. >> >> But does it make sense for gen_decl_die to call >> dwarf2out_abstract_function when decl is null? That seems wrong. > > Before r144529 we had just: > if (DECL_ORIGIN (decl) != decl) > dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); > and that is indeed to handle clones etc. > > r144529 changed that to: > if (origin || DECL_ORIGIN (decl) != decl) > dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); > All of the decl is NULL introduced in r144529 implies decl_or_origin > is abstract. > > Removing that origin || wouldn't really work, we'd have to rewrite most of > gen_decl_die FUNCTION_DECL handling to use decl_or_origin instead of > decl etc. I was thinking to change it to if (decl && (origin || DECL_ORIGIN (decl) != decl)) Jason
Re: [PATCH] Fix profiledbootstrap ada checking failure (PR debug/79255)
On Fri, Mar 24, 2017 at 09:07:54AM -0400, Jason Merrill wrote: > >> And when it's cloned. > >> > >> But does it make sense for gen_decl_die to call > >> dwarf2out_abstract_function when decl is null? That seems wrong. > > > > Before r144529 we had just: > > if (DECL_ORIGIN (decl) != decl) > > dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); > > and that is indeed to handle clones etc. > > > > r144529 changed that to: > > if (origin || DECL_ORIGIN (decl) != decl) > > dwarf2out_abstract_function (DECL_ABSTRACT_ORIGIN (decl)); > > All of the decl is NULL introduced in r144529 implies decl_or_origin > > is abstract. > > > > Removing that origin || wouldn't really work, we'd have to rewrite most of > > gen_decl_die FUNCTION_DECL handling to use decl_or_origin instead of > > decl etc. > > I was thinking to change it to > > if (decl && (origin || DECL_ORIGIN (decl) != decl)) But then you segfault immediately on the next: else if (cgraph_function_possibly_inlined_p (decl) && ! DECL_ABSTRACT_P (decl) because decl is NULL. So, it would be far easier to do: if (decl == NULL_TREE) { decl = origin; origin = NULL_TREE; } before the if (origin || DECL_ORIGIN (decl) != decl) with a comment, rather than to rewrite a couple of dozen decl_or_origin, and change assumptions that non-NULL origin actually means anything for FUNCTION_DECL. That is IMO still uglier than the decls_for_scope change though, but if you prefer that, I can test that. Jakub
Re: [PATCH] Fix calls.c for a _complex type (PR ipa/80104).
On 03/24/2017 12:49 PM, Richard Biener wrote: > On Fri, Mar 24, 2017 at 10:25 AM, Martin Liška wrote: >> Hello. >> >> Briefly described in the PR, running ICF (without any optimization level) >> can create a thunk call >> that does not use an SSA_NAME which is a default def of an argument of the >> caller: >> >> c (complex float b) >> { >> complex float arg.1; >> float retval.0; >> >>[100.00%]: >> arg.1_2 = b; >> retval.0_4 = a (arg.1_2); [tail call] >> >> return retval.0_4; >> >> } >> >> The "arg" variable creation was introduced by Jason in r207301. >> As complex type is passed as invisible reference, we need to find address of >> the argument. >> Thus I'm suggesting to find it alternatively via SSA_NAME_DEF_STMT. >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> And fixes the ICE >> on s390x cross compiler. >> >> Ready to be installed? > > Ick, that looks like fragile code ... looks like the > call_from_thunk_p case expects > the default def or the param-decl. So why not create it that way? Or Not creating the "arg" and directly passing the PARM_DECL will break verifier: /home/marxin/Programming/testcases/pr80104.c:10:1: error: invalid argument to gimple call } ^ b # .MEM_2 = VDEF <.MEM_1(D)> retval.0_3 = a (b); [tail call] /home/marxin/Programming/testcases/pr80104.c:10:1: internal compiler error: verify_gimple failed > simply not do > this "optimization"? You mean understanding when an argument of a wrapper (thunk) will need invisible reference and if so, then ICF should not do the optimization? Martin > > Richard. > >> Martin
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On Fri, Mar 24, 2017 at 02:04:42PM +0100, Jakub Jelinek wrote: > On Fri, Mar 24, 2017 at 06:37:10AM -0600, Jeff Law wrote: > > > 2017-03-21 Jakub Jelinek > > > > > > PR target/80102 > > > * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between /f > > > and non-/f instructions. If both i1 and i2 are frame related, > > > verify all CFA notes, their order and content. > > > > > > * g++.dg/opt/pr80102.C: New test. > > Presumably this didn't ICE at some point in the past, so it's a regression? > > (it's not marked as such in the BZ). > > It doesn't ICE for me with r238210 and ICEs with current trunk, I don't have > too many ppc64le compilers around though. GCC 4.8 doesn't ICE either. Jakub
Re: [RFC] VEC_SELECT sanity checking in genrecog
On Mon, Mar 06, 2017 at 12:53:38PM +0100, Richard Biener wrote: > I think these are all bugs and should be fixed and thus this checking > is good. > > Of course we'd better not break (too many) targets at this point... I've tested it today and it passed on all targets I've tried make s-recog on, i.e. i386, arm, aarch64, alpha, avr, cris, hppa, ia64, m32c, m68k, mips, nvptx, rs6000, sparc, sh, s390x. Ok for trunk then? > > 2017-03-03 Jakub Jelinek > > > > * genrecog.c (validate_pattern): Add VEC_SELECT validation. > > * genmodes.c (emit_min_insn_modes_c): Call emit_mode_nunits > > and emit_mode_inner. Jakub
[Committed] S/390: PR79893: Add diagnostics vec_load_bndry builtin.
The boundary argument of the vec_load_bndry builtin needs to be rewritten. At that point it must be constant already. The current diagnostics in s390_expand_builtins is too late for this. The patch adds an additional check for that builtin which will be triggered already during preprocessing. Regression tested on s390x. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel PR target/79893 * gcc.target/s390/zvector/pr79893.c: New test. gcc/ChangeLog: 2017-03-24 Andreas Krebbel PR target/79893 * config/s390/s390-c.c (s390_adjust_builtin_arglist): Issue an error if the boundary argument is not constant. --- gcc/ChangeLog | 6 ++ gcc/config/s390/s390-c.c| 12 ++-- gcc/testsuite/ChangeLog | 5 + gcc/testsuite/gcc.target/s390/zvector/pr79893.c | 9 + 4 files changed, 30 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/zvector/pr79893.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8d4adfb..9cb2271 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2017-03-24 Andreas Krebbel + + PR target/79893 + * config/s390/s390-c.c (s390_adjust_builtin_arglist): Issue an + error if the boundary argument is not constant. + 2017-03-24 Bill Schmidt PR tree-optimization/80158 diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c index 8cdac7e..0521e1e 100644 --- a/gcc/config/s390/s390-c.c +++ b/gcc/config/s390/s390-c.c @@ -673,10 +673,18 @@ s390_adjust_builtin_arglist (unsigned int ob_fcode, tree decl, case S390_OVERLOADED_BUILTIN_s390_vec_load_bndry: { int code; - if (dest_arg_index == 1) { - switch (tree_to_uhwi ((**arglist)[src_arg_index])) + tree arg = (**arglist)[src_arg_index]; + + if (TREE_CODE (arg) != INTEGER_CST) + { + error ("constant value required for builtin %qF argument %d", + decl, src_arg_index + 1); + return; + } + + switch (tree_to_uhwi (arg)) { case 64: code = 0; break; case 128: code = 1; break; diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 759d2b3..f3908d9 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2017-03-24 Andreas Krebbel + + PR target/79893 + * gcc.target/s390/zvector/pr79893.c: New test. + 2017-03-24 Bill Schmidt PR tree-optimization/80158 diff --git a/gcc/testsuite/gcc.target/s390/zvector/pr79893.c b/gcc/testsuite/gcc.target/s390/zvector/pr79893.c new file mode 100644 index 000..ad6ca30 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/zvector/pr79893.c @@ -0,0 +1,9 @@ +/* { dg-do compile { target { s390*-*-* } } } */ +/* { dg-options "-march=z13 -mzarch -mzvector" } */ + +#include + +void +foo(signed char *p, int i) { + vec_load_bndry(p, i); /* { dg-error "constant value required for builtin.*2" } */ +} -- 2.9.1
[Committed] S/390: PR79904: Disallow reg + sym_ref literal pool addresses.
We accept reg + sym_ref as valid address if sym_ref is a literal pool reference knowing that it will be rewritten as r13 + reg + offset. However, annotate_constant_pool_refs was never able to handle that. With the patch only single sym_refs are accepted. Regression tested on s390x. 2017-03-24 Andreas Krebbel PR target/79904 * config/s390/s390.c (s390_decompose_address): Reject reg + sym_ref literal pool references. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.dg/ubsan/pr79904-2.c: New test. --- gcc/config/s390/s390.c | 11 --- gcc/testsuite/ChangeLog| 4 gcc/testsuite/gcc.dg/ubsan/pr79904-2.c | 11 +++ 3 files changed, 19 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ubsan/pr79904-2.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index e7ab128..27640ad 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -2842,13 +2842,10 @@ s390_decompose_address (rtx addr, struct s390_address *out) displacements by basing them off the base register. */ if (disp && GET_CODE (disp) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (disp)) { - /* Either base or index must be free to hold the base register. */ - if (!base) -base = fake_pool_base, literal_pool = true; - else if (!indx) -indx = fake_pool_base, literal_pool = true; - else -return false; + if (base || indx) + return false; + + base = fake_pool_base, literal_pool = true; /* Mark up the displacement. */ disp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, disp), diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index f3908d9..29318bb 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,9 @@ 2017-03-24 Andreas Krebbel + * gcc.dg/ubsan/pr79904-2.c: New test. + +2017-03-24 Andreas Krebbel + PR target/79893 * gcc.target/s390/zvector/pr79893.c: New test. diff --git a/gcc/testsuite/gcc.dg/ubsan/pr79904-2.c b/gcc/testsuite/gcc.dg/ubsan/pr79904-2.c new file mode 100644 index 000..c99c43d --- /dev/null +++ b/gcc/testsuite/gcc.dg/ubsan/pr79904-2.c @@ -0,0 +1,11 @@ +/* PR sanitizer/79904 */ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=signed-integer-overflow" } */ + +typedef signed char V __attribute__((vector_size (8))); + +void +foo (V *a) +{ + *a = *a * (V) { 3, 4, 5, 6, 7, 8, 9, 10 }; +} -- 2.9.1
Re: [RFC] VEC_SELECT sanity checking in genrecog
On Fri, 24 Mar 2017, Jakub Jelinek wrote: > On Mon, Mar 06, 2017 at 12:53:38PM +0100, Richard Biener wrote: > > I think these are all bugs and should be fixed and thus this checking > > is good. > > > > Of course we'd better not break (too many) targets at this point... > > I've tested it today and it passed on all targets I've tried make s-recog > on, i.e. i386, arm, aarch64, alpha, avr, cris, hppa, ia64, m32c, m68k, > mips, nvptx, rs6000, sparc, sh, s390x. > Ok for trunk then? Yes. Richard.
[Committed] S/390: Add arch12 support
This patch series adds support for a new architecture level of S/390. The most important feature of the new instruction set is the support of single and extended precision floating point vector operations. Binutils support is part of the 2.28 release: https://sourceware.org/ml/binutils/2017-02/msg00301.html Note: arch12 is NOT the official name of the new CPU. It just continues the series of archXX options supported as alternate names. The archXX terminology refers to the edition number of the Principle of Operations manual. The official CPU name will be added later while keeping support of the arch12 for backwards compatibility. Andreas Krebbel (16): S/390: Rename cpu facility vec to vx. S/390: Improve support of 128 bit vectors in GPRs S/390: vec_init improvements S/390: movsf/sd pattern fixes. S/390: movdf improvements S/390: Move and rename vector check. S/390: Use wfc for scalar vector compares S/390: Rearrange fixuns_trunc pattern definitions. S/390: arch12: Add arch12 option. S/390: arch12: Add support for new vector bit operations. S/390: arch12: New vector popcount variants S/390: arch12: Add vllezlf instruction. S/390: arch12: Add indirect branch pattern S/390: arch12: Support the mul/add/subtract instructions. S/390: arch12: Support new vector floating point modes. S/390: arch12: New builtins. gcc/ChangeLog | 223 ++ gcc/common/config/s390/s390-common.c |5 +- gcc/config.gcc |2 +- gcc/config/s390/2964.md|8 +- gcc/config/s390/constraints.md | 10 +- gcc/config/s390/driver-native.c|3 + gcc/config/s390/s390-builtin-types.def | 129 +- gcc/config/s390/s390-builtins.def | 3504 +++- gcc/config/s390/s390-builtins.h|2 + gcc/config/s390/s390-c.c | 41 +- gcc/config/s390/s390-opts.h|1 + gcc/config/s390/s390.c | 206 +- gcc/config/s390/s390.h | 25 +- gcc/config/s390/s390.md| 663 ++-- gcc/config/s390/s390.opt |3 + gcc/config/s390/vecintrin.h| 125 +- gcc/config/s390/vector.md | 522 ++- gcc/config/s390/vx-builtins.md | 547 +-- gcc/testsuite/ChangeLog| 62 + gcc/testsuite/gcc.target/s390/arch12/aghsghmgh-1.c | 23 + gcc/testsuite/gcc.target/s390/arch12/mul-1.c | 30 + gcc/testsuite/gcc.target/s390/arch12/mul-2.c | 16 + gcc/testsuite/gcc.target/s390/htm-builtins-z13-1.c |2 +- gcc/testsuite/gcc.target/s390/s390.exp | 22 +- .../gcc.target/s390/target-attribute/tattr-3.c |3 +- .../gcc.target/s390/target-attribute/tattr-4.c |6 +- .../s390/target-attribute/tpragma-struct-vx-1.c|2 +- .../s390/target-attribute/tpragma-struct-vx-2.c|2 +- gcc/testsuite/gcc.target/s390/vector/stpcpy-1.c|2 +- .../gcc.target/s390/vector/vec-abi-vararg-1.c |2 +- .../gcc.target/s390/vector/vec-clobber-1.c |2 +- .../gcc.target/s390/vector/vec-genbytemask-1.c |2 +- .../gcc.target/s390/vector/vec-genmask-1.c |2 +- gcc/testsuite/gcc.target/s390/vector/vec-init-2.c | 48 + .../gcc.target/s390/vector/vec-nopeel-1.c |2 +- .../gcc.target/s390/vector/vec-scalar-cmp-1.c | 31 +- gcc/testsuite/gcc.target/s390/vector/vec-vrepi-1.c |2 +- gcc/testsuite/gcc.target/s390/vxe/bitops-1.c | 52 + gcc/testsuite/gcc.target/s390/vxe/negfma-1.c | 49 + gcc/testsuite/gcc.target/s390/vxe/popcount-1.c | 88 + gcc/testsuite/gcc.target/s390/vxe/vllezlf-1.c | 30 + gcc/testsuite/lib/target-supports.exp | 35 + 42 files changed, 4083 insertions(+), 2451 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/arch12/aghsghmgh-1.c create mode 100644 gcc/testsuite/gcc.target/s390/arch12/mul-1.c create mode 100644 gcc/testsuite/gcc.target/s390/arch12/mul-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-init-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vxe/bitops-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vxe/negfma-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vxe/popcount-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vxe/vllezlf-1.c -- 2.9.1
[PATCH 02/16] S/390: Improve support of 128 bit vectors in GPRs
This patch improves the handling of 128 bit vectors residing in GPRs by adding more alternatives to the move pattern. Regression tested on s390x. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/constraints.md: Add comments. (jKK): Reject element sizes > 8 bytes. * config/s390/s390.c (s390_split_ok_p): Enable splitting also for s_operands. * config/s390/s390.md: Add the s_operand checks formerly in s390_split_ok_p to various splitters where they are still required. * config/s390/vector.md ("mov" V_128): Add GPR alternatives for 128 bit vectors. Plus two splitters. --- gcc/ChangeLog | 12 ++ gcc/config/s390/constraints.md | 10 +++-- gcc/config/s390/s390.c | 4 gcc/config/s390/s390.md| 16 + gcc/config/s390/vector.md | 51 ++ 5 files changed, 83 insertions(+), 10 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 7a83d1b..292e946 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,17 @@ 2017-03-24 Andreas Krebbel + * config/s390/constraints.md: Add comments. + (jKK): Reject element sizes > 8 bytes. + * config/s390/s390.c (s390_split_ok_p): Enable splitting also for + s_operands. + * config/s390/s390.md: Add the s_operand checks formerly in + s390_split_ok_p to various splitters where they are still + required. + * config/s390/vector.md ("mov" V_128): Add GPR alternatives + for 128 bit vectors. Plus two splitters. + +2017-03-24 Andreas Krebbel + * config/s390/s390.md: Rename the cpu facilty vec to vx throughout the file. diff --git a/gcc/config/s390/constraints.md b/gcc/config/s390/constraints.md index 536f485..95c6a8f 100644 --- a/gcc/config/s390/constraints.md +++ b/gcc/config/s390/constraints.md @@ -410,20 +410,26 @@ "All one bit scalar or vector constant" (match_test "op == CONSTM1_RTX (GET_MODE (op))")) +; vector generate mask operand - support for up to 64 bit elements (define_constraint "jxx" "@internal" (and (match_code "const_vector") (match_test "s390_contiguous_bitmask_vector_p (op, NULL, NULL)"))) +; vector generate byte mask operand - this is only supposed to deal +; with real vectors 128 bit values of being either 0 or -1 are handled +; with j00 and jm1 (define_constraint "jyy" "@internal" (and (match_code "const_vector") (match_test "s390_bytemask_vector_p (op, NULL)"))) +; vector replicate immediate operand - support for up to 64 bit elements (define_constraint "jKK" "@internal" - (and (and (match_code "const_vector") - (match_test "const_vec_duplicate_p (op)")) + (and (and (and (match_code "const_vector") +(match_test "const_vec_duplicate_p (op)")) + (match_test "GET_MODE_UNIT_SIZE (GET_MODE (op)) <= 8")) (match_test "satisfies_constraint_K (XVECEXP (op, 0, 0))"))) (define_constraint "jm6" diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 27640ad..f3cebd6 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -2494,10 +2494,6 @@ s390_split_ok_p (rtx dst, rtx src, machine_mode mode, int first_subword) if (FP_REG_P (src) || FP_REG_P (dst) || VECTOR_REG_P (src) || VECTOR_REG_P (dst)) return false; - /* We don't need to split if operands are directly accessible. */ - if (s_operand (src, mode) || s_operand (dst, mode)) -return false; - /* Non-offsettable memory references cannot be split. */ if ((GET_CODE (src) == MEM && !offsettable_memref_p (src)) || (GET_CODE (dst) == MEM && !offsettable_memref_p (dst))) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 660b5f9..555a779 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -1490,6 +1490,8 @@ [(set (match_operand:TI 0 "nonimmediate_operand" "") (match_operand:TI 1 "general_operand" ""))] "TARGET_ZARCH && reload_completed + && !s_operand (operands[0], TImode) + && !s_operand (operands[1], TImode) && s390_split_ok_p (operands[0], operands[1], TImode, 0)" [(set (match_dup 2) (match_dup 4)) (set (match_dup 3) (match_dup 5))] @@ -1504,6 +1506,8 @@ [(set (match_operand:TI 0 "nonimmediate_operand" "") (match_operand:TI 1 "general_operand" ""))] "TARGET_ZARCH && reload_completed + && !s_operand (operands[0], TImode) + && !s_operand (operands[1], TImode) && s390_split_ok_p (operands[0], operands[1], TImode, 1)" [(set (match_dup 2) (match_dup 4)) (set (match_dup 3) (match_dup 5))] @@ -1824,6 +1828,8 @@ [(set (match_operand:DI 0 "nonimmediate_operand" "") (match_operand:DI 1 "general_operand" ""))] "!TARGET_ZARCH && reload_completed + && !s_operand (operands[0], DImode) + && !s_operand (operands[1], DImode) && s390_split_ok_p (operands[0], operands[1], DImode, 0)" [(set (match_dup 2) (match_
[PATCH 01/16] S/390: Rename cpu facility vec to vx.
gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.md: Rename the cpu facilty vec to vx throughout the file. --- gcc/ChangeLog | 5 + gcc/config/s390/s390.md | 46 +++--- 2 files changed, 28 insertions(+), 23 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9cb2271..7a83d1b 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,10 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.md: Rename the cpu facilty vec to vx throughout + the file. + +2017-03-24 Andreas Krebbel + PR target/79893 * config/s390/s390-c.c (s390_adjust_builtin_arglist): Issue an error if the boundary argument is not constant. diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 19daf31..660b5f9 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -482,7 +482,7 @@ (const (symbol_ref "s390_tune_attr"))) (define_attr "cpu_facility" - "standard,ieee,zarch,cpu_zarch,longdisp,extimm,dfp,z10,z196,zEC12,vec,z13" + "standard,ieee,zarch,cpu_zarch,longdisp,extimm,dfp,z10,z196,zEC12,vx,z13" (const_string "standard")) (define_attr "enabled" "" @@ -525,7 +525,7 @@ (match_test "TARGET_ZEC12")) (const_int 1) - (and (eq_attr "cpu_facility" "vec") + (and (eq_attr "cpu_facility" "vx") (match_test "TARGET_VX")) (const_int 1) @@ -1484,7 +1484,7 @@ #" [(set_attr "op_type" "RSY,RSY,VRR,VRI,VRI,VRR,*,VRX,VRX,*,*") (set_attr "type" "lm,stm,*,*,*,*,*,*,*,*,*") - (set_attr "cpu_facility" "*,*,vec,vec,vec,vec,vec,vec,vec,*,*")]) + (set_attr "cpu_facility" "*,*,vx,vx,vx,vx,vx,vx,vx,*,*")]) (define_split [(set (match_operand:TI 0 "nonimmediate_operand" "") @@ -1720,7 +1720,7 @@ *,*,*,*,*,*,*") (set_attr "cpu_facility" "*,*,*,*,*,extimm,extimm,extimm,dfp,dfp,longdisp, z10,*,*,*,*,*,longdisp,*,longdisp, - z10,z10,*,*,*,*,vec,vec,vec,vec,vec,vec") + z10,z10,*,*,*,*,vx,vx,vx,vx,vx,vx") (set_attr "z10prop" "z10_fwd_A1, z10_fwd_E1, z10_fwd_E1, @@ -2001,7 +2001,7 @@ *, *,*,*,*,*,*,*") (set_attr "cpu_facility" "*,*,*,extimm,longdisp,z10,*,*,longdisp,*,longdisp, - vec,*,vec,*,longdisp,*,longdisp,*,*,*,z10,z10,*,vec,vec,vec,vec,vec,vec") + vx,*,vx,*,longdisp,*,longdisp,*,*,*,z10,z10,*,vx,vx,vx,vx,vx,vx") (set_attr "z10prop" "z10_fwd_A1, z10_fwd_E1, z10_fwd_E1, @@ -2049,7 +2049,7 @@ (set_attr "type" "*,lr,load,store,floadsf,floadsf,floadsf,floadsf,fstoresf,*,*,*,*") (set_attr "z10prop" "z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_rec,*,*,*,*,*,z10_super_E1, z10_super,*,*") - (set_attr "cpu_facility" "*,*,*,*,vec,*,vec,*,*,*,*,*,*") + (set_attr "cpu_facility" "*,*,*,*,vx,*,vx,*,*,*,*,*,*") ]) (define_peephole2 @@ -2183,7 +2183,7 @@ vsteh\t%v1,%0,0" [(set_attr "op_type" "RR,RI,RX,RXY,RIL,RX,RXY,RIL,SIL,VRI,VRR,VRS,VRS,VRX,VRX") (set_attr "type" "lr,*,*,*,larl,store,store,store,*,*,*,*,*,*,*") - (set_attr "cpu_facility" "*,*,*,longdisp,z10,*,longdisp,z10,z10,vec,vec,vec,vec,vec,vec") + (set_attr "cpu_facility" "*,*,*,longdisp,z10,*,longdisp,z10,z10,vx,vx,vx,vx,vx,vx") (set_attr "z10prop" "z10_fr_E1, z10_fwd_A1, z10_super_E1, @@ -2248,7 +2248,7 @@ vsteb\t%v1,%0,0" [(set_attr "op_type" "RR,RI,RX,RXY,RX,RXY,SI,SIY,SS,VRI,VRR,VRS,VRS,VRX,VRX") (set_attr "type" "lr,*,*,*,store,store,store,store,*,*,*,*,*,*,*") - (set_attr "cpu_facility" "*,*,*,longdisp,*,longdisp,*,longdisp,*,vec,vec,vec,vec,vec,vec") + (set_attr "cpu_facility" "*,*,*,longdisp,*,longdisp,*,longdisp,*,vx,vx,vx,vx,vx,vx") (set_attr "z10prop" "z10_fr_E1, z10_fwd_A1, z10_super_E1, @@ -2476,7 +2476,7 @@ (set_attr "type" "fsimpdf,floaddf,floaddf,floaddf,floaddf,floaddf, fstoredf,fstoredf,*,lr,load,load,store,store,*,*,*,load,store") (set_attr "z10prop" "*,*,*,*,*,*,*,*,z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_fwd_A3,z10_rec,z10_rec,*,*,*,*,*") - (set_attr "cpu_facility" "z196,*,*,*,*,longdisp,*,longdisp,*,*,z10,*,z10,*,vec,vec,vec,vec,vec")]) + (set_attr "cpu_facility" "z196,*,*,*,*,longdisp,*,longdisp,*,*,z10,*,z10,*,vx,vx,vx,vx,vx")]) (define_insn "*mov_64" [(set (match_operand:DD_DF 0 "nonimmediate_operand" "=f,f,f,f,R,T,d,d,d,d,b,T,v,v,R") @@ -2502,7 +2502,7 @@ (set_attr "type""fsimpdf,fload,fload,fload, fstore,fstore,*,lr,load,load,store,store,*,load,store") (set_attr "z10prop" "*,*,*,*,*,*,z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_fwd_A3,z10_rec,z10_rec,*,*,*")
[PATCH 05/16] S/390: movdf improvements
This patch add the vector load element from immediate instruction to the movdf/dd pattern for loading a FP zero and it removes the vector instructions from the mov_64 pattern. These were pointless in there because z13 support implies DFP support so these instructions will always be matched in the mov_64dfp pattern instead. Regression tested on s390x gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.md ("mov_64dfp" DD_DF): Use vleig for loading a FP zero. ("*mov_64" DD_DF): Remove the vector instructions. These will anyway by matched by mov_64dfp. --- gcc/ChangeLog | 7 +++ gcc/config/s390/s390.md | 30 ++ 2 files changed, 21 insertions(+), 16 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index d29c06b..4dd2be6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,12 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.md ("mov_64dfp" DD_DF): Use vleig for loading a + FP zero. + ("*mov_64" DD_DF): Remove the vector instructions. These + will anyway by matched by mov_64dfp. + +2017-03-24 Andreas Krebbel + * config/s390/s390.c (s390_expand_vec_init): Enable vector load pair for all vector types with 64 bit elements. * config/s390/vx-builtins.md (V_HW_64): Move mode iterator to ... diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 75b15df..554fb37 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -2460,9 +2460,9 @@ (define_insn "*mov_64dfp" [(set (match_operand:DD_DF 0 "nonimmediate_operand" - "=f,f,f,d,f,f,R,T,d,d,d,d,b,T,v,v,d,v,R") + "=f,f,f,d,f,f,R,T,d,d,d,d,b,T,v,v,v,d,v,R") (match_operand:DD_DF 1 "general_operand" - " G,f,d,f,R,T,f,f,G,d,b,T,d,d,v,d,v,R,v"))] + " G,f,d,f,R,T,f,f,G,d,b,T,d,d,v,G,d,v,R,v"))] "TARGET_DFP" "@ lzdr\t%0 @@ -2480,19 +2480,20 @@ stgrl\t%1,%0 stg\t%1,%0 vlr\t%v0,%v1 + vleig\t%v0,0,0 vlvgg\t%v0,%1,0 vlgvg\t%0,%v1,0 vleg\t%0,%1,0 vsteg\t%1,%0,0" - [(set_attr "op_type" "RRE,RR,RRE,RRE,RX,RXY,RX,RXY,RI,RRE,RIL,RXY,RIL,RXY,VRR,VRS,VRS,VRX,VRX") + [(set_attr "op_type" "RRE,RR,RRE,RRE,RX,RXY,RX,RXY,RI,RRE,RIL,RXY,RIL,RXY,VRR,VRI,VRS,VRS,VRX,VRX") (set_attr "type" "fsimpdf,floaddf,floaddf,floaddf,floaddf,floaddf, - fstoredf,fstoredf,*,lr,load,load,store,store,*,*,*,load,store") - (set_attr "z10prop" "*,*,*,*,*,*,*,*,z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_fwd_A3,z10_rec,z10_rec,*,*,*,*,*") - (set_attr "cpu_facility" "z196,*,*,*,*,longdisp,*,longdisp,*,*,z10,*,z10,*,vx,vx,vx,vx,vx")]) + fstoredf,fstoredf,*,lr,load,load,store,store,*,*,*,*,load,store") + (set_attr "z10prop" "*,*,*,*,*,*,*,*,z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_fwd_A3,z10_rec,z10_rec,*,*,*,*,*,*") + (set_attr "cpu_facility" "z196,*,*,*,*,longdisp,*,longdisp,*,*,z10,*,z10,*,vx,vx,vx,vx,vx,vx")]) (define_insn "*mov_64" - [(set (match_operand:DD_DF 0 "nonimmediate_operand" "=f,f,f,f,R,T,d,d,d,d,b,T,v,v,R") -(match_operand:DD_DF 1 "general_operand" " G,f,R,T,f,f,G,d,b,T,d,d,v,R,v"))] + [(set (match_operand:DD_DF 0 "nonimmediate_operand" "=f,f,f,f,R,T,d,d,d,d,b,T") +(match_operand:DD_DF 1 "general_operand" " G,f,R,T,f,f,G,d,b,T,d,d"))] "TARGET_ZARCH" "@ lzdr\t%0 @@ -2506,15 +2507,12 @@ lgrl\t%0,%1 lg\t%0,%1 stgrl\t%1,%0 - stg\t%1,%0 - vlr\t%v0,%v1 - vleg\t%v0,%1,0 - vsteg\t%v1,%0,0" - [(set_attr "op_type" "RRE,RR,RX,RXY,RX,RXY,RI,RRE,RIL,RXY,RIL,RXY,VRR,VRX,VRX") + stg\t%1,%0" + [(set_attr "op_type" "RRE,RR,RX,RXY,RX,RXY,RI,RRE,RIL,RXY,RIL,RXY") (set_attr "type""fsimpdf,fload,fload,fload, - fstore,fstore,*,lr,load,load,store,store,*,load,store") - (set_attr "z10prop" "*,*,*,*,*,*,z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_fwd_A3,z10_rec,z10_rec,*,*,*") - (set_attr "cpu_facility" "z196,*,*,longdisp,*,longdisp,*,*,z10,*,z10,*,vx,vx,vx")]) +fstore,fstore,*,lr,load,load,store,store") + (set_attr "z10prop" "*,*,*,*,*,*,z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_fwd_A3,z10_rec,z10_rec") + (set_attr "cpu_facility" "z196,*,*,longdisp,*,longdisp,*,*,z10,*,z10,*")]) (define_insn "*mov_31" [(set (match_operand:DD_DF 0 "nonimmediate_operand" -- 2.9.1
[PATCH 04/16] S/390: movsf/sd pattern fixes.
The SD/SFmode move pattern used a wrong mnemonic for vector load element. On the vector load element instruction was an operand missing. Regression tested on s390x. 2017-03-24 Andreas Krebbel * config/s390/s390.md ("mov" SD_SF): Change vleg/vsteg to vlef/vstef. Add missing operand to vleif. --- gcc/config/s390/s390.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 555a779..75b15df 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -2613,11 +2613,11 @@ st\t%1,%0 sty\t%1,%0 vlr\t%v0,%v1 - vleif\t%v0,0 + vleif\t%v0,0,0 vlvgf\t%v0,%1,0 vlgvf\t%0,%v1,0 - vleg\t%0,%1,0 - vsteg\t%1,%0,0" + vlef\t%0,%1,0 + vstef\t%1,%0,0" [(set_attr "op_type" "RRE,RR,RR,RXE,RX,RXY,RX,RXY,RI,RR,RIL,RX,RXY,RIL,RX,RXY,VRR,VRI,VRS,VRS,VRX,VRX") (set_attr "type" "fsimpsf,fsimpsf,fload,fload,fload,fload, fstore,fstore,*,lr,load,load,load,store,store,store,*,*,*,*,load,store") -- 2.9.1
[PATCH 06/16] S/390: Move and rename vector check.
Move the target support routine for the vector facility to the common code file. This is required to enable the generic vectorization tests on S/390. While doing this the too generic name for the check (vector) is changed to s390_vx. The renaming required to modify all the testcases currently using that check. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/s390.exp (check_effective_target_vector): Include target-supports.exp and move target_vector check routine ... * lib/target-supports.exp (check_effective_target_s390_vx): ... to here and rename it. * gcc.target/s390/htm-builtins-z13-1.c: Rename effective target check from vector to s390_vx. * gcc.target/s390/target-attribute/tpragma-struct-vx-1.c: Likewise. * gcc.target/s390/target-attribute/tpragma-struct-vx-2.c: Likewise. * gcc.target/s390/vector/stpcpy-1.c: Likewise. * gcc.target/s390/vector/vec-abi-vararg-1.c: Likewise. * gcc.target/s390/vector/vec-clobber-1.c: Likewise. * gcc.target/s390/vector/vec-genbytemask-1.c: Likewise. * gcc.target/s390/vector/vec-genmask-1.c: Likewise. * gcc.target/s390/vector/vec-nopeel-1.c: Likewise. * gcc.target/s390/vector/vec-vrepi-1.c: Likewise. --- gcc/testsuite/ChangeLog | 19 +++ gcc/testsuite/gcc.target/s390/htm-builtins-z13-1.c| 2 +- gcc/testsuite/gcc.target/s390/s390.exp| 16 +--- .../s390/target-attribute/tpragma-struct-vx-1.c | 2 +- .../s390/target-attribute/tpragma-struct-vx-2.c | 2 +- gcc/testsuite/gcc.target/s390/vector/stpcpy-1.c | 2 +- .../gcc.target/s390/vector/vec-abi-vararg-1.c | 2 +- gcc/testsuite/gcc.target/s390/vector/vec-clobber-1.c | 2 +- .../gcc.target/s390/vector/vec-genbytemask-1.c| 2 +- gcc/testsuite/gcc.target/s390/vector/vec-genmask-1.c | 2 +- gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c | 2 +- gcc/testsuite/gcc.target/s390/vector/vec-vrepi-1.c| 2 +- gcc/testsuite/lib/target-supports.exp | 18 ++ 13 files changed, 48 insertions(+), 25 deletions(-) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 727bb45..0f0877c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,24 @@ 2017-03-24 Andreas Krebbel + * gcc.target/s390/s390.exp (check_effective_target_vector): + Include target-supports.exp and move target_vector check routine + ... + * lib/target-supports.exp (check_effective_target_s390_vx): ... to + here and rename it. + * gcc.target/s390/htm-builtins-z13-1.c: Rename effective target + check from vector to s390_vx. + * gcc.target/s390/target-attribute/tpragma-struct-vx-1.c: Likewise. + * gcc.target/s390/target-attribute/tpragma-struct-vx-2.c: Likewise. + * gcc.target/s390/vector/stpcpy-1.c: Likewise. + * gcc.target/s390/vector/vec-abi-vararg-1.c: Likewise. + * gcc.target/s390/vector/vec-clobber-1.c: Likewise. + * gcc.target/s390/vector/vec-genbytemask-1.c: Likewise. + * gcc.target/s390/vector/vec-genmask-1.c: Likewise. + * gcc.target/s390/vector/vec-nopeel-1.c: Likewise. + * gcc.target/s390/vector/vec-vrepi-1.c: Likewise. + +2017-03-24 Andreas Krebbel + * gcc.target/s390/vector/vec-init-2.c: New test. 2017-03-24 Andreas Krebbel diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-z13-1.c b/gcc/testsuite/gcc.target/s390/htm-builtins-z13-1.c index 7879c36..aaca1f4 100644 --- a/gcc/testsuite/gcc.target/s390/htm-builtins-z13-1.c +++ b/gcc/testsuite/gcc.target/s390/htm-builtins-z13-1.c @@ -1,7 +1,7 @@ /* Verify if VRs are saved and restored. */ /* { dg-do run } */ -/* { dg-require-effective-target vector } */ +/* { dg-require-effective-target s390_vx } */ /* { dg-options "-O3 -march=z13 -mzarch" } */ typedef int __attribute__((vector_size(16))) v4si; diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp index cab68e8..d7a61f4 100644 --- a/gcc/testsuite/gcc.target/s390/s390.exp +++ b/gcc/testsuite/gcc.target/s390/s390.exp @@ -26,6 +26,7 @@ if ![istarget s390*-*-*] then { # Load support procs. load_lib gcc-dg.exp +load_lib target-supports.exp # Return 1 if the the assembler understands .machine and .machinemode. The # target attribute needs that feature to work. @@ -55,21 +56,6 @@ proc check_effective_target_htm { } { }] "-march=zEC12 -mzarch" ] } { return 0 } else { return 1 } } -# Return 1 if vector (va - vector add) instructions are understood by -# the assembler and can be executed. This also covers checking for -# the VX kernel feature. A kernel without that feature does not -# enable the vector facility and the following check will die with a -# signal. -proc check_effective_target_vector { } { -if { ![check_runtime s390_chec
[PATCH 07/16] S/390: Use wfc for scalar vector compares
The z13 vector support used the vector style comparison instructions also for the scalar compares in vector registers. However, it is much more convenient to just use the compare scalar instruction for that purpose. The advantage is that this instruction generates a CC result as our compares usually do. So this results in quite some code to be removed from the backend. Regression tested on s390x. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/2964.md: Remove the single element vector compare instructions which are no longer used. * config/s390/s390.c (s390_select_ccmode): Remove handling of vector CCmodes. (s390_canonicalize_comparison): Remove handling of DFmode compares. (s390_expand_vec_compare_scalar): Remove function. (s390_emit_compare): Don't call s390_expand_vec_compare_scalar. * config/s390/s390.md ("*vec_cmpdf_cconly"): Remove pattern. ("*cmp_ccs"): Add wfcdb instruction. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/vector/vec-scalar-cmp-1.c: Adjust for the comparison instructions used from now on. --- gcc/ChangeLog | 14 +++ gcc/config/s390/2964.md| 8 +- gcc/config/s390/s390.c | 102 + gcc/config/s390/s390.md| 26 ++ gcc/testsuite/ChangeLog| 5 + .../gcc.target/s390/vector/vec-scalar-cmp-1.c | 31 +-- 6 files changed, 57 insertions(+), 129 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 4dd2be6..fef571c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,19 @@ 2017-03-24 Andreas Krebbel + * config/s390/2964.md: Remove the single element vector compare + instructions which are no longer used. + * config/s390/s390.c (s390_select_ccmode): Remove handling of + vector CCmodes. + (s390_canonicalize_comparison): Remove handling of DFmode + compares. + (s390_expand_vec_compare_scalar): Remove function. + (s390_emit_compare): Don't call s390_expand_vec_compare_scalar. + * config/s390/s390.md ("*vec_cmpdf_cconly"): Remove + pattern. + ("*cmp_ccs"): Add wfcdb instruction. + +2017-03-24 Andreas Krebbel + * config/s390/s390.md ("mov_64dfp" DD_DF): Use vleig for loading a FP zero. ("*mov_64" DD_DF): Remove the vector instructions. These diff --git a/gcc/config/s390/2964.md b/gcc/config/s390/2964.md index 374e2e3..d9b6729 100644 --- a/gcc/config/s390/2964.md +++ b/gcc/config/s390/2964.md @@ -88,7 +88,7 @@ vsh,vsl,vsq,lxebr,cdtr,fiebr,vupllb,vupllf,vupllh,vmrhb,madbr,vtm,vmrhf,\ vmrhg,vmrhh,axtr,fiebra,vleb,cxtr,vlef,vleg,vleh,vpkf,vpkg,vpkh,vmlob,vmlof,\ vmloh,lxdb,ldeb,mdtr,vceqfs,adb,wflndb,lxeb,vn,vo,vchlb,vx,mxtr,vchlf,vchlg,\ vchlh,vfcedbs,vfcedb,vceqgs,cxbr,msdbr,vcdgb,debr,vceqhs,meeb,lcxbr,vavglb,\ -vavglf,vavglg,vavglh,wfcedbs,vmrlb,vmrlf,vmrlg,vmrlh,wfchedbs,vmxb,tcdb,\ +vavglf,vavglg,vavglh,vmrlb,vmrlf,vmrlg,vmrlh,vmxb,tcdb,\ vmahh,vsrlb,wcgdb,lcdbr,vistrbs,vrepb,wfmdb,vrepf,vrepg,vreph,ler,wcdlgb,\ ley,vistrb,vistrf,vistrh,tceb,wfsqdb,sqeb,vsumqf,vsumqg,vesrlb,vfeezbs,\ maebr,vesrlf,vesrlg,vesrlh,vmeb,vmef,vmeh,meebr,vflcdb,wfmadb,vperm,sxtr,\ @@ -96,7 +96,7 @@ vclzf,vgm,vgmb,vgmf,vgmg,vgmh,tdcxt,vzero,msebr,veslb,veslf,veslg,vfenezb,\ vfenezf,vfenezh,vistrfs,vchf,vchg,vchh,vmhb,vmhf,vmhh,cdb,veslvb,ledbr,\ veslvf,veslvg,veslvh,wclgdb,vfmdb,vmnlb,vmnlf,vmnlg,vmnlh,vclzb,vfeezfs,\ vclzg,vclzh,mdb,vmxlb,vmxlf,vmxlg,vmxlh,ltdtr,vsbcbiq,ceb,wfddb,sebr,vistrhs,\ -lxdtr,lcebr,vab,vaf,vag,vah,ltxtr,vlpf,vlpg,vsegb,vaq,vsegf,vsegh,wfchdbs,\ +lxdtr,lcebr,vab,vaf,vag,vah,ltxtr,vlpf,vlpg,vsegb,vaq,vsegf,vsegh,\ sdtr,cdbr,vfeezhs,le,wldeb,vfmadb,vchlbs,vacccq,vmaleb,vsel,vmalef,vmaleh,\ vflndb,mdbr,vmlb,wflpdb,ldetr,vpksfs,vpksf,vpksg,vpksh,sqdb,mxbr,sqdbr,\ vmaeb,veslh,vmaef,vpklsf,vpklsg,vpklsh,verllb,vchb,ddtr,verllf,verllg,verllh,\ @@ -164,7 +164,7 @@ vsl,vsq,lxebr,cdtr,fiebr,vupllb,vupllf,vupllh,vmrhb,madbr,vtm,vmrhf,vmrhg,\ vmrhh,axtr,fiebra,vleb,cxtr,vlef,vleg,vleh,vpkf,vpkg,vpkh,vmlob,vmlof,vmloh,\ lxdb,ldeb,vceqfs,adb,wflndb,lxeb,vn,vo,vchlb,vx,vchlf,vchlg,vchlh,vfcedbs,\ vfcedb,vceqgs,cxbr,msdbr,vcdgb,vceqhs,meeb,lcxbr,vavglb,vavglf,vavglg,vavglh,\ -wfcedbs,vmrlb,vmrlf,vmrlg,vmrlh,wfchedbs,vmxb,tcdb,vmahh,vsrlb,wcgdb,lcdbr,\ +vmrlb,vmrlf,vmrlg,vmrlh,vmxb,tcdb,vmahh,vsrlb,wcgdb,lcdbr,\ vistrbs,vrepb,wfmdb,vrepf,vrepg,vreph,ler,wcdlgb,ley,vistrb,vistrf,vistrh,\ tceb,vsumqf,vsumqg,vesrlb,vfeezbs,maebr,vesrlf,vesrlg,vesrlh,vmeb,vmef,\ vmeh,meebr,vflcdb,wfmadb,vperm,sxtr,vclzf,vgm,vgmb,vgmf,vgmg,vgmh,tdcxt,\ @@ -172,7 +172,7 @@ vzero,msebr,veslb,veslf,veslg,vfenezb,vfenezf,vfenezh,vistrfs,vchf,vchg,\ vchh,vmhb,vmhf,vmhh,cdb,veslvb,ledbr,veslvf,veslvg,veslvh,wclgdb,vfmdb,\ vmnlb,vmnlf,vmnlg,vmnlh,vclzb,vfeezfs,vclzg,vclzh,mdb,
[PATCH 03/16] S/390: vec_init improvements
This enables the vec_init pattern also for V4SF, V1TI, and V1TF. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/vector/vec-init-2.c: New test. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.c (s390_expand_vec_init): Enable vector load pair for all vector types with 64 bit elements. * config/s390/vx-builtins.md (V_HW_64): Move mode iterator to ... * config/s390/vector.md (V_HW_64): ... here. (V_128_NOSINGLE): New mode iterator. ("vec_init"): Use V_128 as mode iterator. ("*vec_splat"): Use V_128_NOSINGLE mode iterator. ("*vec_tf_to_v1tf", "*vec_ti_to_v1ti"): New pattern definitions. ("*vec_load_pairv2di"): Change to ... ("*vec_load_pair"): ... this one. --- gcc/ChangeLog | 13 gcc/config/s390/s390.c| 5 +- gcc/config/s390/vector.md | 72 +-- gcc/config/s390/vx-builtins.md| 1 - gcc/testsuite/ChangeLog | 4 ++ gcc/testsuite/gcc.target/s390/vector/vec-init-2.c | 48 +++ 6 files changed, 122 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-init-2.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 292e946..d29c06b 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,18 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.c (s390_expand_vec_init): Enable vector load + pair for all vector types with 64 bit elements. + * config/s390/vx-builtins.md (V_HW_64): Move mode iterator to ... + * config/s390/vector.md (V_HW_64): ... here. + (V_128_NOSINGLE): New mode iterator. + ("vec_init"): Use V_128 as mode iterator. + ("*vec_splat"): Use V_128_NOSINGLE mode iterator. + ("*vec_tf_to_v1tf", "*vec_ti_to_v1ti"): New pattern definitions. + ("*vec_load_pairv2di"): Change to ... + ("*vec_load_pair"): ... this one. + +2017-03-24 Andreas Krebbel + * config/s390/constraints.md: Add comments. (jKK): Reject element sizes > 8 bytes. * config/s390/s390.c (s390_split_ok_p): Enable splitting also for diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index f3cebd6..65a7546 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -6617,7 +6617,10 @@ s390_expand_vec_init (rtx target, rtx vals) return; } - if (all_regs && REG_P (target) && n_elts == 2 && inner_mode == DImode) + if (all_regs + && REG_P (target) + && n_elts == 2 + && GET_MODE_SIZE (inner_mode) == 8) { /* Use vector load pair. */ emit_insn (gen_rtx_SET (target, diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 38905e8..7ddeb9a 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -31,6 +31,9 @@ ; independently e.g. vcond (define_mode_iterator V_HW [V16QI V8HI V4SI V2DI V2DF]) (define_mode_iterator V_HW2 [V16QI V8HI V4SI V2DI V2DF]) + +(define_mode_iterator V_HW_64 [V2DI V2DF]) + ; Including TI for instructions that support it (va, vn, ...) (define_mode_iterator VT_HW [V16QI V8HI V4SI V2DI V2DF V1TI TI]) @@ -53,6 +56,8 @@ (define_mode_iterator V_64 [V8QI V4HI V2SI V2SF V1DI V1DF]) (define_mode_iterator V_128 [V16QI V8HI V4SI V4SF V2DI V2DF V1TI V1TF]) +(define_mode_iterator V_128_NOSINGLE [V16QI V8HI V4SI V4SF V2DI V2DF]) + ; A blank for vector modes and a * for TImode. This is used to hide ; the TImode expander name in case it is defined already. See addti3 ; for an example. @@ -437,9 +442,9 @@ "vlgv\t%0,%v1,%Y3(%2)" [(set_attr "op_type" "VRS")]) -(define_expand "vec_init" - [(match_operand:V_HW 0 "register_operand" "") - (match_operand:V_HW 1 "nonmemory_operand" "")] +(define_expand "vec_init" + [(match_operand:V_128 0 "register_operand" "") + (match_operand:V_128 1 "nonmemory_operand" "")] "TARGET_VX" { s390_expand_vec_init (operands[0], operands[1]); @@ -449,20 +454,20 @@ ; Replicate from vector element ; vrepb, vreph, vrepf, vrepg (define_insn "*vec_splat" - [(set (match_operand:V_HW 0 "register_operand" "=v") - (vec_duplicate:V_HW + [(set (match_operand:V_128_NOSINGLE 0 "register_operand" "=v") + (vec_duplicate:V_128_NOSINGLE (vec_select: - (match_operand:V_HW 1 "register_operand" "v") + (match_operand:V_128_NOSINGLE 1 "register_operand" "v") (parallel [(match_operand:QI 2 "const_mask_operand" "C")]] - "TARGET_VX && UINTVAL (operands[2]) < GET_MODE_NUNITS (mode)" + "TARGET_VX && UINTVAL (operands[2]) < GET_MODE_NUNITS (mode)" "vrep\t%v0,%v1,%2" [(set_attr "op_type" "VRI")]) ; vlrepb, vlreph, vlrepf, vlrepg, vrepib, vrepih, vrepif, vrepig, vrepb, vreph, vrepf, vrepg (define_insn "*vec_splats" - [(set (match_operand:V_HW 0 "register_operand" "=v,v,v,v") - (vec
[PATCH 13/16] S/390: arch12: Add indirect branch pattern
This adds support for the branch indirect instruction. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.md ("indirect_jump"): Turn insn definition into expander. ("*indirect_jump", "*indirect2_jump"): New pattern definitions. --- gcc/ChangeLog | 6 ++ gcc/config/s390/s390.md | 48 +--- 2 files changed, 43 insertions(+), 11 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a48b743..b3f3d95 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,11 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.md ("indirect_jump"): Turn insn definition into + expander. + ("*indirect_jump", "*indirect2_jump"): New pattern definitions. + +2017-03-24 Andreas Krebbel + * config/s390/s390.c (s390_expand_vec_init): Use vllezl instruction if possible. * config/s390/vector.md (vec_halfnumelts): New mode diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 53c8fed..32753ef 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -9509,20 +9509,46 @@ ; indirect-jump instruction pattern(s). ; -(define_insn "indirect_jump" - [(set (pc) (match_operand 0 "address_operand" "ZR"))] +(define_expand "indirect_jump" + [(set (pc) (match_operand 0 "nonimmediate_operand" ""))] "" { - if (get_attr_op_type (insn) == OP_TYPE_RR) -return "br\t%0"; + if (address_operand (operands[0], GET_MODE (operands[0]))) +; + else if (TARGET_ARCH12 + && GET_MODE (operands[0]) == Pmode + && memory_operand (operands[0], Pmode)) +; else -return "b\t%a0"; -} - [(set (attr "op_type") -(if_then_else (match_operand 0 "register_operand" "") - (const_string "RR") (const_string "RX"))) - (set_attr "type" "branch") - (set_attr "atype" "agen")]) +operands[0] = force_reg (Pmode, operands[0]); +}) + +(define_insn "*indirect_jump" + [(set (pc) + (match_operand 0 "address_operand" "a,ZR"))] + "" + "@ + br\t%0 + b\t%a0" + [(set_attr "op_type" "RR,RX") + (set_attr "type" "branch") + (set_attr "atype" "agen") + (set_attr "cpu_facility" "*")]) + +; FIXME: LRA does not appear to be able to deal with MEMs being +; checked against address constraints like ZR above. So make this a +; separate pattern for now. +(define_insn "*indirect2_jump" + [(set (pc) + (match_operand 0 "nonimmediate_operand" "a,T"))] + "" + "@ + br\t%0 + bi\t%0" + [(set_attr "op_type" "RR,RXY") + (set_attr "type" "branch") + (set_attr "atype" "agen") + (set_attr "cpu_facility" "*,arch12")]) ; ; casesi instruction pattern(s). -- 2.9.1
[PATCH 10/16] S/390: arch12: Add support for new vector bit operations.
This patch adds support for the new bit operations introduced with arch12. The patch also renames the one complement pattern to the proper RTL standard name. 2017-03-24 Andreas Krebbel * config/s390/s390.c (s390_rtx_costs): Return low costs for the canonical form of ~AND to make sure the new instruction will be used. * config/s390/vector.md ("notand3", "ior_not3") ("notxor3"): Add new pattern definitions. ("*not"): Rename to ... ("one_cmpl2"): ... this. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/vxe/bitops-1.c: New test. --- gcc/config/s390/s390.c | 15 gcc/config/s390/vector.md| 31 +++-- gcc/testsuite/ChangeLog | 4 +++ gcc/testsuite/gcc.target/s390/vxe/bitops-1.c | 52 4 files changed, 100 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vxe/bitops-1.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index c94edcc..416a15e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -3373,6 +3373,21 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = COSTS_N_INSNS (2); return true; } + + /* ~AND on a 128 bit mode. This can be done using a vector +instruction. */ + if (TARGET_VXE + && GET_CODE (XEXP (x, 0)) == NOT + && GET_CODE (XEXP (x, 1)) == NOT + && REG_P (XEXP (XEXP (x, 0), 0)) + && REG_P (XEXP (XEXP (x, 1), 0)) + && GET_MODE_SIZE (GET_MODE (XEXP (XEXP (x, 0), 0))) == 16 + && s390_hard_regno_mode_ok (VR0_REGNUM, + GET_MODE (XEXP (XEXP (x, 0), 0 + { + *total = COSTS_N_INSNS (1); + return true; + } /* fallthrough */ case ASHIFT: case ASHIFTRT: diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 7ddeb9a..68a8ed0 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -655,6 +655,15 @@ "vn\t%v0,%v1,%v2" [(set_attr "op_type" "VRR")]) +; Vector not and + +(define_insn "notand3" + [(set (match_operand:VT 0 "register_operand" "=v") + (ior:VT (not:VT (match_operand:VT 1 "register_operand" "%v")) + (not:VT (match_operand:VT 2 "register_operand" "v"] + "TARGET_VXE" + "vnn\t%v0,%v1,%v2" + [(set_attr "op_type" "VRR")]) ; Vector or @@ -666,6 +675,15 @@ "vo\t%v0,%v1,%v2" [(set_attr "op_type" "VRR")]) +; Vector or with complement + +(define_insn "ior_not3" + [(set (match_operand:VT 0 "register_operand" "=v") + (ior:VT (not:VT (match_operand:VT 2 "register_operand" "v")) + (match_operand:VT 1 "register_operand" "%v")))] + "TARGET_VXE" + "voc\t%v0,%v1,%v2" + [(set_attr "op_type" "VRR")]) ; Vector xor @@ -677,9 +695,18 @@ "vx\t%v0,%v1,%v2" [(set_attr "op_type" "VRR")]) +; Vector not xor + +(define_insn "notxor3" + [(set (match_operand:VT 0 "register_operand" "=v") + (not:VT (xor:VT (match_operand:VT 1 "register_operand" "%v") + (match_operand:VT 2 "register_operand" "v"] + "TARGET_VXE" + "vnx\t%v0,%v1,%v2" + [(set_attr "op_type" "VRR")]) -; Bitwise inversion of a vector - used for vec_cmpne -(define_insn "*not" +; Bitwise inversion of a vector +(define_insn "one_cmpl2" [(set (match_operand:VT 0 "register_operand" "=v") (not:VT (match_operand:VT 1 "register_operand" "v")))] "TARGET_VX" diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 9ca13ab..bbdd3c8 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,9 @@ 2017-03-24 Andreas Krebbel + * gcc.target/s390/vxe/bitops-1.c: New test. + +2017-03-24 Andreas Krebbel + * gcc.target/s390/s390.exp: Run tests in arch12 and vxe dirs. * lib/target-supports.exp: Add effective target check s390_vxe. diff --git a/gcc/testsuite/gcc.target/s390/vxe/bitops-1.c b/gcc/testsuite/gcc.target/s390/vxe/bitops-1.c new file mode 100644 index 000..bdf7457 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vxe/bitops-1.c @@ -0,0 +1,52 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mzarch -march=arch12 --save-temps" } */ +/* { dg-require-effective-target s390_vxe } */ + +typedef unsigned int uv4si __attribute__((vector_size(16))); + +uv4si __attribute__((noinline)) +not_xor (uv4si a, uv4si b) +{ + return ~(a ^ b); +} +/* { dg-final { scan-assembler-times "vnx\t%v24,%v24,%v26" 1 } } */ + +uv4si __attribute__((noinline)) +not_and (uv4si a, uv4si b) +{ + return ~(a & b); +} +/* { dg-final { scan-assembler-times "vnn\t%v24,%v24,%v26" 1 } } */ + +uv4si __attribute__((noinline)) +or_not (uv4si a, uv4si b) +{ + return a | ~b; +} +/* { dg-final { scan-assembler-times "voc\t%v24,%v24,%v26" 1 } } */ + + +int +main () +{ +
[PATCH 11/16] S/390: arch12: New vector popcount variants
arch12 provides pop count vector instructions for bigger elements than just chars. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/vxe/popcount-1.c: New test. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/vector.md ("popcountv16qi2", "popcountv8hi2") ("popcountv4si2", "popcountv2di2"): Rename to ... ("popcount2", "popcountv8hi2_vx", "popcountv4si2_vx") ("popcountv2di2_vx"): ... these and add !TARGET_VXE to the condition. ("popcount2_vxe"): New pattern. --- gcc/ChangeLog | 9 +++ gcc/config/s390/vector.md | 38 --- gcc/testsuite/ChangeLog| 4 ++ gcc/testsuite/gcc.target/s390/vxe/popcount-1.c | 88 ++ 4 files changed, 131 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vxe/popcount-1.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 89e7906..d516b4d 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,14 @@ 2017-03-24 Andreas Krebbel + * config/s390/vector.md ("popcountv16qi2", "popcountv8hi2") + ("popcountv4si2", "popcountv2di2"): Rename to ... + ("popcount2", "popcountv8hi2_vx", "popcountv4si2_vx") + ("popcountv2di2_vx"): ... these and add !TARGET_VXE to the + condition. + ("popcount2_vxe"): New pattern. + +2017-03-24 Andreas Krebbel + * common/config/s390/s390-common.c (processor_flags_table): Add arch12. * config.gcc: Add arch12. diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 68a8ed0..d4c0e95 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -715,11 +715,33 @@ ; Vector population count -(define_insn "popcountv16qi2" +(define_expand "popcount2" + [(set (match_operand:VI_HW0 "register_operand" "=v") + (unspec:VI_HW [(match_operand:VI_HW 1 "register_operand" "v")] + UNSPEC_POPCNT))] + "TARGET_VX" +{ + if (TARGET_VXE) +emit_insn (gen_popcount2_vxe (operands[0], operands[1])); + else +emit_insn (gen_popcount2_vx (operands[0], operands[1])); + DONE; +}) + +; vpopctb, vpopcth, vpopctf, vpopctg +(define_insn "popcount2_vxe" + [(set (match_operand:VI_HW0 "register_operand" "=v") + (unspec:VI_HW [(match_operand:VI_HW 1 "register_operand" "v")] + UNSPEC_POPCNT))] + "TARGET_VXE" + "vpopct\t%v0,%v1" + [(set_attr "op_type" "VRR")]) + +(define_insn "popcountv16qi2_vx" [(set (match_operand:V16QI0 "register_operand" "=v") (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")] UNSPEC_POPCNT))] - "TARGET_VX" + "TARGET_VX && !TARGET_VXE" "vpopct\t%v0,%v1,0" [(set_attr "op_type" "VRR")]) @@ -729,7 +751,7 @@ ; of the result, add it to the result and extend it to halfword ; element size (unpack). -(define_expand "popcountv8hi2" +(define_expand "popcountv8hi2_vx" [(set (match_dup 2) (unspec:V16QI [(subreg:V16QI (match_operand:V8HI 1 "register_operand" "v") 0)] UNSPEC_POPCNT)) @@ -761,7 +783,7 @@ (and:V8HI (subreg:V8HI (match_dup 2) 0) (subreg:V8HI (match_dup 3) 0))) ] - "TARGET_VX" + "TARGET_VX && !TARGET_VXE" { operands[2] = gen_reg_rtx (V16QImode); operands[3] = gen_reg_rtx (V16QImode); @@ -769,20 +791,20 @@ operands[5] = CONST0_RTX (V16QImode); }) -(define_expand "popcountv4si2" +(define_expand "popcountv4si2_vx" [(set (match_dup 2) (unspec:V16QI [(subreg:V16QI (match_operand:V4SI 1 "register_operand" "v") 0)] UNSPEC_POPCNT)) (set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_dup 2) (match_dup 3)] UNSPEC_VEC_VSUM))] - "TARGET_VX" + "TARGET_VX && !TARGET_VXE" { operands[2] = gen_reg_rtx (V16QImode); operands[3] = force_reg (V16QImode, CONST0_RTX (V16QImode)); }) -(define_expand "popcountv2di2" +(define_expand "popcountv2di2_vx" [(set (match_dup 2) (unspec:V16QI [(subreg:V16QI (match_operand:V2DI 1 "register_operand" "v") 0)] UNSPEC_POPCNT)) @@ -792,7 +814,7 @@ (set (match_operand:V2DI 0 "register_operand" "=v") (unspec:V2DI [(match_dup 3) (match_dup 5)] UNSPEC_VEC_VSUMG))] - "TARGET_VX" + "TARGET_VX && !TARGET_VXE" { operands[2] = gen_reg_rtx (V16QImode); operands[3] = gen_reg_rtx (V4SImode); diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index bbdd3c8..6d178c5 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,9 @@ 2017-03-24 Andreas Krebbel + * gcc.target/s390/vxe/popcount-1.c: New test. + +2017-03-24 Andreas Krebbel + * gcc.target/s390/vxe/bitops-1.c: New test. 2017-03-24 Andreas Krebbel diff --git a/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c b/gcc/testsuite/
[PATCH 08/16] S/390: Rearrange fixuns_trunc pattern definitions.
This reworks the fixuns_trunc* patterns a bit which got quite confusing after adding z13 support. Now we just have a single RTL standard name expander definition ("fixuns_trunc2") which then multiplexes to either the emulation variants *_emu or the hardware implementations. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.md ("fixuns_truncdddi2", "fixuns_trunctddi2") ("fixuns_trunc2"): Merge into ... ("fixuns_trunc2"): New expander. ("fixuns_trunc2", "fixuns_truncsi2"): Rename expanders to ... ("fixuns_trunc2_emu") ("fixuns_truncdddi2_emu"): ... these. ("fixuns_truncsi2_emu"): New expander. ("*fixuns_truncdfdi2_z13"): Rename to ... ("*fixuns_truncdfdi2_vx"): ... this. --- gcc/ChangeLog | 18 gcc/config/s390/s390.md | 253 +++- 2 files changed, 161 insertions(+), 110 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fef571c..779101e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,23 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.md + ("fixuns_truncdddi2", "fixuns_trunctddi2") + ("fixuns_trunc2"): Merge into ... + ("fixuns_trunc2"): New expander. + + ("fixuns_trunc2", "fixuns_truncsi2"): + Rename expanders to ... + + ("fixuns_trunc2_emu") + ("fixuns_truncdddi2_emu"): ... these. + + ("fixuns_truncsi2_emu"): New expander. + + ("*fixuns_truncdfdi2_z13"): Rename to ... + ("*fixuns_truncdfdi2_vx"): ... this. + +2017-03-24 Andreas Krebbel + * config/s390/2964.md: Remove the single element vector compare instructions which are no longer used. * config/s390/s390.c (s390_select_ccmode): Remove handling of diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index e72d5be..d4d3781 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -4735,152 +4735,185 @@ "operands[2] = gen_lowpart (QImode, operands[0]);") ; -; fixuns_trunc(dd|td)di2 instruction pattern(s). +; fixuns_trunc(dd|td|sf|df|tf)(si|di)2 expander ; -(define_expand "fixuns_truncdddi2" +; This is the only entry point for fixuns_trunc. It multiplexes the +; expansion to either the *_emu expanders below for pre z196 machines +; or emits the default pattern otherwise. +(define_expand "fixuns_trunc2" [(parallel -[(set (match_operand:DI 0 "register_operand" "") - (unsigned_fix:DI (match_operand:DD 1 "register_operand" ""))) - (unspec:DI [(const_int DFP_RND_TOWARD_0)] UNSPEC_ROUND) +[(set (match_operand:GPR 0 "register_operand" "") + (unsigned_fix:GPR (match_operand:FP 1 "register_operand" ""))) + (unspec:GPR [(match_dup 2)] UNSPEC_ROUND) (clobber (reg:CC CC_REGNUM))])] - - "TARGET_HARD_DFP" + "TARGET_HARD_FLOAT" { if (!TARGET_Z196) { - rtx_code_label *label1 = gen_label_rtx (); - rtx_code_label *label2 = gen_label_rtx (); - rtx temp = gen_reg_rtx (TDmode); - REAL_VALUE_TYPE cmp, sub; - - decimal_real_from_string (&cmp, "9223372036854775808.0"); /* 2^63 */ - decimal_real_from_string (&sub, "18446744073709551616.0"); /* 2^64 */ - - /* 2^63 can't be represented as 64bit DFP number with full precision. The - solution is doing the check and the subtraction in TD mode and using a - TD -> DI convert afterwards. */ - emit_insn (gen_extendddtd2 (temp, operands[1])); - temp = force_reg (TDmode, temp); - emit_cmp_and_jump_insns (temp, - const_double_from_real_value (cmp, TDmode), - LT, NULL_RTX, VOIDmode, 0, label1); - emit_insn (gen_subtd3 (temp, temp, - const_double_from_real_value (sub, TDmode))); - emit_insn (gen_fix_trunctddi2_dfp (operands[0], temp, -GEN_INT (DFP_RND_TOWARD_MINF))); - emit_jump (label2); - - emit_label (label1); - emit_insn (gen_fix_truncdddi2_dfp (operands[0], operands[1], -GEN_INT (DFP_RND_TOWARD_0))); - emit_label (label2); + /* We don't provide emulation for TD|DD->SI. */ + if (GET_MODE_CLASS (mode) == MODE_DECIMAL_FLOAT + && mode == SImode) + FAIL; + emit_insn (gen_fixuns_trunc2_emu (operands[0], + operands[1])); DONE; } + + if (GET_MODE_CLASS (mode) == MODE_DECIMAL_FLOAT) +operands[2] = GEN_INT (DFP_RND_TOWARD_0); + else +operands[2] = GEN_INT (BFP_RND_TOWARD_0); +}) + +; (sf|df|tf)->unsigned (si|di) + +; Emulate the unsigned conversion with the signed version for pre z196 +; machines. +(define_expand "fixuns_trunc2_emu" + [(parallel +[(set (match_operand:GPR 0 "register_operand" "") + (unsigned_fix:GPR (match_operand:BFP 1 "register_operand" ""))) + (unspec:GPR [(const_int BFP_RND_TOWARD_0)] UNSPEC_ROUND) + (clobber (reg:CC CC_REG
[PATCH 14/16] S/390: arch12: Support the mul/add/subtract instructions.
gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.md ("*adddi3_sign", "*subdi3_sign", "mulditi3") ("mulditi3_2", "*muldi3_sign"): New patterns. ("muldi3", "*muldi3", "mulsi3", "*mulsi3"): Add an expander and rename the pattern definition. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/arch12/aghsghmgh-1.c: New test. * gcc.target/s390/arch12/mul-1.c: New test. * gcc.target/s390/arch12/mul-2.c: New test. --- gcc/ChangeLog | 7 ++ gcc/config/s390/s390.md| 98 +++--- gcc/testsuite/ChangeLog| 6 ++ gcc/testsuite/gcc.target/s390/arch12/aghsghmgh-1.c | 23 + gcc/testsuite/gcc.target/s390/arch12/mul-1.c | 30 +++ gcc/testsuite/gcc.target/s390/arch12/mul-2.c | 16 6 files changed, 167 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/arch12/aghsghmgh-1.c create mode 100644 gcc/testsuite/gcc.target/s390/arch12/mul-1.c create mode 100644 gcc/testsuite/gcc.target/s390/arch12/mul-2.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index b3f3d95..3753ad6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,12 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.md ("*adddi3_sign", "*subdi3_sign", "mulditi3") + ("mulditi3_2", "*muldi3_sign"): New patterns. + ("muldi3", "*muldi3", "mulsi3", "*mulsi3"): Add an expander and + rename the pattern definition. + +2017-03-24 Andreas Krebbel + * config/s390/s390.md ("indirect_jump"): Turn insn definition into expander. ("*indirect_jump", "*indirect2_jump"): New pattern definitions. diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 32753ef..93a0bc6 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -5795,6 +5795,15 @@ (set_attr "cpu_facility" "*,z196,extimm,z10") (set_attr "z10prop" "z10_super_E1,*,z10_super_E1,z10_super_E1")]) +(define_insn "*adddi3_sign" + [(set (match_operand:DI 0 "register_operand" "=d") +(plus:DI (sign_extend:DI (match_operand:HI 2 "memory_operand""T")) +(match_operand:DI 1 "register_operand" "0"))) + (clobber (reg:CC CC_REGNUM))] + "TARGET_ARCH12" + "agh\t%0,%2" + [(set_attr "op_type" "RXY")]) + ; ; add(tf|df|sf|td|dd)3 instruction pattern(s). ; @@ -6226,6 +6235,15 @@ (set_attr "cpu_facility" "*,z196,*,longdisp") (set_attr "z10prop" "z10_super_c_E1,*,z10_super_E1,z10_super_E1")]) +(define_insn "*subdi3_sign" + [(set (match_operand:DI 0 "register_operand" "=d") +(minus:DI (match_operand:DI 1 "register_operand" "0") + (sign_extend:DI (match_operand:HI 2 "memory_operand" "T" + (clobber (reg:CC CC_REGNUM))] + "TARGET_ARCH12" + "sgh\t%0,%2" + [(set_attr "op_type" "RXY")]) + ; ; sub(tf|df|sf|td|dd)3 instruction pattern(s). @@ -6565,6 +6583,14 @@ ; muldi3 instruction pattern(s). ; +(define_expand "muldi3" + [(parallel +[(set (match_operand:DI 0 "register_operand") + (mult:DI (match_operand:DI 1 "nonimmediate_operand") + (match_operand:DI 2 "general_operand"))) + (clobber (reg:CC CC_REGNUM))])] + "TARGET_ZARCH") + (define_insn "*muldi3_sign" [(set (match_operand:DI 0 "register_operand" "=d,d") (mult:DI (sign_extend:DI (match_operand:SI 2 "general_operand" "d,T")) @@ -6576,24 +6602,68 @@ [(set_attr "op_type" "RRE,RXY") (set_attr "type" "imuldi")]) -(define_insn "muldi3" - [(set (match_operand:DI 0 "register_operand" "=d,d,d,d") -(mult:DI (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,0") - (match_operand:DI 2 "general_operand" "d,K,T,Os")))] +(define_insn "*muldi3" + [(set (match_operand:DI 0 "register_operand" "=d,d,d,d,d") + (mult:DI (match_operand:DI 1 "nonimmediate_operand" "%0,d,0,0,0") +(match_operand:DI 2 "general_operand" "d,d,K,T,Os"))) + (clobber (match_scratch:CC 3"=X,c,X,X,X"))] "TARGET_ZARCH" "@ msgr\t%0,%2 + msgrkc\t%0,%1,%2 mghi\t%0,%h2 msg\t%0,%2 msgfi\t%0,%2" - [(set_attr "op_type" "RRE,RI,RXY,RIL") + [(set_attr "op_type" "RRE,RRF,RI,RXY,RIL") (set_attr "type" "imuldi") - (set_attr "cpu_facility" "*,*,*,z10")]) + (set_attr "cpu_facility" "*,arch12,*,*,z10")]) + +(define_insn "mulditi3" + [(set (match_operand:TI 0 "register_operand" "=d,d") +(mult:TI (sign_extend:TI + (match_operand:DI 1 "register_operand" "%d,0")) +(sign_extend:TI + (match_operand:DI 2 "nonimmediate_operand" " d,T"] + "TARGET_ARCH12" + "@ + mgrk\t%0,%1,%2 + mg\t%0,%2" + [(set_attr "op_type" "RRF,RXY")]) + +; Combine likes op
[PATCH 09/16] S/390: arch12: Add arch12 option.
This patch covers the mechanical work of making the new architecture option arch12 available wherever it will be needed later. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/s390.exp: Run tests in arch12 and vxe dirs. * lib/target-supports.exp: Add effective target check s390_vxe. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * common/config/s390/s390-common.c (processor_flags_table): Add arch12. * config.gcc: Add arch12. * config/s390/driver-native.c (s390_host_detect_local_cpu): Default to arch12 for unknown CPU model numbers. * config/s390/s390-builtins.def: Add B_VXE builtin flag. * config/s390/s390-c.c (s390_cpu_cpp_builtins_internal): Adjust PROCESSOR_max sanity check. * config/s390/s390-opts.h (enum processor_type): Add PROCESSOR_ARCH12. * config/s390/s390.c (processor_table): Add arch12. (s390_expand_builtin): Add check for B_VXE flag. (s390_issue_rate): Add PROCESSOR_ARCH12. (s390_get_sched_attrmask): Likewise. (s390_get_unit_mask): Likewise. (s390_sched_score): Enable z13 scheduling for arch12. (s390_sched_reorder): Likewise. (s390_sched_variable_issue): Likewise. * config/s390/s390.h (enum processor_flags): Add PF_ARCH12 and PF_VXE. (s390_tune_attr): Use z13 scheduling also for arch12. (TARGET_CPU_ARCH12, TARGET_CPU_ARCH12_P, TARGET_CPU_VXE) (TARGET_CPU_VXE_P, TARGET_ARCH12, TARGET_ARCH12_P, TARGET_VXE) (TARGET_VXE_P): New macros. * config/s390/s390.md: Add arch12 to cpu attribute. Add arch12 and vxe to cpu_facility. Add arch12 and vxe to enabled attribute. * config/s390/s390.opt: Add arch12 as processor_type. --- gcc/ChangeLog | 30 ++ gcc/common/config/s390/s390-common.c | 5 - gcc/config.gcc | 2 +- gcc/config/s390/driver-native.c| 3 +++ gcc/config/s390/s390-builtins.def | 3 ++- gcc/config/s390/s390-c.c | 2 +- gcc/config/s390/s390-opts.h| 1 + gcc/config/s390/s390.c | 22 -- gcc/config/s390/s390.h | 25 + gcc/config/s390/s390.md| 12 ++-- gcc/config/s390/s390.opt | 3 +++ gcc/testsuite/ChangeLog| 5 + gcc/testsuite/gcc.target/s390/s390.exp | 6 ++ gcc/testsuite/lib/target-supports.exp | 17 + 14 files changed, 120 insertions(+), 16 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 779101e..89e7906 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,35 @@ 2017-03-24 Andreas Krebbel + * common/config/s390/s390-common.c (processor_flags_table): Add + arch12. + * config.gcc: Add arch12. + * config/s390/driver-native.c (s390_host_detect_local_cpu): + Default to arch12 for unknown CPU model numbers. + * config/s390/s390-builtins.def: Add B_VXE builtin flag. + * config/s390/s390-c.c (s390_cpu_cpp_builtins_internal): Adjust + PROCESSOR_max sanity check. + * config/s390/s390-opts.h (enum processor_type): Add + PROCESSOR_ARCH12. + * config/s390/s390.c (processor_table): Add arch12. + (s390_expand_builtin): Add check for B_VXE flag. + (s390_issue_rate): Add PROCESSOR_ARCH12. + (s390_get_sched_attrmask): Likewise. + (s390_get_unit_mask): Likewise. + (s390_sched_score): Enable z13 scheduling for arch12. + (s390_sched_reorder): Likewise. + (s390_sched_variable_issue): Likewise. + * config/s390/s390.h (enum processor_flags): Add PF_ARCH12 and + PF_VXE. + (s390_tune_attr): Use z13 scheduling also for arch12. + (TARGET_CPU_ARCH12, TARGET_CPU_ARCH12_P, TARGET_CPU_VXE) + (TARGET_CPU_VXE_P, TARGET_ARCH12, TARGET_ARCH12_P, TARGET_VXE) + (TARGET_VXE_P): New macros. + * config/s390/s390.md: Add arch12 to cpu attribute. Add arch12 + and vxe to cpu_facility. Add arch12 and vxe to enabled attribute. + * config/s390/s390.opt: Add arch12 as processor_type. + +2017-03-24 Andreas Krebbel + * config/s390/s390.md ("fixuns_truncdddi2", "fixuns_trunctddi2") ("fixuns_trunc2"): Merge into ... diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 47f13e1..10418a3 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -45,7 +45,10 @@ EXPORTED_CONST int processor_flags_table[] = | PF_EXTIMM | PF_DFP | PF_Z10 | PF_Z196 | PF_ZEC12 | PF_TX, /* z13 */PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT | PF_EXTIMM | PF_DFP | PF_Z10 | PF_Z196 | PF_ZEC12 | PF_TX - | PF_Z13 | PF_VX + | PF_Z13 | PF_VX, +/* arch12 */ PF_IEEE_FL
[PATCH 12/16] S/390: arch12: Add vllezlf instruction.
This adds support for the vector load element and zero instruction and makes sure it is used when initializing vectors with elements while setting the rest to 0. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.c (s390_expand_vec_init): Use vllezl instruction if possible. * config/s390/vector.md (vec_halfnumelts): New mode attribute. ("*vec_vllezlf"): New pattern. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/vxe/vllezlf-1.c: New test. --- gcc/ChangeLog | 8 +++ gcc/config/s390/s390.c| 28 + gcc/config/s390/vector.md | 17 +++ gcc/testsuite/ChangeLog | 4 gcc/testsuite/gcc.target/s390/vxe/vllezlf-1.c | 30 +++ 5 files changed, 87 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vxe/vllezlf-1.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index d516b4d..a48b743 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,13 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.c (s390_expand_vec_init): Use vllezl + instruction if possible. + * config/s390/vector.md (vec_halfnumelts): New mode + attribute. + ("*vec_vllezlf"): New pattern. + +2017-03-24 Andreas Krebbel + * config/s390/vector.md ("popcountv16qi2", "popcountv8hi2") ("popcountv4si2", "popcountv2di2"): Rename to ... ("popcount2", "popcountv8hi2_vx", "popcountv4si2_vx") diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 416a15e..e800323 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -6552,6 +6552,34 @@ s390_expand_vec_init (rtx target, rtx vals) return; } + /* Use vector load logical element and zero. */ + if (TARGET_VXE && (mode == V4SImode || mode == V4SFmode)) +{ + bool found = true; + + x = XVECEXP (vals, 0, 0); + if (memory_operand (x, inner_mode)) + { + for (i = 1; i < n_elts; ++i) + found = found && XVECEXP (vals, 0, i) == const0_rtx; + + if (found) + { + machine_mode half_mode = (inner_mode == SFmode + ? V2SFmode : V2SImode); + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (mode, + gen_rtx_VEC_CONCAT (half_mode, + x, + const0_rtx), + gen_rtx_VEC_CONCAT (half_mode, + const0_rtx, + const0_rtx; + return; + } + } +} + /* We are about to set the vector elements one by one. Zero out the full register first in order to help the data flow framework to detect it as full VR set. */ diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index d4c0e95..6a726a3 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -44,6 +44,7 @@ (define_mode_iterator VI_HW_HSD [V8HI V4SI V2DI]) (define_mode_iterator VI_HW_HS [V8HI V4SI]) (define_mode_iterator VI_HW_QH [V16QI V8HI]) +(define_mode_iterator VI_HW_4 [V4SI V4SF]) ; All integer vector modes supported in a vector register + TImode (define_mode_iterator VIT [V1QI V2QI V4QI V8QI V16QI V1HI V2HI V4HI V8HI V1SI V2SI V4SI V1DI V2DI V1TI TI]) @@ -127,6 +128,9 @@ (V2DI "V2SI") (V2DF "V2SF")]) +(define_mode_attr vec_halfnumelts + [(V4SF "V2SF") (V4SI "V2SI")]) + ; The comparisons not setting CC iterate over the rtx code. (define_code_iterator VFCMP_HW_OP [eq gt ge]) (define_code_attr asm_fcmp_op [(eq "e") (gt "h") (ge "he")]) @@ -451,6 +455,19 @@ DONE; }) +(define_insn "*vec_vllezlf" + [(set (match_operand:VI_HW_4 0 "register_operand" "=v") + (vec_concat:VI_HW_4 +(vec_concat: + (match_operand: 1 "memory_operand""R") + (const_int 0)) +(vec_concat: + (const_int 0) + (const_int 0] + "TARGET_VXE" + "vllezlf\t%v0,%1" + [(set_attr "op_type" "VRX")]) + ; Replicate from vector element ; vrepb, vreph, vrepf, vrepg (define_insn "*vec_splat" diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 6d178c5..4efc391 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,9 @@ 2017-03-24 Andreas Krebbel + * gcc.target/s390/vxe/vllezlf-1.c: New test. + +2017-03-24 Andreas Krebbel + * gcc.target/s390/vxe/popcount-1.c: New test. 2017-03-24 Andreas Krebbel diff --git a/gcc/testsuite/gcc.target/s390/vxe/vllezlf-1.c b/gcc/testsuite/gcc.target/s390/vxe/vllezlf-1.c new file mo
[PATCH 15/16] S/390: arch12: Support new vector floating point modes.
This patch adds support for the new floating point vector elements (SF and TF) introduced with arch12. gcc/ChangeLog: 2017-03-24 Andreas Krebbel * config/s390/s390.c (s390_expand_vec_compare): Support other vector floating point modes than just V2DF. (s390_expand_vcond): Likewise. (s390_hard_regno_mode_ok): Allow SFmode values in VRs. (s390_cannot_change_mode_class): Prevent mode changes between TF and V1TF in vector registers. * config/s390/s390.md (DF, SF): New mode attributes. ("*cmp_ccs", "add3", "sub3", "mul3") ("fma4", "fms4", "div3", "*neg2"): Add SFmode support for VRs. * config/s390/vector.md (V_HW, V_HW2, VT_HW, ti*, nonvec): Add new vector fp modes. (VFT, VF_HW): New mode iterators. (vw, sdx): New mode attributes. ("addv2df3", "subv2df3", "mulv2df3", "divv2df3", "sqrtv2df2") ("fmav2df4","fmsv2df4", "negv2df2", "absv2df2", "*negabsv2df2") ("smaxv2df3", "sminv2df3", "*vec_cmpv2df_nocc") ("vec_cmpuneqv2df", "vec_cmpltgtv2df", "vec_orderedv2df") ("vec_unorderedv2df"): Adjust the v2df only patterns to support also the new vector floating point modes. Renaming to ... ("add3", "sub3", "mul3", "div3") ("sqrt2", "fma4", "fms4", "neg2") ("abs2", "negabs2", "smax3") ("smin3", "*vec_cmp_nocc") ("vec_cmpuneq", "vec_cmpltgt", "vec_ordered") ("vec_unordered"): ... these. ("neg_fma4", "neg_fms4", "*smax3_vxe") ("*smin3_vxe", "*sminv2df3_vx", "*vec_extendv4sf") ("*vec_extendv2df"): New insn definitions. gcc/testsuite/ChangeLog: 2017-03-24 Andreas Krebbel * gcc.target/s390/vxe/negfma-1.c: New test. --- gcc/ChangeLog| 34 +++ gcc/config/s390/s390-builtins.def| 2 +- gcc/config/s390/s390.c | 13 +- gcc/config/s390/s390.md | 144 ++-- gcc/config/s390/vector.md| 313 ++- gcc/testsuite/ChangeLog | 4 + gcc/testsuite/gcc.target/s390/vxe/negfma-1.c | 49 + 7 files changed, 388 insertions(+), 171 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vxe/negfma-1.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3753ad6..bd60982 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,39 @@ 2017-03-24 Andreas Krebbel + * config/s390/s390.c (s390_expand_vec_compare): Support other + vector floating point modes than just V2DF. + (s390_expand_vcond): Likewise. + (s390_hard_regno_mode_ok): Allow SFmode values in VRs. + (s390_cannot_change_mode_class): Prevent mode changes between TF + and V1TF in vector registers. + * config/s390/s390.md (DF, SF): New mode attributes. + ("*cmp_ccs", "add3", "sub3", "mul3") + ("fma4", "fms4", "div3", "*neg2"): Add + SFmode support for VRs. + * config/s390/vector.md (V_HW, V_HW2, VT_HW, ti*, nonvec): Add new + vector fp modes. + (VFT, VF_HW): New mode iterators. + (vw, sdx): New mode attributes. + ("addv2df3", "subv2df3", "mulv2df3", "divv2df3", "sqrtv2df2") + ("fmav2df4","fmsv2df4", "negv2df2", "absv2df2", "*negabsv2df2") + ("smaxv2df3", "sminv2df3", "*vec_cmpv2df_nocc") + ("vec_cmpuneqv2df", "vec_cmpltgtv2df", "vec_orderedv2df") + ("vec_unorderedv2df"): Adjust the v2df only patterns to support + also the new vector floating point modes. Renaming to ... + + ("add3", "sub3", "mul3", "div3") + ("sqrt2", "fma4", "fms4", "neg2") + ("abs2", "negabs2", "smax3") + ("smin3", "*vec_cmp_nocc") + ("vec_cmpuneq", "vec_cmpltgt", "vec_ordered") + ("vec_unordered"): ... these. + + ("neg_fma4", "neg_fms4", "*smax3_vxe") + ("*smin3_vxe", "*sminv2df3_vx", "*vec_extendv4sf") + ("*vec_extendv2df"): New insn definitions. + +2017-03-24 Andreas Krebbel + * config/s390/s390.md ("*adddi3_sign", "*subdi3_sign", "mulditi3") ("mulditi3_2", "*muldi3_sign"): New patterns. ("muldi3", "*muldi3", "mulsi3", "*mulsi3"): Add an expander and diff --git a/gcc/config/s390/s390-builtins.def b/gcc/config/s390/s390-builtins.def index 27cb6a8..bb2d743 100644 --- a/gcc/config/s390/s390-builtins.def +++ b/gcc/config/s390/s390-builtins.def @@ -2496,7 +2496,7 @@ B_DEF (s390_vec_ctsl, vec_ctsl, 0, B_DEF (s390_vec_ctul, vec_ctul, 0, B_VX, O2_U3, BT_FN_UV2DI_V2DF_INT) /* vclgdb */ B_DEF (s390_vcgdb, vec_df_to_di_s64, 0, B_VX, O2_U3, BT_FN_V2DI_V2DF_INT) /* vcgdb */ B_DEF (s390_vclgdb,vec_df_to_di_u64, 0, B_VX, O2_U3, BT_FN_UV2DI_V
C++ PATCH for c++/77339, ICE with invalid use of alias template
Here, we were seeing that A:: is a template specialization, so we tried to treat it as entering the scope of A. But since A is an alias template, that doesn't make sense. Fixed by not considering alias templates for this treatment. Tested x86_64-pc-linux-gnu, applying to trunk. commit 7fde9bb020c7be786c791e485261e28e7b0d3ccf Author: Jason Merrill Date: Thu Mar 23 16:21:34 2017 -0400 PR c++/77339 - ICE with invalid use of alias template. * pt.c (lookup_template_class_1): Don't try to enter the scope of an alias template. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 5259dad..cbe8082 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8682,9 +8682,9 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context, || !PRIMARY_TEMPLATE_P (gen_tmpl) || currently_open_class (template_type)) { - tree tinfo = TYPE_TEMPLATE_INFO_MAYBE_ALIAS (template_type); + tree tinfo = TYPE_TEMPLATE_INFO (template_type); - if (comp_template_args (TI_ARGS (tinfo), arglist)) + if (tinfo && comp_template_args (TI_ARGS (tinfo), arglist)) return template_type; } diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-58.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-58.C new file mode 100644 index 000..0ae1c49 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-58.C @@ -0,0 +1,7 @@ +// PR c++/77339 +// { dg-do compile { target c++11 } } + +template < typename > using A = int; + +//OK: template < typename X > A < X > a; +template < typename X > A < X >::a; // { dg-error "" }
Re: [PATCH 0/3] Introduce internal_error_cont and exclude it from pot files
On 03/24/2017 03:29 AM, Martin Liška wrote: I would like to ping that. I'm not sure what's agreement after I read discussion in: https://gcc.gnu.org/ml/gcc/2017-03/msg00070.html Martin Sebor may know, CC'ing him. I haven't seen any followup posts after the one above. The other (and I think more important point, one with a bearing on the whole patch series) to consider is the one made by translator Frederic Marchal in comment #2 in bug 80055. Martin
Re: C++ PATCH to fix ICE in replace_placeholders_r (PR c++/79937)
On Thu, Mar 23, 2017 at 05:09:58PM -0400, Jason Merrill wrote: > On Thu, Mar 23, 2017 at 4:34 PM, Marek Polacek wrote: > > On Tue, Mar 14, 2017 at 02:34:30PM -0400, Jason Merrill wrote: > >> On Tue, Mar 14, 2017 at 2:33 PM, Jason Merrill wrote: > >> > On Tue, Mar 7, 2017 at 12:10 PM, Marek Polacek > >> > wrote: > >> >> In this testcase we have > >> >> C c = bar (X{1}); > >> >> which store_init_value sees as > >> >> c = TARGET_EXPR >> >> .n=(&)->i}>)> > >> >> i.e. we're initializing "c" with a TARGET_EXPR. We call > >> >> replace_placeholders > >> >> that walks the whole tree to substitute the placeholders. Eventually > >> >> we find > >> >> the nested but that's for another object, > >> >> so we > >> >> crash. Seems that we shouldn't have stepped into the second > >> >> TARGET_EXPR at > >> >> all; it has nothing to with "c", it's bar's argument. > >> >> > >> >> It occurred to me that we shouldn't step into CALL_EXPRs and leave the > >> >> placeholders in function arguments to cp_gimplify_init_expr which calls > >> >> replace_placeholders for constructors. Not sure if it's enough to > >> >> handle > >> >> CALL_EXPRs like this, anything else? > >> > > >> > Hmm, we might have a DMI containing a call with an argument referring > >> > to *this, i.e. > >> > > >> > struct A > >> > { > >> > int i; > >> > int j = frob (this->i); > >> > }; > >> > > >> > The TARGET_EXPR seems like a more likely barrier, but even there we > >> > could have something like > >> > > >> > struct A { int i; }; > >> > struct B > >> > { > >> > int i; > >> > A a = A{this->i}; > >> > }; > >> > > >> > I think we need replace_placeholders to keep a stack of objects, so > >> > that when we see a TARGET_EXPR we add it to the stack and therefore > >> > can properly replace a PLACEHOLDER_EXPR of its type. > >> > >> Or actually, avoid replacing such a PLACEHOLDER_EXPR, but rather leave > >> it for later when we lower the TARGET_EXPR. > > > > Sorry, I don't really follow. I have a patch that puts TARGET_EXPRs on > > a stack, but I don't know how that helps. E.g. with nsdmi-aggr3.C > > we have > > B b = TARGET_EXPR > &>}> > > so when we get to that PLACEHOLDER_EXPR, on the stack there's > > TARGET_EXPR with type struct A > > TARGET_EXPR with type struct B > > so the type of the PLACEHOLDER_EXPR doesn't match the type of the current > > TARGET_EXPR, but we still want to replace it in this case. > > > > So -- could you expand a bit on what you had in mind, please? > > So then when we see a placeholder, we walk the stack to find the > object of the matching type. > > But if the object we find was collected from walking through a > TARGET_EXPR, we should leave the PLACEHOLDER_EXPR alone, so that it > can be replaced later with the actual target of the initialization. Unfortunately, I still don't understand; guess I'll have to drop this PR. With this we put TARGET_EXPRs on a stack, and then when we find a PLACEHOLDER_EXPR we walk the stack to find a TARGET_EXPR of the same type as the PLACEHOLDER_EXPR. There are three simplified examples I've been playing with: B b = T_E >}> - here we should replace the P_E; on the stack there are two TARGET_EXPRs of types B and A C c = T_E >)> - here we shouldn't replace the P_E; on the stack there are two TARGET_EXPRs of types X and C B b = T_E }}> - here we should replace the P_E; on the stack there's one TARGET_EXPR of type B In each case we find a TARGET_EXPR of the type of the PLACEHOLDER_EXPR, but I don't see how to decide which PLACEHOLDER_EXPR we should let slide. Sorry for being dense... diff --git gcc/cp/tree.c gcc/cp/tree.c index 2757af6..2439a00 100644 --- gcc/cp/tree.c +++ gcc/cp/tree.c @@ -2741,8 +2741,12 @@ build_ctor_subob_ref (tree index, tree type, tree obj) struct replace_placeholders_t { - tree obj;/* The object to be substituted for a PLACEHOLDER_EXPR. */ - bool seen; /* Whether we've encountered a PLACEHOLDER_EXPR. */ + /* The object to be substituted for a PLACEHOLDER_EXPR. */ + tree obj; + /* Whether we've encountered a PLACEHOLDER_EXPR. */ + bool seen; + /* A stack of TARGET_EXPRs we've found ourselves in. */ + vec target_expr_stack; }; /* Like substitute_placeholder_in_expr, but handle C++ tree codes and @@ -2762,14 +2766,35 @@ replace_placeholders_r (tree* t, int* walk_subtrees, void* data_) switch (TREE_CODE (*t)) { +case TARGET_EXPR: + d->target_expr_stack.safe_push (*t); + cp_walk_tree (&TARGET_EXPR_INITIAL (*t), replace_placeholders_r, data_, + NULL); + d->target_expr_stack.pop (); + *walk_subtrees = 0; + break; + case PLACEHOLDER_EXPR: { - tree x = obj; - for (; !(same_type_ignoring_top_level_qualifiers_p -(TREE_TYPE (*t), TREE_TYPE (x))); -x = TREE_OPERAND (x, 0)) - gcc_assert (TREE_CODE (x) == COMPONENT_REF); - *t = x; + bool skip_it = false; + unsigned ix; +
Re: [testsuite] Add missing dg-require-effective-target alloca to gcc testsuite
On Mar 24, 2017, at 5:58 AM, Rainer Orth wrote: > - { /* { dg-warning "statement may fall through" "" { target c } 23 } */ > - int a[i]; /* { dg-warning "statement may fall through" "" { target c++ > } 24 } */ > + { /* { dg-warning "statement may fall through" "" { target c } 24 } */ > + int a[i]; /* { dg-warning "statement may fall through" "" { target c++ > } 25 } */ Any reason to not use relative line numbers? See testsuite/gcc.dg/dg-test-1.c for a template for how to do it. I think they should work and be better. Indeed, all line numbers should be relative, generally speaking.
Re: [PATCH] fwprop: Prevent infinite looping (PR79405)
On 03/24/2017 05:54 AM, Richard Biener wrote: On Fri, Mar 24, 2017 at 10:39 AM, Segher Boessenkool wrote: On Fri, Mar 24, 2017 at 09:13:59AM +0100, Richard Biener wrote: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01485.html ? What about it? That suggestion would make fwprop do *less* useful work, while in principle the problem is that it *already* does not enough! Ok, not knowing a lot about fwprop I take your word for it (but GIMPLE level forwprop works that way and certainly you can't substitute a def into a use that is before the def). When fwprop has done a propagation it makes fresh new uses for all the sources of the resulting insn, which it adds at the end of the table. It doesn't reuse the original uses. Yes, that's what I suspected. If fwprop actually tried all propagations (and never tried substituting X with X, which it currently does), there would be no problem. To me it looked like fwprop tries to iterate over all propagations but it iterates over a changing data structure which is why it "oscillates". But I didn't actually verify that theory (in fact, I just very much disliked Berns patch give its compile-time cost). We have, in order, in one bb: B = A|Z A = B D = A where each of those is the only set of its dest. Now the first thing tried is propagating the first into the second. This fails. Then, Z is found to be zero, so we get B = A A = B D = A If the first was now propagated into the second again, all is fine (all three vars are set to A). But this is not tried again. The second is propagated into the third, giving B = A A = B D = B and then the first is propagated into the third, and we repeat forever. This patch is a workaround; it makes no difference on pretty much any code, except this single testcase (which has undefined behaviour, uninitialised variables). Yeah, your fix looks similar to the other hack I suggested. I have implemented the "retry things" as well fwiw, but a) it is too big and invasive for stage 4, and b) it kind of sucks, needs more work, even more invasive. The workaround is cheap and solves the immediate problem. Agreed, still iterating over the DF uses in the first place looks like the bug (given this "all uses" data structure changes during propagation!). I'd have done for (BBs in RPO order) for (insn in BB) repeat: for (use in insn) if (propagate_into (use)) goto repeat; I wonder why it wasn't implemented like this in the first place. ISTM that with the exception of loops that it ought to give the same final result. For loops, I wouldn't expect opportunities along backedges to be all that common. It also seems to me that we ought to be able to verify if the different order changes things. Just to be clear, I can live with either approach. I would even be option to Segher's as a stopgap and yours for gcc-8. jeff
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On 03/24/2017 07:35 AM, Jakub Jelinek wrote: On Fri, Mar 24, 2017 at 02:04:42PM +0100, Jakub Jelinek wrote: On Fri, Mar 24, 2017 at 06:37:10AM -0600, Jeff Law wrote: 2017-03-21 Jakub Jelinek PR target/80102 * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between /f and non-/f instructions. If both i1 and i2 are frame related, verify all CFA notes, their order and content. * g++.dg/opt/pr80102.C: New test. Presumably this didn't ICE at some point in the past, so it's a regression? (it's not marked as such in the BZ). It doesn't ICE for me with r238210 and ICEs with current trunk, I don't have too many ppc64le compilers around though. GCC 4.8 doesn't ICE either. ISTM this isn't a regression (without doing deeper analysis). As a release manager, ISTM you can grant an exception if you want to push this forward for gcc-7. Jeff
Re: [PATCH 0/3] Introduce internal_error_cont and exclude it from pot files
On 03/24/2017 09:37 AM, Martin Sebor wrote: On 03/24/2017 03:29 AM, Martin Liška wrote: I would like to ping that. I'm not sure what's agreement after I read discussion in: https://gcc.gnu.org/ml/gcc/2017-03/msg00070.html Martin Sebor may know, CC'ing him. I haven't seen any followup posts after the one above. The other (and I think more important point, one with a bearing on the whole patch series) to consider is the one made by translator Frederic Marchal in comment #2 in bug 80055. It's on the list of things to evaluate. The non-gcc stuff on my plate is just about wrapped up, at which point I can give more attention to gcc stuff. Jeff
Re: [PATCH 1/5] [ARC] Save/restore blink when in ISR.
* Claudiu Zissulescu [2017-03-20 12:43:26 +0100]: > BLIBK register needs to be saved/restored in a interrupt. Fix this issue. > > gcc/ > 2016-09-21 Claudiu Zissulescu > > * config/arc/arc.c (arc_epilogue_uses): BLINK should be also > restored when in interrupt. > * config/arc/arc.md (simple_return): ARCv2 rtie instruction > doesn't have delay slot. Looks good thanks, Andrew > > gcc/testsuite/ > 2016-09-21 Claudiu Zissulescu > > * gcc.target/arc/interrupt-4.c: New file. > --- > gcc/config/arc/arc.c | 10 ++ > gcc/config/arc/arc.md | 7 ++- > gcc/testsuite/gcc.target/arc/interrupt-4.c | 15 +++ > 3 files changed, 27 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arc/interrupt-4.c > > diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c > index 556b587..50bfa11 100644 > --- a/gcc/config/arc/arc.c > +++ b/gcc/config/arc/arc.c > @@ -9513,9 +9513,10 @@ arc_can_follow_jump (const rtx_insn *follower, const > rtx_insn *followee) > Return true if REGNO should be added to the deemed uses of the epilogue. > > We use the return address > - arc_return_address_regs[arc_compute_function_type (cfun)] . > - But also, we have to make sure all the register restore instructions > - are known to be live in interrupt functions. */ > + arc_return_address_regs[arc_compute_function_type (cfun)]. But > + also, we have to make sure all the register restore instructions > + are known to be live in interrupt functions, plus the blink > + register if it is clobbered by the isr. */ > > bool > arc_epilogue_uses (int regno) > @@ -9528,7 +9529,8 @@ arc_epilogue_uses (int regno) > { > if (!fixed_regs[regno]) > return true; > - return regno == arc_return_address_regs[cfun->machine->fn_type]; > + return ((regno == arc_return_address_regs[cfun->machine->fn_type]) > + || (regno == RETURN_ADDR_REGNUM)); > } >else > return regno == RETURN_ADDR_REGNUM; > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index a06c2ed..b912bd4 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -4748,7 +4748,12 @@ >output_asm_insn (\"j%!%* [%0]%&\", ®); >return \"\"; > } > - [(set_attr "type" "return") > + [(set (attr "type") > + (cond [(and (eq (symbol_ref "arc_compute_function_type (cfun)") > + (symbol_ref "ARC_FUNCTION_ILINK1")) > + (match_test "TARGET_V2")) > +(const_string "brcc_no_delay_slot")] > + (const_string "return"))) > ; predicable won't help here since the canonical rtl looks different > ; for branches. > (set (attr "cond") > diff --git a/gcc/testsuite/gcc.target/arc/interrupt-4.c > b/gcc/testsuite/gcc.target/arc/interrupt-4.c > new file mode 100644 > index 000..ea6596e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arc/interrupt-4.c > @@ -0,0 +1,15 @@ > +#if defined (__ARCHS__) || defined (__ARCEM__) > +#define RILINK "ilink" > +#else > +#define RILINK "ilink1" > +#endif > + > +extern int gpio_int; > +extern int read_reg (int); > + > +void __attribute__ ((interrupt(RILINK))) > +isr_handler (void) > +{ > + gpio_int = read_reg (1); > +} > +/* { dg-final { scan-assembler-times "blink" 2 } } */ > -- > 1.9.1 >
Re: [PATCH 2/5] [ARC] Fix detection of long immediate for load/store operands.
* Claudiu Zissulescu [2017-03-20 12:43:27 +0100]: > ARC can use scaled offsets when loading (i.e. ld.as rA,[base, > offset]). Where base and offset can be a register or an immediate > operand. The scaling only applies on the offset part of the > instruction. The compiler can accept an address like this: > > (plus:SI (mult:SI (reg:SI 2 r2 [orig:596 _2129 ] [596]) > (const_int 4 [0x4])) >(const_int 60 [0x3c])) > > Hence, to emit this instruction we place the (const_int 60) into base > and the register into offset to take advantage of the scaled offset > facility of the load instruction. As a result the length of the load > instruction is 8 bytes. However, the long_immediate_loadstore_operand > predicate used for calculating the length attribute doesn't recognize > this address and returns a wrong decision leading to a wrong length > computation for a load instruction using the above address. > > gcc/ > 2016-09-21 Claudiu Zissulescu > > * config/arc/predicates.md (long_immediate_loadstore_operand): > Consider scaled addresses cases. Looks good thanks, Andrew > --- > gcc/config/arc/predicates.md | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md > index 0dec736..8dd8d55 100644 > --- a/gcc/config/arc/predicates.md > +++ b/gcc/config/arc/predicates.md > @@ -148,6 +148,11 @@ >{ > rtx x = XEXP (op, 1); > > + if ((GET_CODE (XEXP (op, 0)) == MULT) > + && REG_P (XEXP (XEXP (op, 0), 0)) > + && CONSTANT_P (x)) > + return 1; > + > if (GET_CODE (x) == CONST) > { > x = XEXP (x, 0); > -- > 1.9.1 >
Re: [PATCH 3/5] [ARC] Disable TP register when building for bare metal.
* Claudiu Zissulescu [2017-03-20 12:43:28 +0100]: > No need for thread pointer in bare metal toolchain. Use TP register normally. > > gcc/ > 2016-09-29 Claudiu Zissulescu > > * config/arc/elf.h (ARGET_ARC_TP_REGNO_DEFAULT): Define. > * config/arc/linux.h (ARGET_ARC_TP_REGNO_DEFAULT): Likewise. > * config/arc/arc.opt (mtp-regno): Use > * ARGET_ARC_TP_REGNO_DEFAULT. Looks fine. I've suggested some alternative phrasing for the comments to make them read a little easier. Thanks, Andrew > --- > gcc/config/arc/arc.opt | 2 +- > gcc/config/arc/elf.h | 5 + > gcc/config/arc/linux.h | 4 > 3 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt > index 17af736..6060ded 100644 > --- a/gcc/config/arc/arc.opt > +++ b/gcc/config/arc/arc.opt > @@ -469,7 +469,7 @@ EnumValue > Enum(arc_fpu) String(fpud_all) Value(FPU_FPUD_ALL) > > mtp-regno= > -Target RejectNegative Joined UInteger Var(arc_tp_regno) Init(25) > +Target RejectNegative Joined UInteger Var(arc_tp_regno) > Init(TARGET_ARC_TP_REGNO_DEFAULT) > Specify thread pointer register number. > > mtp-regno=none > diff --git a/gcc/config/arc/elf.h b/gcc/config/arc/elf.h > index d2106c5..a6d6c05 100644 > --- a/gcc/config/arc/elf.h > +++ b/gcc/config/arc/elf.h > @@ -53,3 +53,8 @@ along with GCC; see the file COPYING3. If not see > # define MULTILIB_DEFAULTS { "mcpu=" ARC_MULTILIB_CPU_DEFAULT } > # endif > #endif > + > +/* Bare-metal toolchains are not having a thread pointer register > + set. */ /* Bare-metal toolchains do not need a thread pointer register. */ > +#undef TARGET_ARC_TP_REGNO_DEFAULT > +#define TARGET_ARC_TP_REGNO_DEFAULT -1 > diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h > index 10c291c..1d10357 100644 > --- a/gcc/config/arc/linux.h > +++ b/gcc/config/arc/linux.h > @@ -74,3 +74,7 @@ along with GCC; see the file COPYING3. If not see > /* We do not have any MULTILIB_OPTIONS specified, so there are no > MULTILIB_DEFAULTS. */ > #undef MULTILIB_DEFAULTS > + > +/* Linux toolchains are using r25 as TLS register. */ /* Linux toolchains use r25 as the thread pointer register. */ > +#undef TARGET_ARC_TP_REGNO_DEFAULT > +#define TARGET_ARC_TP_REGNO_DEFAULT 25 > -- > 1.9.1 >
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On 03/24/2017 07:04 AM, Jakub Jelinek wrote: On Fri, Mar 24, 2017 at 06:37:10AM -0600, Jeff Law wrote: 2017-03-21 Jakub Jelinek PR target/80102 * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between /f and non-/f instructions. If both i1 and i2 are frame related, verify all CFA notes, their order and content. * g++.dg/opt/pr80102.C: New test. Presumably this didn't ICE at some point in the past, so it's a regression? (it's not marked as such in the BZ). It doesn't ICE for me with r238210 and ICEs with current trunk, I don't have too many ppc64le compilers around though. + /* If both i1 and i2 are frame related, verify all the CFA notes + in the same order and with the same content. */ + if (RTX_FRAME_RELATED_P (i1)) +{ + static enum reg_note cfa_note_kinds[] = { + REG_FRAME_RELATED_EXPR, REG_CFA_DEF_CFA, REG_CFA_ADJUST_CFA, + REG_CFA_OFFSET, REG_CFA_REGISTER, REG_CFA_EXPRESSION, + REG_CFA_VAL_EXPRESSION, REG_CFA_RESTORE, REG_CFA_SET_VDRAP, + REG_CFA_TOGGLE_RA_MANGLE, REG_CFA_WINDOW_SAVE, REG_CFA_FLUSH_QUEUE + }; ISTM this could get out of date very easily. Is there a clean way to generate the array of cfa notes as we build up the notes from reg-notes.def? We could e.g. #ifndef REG_CFA_NOTE # define REG_CFA_NOTE(NAME) REG_NOTE(NAME) #endif and then REG_CFA_NOTE (FRAME_RELATED_EXPR) etc. in reg-notes.def (and document that REG_CFA_NOTE should be used for notes related to CFA). Then in cfgcleanups.c we could just #undef REG_CFA_NOTE #define DEF_REG_NOTE(NAME) #define REG_CFA_NOTE(NAME) REG_##NAME, #include "reg-notes.def" #undef DEF_REG_NOTE #undef REG_CFA_NOTE to populate the cfa_note_kinds array. Something like that seems preferable -- I think we're a lot more likely to catch the need to use REG_CFA_NOTE when defining the notes in reg-notes.def than we are to remember to update an array in a different file. jeff
[New file] Add testcase to ensure that #pragma GCC diagnostic push/pop works with -Wtraditional.
The attached test case failed with gcc 4.9 and older, but started compiling successfully with only the 1 expected warning with gcc 5. Adding it to the test suite would ensure that this behavior doesn't regress. Note that I have only tested it by compiling it manually, and not by actually running it as part of the entire test suite, so please let me know if I got any of the dejagnu directives wrong. Thanks, Eric Gallager gcc/testsuite/ChangeLog: 2017-03-24 Eric Gallager * gcc.dg/pragma-diag-7.c: New test. /* { dg-do compile } */ unsigned long ok = 0UL; #pragma GCC diagnostic push #pragma GCC diagnostic warning "-Wtraditional" unsigned long bad = 1UL; /* { dg-warning "suffix" } */ /* Note the extra space before the pragma on this next line: */ #pragma GCC diagnostic pop unsigned long ok_again = 2UL; /* { dg-bogus "suffix" } */
Re: [PATCH 4/5] [ARC] Fix divdf3 emulation for arcem.
* Claudiu Zissulescu [2017-03-20 12:43:29 +0100]: > Missing case for ARCEM cpus. Add it. > > libgcc/ > 2016-09-29 Claudiu Zissulescu > > * config/arc/ieee-754/divdf3.S (__divdf3): Use __ARCEM__. Looks good, thanks, Andrew > --- > libgcc/config/arc/ieee-754/divdf3.S | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/libgcc/config/arc/ieee-754/divdf3.S > b/libgcc/config/arc/ieee-754/divdf3.S > index 4d6aae2..b8085a6 100644 > --- a/libgcc/config/arc/ieee-754/divdf3.S > +++ b/libgcc/config/arc/ieee-754/divdf3.S > @@ -189,13 +189,13 @@ __divdf3: > asl r8,DBL1H,12 > lsr r12,DBL1L,20 > lsr r4,r8,26 > -#ifdef __HS__ > +#if defined (__ARCHS__) || defined (__ARCEM__) > add3 r10,pcl,60 ; (.Ldivtab-.) >> 3 > #else > add3 r10,pcl,59 ; (.Ldivtab-.) >> 3 > #endif > ld.as r4,[r10,r4] > -#ifdef __HS__ > +#if defined (__ARCHS__) || defined (__ARCEM__) > ld.as r9,[pcl,182]; [pcl,(-((.-.L7ff0) >> 2))] ; 0x7ff0 > #else > ld.as r9,[pcl,180]; [pcl,(-((.-.L7ff0) >> 2))] ; 0x7ff0 > @@ -299,14 +299,14 @@ __divdf3: > rsub r7,r6,5 > asr r10,r12,28 > bmsk r4,r12,27 > -#ifdef __HS__ > +#if defined (__ARCHS__) || defined (__ARCEM__) > min r7, r7, 31 > asr DBL0L, r4, r7 > #else > asrs DBL0L,r4,r7 > #endif > add DBL1H,r11,r10 > -#ifdef __HS__ > +#if defined (__ARCHS__) || defined (__ARCEM__) > abs.f r10, r4 > sub.mi r10, r10, 1 > #endif > -- > 1.9.1 >
Re: [PATCH 5/5] [ARC] Fix move_double_src_operand predicate.
* Claudiu Zissulescu [2017-03-20 12:43:30 +0100]: > Durring compilation process, (subreg (mem ...) ...) can occur. Hence, > we need to check if the address of mem is a valid one. This patch is > fixing this check by directly calling the address_operand, instead of > calling move_double_src_operand, as the latter is always checking > against the original mode, thus, returning false when the inner and > outer modes are different. > > gcc/ > 2016-10-07 Claudiu Zissulescu > > * config/arc/predicates.md (move_double_src_operand): Replace the > call to move_double_src_operand with a call to > address_operand. Sounds good, thanks, Andrew > --- > gcc/config/arc/predicates.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md > index 8dd8d55..9e60cb7 100644 > --- a/gcc/config/arc/predicates.md > +++ b/gcc/config/arc/predicates.md > @@ -318,7 +318,7 @@ >/* (subreg (mem ...) ...) can occur here if the inner part was once a >pseudo-reg and is now a stack slot. */ >if (GET_CODE (SUBREG_REG (op)) == MEM) > - return move_double_src_operand (SUBREG_REG (op), mode); > + return address_operand (XEXP (SUBREG_REG (op), 0), mode); >else > return register_operand (op, mode); > case MEM : > -- > 1.9.1 >
patch to fix PR80148
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80148 The test file is too big so there is no a testcase. The patch was successfully bootstrapped and tested on x86-64. Committed as rev. 246467. Index: ChangeLog === --- ChangeLog (revision 246466) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2017-03-24 Vladimir Makarov + + PR target/80148 + * lra-assigns.c (assign_by_spills): Add spilled non-reload pseudos + to consider in curr_insn_transform. + 2017-03-24 Jakub Jelinek * genrecog.c (validate_pattern): Add VEC_SELECT validation. Index: lra-assigns.c === --- lra-assigns.c (revision 246466) +++ lra-assigns.c (working copy) @@ -1507,6 +1507,14 @@ assign_by_spills (void) sorted_pseudos[nfails++] = conflict_regno; former_reload_pseudo_spill_p = true; } + else + /* It is better to do reloads before spilling as after the + spill-subpass we will reload memory instead of pseudos + and this will make reusing reload pseudos more + complicated. Going directly to the spill pass in such + case might result in worse code performance or even LRA + cycling if we have few registers. */ + bitmap_set_bit (&all_spilled_pseudos, conflict_regno); if (lra_dump_file != NULL) fprintf (lra_dump_file, " Spill %s r%d(hr=%d, freq=%d)\n", pseudo_prefix_title (conflict_regno), conflict_regno,
[C/C++ PATCH] Fix ICEs with vector subscripting of register variables (PR middle-end/80162)
Hi! c*_mark_addressable doesn't look through VIEW_CONVERT_EXPRs that vector subscripts are turned into, which means we don't diagnose taking address of e.g. a vector element in a hard register. On the other side, I think we want to support just normal vector subscripting of vectors in hard registers, that can be expanded into e.g. vector shift etc. So, this patch handles specially c*_mark_addressable calls from array_ref building, otherwise diagnoses taking addresses of vector elements in hard registers. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-24 Jakub Jelinek PR middle-end/80162 c-common/ * c-common.c (c_common_mark_addressable_vec): Don't set TREE_ADDRESSABLE on DECL_HARD_REGISTER. c/ * c-tree.h (c_mark_addressable): Add array_ref_p argument. * c-typeck.c (c_mark_addressable): Likewise. Look through VIEW_CONVERT_EXPR unless array_ref_p and VCE is from VECTOR_TYPE to ARRAY_TYPE. (build_array_ref): Pass true as array_ref_p to c_mark_addressable. cp/ * cp-tree.h (cxx_mark_addressable): Add array_ref_p argument. * typeck.c (cxx_mark_addressable): Likewise. Look through VIEW_CONVERT_EXPR unless array_ref_p and VCE is from VECTOR_TYPE to ARRAY_TYPE. (cp_build_array_ref): Pass true as array_ref_p to cxx_mark_addressable. testsuite/ * c-c++-common/pr80162-1.c: New test. * c-c++-common/pr80162-2.c: New test. * c-c++-common/pr80162-3.c: New test. --- gcc/c-family/c-common.c.jj 2017-03-22 19:31:40.0 +0100 +++ gcc/c-family/c-common.c 2017-03-24 10:20:13.340088654 +0100 @@ -6542,7 +6542,8 @@ c_common_mark_addressable_vec (tree t) && TREE_CODE (t) != PARM_DECL && TREE_CODE (t) != COMPOUND_LITERAL_EXPR) return; - TREE_ADDRESSABLE (t) = 1; + if (!VAR_P (t) || !DECL_HARD_REGISTER (t)) +TREE_ADDRESSABLE (t) = 1; } --- gcc/c/c-tree.h.jj 2017-03-23 15:49:55.0 +0100 +++ gcc/c/c-tree.h 2017-03-24 11:12:19.877403779 +0100 @@ -615,7 +615,7 @@ extern int same_translation_unit_p (cons extern int comptypes (tree, tree); extern int comptypes_check_different_types (tree, tree, bool *); extern bool c_vla_type_p (const_tree); -extern bool c_mark_addressable (tree); +extern bool c_mark_addressable (tree, bool = false); extern void c_incomplete_type_error (location_t, const_tree, const_tree); extern tree c_type_promotes_to (tree); extern struct c_expr default_function_array_conversion (location_t, --- gcc/c/c-typeck.c.jj 2017-03-21 07:55:51.0 +0100 +++ gcc/c/c-typeck.c2017-03-24 11:16:16.725391687 +0100 @@ -2654,7 +2654,7 @@ build_array_ref (location_t loc, tree ar || (COMPLETE_TYPE_P (TREE_TYPE (TREE_TYPE (array))) && TREE_CODE (TYPE_SIZE (TREE_TYPE (TREE_TYPE (array != INTEGER_CST)) { - if (!c_mark_addressable (array)) + if (!c_mark_addressable (array, true)) return error_mark_node; } /* An array that is indexed by a constant value which is not within @@ -4755,16 +4755,26 @@ lvalue_or_else (location_t loc, const_tr /* Mark EXP saying that we need to be able to take the address of it; it should not be allocated in a register. - Returns true if successful. */ + Returns true if successful. ARRAY_REF_P is true if this + is for ARRAY_REF construction - in that case we don't want + to look through VIEW_CONVERT_EXPR from VECTOR_TYPE to ARRAY_TYPE, + it is fine to use ARRAY_REFs for vector subscripts on vector + register variables. */ bool -c_mark_addressable (tree exp) +c_mark_addressable (tree exp, bool array_ref_p) { tree x = exp; while (1) switch (TREE_CODE (x)) { + case VIEW_CONVERT_EXPR: + if (array_ref_p + && TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE + && TREE_CODE (TREE_TYPE (TREE_OPERAND (x, 0))) == VECTOR_TYPE) + return true; + /* FALLTHRU */ case COMPONENT_REF: case ADDR_EXPR: case ARRAY_REF: --- gcc/cp/cp-tree.h.jj 2017-03-23 15:49:56.0 +0100 +++ gcc/cp/cp-tree.h2017-03-24 11:23:23.828878855 +0100 @@ -6715,7 +6715,7 @@ extern void cxx_print_error_function (d struct diagnostic_info *); /* in typeck.c */ -extern bool cxx_mark_addressable (tree); +extern bool cxx_mark_addressable (tree, bool = false); extern int string_conv_p (const_tree, const_tree, int); extern tree cp_truthvalue_conversion (tree); extern tree condition_conversion (tree); --- gcc/cp/typeck.c.jj 2017-03-19 11:57:14.0 +0100 +++ gcc/cp/typeck.c 2017-03-24 11:25:51.997956658 +0100 @@ -3217,7 +3217,7 @@ cp_build_array_ref (location_t loc, tree && (TREE_CODE (TYPE_SIZE (TREE_TYPE (TREE_TYPE (array != INTEGER_CST))) { - i
[PATCH] Fix expansion of initializer extensions (PR c/80163)
Hi! As can be seen on the following testcase, when expanding an extension in EXPAND_INITIALIZER context, we emit wrong extension operation (one depending on the signedness of the result type rather than on the signedness of the argument type, so e.g. extension of unsigned int to long long int is done using SIGN_EXTEND instead of ZERO_EXTEND, and extension of int to unsigned long long int using ZERO_EXTEND instead of SIGN_EXTEND. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-24 Jakub Jelinek PR c/80163 * expr.c : For EXPAND_INITIALIZER determine SIGN_EXTEND vs. ZERO_EXTEND based on signedness of treeop0's type rather than signedness of the result type. * gcc.dg/torture/pr80163.c: New test. --- gcc/expr.c.jj 2017-03-07 09:04:04.0 +0100 +++ gcc/expr.c 2017-03-24 12:09:57.729854079 +0100 @@ -8333,7 +8333,8 @@ expand_expr_real_2 (sepops ops, rtx targ } else if (modifier == EXPAND_INITIALIZER) - op0 = gen_rtx_fmt_e (unsignedp ? ZERO_EXTEND : SIGN_EXTEND, mode, op0); + op0 = gen_rtx_fmt_e (TYPE_UNSIGNED (TREE_TYPE (treeop0)) +? ZERO_EXTEND : SIGN_EXTEND, mode, op0); else if (target == 0) op0 = convert_to_mode (mode, op0, --- gcc/testsuite/gcc.dg/torture/pr80163.c.jj 2017-03-24 12:19:21.363603066 +0100 +++ gcc/testsuite/gcc.dg/torture/pr80163.c 2017-03-24 12:19:09.0 +0100 @@ -0,0 +1,35 @@ +/* PR c/80163 */ +/* { dg-do compile { target int128 } } */ + +volatile int v; + +__attribute__((noinline, noclone)) void +bar (void) +{ + v++; + asm volatile ("" : : : "memory"); +} + +__attribute__((noinline, noclone)) __int128_t * +foo (unsigned long **p) +{ +a: + bar (); +b: + bar (); + static __int128_t d = (unsigned long) &&a - (unsigned long) &&b; + static unsigned long e = (unsigned long) &&a - (unsigned long) &&b; + *p = &e; + return &d; +} + +int +main () +{ + __int128_t *p; + unsigned long *q; + p = foo (&q); + if (*p != *q) +__builtin_abort (); + return 0; +} Jakub
[patch] have "would be stringified in traditional C" warning in libcpp/macro.c be controlled by -Wtraditional
It seemed odd to me that gcc was issuing a warning about compatibility with traditional C that I couldn't turn off by pushing/popping -Wtraditional over the problem area, so I made the attached (minor) patch to fix it. Survives bootstrap, but the only testing I've done with it has been compiling the one file that was giving me issues previously, which I'd need to reduce further to turn it into a proper test case. Thanks, Eric Gallager libcpp/ChangeLog: 2017-03-24 Eric Gallager * macro.c (check_trad_stringification): Have warning be controlled by -Wtraditional. libcpp/macro.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libcpp/macro.c b/libcpp/macro.c index de18c2210cf..5c0de2e4bab 100644 --- a/libcpp/macro.c +++ b/libcpp/macro.c @@ -3316,7 +3316,7 @@ check_trad_stringification (cpp_reader *pfile, const cpp_macro *macro, if (NODE_LEN (node) == len && !memcmp (p, NODE_NAME (node), len)) { - cpp_error (pfile, CPP_DL_WARNING, + cpp_warning (pfile, CPP_W_TRADITIONAL, "macro argument \"%s\" would be stringified in traditional C", NODE_NAME (node)); break;
Re: [New file] Add testcase to ensure that #pragma GCC diagnostic push/pop works with -Wtraditional.
On Fri, 2017-03-24 at 14:10 -0400, Eric Gallager wrote: > The attached test case failed with gcc 4.9 and older, but started > compiling successfully with only the 1 expected warning with gcc 5. > Adding it to the test suite would ensure that this behavior doesn't > regress. Thanks for posting this. What's the significance of the leading space in the: #pragma GCC diagnostic pop line? Is *that* the bug? (did we have a bug # for this, I wonder?) > Note that I have only tested it by compiling it manually, and > not by actually running it as part of the entire test suite, so > please > let me know if I got any of the dejagnu directives wrong. When I started contributing to gcc, it took me a while to figure out how to run just one case in the testsuite, so in case it's helpful I'll post the recipe here: 1) Find the pertinent Tcl script that runs the test: a .exp script in the same directory, or one of the ancestors directories. For this case it's gcc.dg/dg.exp. The significant part is the filename: dg.exp 2) Figure out the appropriate "make" target, normally based on the source language for the test. For this case it's "check-gcc" 3) Run make in your BUILDDIR/gcc, passing in a suitable value for RUNTESTFLAGS based on the filename found in step 1 above. For this case, giving it a couple of "-v" flags for verbosity (so that we can see the command-line of the compiler invocation) it would be: $ make -jN && make check-gcc RUNTESTFLAGS="-v -v dg.exp=pragma-diag -7.c" (for some N; I like the "make && make check-FOO" construction to ensure that the compiler is rebuilt before running the tests). ...which leads to a summary of: # of expected passes3 which looks good. You can also use wildcards e.g.: make -j64 && make check-gcc RUNTESTFLAGS="-v -v dg.exp=pragma-diag-*.c" (and can use -jN on the "make check-FOO" invocation if there are a lot of tests; I tend not to use it for a small number of tests, to avoid interleaving of output in the logs). Thanks, > Eric Gallager > > gcc/testsuite/ChangeLog: > > 2017-03-24 Eric Gallager > > * gcc.dg/pragma-diag-7.c: New test. I tested your new test case via the above approach and it looks good to me. Although we're meant to only be accepting regression fixes and documentation fixes right now (stage 4 of gcc 7 development) I feel that extra test coverage like this also ought to be acceptable. I don't know if the test case is sufficiently small to be exempt from the FSF's paperwork requirements here: https://gcc.gnu.org/contribute.html (do you have that paperwork in place?) Thanks Dave >
[PATCH] Reject > word sign extensions in initializers (PR middle-end/80163)
Hi! I'm not aware of any target that would support sign extension of something that can't be folded in the compiler into some type larger than word/pointer. Zero extension is doable and assemble_variable is able to deal with it by emitting low/high subregs of it (where the low one will contain some expression that needs to be computed by assembler or even have relocation and upper part will be all zeros), but for sign extension we'd need assembler support that would for some asm expression shift it arithmetically right. So, this patch just rejects it in initializers (in C++ handles through dynamic initialization) in that case. clang also rejects it, ICC silently miscompiles (performs zero extension). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-24 Jakub Jelinek PR middle-end/80163 * varasm.c (initializer_constant_valid_p_1): Disallow sign-extending conversions to integer types wider than word and pointer. * gcc.dg/pr80163.c: New test. --- gcc/varasm.c.jj 2017-01-01 12:45:37.0 +0100 +++ gcc/varasm.c2017-03-24 13:03:37.489529361 +0100 @@ -4472,8 +4472,15 @@ initializer_constant_valid_p_1 (tree val return initializer_constant_valid_p_1 (src, endtype, cache); /* Allow conversions between other integer types only if - explicit value. */ - if (INTEGRAL_TYPE_P (dest_type) && INTEGRAL_TYPE_P (src_type)) + explicit value. Don't allow sign-extension to a type larger + than word and pointer, there aren't relocations that would + allow to sign extend it to a wider type. */ + if (INTEGRAL_TYPE_P (dest_type) + && INTEGRAL_TYPE_P (src_type) + && (TYPE_UNSIGNED (src_type) + || TYPE_PRECISION (dest_type) <= TYPE_PRECISION (src_type) + || TYPE_PRECISION (dest_type) <= BITS_PER_WORD + || TYPE_PRECISION (dest_type) <= POINTER_SIZE)) { tree inner = initializer_constant_valid_p_1 (src, endtype, cache); if (inner == null_pointer_node) --- gcc/testsuite/gcc.dg/pr80163.c.jj 2017-03-24 13:02:37.827295587 +0100 +++ gcc/testsuite/gcc.dg/pr80163.c 2017-03-24 13:05:12.465309614 +0100 @@ -0,0 +1,22 @@ +/* PR middle-end/80163 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O0" } */ + +void bar (void); + +__int128_t * +foo (void) +{ +a: + bar (); +b:; + static __int128_t d = (long) &&a - (long) &&b; /* { dg-error "initializer element is not computable at load time" } */ + return &d; +} + +__int128_t * +baz (void) +{ + static __int128_t d = (long) (3 * 4); + return &d; +} Jakub
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On Fri, Mar 24, 2017 at 11:47:35AM -0600, Jeff Law wrote: > On 03/24/2017 07:35 AM, Jakub Jelinek wrote: > > On Fri, Mar 24, 2017 at 02:04:42PM +0100, Jakub Jelinek wrote: > > > On Fri, Mar 24, 2017 at 06:37:10AM -0600, Jeff Law wrote: > > > > > 2017-03-21 Jakub Jelinek > > > > > > > > > > PR target/80102 > > > > > * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between > > > > > /f > > > > > and non-/f instructions. If both i1 and i2 are frame related, > > > > > verify all CFA notes, their order and content. > > > > > > > > > > * g++.dg/opt/pr80102.C: New test. > > > > Presumably this didn't ICE at some point in the past, so it's a > > > > regression? > > > > (it's not marked as such in the BZ). > > > > > > It doesn't ICE for me with r238210 and ICEs with current trunk, I don't > > > have > > > too many ppc64le compilers around though. > > > > GCC 4.8 doesn't ICE either. > ISTM this isn't a regression (without doing deeper analysis). As a release > manager, ISTM you can grant an exception if you want to push this forward > for gcc-7. It is a [7 Regression], started with r239866, so it is a P1. Jakub
[PATCH] Fix asan/ubsan bitfield handling in VL structures (PR sanitizer/80168)
Hi! We ICE on the following testcase, because we attempt to use DECL_BIT_FIELD_REPRESENTATIVE instead of original FIELD_DECL in a COMPONENT_REF in a VL structure, but DECL_BIT_FIELD_REPRESENTATIVE's DECL_FIELD_OFFSET is not really gimplified and even if it was, it wouldn't be current. From the expr.c and stor-layout.c comments, seems DECL_BIT_FIELD_REPRESENTATIVE's DECL_FIELD_OFFSET is guaranteed to be the same as the corresponding field's by construction if it is not constant, all the differences if any are in DECL_FIELD_BIT_OFFSET. Therefore, it should be safe to reuse 3rd COMPONENT_REF operand. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-24 Jakub Jelinek PR sanitizer/80168 * asan.c (instrument_derefs): Copy over last operand from original COMPONENT_REF to the new COMPONENT_REF with DECL_BIT_FIELD_REPRESENTATIVE. * ubsan.c (instrument_object_size): Likewise. * gcc.dg/asan/pr80168.c: New test. --- gcc/asan.c.jj 2017-03-21 07:57:00.0 +0100 +++ gcc/asan.c 2017-03-24 17:02:35.451865004 +0100 @@ -1868,7 +1868,8 @@ instrument_derefs (gimple_stmt_iterator tree repr = DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)); instrument_derefs (iter, build3 (COMPONENT_REF, TREE_TYPE (repr), TREE_OPERAND (t, 0), repr, - NULL_TREE), location, is_store); + TREE_OPERAND (t, 2)), +location, is_store); return; } --- gcc/ubsan.c.jj 2017-03-07 07:10:00.0 +0100 +++ gcc/ubsan.c 2017-03-24 17:02:58.439568314 +0100 @@ -1772,7 +1772,7 @@ instrument_object_size (gimple_stmt_iter { tree repr = DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)); t = build3 (COMPONENT_REF, TREE_TYPE (repr), TREE_OPERAND (t, 0), - repr, NULL_TREE); + repr, TREE_OPERAND (t, 2)); } break; case ARRAY_REF: --- gcc/testsuite/gcc.dg/asan/pr80168.c.jj 2017-03-24 17:08:14.440489868 +0100 +++ gcc/testsuite/gcc.dg/asan/pr80168.c 2017-03-24 17:09:08.567791277 +0100 @@ -0,0 +1,12 @@ +/* PR sanitizer/80168 */ +/* { dg-do compile } */ + +int a; + +int +foo (void) +{ + struct S { int c[a]; int q : 8; int e : 4; } f; + f.e = 4; + return f.e; +} Jakub
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On Fri, Mar 24, 2017 at 11:56:05AM -0600, Jeff Law wrote: > > We could e.g. > > #ifndef REG_CFA_NOTE > > # define REG_CFA_NOTE(NAME) REG_NOTE(NAME) > > #endif > > and then > > REG_CFA_NOTE (FRAME_RELATED_EXPR) > > etc. in reg-notes.def (and document that REG_CFA_NOTE should be used for > > notes related to CFA). > > Then in cfgcleanups.c we could just > > #undef REG_CFA_NOTE > > #define DEF_REG_NOTE(NAME) > > #define REG_CFA_NOTE(NAME) REG_##NAME, > > #include "reg-notes.def" > > #undef DEF_REG_NOTE > > #undef REG_CFA_NOTE > > to populate the cfa_note_kinds array. > Something like that seems preferable -- I think we're a lot more likely to > catch the need to use REG_CFA_NOTE when defining the notes in reg-notes.def > than we are to remember to update an array in a different file. So like this (if it passes bootstrap/regtest on x86_64, i686 and powerpc64le)? 2017-03-24 Jakub Jelinek PR target/80102 * reg-notes.def (REG_CFA_NOTE): Define. Use it for CFA related notes. * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between /f and non-/f instructions. If both i1 and i2 are frame related, verify all CFA notes, their order and content. * g++.dg/opt/pr80102.C: New test. --- gcc/reg-notes.def.jj2017-01-21 02:26:33.0 +0100 +++ gcc/reg-notes.def 2017-03-24 19:14:23.413483807 +0100 @@ -20,10 +20,16 @@ along with GCC; see the file COPYING3. /* This file defines all the codes that may appear on individual EXPR_LIST, INSN_LIST and INT_LIST rtxes in the REG_NOTES chain of an insn. The codes are stored in the mode field of the rtx. Source files - define DEF_REG_NOTE appropriately before including this file. */ + define DEF_REG_NOTE appropriately before including this file. + + CFA related notes meant for RTX_FRAME_RELATED_P instructions + should be declared with REG_CFA_NOTE macro instead of REG_NOTE. */ /* Shorthand. */ #define REG_NOTE(NAME) DEF_REG_NOTE (REG_##NAME) +#ifndef REG_CFA_NOTE +# define REG_CFA_NOTE(NAME) REG_NOTE (NAME) +#endif /* REG_DEP_TRUE is used in scheduler dependencies lists to represent a read-after-write dependency (i.e. a true data dependency). This is @@ -112,7 +118,7 @@ REG_NOTE (BR_PRED) /* Attached to insns that are RTX_FRAME_RELATED_P, but are too complex for DWARF to interpret what they imply. The attached rtx is used instead of intuition. */ -REG_NOTE (FRAME_RELATED_EXPR) +REG_CFA_NOTE (FRAME_RELATED_EXPR) /* Attached to insns that are RTX_FRAME_RELATED_P, but are too complex for FRAME_RELATED_EXPR intuition. The insn's first pattern must be @@ -122,7 +128,7 @@ REG_NOTE (FRAME_RELATED_EXPR) with a base register and a constant offset. In the most complicated cases, this will result in a DW_CFA_def_cfa_expression with the rtx expression rendered in a dwarf location expression. */ -REG_NOTE (CFA_DEF_CFA) +REG_CFA_NOTE (CFA_DEF_CFA) /* Attached to insns that are RTX_FRAME_RELATED_P, but are too complex for FRAME_RELATED_EXPR intuition. This note adjusts the expression @@ -130,57 +136,57 @@ REG_NOTE (CFA_DEF_CFA) expression, relative to the old CFA expression. This rtx must be of the form (SET new-cfa-reg (PLUS old-cfa-reg const_int)). If the note rtx is NULL, we use the first SET of the insn. */ -REG_NOTE (CFA_ADJUST_CFA) +REG_CFA_NOTE (CFA_ADJUST_CFA) /* Similar to FRAME_RELATED_EXPR, with the additional information that this is a save to memory, i.e. will result in DW_CFA_offset or the like. The pattern or the insn should be a simple store relative to the CFA. */ -REG_NOTE (CFA_OFFSET) +REG_CFA_NOTE (CFA_OFFSET) /* Similar to FRAME_RELATED_EXPR, with the additional information that this is a save to a register, i.e. will result in DW_CFA_register. The insn or the pattern should be simple reg-reg move. */ -REG_NOTE (CFA_REGISTER) +REG_CFA_NOTE (CFA_REGISTER) /* Attached to insns that are RTX_FRAME_RELATED_P, but are too complex for FRAME_RELATED_EXPR intuition. This is a save to memory, i.e. will result in a DW_CFA_expression. The pattern or the insn should be a store of a register to an arbitrary (non-validated) memory address. */ -REG_NOTE (CFA_EXPRESSION) +REG_CFA_NOTE (CFA_EXPRESSION) /* Attached to insns that are RTX_FRAME_RELATED_P, but are too complex for FRAME_RELATED_EXPR intuition. The DWARF expression computes the value of the given register. */ -REG_NOTE (CFA_VAL_EXPRESSION) +REG_CFA_NOTE (CFA_VAL_EXPRESSION) /* Attached to insns that are RTX_FRAME_RELATED_P, with the information that this is a restore operation, i.e. will result in DW_CFA_restore or the like. Either the attached rtx, or the destination of the insn's first pattern is the register to be restored. */ -REG_NOTE (CFA_RESTORE) +REG_CFA_NOTE (CFA_RESTORE) /* Attached to insns that are RTX_FRAME_RELATED_P, marks insn that sets vDRAP from DRAP. If vDRAP is
Re: [New file] Add testcase to ensure that #pragma GCC diagnostic push/pop works with -Wtraditional.
On 3/24/17, David Malcolm wrote: > On Fri, 2017-03-24 at 14:10 -0400, Eric Gallager wrote: >> The attached test case failed with gcc 4.9 and older, but started >> compiling successfully with only the 1 expected warning with gcc 5. >> Adding it to the test suite would ensure that this behavior doesn't >> regress. > > Thanks for posting this. > > What's the significance of the leading space in the: > #pragma GCC diagnostic pop > line? Is *that* the bug? (did we have a bug # for this, I wonder?) > It prints a warning without it, which would be entirely correct of it to do: /Users/ericgallager/gcc-git/gcc/testsuite/gcc.dg/pragma-diag-7.c:8:2: warning: suggest hiding #pragma from traditional C with an indented # [-Wtraditional] #pragma GCC diagnostic pop ^ I only wanted the test case to be testing for the warnings about suffixes; another warning about the pragma would just be noise, albeit correct noise. > >> Note that I have only tested it by compiling it manually, and >> not by actually running it as part of the entire test suite, so >> please >> let me know if I got any of the dejagnu directives wrong. > > When I started contributing to gcc, it took me a while to figure out > how to run just one case in the testsuite, so in case it's helpful I'll > post the recipe here: > > 1) Find the pertinent Tcl script that runs the test: a .exp script in > the same directory, or one of the ancestors directories. For this case > it's gcc.dg/dg.exp. The significant part is the filename: dg.exp > > 2) Figure out the appropriate "make" target, normally based on the > source language for the test. For this case it's "check-gcc" > > 3) Run make in your BUILDDIR/gcc, passing in a suitable value for > RUNTESTFLAGS based on the filename found in step 1 above. > For this case, giving it a couple of "-v" flags for verbosity (so that > we can see the command-line of the compiler invocation) it would be: > > $ make -jN && make check-gcc RUNTESTFLAGS="-v -v dg.exp=pragma-diag > -7.c" > > (for some N; I like the "make && make check-FOO" construction to ensure > that the compiler is rebuilt before running the tests). > > ...which leads to a summary of: > > # of expected passes 3 > > which looks good. Okay, I tried this, and I also got: # of expected passes3 too, so that's good. > > You can also use wildcards e.g.: > > make -j64 && make check-gcc RUNTESTFLAGS="-v -v dg.exp=pragma-diag-*.c" > > (and can use -jN on the "make check-FOO" invocation if there are a lot of > tests; I tend not to use it for a small number of tests, to avoid > interleaving of output in the logs). > > Thanks, >> Eric Gallager >> >> gcc/testsuite/ChangeLog: >> >> 2017-03-24 Eric Gallager >> >> * gcc.dg/pragma-diag-7.c: New test. > > I tested your new test case via the above approach and it looks good to > me. > > Although we're meant to only be accepting regression fixes and > documentation fixes right now (stage 4 of gcc 7 development) I feel > that extra test coverage like this also ought to be acceptable. It's okay to save it for next stage 1, I'm already submitting it later than I intended to, so extra waiting won't hurt. > > I don't know if the test case is sufficiently small to be exempt from > the FSF's paperwork requirements here: > https://gcc.gnu.org/contribute.html > (do you have that paperwork in place?) > > Thanks > Dave Yes, I dropped off my copyright assignment at the FSF in December, but I don't have commit access yet though. Thanks, Eric
Re: [PATCH] fwprop: Prevent infinite looping (PR79405)
On Fri, Mar 24, 2017 at 12:54:35PM +0100, Richard Biener wrote: > > I have implemented the "retry things" as well fwiw, but a) it is too > > big and invasive for stage 4, and b) it kind of sucks, needs more > > work, even more invasive. The workaround is cheap and solves the > > immediate problem. > > Agreed, still iterating over the DF uses in the first place looks like > the bug (given this "all uses" data structure changes during propagation!). > > I'd have done > > for (BBs in RPO order) > for (insn in BB) > repeat: > for (use in insn) > if (propagate_into (use)) >goto repeat; That is more or less how the def-use links for fwprop are built now, but not the order that propagations are tried in. I think originally it used all single definitions, not just those dominating their uses, in which case something like your proposed algorithm does not work (there can be def-use loops, even irreducible loops). I agree it would be good to try something like you suggest in next stage 1. Segher
Re: [Committed] S/390: PR79904: Disallow reg + sym_ref literal pool addresses.
On Fri, Mar 24, 2017 at 03:05:21PM +0100, Andreas Krebbel wrote: > 2017-03-24 Andreas Krebbel > > PR target/79904 > * config/s390/s390.c (s390_decompose_address): Reject reg + > sym_ref literal pool references. > > gcc/testsuite/ChangeLog: > > 2017-03-24 Andreas Krebbel > > * gcc.dg/ubsan/pr79904-2.c: New test. Similarly to the other tests, we need -Wno-psabi, otherwise it fails on i686-linux. Fixed thusly, committed as obvious: 2017-03-24 Jakub Jelinek PR sanitizer/79904 * gcc.dg/ubsan/pr79904-2.c: Add -Wno-psabi to dg-options. --- gcc/testsuite/gcc.dg/ubsan/pr79904-2.c.jj 2017-03-24 15:08:52.0 +0100 +++ gcc/testsuite/gcc.dg/ubsan/pr79904-2.c 2017-03-24 19:43:52.283599715 +0100 @@ -1,6 +1,6 @@ /* PR sanitizer/79904 */ /* { dg-do compile } */ -/* { dg-options "-fsanitize=signed-integer-overflow" } */ +/* { dg-options "-fsanitize=signed-integer-overflow -Wno-psabi" } */ typedef signed char V __attribute__((vector_size (8))); Jakub
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On Fri, Mar 24, 2017 at 08:36:16PM +0100, Jakub Jelinek wrote: > On Fri, Mar 24, 2017 at 11:56:05AM -0600, Jeff Law wrote: > > > We could e.g. > > > #ifndef REG_CFA_NOTE > > > # define REG_CFA_NOTE(NAME) REG_NOTE(NAME) > > > #endif > > > and then > > > REG_CFA_NOTE (FRAME_RELATED_EXPR) > > > etc. in reg-notes.def (and document that REG_CFA_NOTE should be used for > > > notes related to CFA). > > > Then in cfgcleanups.c we could just > > > #undef REG_CFA_NOTE > > > #define DEF_REG_NOTE(NAME) > > > #define REG_CFA_NOTE(NAME) REG_##NAME, > > > #include "reg-notes.def" > > > #undef DEF_REG_NOTE > > > #undef REG_CFA_NOTE > > > to populate the cfa_note_kinds array. > > Something like that seems preferable -- I think we're a lot more likely to > > catch the need to use REG_CFA_NOTE when defining the notes in reg-notes.def > > than we are to remember to update an array in a different file. > > So like this (if it passes bootstrap/regtest on x86_64, i686 and > powerpc64le)? > > 2017-03-24 Jakub Jelinek > > PR target/80102 > * reg-notes.def (REG_CFA_NOTE): Define. Use it for CFA related > notes. > * cfgcleanup.c (old_insns_match_p): Don't cross-jump in between /f > and non-/f instructions. If both i1 and i2 are frame related, > verify all CFA notes, their order and content. > > * g++.dg/opt/pr80102.C: New test. Successfully bootstrapped/regtested on {x86_64,i686,powerpc64le}-linux, ok for trunk? Jakub
[PATCH,rs6000] PR80103: Fix ICE with cross compiler
PR 80103 provides a test case which results in an internal compiler error when invoked with -mno-direct-move -mpower9-dform- vector target options. The internal compiler error results because these two target options are incompatible with each other. The enclosed patch simply disables this particular combination of target options, terminating gcc with an error message instead of producing an internal compiler error. Additionally, this patch includes new comments to address omissions from a patch committed on 2017/03/23 which deals with conflicts between the -mno-power9-vector and -mcpu=power9 target options. This patch has been bootstrapped and tested with no regressions on both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-03-24 Kelvin Nilsen PR target/80103 * gcc.target/powerpc/pr80103-1.c: New test. gcc/ChangeLog: 2017-03-24 Kelvin Nilsen PR target/80103 * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Edit and add comments. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add special handling for target option conflicts between dform options (-mpower9-dform, -mpower9-dform-vector, -mpower9-dform-scalar) and -mno-direct-move. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 246406) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -429,6 +429,12 @@ rs6000_target_modify_macros (bool define_p, HOST_W if ((flags & OPTION_MASK_POPCNTD) != 0) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR7"); /* Note that the OPTION_MASK_DIRECT_MOVE flag is automatically + turned on in the following condition: + 1. TARGET_P9_DFORM_SCALAR or TARGET_P9_DFORM_VECTOR are enabled +and OPTION_MASK_DIRECT_MOVE is not explicitly disabled. +Hereafter, the OPTION_MASK_DIRECT_MOVE flag is considered to +have been turned on explicitly. + Note that the OPTION_MASK_DIRECT_MOVE flag is automatically turned off in any of the following conditions: 1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly disabled and OPTION_MASK_DIRECT_MOVE was not explicitly @@ -473,8 +479,13 @@ rs6000_target_modify_macros (bool define_p, HOST_W if (!flag_iso) rs6000_define_or_undefine_macro (define_p, "__APPLE_ALTIVEC__"); } - /* Note that the OPTION_MASK_VSX flag is automatically turned off in + /* Note that the OPTION_MASK_VSX flag is automatically turned on in the following conditions: + 1. TARGET_P8_VECTOR is explicitly turned on and the OPTION_MASK_VSX +was not explicitly turned off. Hereafter, the OPTION_MASK_VSX +flag is considered to have been explicitly turned on. + Note that the OPTION_MASK_VSX flag is automatically turned off in + the following conditions: 1. The operating system does not support saving of AltiVec registers (OS_MISSING_ALTIVEC). 2. If any of the options TARGET_HARD_FLOAT, TARGET_FPRS, @@ -507,6 +518,12 @@ rs6000_target_modify_macros (bool define_p, HOST_W rs6000_define_or_undefine_macro (define_p, "__TM_FENCE__"); } /* Note that the OPTION_MASK_P8_VECTOR flag is automatically turned + on in the following conditions: + 1. TARGET_P9_VECTOR is explicitly turned on and +OPTION_MASK_P8_VECTOR is not explicitly turned off. +Hereafter, the OPTION_MASK_P8_VECTOR flag is considered to +have been turned off explicitly. + Note that the OPTION_MASK_P8_VECTOR flag is automatically turned off in the following conditions: 1. If any of TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX were turned off explicitly and OPTION_MASK_P8_VECTOR flag was @@ -514,15 +531,24 @@ rs6000_target_modify_macros (bool define_p, HOST_W 2. If TARGET_ALTIVEC is turned off. Hereafter, the OPTION_MASK_P8_VECTOR flag is considered to have been turned off explicitly. - 3. If TARGET_VSX is turned off. Hereafter, the OPTION_MASK_P8_VECTOR - flag is considered to have been turned off explicitly. */ + 3. If TARGET_VSX is turned off and OPTION_MASK_P8_VECTOR was not +explicitly enabled. If TARGET_VSX is explicitly enabled, the +OPTION_MASK_P8_VECTOR flag is hereafter also considered to + have been turned off explicitly. */ if ((flags & OPTION_MASK_P8_VECTOR) != 0) rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__"); /* Note that the OPTION_MASK_P9_VECTOR flag is automatically turned off in the following conditions: - 1. If TARGET_P8_VECTOR is turned off. Hereafter, the - OPTION_MASK_P9_VECTOR flag is considered to have been turned off - explicitly. */ + 1. If TARGET_P8_VECTOR is turned off and OPTION_MASK_P9_VECTOR is +not turned on expli
[wwwdocs] Update five citeseer.ist.psu.edu references in news/profiledriven.html
Update four citeseer.ist.psu.edu references with doi.org and one with an updated one. Applied. Gerald Index: news/profiledriven.html === RCS file: /cvs/gcc/wwwdocs/htdocs/news/profiledriven.html,v retrieving revision 1.10 diff -u -r1.10 profiledriven.html --- news/profiledriven.html 28 Jun 2014 10:34:16 - 1.10 +++ news/profiledriven.html 24 Mar 2017 22:02:33 - @@ -260,11 +260,11 @@ [1] -http://citeseer.ist.psu.edu/ball93branch.html";>Branch Prediction for Free; +https://doi.org/10.1145/155090.155119";>Branch Prediction for Free; Ball and Larus; PLDI '93. [2] -http://citeseer.ist.psu.edu/wu94static.html";>Static Branch +https://doi.org/10.1145/192724.192725";>Static Branch Frequency and Program Profile Analysis; Wu and Larus; MICRO-27. [3] @@ -276,7 +276,8 @@ [5] -http://citeseer.ist.psu.edu/young97nearoptimal.html";>Near-optimal Intraprocedural Branch Alignment; +https://doi.org/10.1145/258916.258932";>Near-optimal + Intraprocedural Branch Alignment; Cliff Young, David S. Johnson, David R. Karger, Michael D. Smith, ACM 1997 [6] @@ -284,11 +285,13 @@ International Conference on Supercomputing, 1999 [7] -http://citeseer.ist.psu.edu/chang91using.html";>Using Profile Information to Assist Classic Code Optimizations; Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu, 1991 +https://doi.org/10.1002/spe.4380211204";>Using Profile + Information to Assist Classic Code Optimizations; + Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu, 1991 [8] http://citeseer.ist.psu.edu/august96hyperblock.html";>Hyperblock +href="http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.1922";>Hyperblock Performance Optimizations For ILP Processors; David Isaac August, 1996 [9]
Re: [PATCH,rs6000] PR80103: Fix ICE with cross compiler
On Fri, Mar 24, 2017 at 04:04:33PM -0600, Kelvin Nilsen wrote: > PR 80103 provides a test case which results in an internal > compiler error when invoked with -mno-direct-move -mpower9-dform- > vector target options. The internal compiler error results because > these two target options are incompatible with each other. > > The enclosed patch simply disables this particular combination of > target options, terminating gcc with an error message instead of > producing an internal compiler error. Additionally, this patch > includes new comments to address omissions from a patch committed > on 2017/03/23 which deals with conflicts between the > -mno-power9-vector and -mcpu=power9 target options. > > This patch has been bootstrapped and tested with no regressions on > both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. > Is this ok for the trunk? This looks good, please apply. Thanks, Segher
[PATCH #2, rs6000][GCC6] Fix PR78543, ICE in push_reload on powerpc64le-linux-gnu
This patch reworks the original patch I attached to the bug report, to try and make it less hacky. It separates the bswap insns where there is hardware support into separate read, write, and register swap instructions. This is because the register allocators will try to push the bswap value in a register to the stack and do the load based swap with reverse bytes. Reload fumbles in certain conditions. LRA generates working code, but the store and the load with byte reverse from the same location, can slow things down compared to the operation on registers. I only did this optimization where we had the hardware support (i.e. bswap for HImode all of the time, bswap for SImode all of the time, and bswap for DImode if we are executing 64-bit instructions and the machine has LDBRX/STDBRX -- power7 and newer/cell ppc). I have done bootstrap builds on a little endian power8 system, on a big endian power8 system, and a big endian power7 system (both 32/64-bit support on this last system). There were no regressions. I applied the patches to GCC 6, and the compiler builds and has no regressions on a little endian power8 system. I built spec 2006 benchmarks with the GCC 7 compiler. There are 12 benchmarks that generate one or more load/store with byte swap instructions (perlbench, gcc, gamess, milc, zeusmp, calculix, h264ref, tonto, omnetpp, wrf, sphinx3, xalancbmk). I compared the instructions generated. 10 of the benchmarks generated the same instructions. Milc generated 1 less load with byte swap instruction and 1 more store with byte swap instruction. Sphinx3 generated 6 less load with byte swap instructions and 6 more store with byte swap instructions. So I count this as the same level of byte swapping is being generated. Can I apply the patch to the GCC 7 trunk? Since the patch shows up as an abort in GCC 6 and not in GCC 7 (unless you disable LRA), I would like to apply the patch as soon as possible to the GCC 6 branch. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 246413) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -370,6 +370,11 @@ (define_mode_iterator P [(SI "TARGET_32B ; PTImode is GPR only) (define_mode_iterator TI2 [TI PTI]) +; Types that have hardware load/store with byte reverse +(define_mode_iterator BSWAP [HI +SI +(DI "TARGET_POWERPC64 && TARGET_LDBRX")]) + ; Any hardware-supported floating-point mode (define_mode_iterator FP [ (SF "TARGET_HARD_FLOAT @@ -2350,12 +2355,12 @@ (define_insn "cmpb3" ;; Since the hardware zeros the upper part of the register, save generating the ;; AND immediate if we are converting to unsigned -(define_insn "*bswaphi2_extenddi" +(define_insn "*bswap2_extenddi" [(set (match_operand:DI 0 "gpc_reg_operand" "=r") (zero_extend:DI -(bswap:HI (match_operand:HI 1 "memory_operand" "Z"] +(bswap:HSI (match_operand:HSI 1 "memory_operand" "Z"] "TARGET_POWERPC64" - "lhbrx %0,%y1" + "lbrx %0,%y1" [(set_attr "length" "4") (set_attr "type" "load")]) @@ -2368,34 +2373,55 @@ (define_insn "*bswaphi2_extendsi" [(set_attr "length" "4") (set_attr "type" "load")]) -(define_expand "bswaphi2" - [(parallel [(set (match_operand:HI 0 "reg_or_mem_operand" "") - (bswap:HI - (match_operand:HI 1 "reg_or_mem_operand" ""))) - (clobber (match_scratch:SI 2 ""))])] +;; Separate the bswap patterns into load, store, and gpr<-gpr. This prevents +;; the register allocator from converting a gpr<-gpr swap into a store and then +;; load with byte swap, which can be slower than doing it in the registers. It +;; also prevents certain failures with the RELOAD register allocator. + +(define_expand "bswap2" + [(use (match_operand:HSI 0 "reg_or_mem_operand")) + (use (match_operand:HSI 1 "reg_or_mem_operand"))] "" { - if (!REG_P (operands[0]) && !REG_P (operands[1])) -operands[1] = force_reg (HImode, operands[1]); + rtx dest = operands[0]; + rtx src = operands[1]; + + if (!REG_P (dest) && !REG_P (src)) +src = force_reg (mode, src); + + if (MEM_P (src)) +emit_insn (gen_bswap2_load (dest, src)); + else +{ + if (MEM_P (dest)) + emit_insn (gen_bswap2_store (dest, src)); + else + emit_insn (gen_bswap2_reg (dest, src)); +} + DONE; }) -(define_insn "bswaphi2_internal" - [(set (match_operand:HI 0 "reg_or_mem_operand" "=r,Z,&r") - (bswap:HI -(match_operand:HI 1 "reg_or_mem_operand" "Z,r,r"))) - (clobber (match_scratch:SI 2 "=X,X,&r"))] +(define_insn "bswap2_load" + [(set (match_operand:BSWAP 0 "gpc_reg_operand" "=r") + (bswap:BSWAP (match_operand:BSWAP 1 "memory_operand" "Z")))] + "" + "lbrx %0,%y1" + [(
Re: [PATCH #2, rs6000][GCC6] Fix PR78543, ICE in push_reload on powerpc64le-linux-gnu
Well instead of attaching the ChangeLog, I attached the patch without ChangeLog. Here is the ChangeLog entry for the patch: [gcc] 2017-03-24 Michael Meissner PR target/78543 * config/rs6000/rs6000.md (BSWAP): New mode iterator for modes with hardware byte swap load/store intstructions. (bswaphi2_extenddi): Combine bswap HImode and SImode with zero extend to DImode to one insn. (bswap2_extenddi): Likewise. (bswapsi2_extenddi): Likewise. (bswaphi2_extendsi): Likewise. (bswaphi2): Combine bswap HImode and SImode into one insn. Separate memory insns from swapping register. (bswapsi2): Likewise. (bswap2): Likewise. (bswaphi2_internal): Delete, no longer used. (bswapsi2_internal): Likewise. (bswap2_load): Split bswap HImode/SImode into separate load, store, and gpr<-gpr swap insns. (bswap2_store): Likewise. (bswaphi2_reg): Register only splitter, combine with the splitter. (bswaphi2 splitter): Likewise. (bswapsi2_reg): Likewise. (bswapsi2 splitter): Likewise. (bswapdi2): If we have the LDBRX and STDBRX instructions, split the insns into load, store, and register/register insns. (bswapdi2_ldbrx): Likewise. (bswapdi2_load): Likewise. (bswapdi2_store): Likewise. (bswapdi2_reg): Likewise. [gcc/testsuite] 2017-03-24 Michael Meissner PR target/78543 * gcc.target/powerpc/pr78543.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
Hi! On Fri, Mar 24, 2017 at 08:36:16PM +0100, Jakub Jelinek wrote: > + /* Skip over reg notes not related to CFI information. */ > + while (n1) > + { > + for (i = 0; i < ARRAY_SIZE (cfa_note_kinds) - 1; i++) > + if (REG_NOTE_KIND (n1) == cfa_note_kinds[i]) > + break; > + if (i != ARRAY_SIZE (cfa_note_kinds)) > + break; > + n1 = XEXP (n1, 1); > + } Maybe factor out reg_note_is_cfa_note and/or insns_have_identical_cfa_notes functions? The patch looks fine to me btw. Thanks for working on this! Segher
Re: [PATCH #2, rs6000][GCC6] Fix PR78543, ICE in push_reload on powerpc64le-linux-gnu
Hi Mike, On Fri, Mar 24, 2017 at 07:23:02PM -0400, Michael Meissner wrote: > Reload fumbles in certain conditions. Yeah. And it does not need bswap to get totally lost with this, so this patch is a workaround, not a fix. It does make things nicer though :-) > LRA generates working code, but the > store and the load with byte reverse from the same location, can slow things > down compared to the operation on registers. How so? You mean (say) lwbrx after doing a stw to that same address? We have that problem in general :-/ Thanks for the extensive testing. > Can I apply the patch to the GCC 7 trunk? Since the patch shows up as an > abort > in GCC 6 and not in GCC 7 (unless you disable LRA), I would like to apply the > patch as soon as possible to the GCC 6 branch. A few remarks: > +; Types that have hardware load/store with byte reverse > +(define_mode_iterator BSWAP [HI > + SI > + (DI "TARGET_POWERPC64 && TARGET_LDBRX")]) The patch would be much easier to read if you had done this step (with HSI as well) separately. Oh well. > +(define_expand "bswap2" > + if (MEM_P (src)) > +emit_insn (gen_bswap2_load (dest, src)); > + else > +{ > + if (MEM_P (dest)) > + emit_insn (gen_bswap2_store (dest, src)); > + else > + emit_insn (gen_bswap2_reg (dest, src)); > +} > + DONE; Please avoid the extra nesting, i.e. do + if (MEM_P (src)) +emit_insn (gen_bswap2_load (dest, src)); + else if (MEM_P (dest)) +emit_insn (gen_bswap2_store (dest, src)); + else +emit_insn (gen_bswap2_reg (dest, src)); + DONE; (also for bswapdi2). > + [(set_attr "length" "12") > + (set_attr "type" "*")]) Lose that last line? You don't need to explicitly set things to their default value ;-) OTOH, you might want to make it type "three" instead? > + [(set_attr "length" "36") > + (set_attr "type" "*")]) Same. This patch is okay for trunk. Also for 6, but what is the hurry there? Thanks! Segher
LRA fix for 80160
This fixes two PRs; we shouldn't try to avoid spilling a reg if it has an alternate class it can use. Bootstrapped and tested on x86_64-linux, approved by Vlad, committed. Bernd Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 246472) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2017-03-25 Bernd Schmidt + + PR rtl-optimization/80160 + PR rtl-optimization/80159 + * lra-assigns.c (must_not_spill_p): Tighten new test to also take + reg_alternate_class into account. + 2017-03-24 Vladimir Makarov PR target/80148 Index: gcc/lra-assigns.c === --- gcc/lra-assigns.c (revision 246472) +++ gcc/lra-assigns.c (working copy) @@ -908,7 +908,8 @@ must_not_spill_p (unsigned spill_regno) does not solve the general case where existing reloads fully cover a limited register class. */ if (!bitmap_bit_p (&non_reload_pseudos, spill_regno) - && reg_class_size [reg_preferred_class (spill_regno)] == 1) + && reg_class_size [reg_preferred_class (spill_regno)] == 1 + && reg_alternate_class (spill_regno) == NO_REGS) return true; return false; } Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 246472) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2017-03-25 Bernd Schmidt + + PR rtl-optimization/80160 + PR rtl-optimization/80159 + + * gcc.target/i386/pr80160.c: New test. + 2017-03-24 Jakub Jelinek PR sanitizer/79904 Index: gcc/testsuite/gcc.target/i386/pr80160.c === --- gcc/testsuite/gcc.target/i386/pr80160.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr80160.c (working copy) @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -w" } */ +/* { dg-additional-options "-march=pentium-mmx" { target ia32 } } */ + +typedef struct { long long a; } a_t; +int *a, b; +a_t *e, c; +long long f; +void fn (int); +void fn2 (void); +int fn3 (a_t); +void fn4 (a_t); +a_t foo (long long val) { return (a_t){val}; } +static void +bar (int ka) +{ + unsigned i; + for (i = 0; i < 512; i++) { +long d; +c = (a_t){d}; +fn2 (); + } + fn (ka); +} +void +test (void) +{ + a_t g; + a_t *h, j; + h = e; + j = *h; + if (e == (a_t *) 1) { +a_t k = {fn3 (j)}; +fn4 (j); +long l; +g = foo((long long)b << 2 | l); +k = g; +if (j.a != k.a) { + a_t m = g; + int n = m.a, o = m.a >> 32; + asm ("# %0 %1 %2 %3" : "=m"(*a), "+A"(f) : "b"(n), "c"(o)); +} + } + bar ((int) h); +}
Re: [PATCH] Don't cross-jump in between frame related and non-frame related insns (PR target/80102, take 3)
On Fri, Mar 24, 2017 at 06:37:46PM -0500, Segher Boessenkool wrote: > On Fri, Mar 24, 2017 at 08:36:16PM +0100, Jakub Jelinek wrote: > > + /* Skip over reg notes not related to CFI information. */ > > + while (n1) > > + { > > + for (i = 0; i < ARRAY_SIZE (cfa_note_kinds) - 1; i++) > > + if (REG_NOTE_KIND (n1) == cfa_note_kinds[i]) > > + break; > > + if (i != ARRAY_SIZE (cfa_note_kinds)) > > + break; > > + n1 = XEXP (n1, 1); > > + } > > Maybe factor out reg_note_is_cfa_note and/or insns_have_identical_cfa_notes > functions? Well, for the reg_note_is_cfa_note, actually it might be better to just have a const bool array indexed by the note enum instead of the loop, I'll implement it later. And yes, I can outline insns_have_identical_cfa_notes. Jakub