[PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases
Richard, I've tried to handle more LSHIFT_EXPR cases with a shift range in tree-vrp. Currently we handle cases like this: - non-negative shifting out zeros [5, 6] << [1, 2] == [10, 24] This patch adds these cases: - unsigned shifting out ones [0xff00, 0x] << [1, 2] == [0xfc00, 0xfffe] - negative numbers [-1, 1] << [1, 2] == [-4, 4] My previous patch (for PR53986) contained a bug (test-case vrp82.c) which makes vrp evaluate: - [minint, 1] << [1, 2] == [0, 4] It should conservatively have checked for vr0.min >= 0, for the signed negative case. This patch fixes that bug as well. Bootstrapped and regtested (ada inclusive) on x86_64. OK for trunk? Thanks, - Tom 2012-09-10 Tom de Vries * tree-vrp.c (extract_range_from_binary_expr_1): Fix bug in handling of LSHIFT_EXPR with shift range. Handle more LSHIFT_EXPR cases with shift range. * gcc.dg/tree-ssa/vrp81.c: New test. * gcc.dg/tree-ssa/vrp81-2.c: Same. * gcc.dg/tree-ssa/vrp82.c: Same. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 191089) +++ gcc/tree-vrp.c (working copy) @@ -2766,20 +2766,63 @@ else if (code == LSHIFT_EXPR && range_int_cst_p (&vr0)) { - int overflow_pos = TYPE_PRECISION (expr_type); + int prec = TYPE_PRECISION (expr_type); + int overflow_pos = prec; int bound_shift; - double_int bound; + double_int bound, complement, low_bound, high_bound; + bool uns = TYPE_UNSIGNED (expr_type); + bool in_bounds = false; - if (!TYPE_UNSIGNED (expr_type)) + if (!uns) overflow_pos -= 1; bound_shift = overflow_pos - TREE_INT_CST_LOW (vr1.max); - bound = double_int_one.llshift (bound_shift, - TYPE_PRECISION (expr_type)); - if (tree_to_double_int (vr0.max).ult (bound)) + /* If bound_shift == HOST_BITS_PER_DOUBLE_INT, the llshift can + overflow. However, for that to happen, vr1.max needs to be + zero, which means vr1 is a singleton range of zero, which + means it should be handled by the previous LSHIFT_EXPR + if-clause. */ + bound = double_int_one.llshift (bound_shift, prec); + complement = ~(bound - double_int_one); + + if (uns) { - /* In the absense of overflow, (a << b) is equivalent - to (a * 2^b). */ + low_bound = bound; + high_bound = complement.zext (prec); + if (tree_to_double_int (vr0.max).ult (low_bound)) + { + /* [5, 6] << [1, 2] == [10, 24]. */ + /* We're shifting out only zeroes, the value increases + monotomically. */ + in_bounds = true; + } + else if (high_bound.ult (tree_to_double_int (vr0.min))) + { + /* [0xff00, 0x] << [1, 2] + == [0xfc00, 0xfffe]. */ + /* We're shifting out only ones, the value decreases + monotomically. */ + in_bounds = true; + } + } + else + { + /* [-1, 1] << [1, 2] == [-4, 4]. */ + low_bound = complement.sext (prec); + high_bound = bound; + if (tree_to_double_int (vr0.max).slt (high_bound) + && low_bound.slt (tree_to_double_int (vr0.min))) + { + /* For non-negative numbers, we're shifting out only + zeroes, the value increases monotomically. + For negative numbers, we're shifting out only ones, the + value decreases monotomically. */ + in_bounds = true; + } + } + + if (in_bounds) + { extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1); return; } Index: gcc/testsuite/gcc.dg/tree-ssa/vrp81-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/vrp81-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/vrp81-2.c (revision 0) @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-vrp1" } */ + +extern void vrp_keep (void); + +void +f2 (int c, int b) +{ + int s = 0; + if (c == 0) +s += 1; + else if (c < 1) +s -= 1; + /* s in [-1, 1]. */ + b = (b & 1) + 1; + /* b in range [1, 2]. */ + b = s << b; + /* b in range [-4, 4]. */ + if (b == -4) +vrp_keep (); + if (b == 4) +vrp_keep (); +} + +void +f3 (int s, int b) +{ + if (s >> 3 == -2) +{ + /* s in range [-16, -9]. */ + b = (b & 1) + 1; + /* b in range [1, 2]. */ + b = s << b; + /* b in range [bmin << smax, bmax << smin], +== [-16 << 2, -9 << 1] +== [-64, -18]. */ + if (b == -64) + vrp_keep (); + if (b == -18) + vrp_keep (); +} +} + +void +f4 (unsigned int s, unsigned int b) +{ + s |= ~(0xffU); + /* s in [0xff00, 0x]. */ + b = (b & 1) + 1; + /* b in [1, 2]. */ + b = s << b; + /* s in [0xfc00, 0xfffe]. */ + if (b == ~0x3ffU) +vrp_keep (); + if (b == ~0x1U) +vrp_keep (); +} + +/* { dg-final { scan-tree-dump-times "vrp_keep \\(" 6 "vrp1"} }
Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases
On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote: > * gcc.dg/tree-ssa/vrp81.c: New test. > * gcc.dg/tree-ssa/vrp81-2.c: Same. > * gcc.dg/tree-ssa/vrp82.c: Same. Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added vrp80-2.c test to vrp81.c)? Jakub
Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases
On 14/09/12 09:38, Jakub Jelinek wrote: > On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote: >> * gcc.dg/tree-ssa/vrp81.c: New test. >> * gcc.dg/tree-ssa/vrp81-2.c: Same. >> * gcc.dg/tree-ssa/vrp82.c: Same. > > Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added > vrp80-2.c test to vrp81.c)? > My thinking behind this was the following: vrp80.c and vrp80-2.c are 2 versions of more or less the same code. In one version, we test whether the inclusive bounds of the range are folded. In the other version we test whether the exclusive bounds of the range are not folded. Given that rationale, should I leave the names like this or rename them? Thanks, - Tom > Jakub >
Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases
On Fri, Sep 14, 2012 at 09:51:48AM +0200, Tom de Vries wrote: > On 14/09/12 09:38, Jakub Jelinek wrote: > > On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote: > >>* gcc.dg/tree-ssa/vrp81.c: New test. > >>* gcc.dg/tree-ssa/vrp81-2.c: Same. > >>* gcc.dg/tree-ssa/vrp82.c: Same. > > > > Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added > > vrp80-2.c test to vrp81.c)? > > > > My thinking behind this was the following: vrp80.c and vrp80-2.c are 2 > versions > of more or less the same code. In one version, we test whether the inclusive > bounds of the range are folded. In the other version we test whether the > exclusive bounds of the range are not folded. IMHO it is enough to give them consecutive numbers, there are many cases where multiple vrpNN.c tests have been added for more or less the same code, but I don't care that much, will leave that decision to Richard as the probable reviewer. Jakub
Re: [PATCH] Changes in mode switching
Additionaly. You can find the patch history in http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01590.html. I need this changes for my implementation of vzeroupper placement: for some statements I have no needs doing real insertion. I tested the changes on bootstrap using config ../gcc/configure --prefix=/export/users/vbyakovl/workspaces/vzu/install-middle --enable-languages=c,c++,fortran 2012/9/14 Vladimir Yakovlev : > Hello, > > I reproduced the failure and found reason of it. I understood haw it > resolve and now I need small changes only - additional argument of > EMIT_MODE_SET. Is it good fo trunk? > > Thank you, > Vladimir > > 2012-09-14 Vladimir Yakovlev > > * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls. > > * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument. > > * config/i386/i386.h (EMIT_MODE_SET): Added an argument. > > * config/sh/sh.h (EMIT_MODE_SET): Added an argument. > > > 2012/8/29 Vladimir Yakovlev : >> I built using last configure. >> >> Thank you, >> Vladimir >> >> 2012/8/29 Kaz Kojima : I tryed ../gcc/configure --host=i686-pc-linux-gnu --target=sh4-unknown-linux-gnu --enable-build-with-cxx --enable-lto --enable-shared --enable-threads=posix --enable-clocale=gnu --enable-libitm --enable-libgcj --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu --with-mpc=/opt2/i686-pc-linux-gnu --with-libelf=/opt2/i686-pc-linux-gnu --with-ppl=no --enable-languages=c,c++,fortran,java,lto,objc --prefix=/export/users/mstester/stability/work/trunk/64/install_sh4 and have got build error. make.log attached. Could you take a look? >>> >>> make.log says >>> make[2]: i686-pc-linux-gnu-ar: Command not found >>> >>> It looks your build system is x86_64-unknown-linux-gnu. >>> Perhaps with specifying --host=x86_64-unknown-linux-gnu instead >>> of --host=i686-pc-linux-gnu in your configuration, that error >>> could be resolved, though >>> --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu --with-mpc=/opt2/i686-pc-linux-gnu --with-libelf=/opt2/i686-pc-linux-gnu >>> >>> are strongly specific to my environment. Maybe >>> >>> ../gcc/configure --host=x86_64-unknown-linux-gnu >>> --target=sh4-unknown-linux-gnu --enable-languages=c >>> >>> and >>> >>> make all-gcc >>> >>> is enough to get cc1 for sh4-unknown-linux-gnu. >>> >>> Best Regards, >>> kaz
Re: Backtrace library [3/3]
A few quick comments, 1) Although mmap is not guaranteed to be async-signal-safe, in practice it should be as you mentioned previously. However I see that when using mmap, the implementation uses pthread mutexes. These are not guaranteed to be async-signal-safe either, but I guess in practice as long as you don't try to use the same mutex in "normal" code and the signal handler (in this case, the same state struct), it should be OK. Is this so? 2) In backtrace_print(), in lieu of a standard (de-facto or otherwise) for the backtrace format, why not follow gdb? I.e. a format string like "#%d 0x%lx in %s at %s:%d\n" where the first argument is a frame number, otherwise as currently done. -- Janne Blomqvist
Re: [PATCH] Combine location with block using block_locations
> I think it's going to make GCC harder to maintain if we drop the -g0 > vs. -g no-code-difference requirement for just some optimization > levels. Seconded, this is surely going to open yet another can of worms. -- Eric Botcazou
Re: Backtrace library [3/3]
On Fri, Sep 14, 2012 at 11:50:31AM +0300, Janne Blomqvist wrote: > A few quick comments, > > 1) Although mmap is not guaranteed to be async-signal-safe, in > practice it should be as you mentioned previously. However I see that > when using mmap, the implementation uses pthread mutexes. These are > not guaranteed to be async-signal-safe either, but I guess in practice Not just not guaranteed, tho locking is just not async-signal-safe. > as long as you don't try to use the same mutex in "normal" code and > the signal handler (in this case, the same state struct), it should be > OK. Is this so? At least if you get interrupted by async signal while in libbacktrace (trying to acquire/release some lock or hold it), you'd be definitely out of luck. File locking could be used as async signal safe locking, but then there is still the case of what to do when backtrace* is invoked while the "lock" is held. If the locking is just about initialization of the DWARF reader, then you might want to prepare everything and just atomically change some global pointer. Jakub
Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT
On Thu, 13 Sep 2012, Steven Bosscher wrote: > On Wed, Sep 12, 2012 at 4:52 PM, Steven Bosscher wrote: > > On Wed, Sep 12, 2012 at 4:02 PM, Richard Guenther wrote: > >> for a followup (and I bet sth else than PRE blows up at -O2 as well). > > > > Actually, the only thing that really blows up is that enemy of scalability, > > VRP. > > FWIW, this appears to be due to the well-known problem with the equiv > set, but also due to the liveness computations that tree-vrp performs, > since your commit in r139263. > > Any reason why you didn't just re-use the tree-ssa-live machinery? Probably I didn't know about it or didn't want to keep the full life problem life (it tries to free things as soon as possible). > And any reason why you don't let a DEF kill a live SSA name? AFAICT > you're exposing all SSA names up without ever killing a USE :-) Eh ;) We also should traverse blocks backward I suppose. Also the RPO traversal suffers from the same issue I noticed in PRE and for what I invented my_rev_post_order_compute ... (pre_and_rev_post_order_compute doesn't compute an optimal reverse post order). Patch fixing the liveness below, untested sofar, apart from on tree-ssa.exp where it seems to regress gcc.dg/tree-ssa/pr21086.c :/ As for the equiv sets - yes, that's known. I wanted to investigate at some point what happens if we instead record the SSA name we registered the assert for (thus look up a chain of lattice values instead of recording all relevant entries in a bitmap). ISTR there were some correctness issues, but if we restrict the chaining maybe we cat get a good compromise here. Richard. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 191289) +++ gcc/tree-vrp.c (working copy) @@ -5439,7 +5439,6 @@ find_assert_locations_1 (basic_block bb, { gimple_stmt_iterator si; gimple last; - gimple phi; bool need_assert; need_assert = false; @@ -5462,7 +5461,7 @@ find_assert_locations_1 (basic_block bb, /* Traverse all the statements in BB marking used names and looking for statements that may infer assertions for their used operands. */ - for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + for (si = gsi_last_bb (bb); !gsi_end_p (si); gsi_prev (&si)) { gimple stmt; tree op; @@ -5531,6 +5530,9 @@ find_assert_locations_1 (basic_block bb, } } } + + FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF) + RESET_BIT (live, SSA_NAME_VERSION (op)); } /* Traverse all PHI nodes in BB marking used operands. */ @@ -5538,7 +5540,11 @@ find_assert_locations_1 (basic_block bb, { use_operand_p arg_p; ssa_op_iter i; - phi = gsi_stmt (si); + gimple phi = gsi_stmt (si); + tree res = gimple_phi_result (phi); + + if (virtual_operand_p (res)) + continue; FOR_EACH_PHI_ARG (arg_p, phi, i, SSA_OP_USE) { @@ -5546,6 +5552,8 @@ find_assert_locations_1 (basic_block bb, if (TREE_CODE (arg) == SSA_NAME) SET_BIT (live, SSA_NAME_VERSION (arg)); } + + RESET_BIT (live, SSA_NAME_VERSION (res)); } return need_assert;
Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable
On Thu, Sep 13, 2012 at 07:46:52PM -0700, H.J. Lu wrote: > There is no reason why --eh-frame-hdr can't be used with static > executable on Linux. This patch enables --eh-frame-hdr for static Well, there is. For more than 2 years after the addition of --eh-frame-hdr support dl_iterate_phdr in libc.a would simply always fail, you aren't adding any kind of check that old glibc (2001-2003ish) isn't used. Even in newer glibcs, it relies on AT_* aux vector values provided by the kernel, if they are not provided for whatever reason, it would fail. Jakub
Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT
On Fri, 14 Sep 2012, Richard Guenther wrote: > On Thu, 13 Sep 2012, Steven Bosscher wrote: > > > On Wed, Sep 12, 2012 at 4:52 PM, Steven Bosscher wrote: > > > On Wed, Sep 12, 2012 at 4:02 PM, Richard Guenther wrote: > > >> for a followup (and I bet sth else than PRE blows up at -O2 as well). > > > > > > Actually, the only thing that really blows up is that enemy of > > > scalability, VRP. > > > > FWIW, this appears to be due to the well-known problem with the equiv > > set, but also due to the liveness computations that tree-vrp performs, > > since your commit in r139263. > > > > Any reason why you didn't just re-use the tree-ssa-live machinery? > > Probably I didn't know about it or didn't want to keep the full life > problem life (it tries to free things as soon as possible). > > > And any reason why you don't let a DEF kill a live SSA name? AFAICT > > you're exposing all SSA names up without ever killing a USE :-) > > Eh ;) We also should traverse blocks backward I suppose. Also > the RPO traversal suffers from the same issue I noticed in PRE > and for what I invented my_rev_post_order_compute ... > (pre_and_rev_post_order_compute doesn't compute an optimal > reverse post order). > > Patch fixing the liveness below, untested sofar, apart from on > tree-ssa.exp where it seems to regress gcc.dg/tree-ssa/pr21086.c :/ The following not. Queued for testing (though I doubt it will help memory usage due to the use of sbitmaps). I think jump threading special-casing asserts and the equiv bitmaps are the real problem of VRP. What testcase did you notice the live issue on? Thanks, Richard. 2012-09-14 Richard Guenther * tree-vrp.c (register_new_assert_for): Simplify for backward walk. (find_assert_locations_1): Walk the basic-block backwards, properly add/prune from live. Use live for asserts derived from stmts. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 191289) +++ gcc/tree-vrp.c (working copy) @@ -4384,24 +4384,7 @@ register_new_assert_for (tree name, tree && (loc->expr == expr || operand_equal_p (loc->expr, expr, 0))) { - /* If the assertion NAME COMP_CODE VAL has already been -registered at a basic block that dominates DEST_BB, then -we don't need to insert the same assertion again. Note -that we don't check strict dominance here to avoid -replicating the same assertion inside the same basic -block more than once (e.g., when a pointer is -dereferenced several times inside a block). - -An exception to this rule are edge insertions. If the -new assertion is to be inserted on edge E, then it will -dominate all the other insertions that we may want to -insert in DEST_BB. So, if we are doing an edge -insertion, don't do this dominance check. */ - if (e == NULL - && dominated_by_p (CDI_DOMINATORS, dest_bb, loc->bb)) - return; - - /* Otherwise, if E is not a critical edge and DEST_BB + /* If E is not a critical edge and DEST_BB dominates the existing location for the assertion, move the assertion up in the dominance tree by updating its location information. */ @@ -5439,7 +5422,6 @@ find_assert_locations_1 (basic_block bb, { gimple_stmt_iterator si; gimple last; - gimple phi; bool need_assert; need_assert = false; @@ -5462,7 +5444,7 @@ find_assert_locations_1 (basic_block bb, /* Traverse all the statements in BB marking used names and looking for statements that may infer assertions for their used operands. */ - for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + for (si = gsi_last_bb (bb); !gsi_end_p (si); gsi_prev (&si)) { gimple stmt; tree op; @@ -5479,8 +5461,10 @@ find_assert_locations_1 (basic_block bb, tree value; enum tree_code comp_code; - /* Mark OP in our live bitmap. */ - SET_BIT (live, SSA_NAME_VERSION (op)); + /* If op is not live beyond this stmt, do not bother to insert +asserts for it. */ + if (!TEST_BIT (live, SSA_NAME_VERSION (op))) + continue; /* If OP is used in such a way that we can infer a value range for it, and we don't find a previous assertion for @@ -5520,25 +5504,28 @@ find_assert_locations_1 (basic_block bb, } } - /* If OP is used only once, namely in this STMT, don't -bother creating an ASSERT_EXPR for it. Such an -ASSERT_EXPR would do nothing but increase compile time. */ - if (!has_single_use (op)) - { - register_new_assert_for (op, op, comp_code, value, -
Re: PR libstdc++/54576: random_device isn't protected by _GLIBCXX_USE_C99_STDINT_TR1
Hi, On 09/14/2012 03:16 AM, H.J. Lu wrote: Hi, include/random has #ifdef _GLIBCXX_USE_C99_STDINT_TR1 #include // For uint_fast32_t, uint_fast64_t, uint_least32_t #include #include #endif // _GLIBCXX_USE_C99_STDINT_TR1 random_device is defined in . But src/c++11/random.cc has #include ... void random_device::_M_init(const std::string& token) { It doesn't check if _GLIBCXX_USE_C99_STDINT_TR1 is defined. This patch checks it. OK to install? I thought this was already history, because it's a Dup. Anyway, the obvious patch is Ok, thanks, but please put a blank line right after "#ifdef _GLIBCXX_USE_C99_STDINT_TR1"- Thanks! Paolo.
Re: minor cleanup in forwprop: use get_prop_source_stmt more
On Thu, Sep 13, 2012 at 8:05 PM, Marc Glisse wrote: > Hello, > > this patch is a minor cleanup of my previous forwprop patches for vectors. I > have known about get_prop_source_stmt from the beginning, but for some > reason I always used SSA_NAME_DEF_STMT. This makes the source code slightly > shorter, and together with PR 54565 it should help get some optimizations to > apply as early as forwprop1 instead of forwprop2. > > There is one line I had badly indented. I am not sure what the policy is for > that. Silently bundling it with this patch as I am doing is probably not so > good. I should probably just fix it in svn without asking the list, but I > was wondering if I should add a ChangeLog entry and post the committed patch > to the list afterwards? (that's what I would do by default, at worst it is a > bit of spam) That's ok, no changelog needed for that. Ok. Thanks, Richard. > passes bootstrap+testsuite > > 2012-09-14 Marc Glisse > > * tree-ssa-forwprop.c (simplify_bitfield_ref): Call > get_prop_source_stmt. > (simplify_permutation): Likewise. > (simplify_vector_constructor): Likewise. > > -- > Marc Glisse > Index: tree-ssa-forwprop.c > === > --- tree-ssa-forwprop.c (revision 191247) > +++ tree-ssa-forwprop.c (working copy) > @@ -2599,23 +2599,22 @@ simplify_bitfield_ref (gimple_stmt_itera >elem_type = TREE_TYPE (TREE_TYPE (op0)); >if (TREE_TYPE (op) != elem_type) > return false; > >size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); >op1 = TREE_OPERAND (op, 1); >n = TREE_INT_CST_LOW (op1) / size; >if (n != 1) > return false; > > - def_stmt = SSA_NAME_DEF_STMT (op0); > - if (!def_stmt || !is_gimple_assign (def_stmt) > - || !can_propagate_from (def_stmt)) > + def_stmt = get_prop_source_stmt (op0, false, NULL); > + if (!def_stmt || !can_propagate_from (def_stmt)) > return false; > >op2 = TREE_OPERAND (op, 2); >idx = TREE_INT_CST_LOW (op2) / size; > >code = gimple_assign_rhs_code (def_stmt); > >if (code == VEC_PERM_EXPR) > { >tree p, m, index, tem; > @@ -2630,21 +2629,21 @@ simplify_bitfield_ref (gimple_stmt_itera > { > p = gimple_assign_rhs1 (def_stmt); > } >else > { > p = gimple_assign_rhs2 (def_stmt); > idx -= nelts; > } >index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size); >tem = build3 (BIT_FIELD_REF, TREE_TYPE (op), > -unshare_expr (p), op1, index); > + unshare_expr (p), op1, index); >gimple_assign_set_rhs1 (stmt, tem); >fold_stmt (gsi); >update_stmt (gsi_stmt (*gsi)); >return true; > } > >return false; > } > > /* Determine whether applying the 2 permutations (mask1 then mask2) > @@ -2682,40 +2681,40 @@ is_combined_permutation_identity (tree m > /* Combine a shuffle with its arguments. Returns 1 if there were any > changes made, 2 if cfg-cleanup needs to run. Else it returns 0. */ > > static int > simplify_permutation (gimple_stmt_iterator *gsi) > { >gimple stmt = gsi_stmt (*gsi); >gimple def_stmt; >tree op0, op1, op2, op3, arg0, arg1; >enum tree_code code; > + bool single_use_op0 = false; > >gcc_checking_assert (gimple_assign_rhs_code (stmt) == VEC_PERM_EXPR); > >op0 = gimple_assign_rhs1 (stmt); >op1 = gimple_assign_rhs2 (stmt); >op2 = gimple_assign_rhs3 (stmt); > >if (TREE_CODE (op2) != VECTOR_CST) > return 0; > >if (TREE_CODE (op0) == VECTOR_CST) > { >code = VECTOR_CST; >arg0 = op0; > } >else if (TREE_CODE (op0) == SSA_NAME) > { > - def_stmt = SSA_NAME_DEF_STMT (op0); > - if (!def_stmt || !is_gimple_assign (def_stmt) > - || !can_propagate_from (def_stmt)) > + def_stmt = get_prop_source_stmt (op0, false, &single_use_op0); > + if (!def_stmt || !can_propagate_from (def_stmt)) > return 0; > >code = gimple_assign_rhs_code (def_stmt); >arg0 = gimple_assign_rhs1 (def_stmt); > } >else > return 0; > >/* Two consecutive shuffles. */ >if (code == VEC_PERM_EXPR) > @@ -2740,35 +2739,31 @@ simplify_permutation (gimple_stmt_iterat >return remove_prop_source_from_use (op0) ? 2 : 1; > } > >/* Shuffle of a constructor. */ >else if (code == CONSTRUCTOR || code == VECTOR_CST) > { >tree opt; >bool ret = false; >if (op0 != op1) > { > - if (TREE_CODE (op0) == SSA_NAME && !has_single_use (op0)) > + if (TREE_CODE (op0) == SSA_NAME && !single_use_op0) > return 0; > > if (TREE_CODE (op1) == VECTOR_CST) > arg1 = op1; > else if (TREE_CODE (op1) == SSA_NAME) > { > enum tree_code code2; > > - if (!has_single_use (op1)) > - return 0; > - > -
Re: [PATCH] Changes in mode switching
Vladimir Yakovlev wrote: > I reproduced the failure and found reason of it. I understood haw it > resolve and now I need small changes only - additional argument of > EMIT_MODE_SET. Is it good fo trunk? > > Thank you, > Vladimir > > 2012-09-14 Vladimir Yakovlev > > * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls. > > * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument. > > * config/i386/i386.h (EMIT_MODE_SET): Added an argument. > > * config/sh/sh.h (EMIT_MODE_SET): Added an argument. No new failures on sh4-unknown-linux-gnu with your patch. Looks OK to me, though I have no authority to approve it except SH specific part. BTW, I guess that the active voice is usual in gcc/ChangeLog. Also, perhaps mailer issue, a tab should be used for indentation instead of 8 spaces and the empty line isn't required between items. Maybe something like * mode-switching.c (optimize_mode_switching): Add an argument EMIT_MODE_SET calls. * config/epiphany/epiphany.h (EMIT_MODE_SET): Add an argument. * config/i386/i386.h (EMIT_MODE_SET): Likewise. * config/sh/sh.h (EMIT_MODE_SET): Likewise. is a usual form. Regards, kaz
Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases
On Fri, Sep 14, 2012 at 9:59 AM, Jakub Jelinek wrote: > On Fri, Sep 14, 2012 at 09:51:48AM +0200, Tom de Vries wrote: >> On 14/09/12 09:38, Jakub Jelinek wrote: >> > On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote: >> >>* gcc.dg/tree-ssa/vrp81.c: New test. >> >>* gcc.dg/tree-ssa/vrp81-2.c: Same. >> >>* gcc.dg/tree-ssa/vrp82.c: Same. >> > >> > Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added >> > vrp80-2.c test to vrp81.c)? >> > >> >> My thinking behind this was the following: vrp80.c and vrp80-2.c are 2 >> versions >> of more or less the same code. In one version, we test whether the inclusive >> bounds of the range are folded. In the other version we test whether the >> exclusive bounds of the range are not folded. > > IMHO it is enough to give them consecutive numbers, there are many cases > where multiple vrpNN.c tests have been added for more or less the same code, > but I don't care that much, will leave that decision to Richard as the > probable reviewer. I agree with Jakub - the patch is ok with adjusting the testcase names. Thanks, Richard. > Jakub
Re: [Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode
Terry Guo wrote: > -/* { dg-final { scan-assembler "movs\tr\[0-9\]" } } */ > +/* { dg-final { scan-assembler "lsrs\tr\[0-9\]" { target arm_thumb2_ok } } > } */ > +/* { dg-final { scan-assembler "movs\tr\[0-9\]" { target { ! arm_thumb2_ok > } } } } */ This causes the arm.exp testcase to fail with a tcl error for me: ERROR: tcl error sourcing /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp. ERROR: unmatched open brace in list while executing "foreach op $tmp { verbose "Processing option: $op" 3 set status [catch "$op" errmsg] if { $status != 0 } { if { 0 && [info exists errorInfo] }..." (procedure "saved-dg-test" line 75) invoked from within "saved-dg-test /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/combine-movs.c {} { -ansi -pedantic-errors}" ("eval" body line 1) invoked from within "eval saved-dg-test $args " (procedure "dg-test" line 10) invoked from within "dg-test $testcase $flags ${default-extra-flags}" (procedure "dg-runtest" line 10) invoked from within "dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ "" $DEFAULT_CFLAGS" (file "/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp" line 37) invoked from within "source /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp" ("uplevel" body line 1) invoked from within "uplevel #0 source /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp" invoked from within "catch "uplevel #0 source $test_file_name"" which seems to be caused by a missing space between two closing braces. Fixed by the following patch. Committed to mainline as obvious. Bye, Ulrich ChangeLog: * gcc.target/arm/combine-movs.c: Add missing space. Index: gcc/testsuite/gcc.target/arm/combine-movs.c === *** gcc/testsuite/gcc.target/arm/combine-movs.c (revision 191254) --- gcc/testsuite/gcc.target/arm/combine-movs.c (working copy) *** void foo (unsigned long r[], unsigned in *** 9,13 r[i] = 0; } ! /* { dg-final { scan-assembler "lsrs\tr\[0-9\]" { target arm_thumb2_ok } }} */ /* { dg-final { scan-assembler "movs\tr\[0-9\]" { target { ! arm_thumb2_ok} } } } */ --- 9,13 r[i] = 0; } ! /* { dg-final { scan-assembler "lsrs\tr\[0-9\]" { target arm_thumb2_ok } } } */ /* { dg-final { scan-assembler "movs\tr\[0-9\]" { target { ! arm_thumb2_ok} } } } */ -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [PATCH 0/4] slim-lto-bootstrap.mk build-target support
On 2012.09.13 at 14:51 -0700, Andi Kleen wrote: > Markus Trippelsdorf writes: > > > Because there is no enthusiastic support for a full libtool update, > > here is a minimal version that adds a new slim-lto-bootstrap > > build-config. > > Can you split the two patches? libtool and ltmain? Thanks for extracting > those out. I've split them into four. > Looks good to me, but eventually this should be just the default for > lto-bootstrap Yes, but lets test the new build-config a little bit first. BTW, I was wondering if I should add: AR_FOR_TARGET = gcc-ar NM_FOR_TARGET = gcc-nm RANLIB_FOR_TARGET = gcc-ranlib to the build target. If the patches look Ok, I would be nice if someone could commit them, because I have no access. Thanks. -- Markus
Re: [PATCH 1/4] Add slim-lto support to libtool.m4
Credits go to Ralf Wildenhues. 2012-09-13 Markus Trippelsdorf * libtool.m4 : Handle slim-lto objects diff --git a/libtool.m4 b/libtool.m4 index a7f99ac..5754fb1 100644 --- a/libtool.m4 +++ b/libtool.m4 @@ -3434,6 +3434,7 @@ for ac_symprfx in "" "_"; do else lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[[ ]]\($symcode$symcode*\)[[ ]][[ ]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'" fi + lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | sed '/ __gnu_lto/d'" # Check to see that the pipe works correctly. pipe_works=no @@ -4451,7 +4452,7 @@ _LT_EOF if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null \ && test "$tmp_diet" = no then - tmp_addflag= + tmp_addflag=' $pic_flag' tmp_sharedflag='-shared' case $cc_basename,$host_cpu in pgcc*) # Portland Group C compiler @@ -5517,8 +5518,8 @@ if test "$_lt_caught_CXX_error" != yes; then # Check if GNU C++ uses GNU ld as the underlying linker, since the # archiving commands below assume that GNU ld is being used. if test "$with_gnu_ld" = yes; then -_LT_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' -_LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' +_LT_TAGVAR(archive_cmds, $1)='$CC $pic_flag -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' +_LT_TAGVAR(archive_expsym_cmds, $1)='$CC $pic_flag -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' _LT_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' @@ -6495,6 +6496,13 @@ public class foo { }; _LT_EOF ]) + +_lt_libdeps_save_CFLAGS=$CFLAGS +case "$CC $CFLAGS " in #( +*\ -flto*\ *) CFLAGS="$CFLAGS -fno-lto" ;; +*\ -fwhopr*\ *) CFLAGS="$CFLAGS -fno-whopr" ;; +esac + dnl Parse the compiler output and extract the necessary dnl objects, libraries and library flags. if AC_TRY_EVAL(ac_compile); then @@ -6543,6 +6551,7 @@ if AC_TRY_EVAL(ac_compile); then fi ;; +*.lto.$objext) ;; # Ignore GCC LTO objects *.$objext) # This assumes that the test object file only shows up # once in the compiler output. @@ -6578,6 +6587,7 @@ else fi $RM -f confest.$objext +CFLAGS=$_lt_libdeps_save_CFLAGS # PORTME: override above test o
Re: [PATCH 2/4] Add slim-lto support to ltmain.sh
Credits go to Ralf Wildenhues. 2012-09-13 Markus Trippelsdorf * ltmain.sh: Handle lto options diff --git a/ltmain.sh b/ltmain.sh index a03433f..2e09101 100644 --- a/ltmain.sh +++ b/ltmain.sh @@ -4980,7 +4980,8 @@ func_mode_link () # @file GCC response files # -tp=* Portland pgcc target processor selection -64|-mips[0-9]|-r[0-9][0-9]*|-xarch=*|-xtarget=*|+DA*|+DD*|-q*|-m*| \ - -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*) + -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*| \ + -O*|-flto*|-fwhopr|-fuse-linker-plugin) func_quote_for_eval "$arg" arg="$func_quote_for_eval_result" func_append com
Re: [PATCH 3/4] Pass CFLAGS to fixincludes
2012-09-13 Markus Trippelsdorf * Makefile.in (configure-(build-)fixincludes): Pass CFLAGS diff --git a/Makefile.in b/Makefile.in index 0108162..891168d 100644 --- a/Makefile.in +++ b/Makefile.in @@ -2835,6 +2835,7 @@ configure-build-fixincludes: test ! -f $(BUILD_SUBDIR)/fixincludes/Makefile || exit 0; \ $(SHELL) $(srcdir)/mkinstalldirs $(BUILD_SUBDIR)/fixincludes ; \ $(BUILD_EXPORTS) \ + CFLAGS="$(STAGE_CFLAGS)"; export CFLAGS; \ echo Configuring in $(BUILD_SUBDIR)/fixincludes; \ cd "$(BUILD_SUBDIR)/fixincludes" || exit 1; \ case $(srcdir) in \ @@ -2870,6 +2871,7 @@ all-build-fixincludes: configure-build-fixincludes $(BUILD_EXPORTS) \ (cd $(BUILD_SUBDIR)/fixincludes && \ $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_BUILD_FLAGS) \ + CFLAGS="$(STAGE_CFLAGS)" \ $(TARGET-build-fixincludes)) @endif build-fixincludes @@ -7745,6 +7747,7 @@ configure-fixincludes: test ! -f $(HOST_SUBDIR)/fixincludes/Makefile || exit 0; \ $(SHELL) $(srcdir)/mkinstalldirs $(HOST_SUBDIR)/fixincludes ; \ $(HOST_EXPORTS) \ + CFLAGS="$(STAGE_CFLAGS)"; export CFLAGS; \ echo Configuring in $(HOST_SUBDIR)/fixincludes; \ cd "$(HOST_SUBDIR)/fixincludes" || exit 1; \ case $(srcdir) in \ @@ -7779,6 +7782,7 @@ all-fixincludes: configure-fixincludes $(HOST_EXPORTS) \ (cd $(HOST_SUBDIR)/fixincludes && \ $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) \ + CFLAGS="$(STAGE_CFLAGS)" \ $(TARGET-fixincludes)) @endif fixincludes -- Markus
Re: [PATCH 4/4] Add new slim-lto-bootstrap.mk build-target
2012-09-13 Markus Trippelsdorf * config/slim-lto-bootstrap.mk: new build-config diff --git a/config/slim-lto-bootstrap.mk b/config/slim-lto-bootstrap.mk new file mode 100644 index 000..11d1252 --- /dev/null +++ b/config/slim-lto-bootstrap.mk @@ -0,0 +1,9 @@ +# This option enables slim LTO for stage2 and stage3. + +STAGE2_CFLAGS += -flto=jobserver -fno-fat-lto-objects -frandom-seed=1 +STAGE3_CFLAGS += -flto=jobserver -fno-fat-lto-objects -frandom-seed=1 +STAGE_CFLAGS += -fuse-linker-plugin +STAGEprofile_CFLAGS += -fno-lto +AR = gcc-ar +NM = gcc-nm +RANLIB = gcc-ranlib -- Markus
Re: [PATCH] Combine location with block using block_locations
On 2012-09-14 04:59 , Eric Botcazou wrote: I think it's going to make GCC harder to maintain if we drop the -g0 vs. -g no-code-difference requirement for just some optimization levels. Seconded, this is surely going to open yet another can of worms. Agreed. Diego.
Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT
On Fri, Sep 14, 2012 at 11:43 AM, Richard Guenther wrote: >> > Any reason why you didn't just re-use the tree-ssa-live machinery? >> >> Probably I didn't know about it or didn't want to keep the full life >> problem life (it tries to free things as soon as possible). I think it'd be good to use the tree-ssa-live stuff instead of a local liveness dataflow solver, even if that may allocate a bit of extra memory. tree-ssa-live is very smart about how liveness is computed. I'll experiment with using it in tree-vrp. >> > And any reason why you don't let a DEF kill a live SSA name? AFAICT >> > you're exposing all SSA names up without ever killing a USE :-) >> >> Eh ;) We also should traverse blocks backward I suppose. Also >> the RPO traversal suffers from the same issue I noticed in PRE >> and for what I invented my_rev_post_order_compute ... >> (pre_and_rev_post_order_compute doesn't compute an optimal >> reverse post order). Eh, what do you mean with "optimal" here? Yikes, I didn't know about my_rev_post_order_compute. How horrible! That function doesn't compute reverse post-order of the CFG, but a post-order of the reverse CFG! > The following not. Queued for testing (though I doubt it will > help memory usage due to the use of sbitmaps). I think > jump threading special-casing asserts and the equiv bitmaps are > the real problem of VRP. What testcase did you notice the live > issue on? I don't remember exactly what test case it was, but it had a large switch in it. Maybe Brad Lucier's scheme interpreter, but I 'm not sure. Ciao! Steven
Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT
On Fri, 14 Sep 2012, Steven Bosscher wrote: > On Fri, Sep 14, 2012 at 11:43 AM, Richard Guenther wrote: > >> > Any reason why you didn't just re-use the tree-ssa-live machinery? > >> > >> Probably I didn't know about it or didn't want to keep the full life > >> problem life (it tries to free things as soon as possible). > > I think it'd be good to use the tree-ssa-live stuff instead of a local > liveness dataflow solver, even if that may allocate a bit of extra > memory. tree-ssa-live is very smart about how liveness is computed. > I'll experiment with using it in tree-vrp. Ok. Note that VRP likes to know liveness at a given stmt (but actually doesn't use it - it would, with my patch) here: /* If OP is used only once, namely in this STMT, don't bother creating an ASSERT_EXPR for it. Such an ASSERT_EXPR would do nothing but increase compile time. */ if (!has_single_use (op)) that really wants to know if anything would consume the assert, thus if OP is live below stmt. > >> > And any reason why you don't let a DEF kill a live SSA name? AFAICT > >> > you're exposing all SSA names up without ever killing a USE :-) > >> > >> Eh ;) We also should traverse blocks backward I suppose. Also > >> the RPO traversal suffers from the same issue I noticed in PRE > >> and for what I invented my_rev_post_order_compute ... > >> (pre_and_rev_post_order_compute doesn't compute an optimal > >> reverse post order). > > Eh, what do you mean with "optimal" here? > > Yikes, I didn't know about my_rev_post_order_compute. How horrible! > That function doesn't compute reverse post-order of the CFG, but a > post-order of the reverse CFG! Ok, well - then that's what we need for compute_antic to have minmal number of iterations and it is what VRP needs. Visit all successors before BB if possible. ;) > > > The following not. Queued for testing (though I doubt it will > > help memory usage due to the use of sbitmaps). I think > > jump threading special-casing asserts and the equiv bitmaps are > > the real problem of VRP. What testcase did you notice the live > > issue on? > > I don't remember exactly what test case it was, but it had a large > switch in it. Maybe Brad Lucier's scheme interpreter, but I 'm not > sure. I see. Richard.
[PATCH] Fix PR54565
This, as suggested in the PR arranges update_address_taken to be run before forwprop. The patch simply moves it to after CCP instead of before alias computation. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2012-09-14 Richard Guenther PR tree-optimization/54565 * passes.c (init_optimization_passes): Adjust comments. (execute_function_todo): Do not execute execute_update_addresses_taken before processing TODO_rebuild_alias. * tree-ssa-ccp.c (do_ssa_ccp): Schedule TODO_update_address_taken. * gcc.dg/tree-ssa/ssa-ccp-17.c: Adjust. * gcc.dg/tree-ssa/forwprop-6.c: Likewise. Remove XFAIL. Index: gcc/passes.c === --- gcc/passes.c(revision 191289) +++ gcc/passes.c(working copy) @@ -1297,11 +1297,11 @@ init_optimization_passes (void) NEXT_PASS (pass_remove_cgraph_callee_edges); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_ccp); + /* After CCP we rewrite no longer addressed locals into SSA +form if possible. */ NEXT_PASS (pass_forwprop); /* pass_build_ealias is a dummy pass that ensures that we -execute TODO_rebuild_alias at this point. Re-building -alias information also rewrites no longer addressed -locals into SSA form if possible. */ +execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_sra_early); NEXT_PASS (pass_fre); @@ -1371,11 +1371,11 @@ init_optimization_passes (void) NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_complete_unrolli); NEXT_PASS (pass_ccp); + /* After CCP we rewrite no longer addressed locals into SSA +form if possible. */ NEXT_PASS (pass_forwprop); /* pass_build_alias is a dummy pass that ensures that we -execute TODO_rebuild_alias at this point. Re-building -alias information also rewrites no longer addressed -locals into SSA form if possible. */ +execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_alias); NEXT_PASS (pass_return_slot); NEXT_PASS (pass_phiprop); @@ -1414,6 +1414,8 @@ init_optimization_passes (void) NEXT_PASS (pass_object_sizes); NEXT_PASS (pass_strlen); NEXT_PASS (pass_ccp); + /* After CCP we rewrite no longer addressed locals into SSA +form if possible. */ NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_optimize_bswap); @@ -1773,13 +1775,10 @@ execute_function_todo (void *data) cfun->last_verified &= ~TODO_verify_ssa; } - if (flags & TODO_rebuild_alias) -{ - execute_update_addresses_taken (); - if (flag_tree_pta) - compute_may_aliases (); -} - else if (optimize && (flags & TODO_update_address_taken)) + if (flag_tree_pta && (flags & TODO_rebuild_alias)) +compute_may_aliases (); + + if (optimize && (flags & TODO_update_address_taken)) execute_update_addresses_taken (); if (flags & TODO_remove_unused_locals) Index: gcc/tree-ssa-ccp.c === --- gcc/tree-ssa-ccp.c (revision 191289) +++ gcc/tree-ssa-ccp.c (working copy) @@ -2105,7 +2105,8 @@ do_ssa_ccp (void) ccp_initialize (); ssa_propagate (ccp_visit_stmt, ccp_visit_phi_node); if (ccp_finalize ()) -todo = (TODO_cleanup_cfg | TODO_update_ssa | TODO_remove_unused_locals); +todo = (TODO_cleanup_cfg | TODO_update_ssa | TODO_remove_unused_locals + | TODO_update_address_taken); free_dominance_info (CDI_DOMINATORS); return todo; } Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-17.c === --- gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-17.c (revision 191289) +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-17.c (working copy) @@ -26,7 +26,7 @@ int foobar(void) return ((const struct Foo *)p)->i; } -/* { dg-final { scan-tree-dump "= i;" "ccp1" } } */ +/* { dg-final { scan-tree-dump "= i_.;" "ccp1" } } */ /* { dg-final { scan-tree-dump "= f.i;" "ccp1" } } */ /* { dg-final { scan-tree-dump "= g.i;" "ccp1" } } */ /* { dg-final { cleanup-tree-dump "ccp1" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-6.c === --- gcc/testsuite/gcc.dg/tree-ssa/forwprop-6.c (revision 191289) +++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-6.c (working copy) @@ -22,6 +22,7 @@ void f(void) particular situation before doing this transformation we have to assure that a is killed by a dominating store via type float for it to be valid. Then we might as well handle the situation by - value-numbering, removing the load alltogether. */ -/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 1 "forwprop1" { xf
Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT
On Fri, 14 Sep 2012, Richard Guenther wrote: > As for the equiv sets - yes, that's known. I wanted to investigate > at some point what happens if we instead record the SSA name we > registered the assert for (thus look up a chain of lattice values > instead of recording all relevant entries in a bitmap). ISTR there > were some correctness issues, but if we restrict the chaining maybe > we cat get a good compromise here. Actually the only users are compare_name_with_value and compare_names. The idea with a simple chain, thus struct value_range_d { ... /* SSA name whose value range is equivalent to this one. */ tree equiv; }; is only non-trivial for intersections (thus PHI nodes). Still there the equivalent range is the nearest dominating one from the chains of all PHI arguments. I can only think of the way we do iteration that would mess up the chains (but not sure if we need to require them to be in dom order). Probably best to try implement the above alongside the bitmaps and compare the results in compare_name_with_value / compare_names. Richard.
Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable
On Fri, Sep 14, 2012 at 2:41 AM, Jakub Jelinek wrote: > On Thu, Sep 13, 2012 at 07:46:52PM -0700, H.J. Lu wrote: >> There is no reason why --eh-frame-hdr can't be used with static >> executable on Linux. This patch enables --eh-frame-hdr for static > > Well, there is. For more than 2 years after the addition of --eh-frame-hdr > support dl_iterate_phdr in libc.a would simply always fail, you aren't > adding any kind of check that old glibc (2001-2003ish) isn't used. > Even in newer glibcs, it relies on AT_* aux vector values provided by the > kernel, if they are not provided for whatever reason, it would fail. > > Jakub It was implemented in http://sourceware.org/ml/libc-alpha/2003-10/msg00098.html for glibc 2.3.0 and we can check AT_PHDR: 0x400040 AT_PHNUM:10 with LD_SHOW_AUXV. -- H.J.
Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)
On Thu, Sep 13, 2012 at 10:42 AM, Uros Bizjak wrote: > On Thu, Sep 13, 2012 at 5:52 PM, Jakub Jelinek wrote: > >> The fma-*.c testcase show that these intrinsics probably mean to preserve >> the high elements (other than the lowest) of the first argument of the >> fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve >> there the destination register, but that varies - for 132 and 213 it is the >> first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway), >> for 231 it is the last one). What the expander did was to put there >> an uninitialized pseudo, so we ended up with pretty random content, before >> H.J's http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190492 it happened >> to work by accident, but when things changed slightly and reload chose >> different alternative, this broke. >> >> The following patch fixes it, by tweaking the header so that the first >> argument is not negated (we negate the second one instead), as we don't want >> to negate the high elements if e.g. for whatever reason combiner doesn't >> match it. It fixes the expander to use a dup of the X operand as the high >> element provider for the pattern, removes the 231 alternatives (because >> those provide different destination high elements) and removes commutative >> marker (again, that would mean different high elements). > > Can we introduce additional "*fmai_fmadd__1" pattern (and > others) that would cover missing 231 alternative? > >> 2012-09-13 Jakub Jelinek >> >> PR target/54564 >> * config/i386/sse.md (fmai_vmfmadd_): Use (match_dup 1) >> instead of (match_dup 0) as second argument to vec_merge. >> (*fmai_fmadd_, *fmai_fmsub_): Likewise. >> Remove third alternative. >> (*fmai_fnmadd_, *fmai_fnmsub_): Likewise. Negate >> operand 2 instead of operand 1, but put it as first argument >> of fma. >> >> * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss, >> _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead >> of the first. > > OK, but header change should be also reviewed by H.J. > It looks OK to me. Thanks. -- H.J.
Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT
On Fri, Sep 14, 2012 at 12:50 PM, Richard Guenther wrote: >> Yikes, I didn't know about my_rev_post_order_compute. How horrible! >> That function doesn't compute reverse post-order of the CFG, but a >> post-order of the reverse CFG! > > Ok, well - then that's what we need for compute_antic to have > minimal number of iterations and it is what VRP needs. Visit > all successors before BB if possible. Right, visit all successors of BB before BB itself, aka visiting in topological order of the reverse CFG. But your my_rev_post_order_compute doesn't actually compute a post-order of the reverse CFG. The first block pushed into the array is EXIT_BLOCK, iff include_entry_exit==true. Fortunately, the function is only ever called with include_entry_exit==false. Ciao! Steven
Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable
On Fri, Sep 14, 2012 at 05:12:19AM -0700, H.J. Lu wrote: > On Fri, Sep 14, 2012 at 2:41 AM, Jakub Jelinek wrote: > > Well, there is. For more than 2 years after the addition of --eh-frame-hdr > > support dl_iterate_phdr in libc.a would simply always fail, you aren't > > adding any kind of check that old glibc (2001-2003ish) isn't used. > > Even in newer glibcs, it relies on AT_* aux vector values provided by the > > kernel, if they are not provided for whatever reason, it would fail. > It was implemented in > > http://sourceware.org/ml/libc-alpha/2003-10/msg00098.html > > for glibc 2.3.0 and we can check Yeah, I know, but that is still later than 2001 when it was implemented for dynamically linked executables. USE_PT_GNU_EH_FRAME is defined even for glibc 2.2.something (if DT_CONFIG macro is defined in headers). > AT_PHDR: 0x400040 > AT_PHNUM:10 > > with LD_SHOW_AUXV. I was worried about some loaders that wouldn't pass the aux vector down. E.g. valgrind's loader does, but perhaps others wouldn't need to. Anyway, IMHO statically linked binaries aren't something one should spend too much time on, they shouldn't be used (with very few exceptions) at all. Jakub
Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT
On Fri, 14 Sep 2012, Steven Bosscher wrote: > On Fri, Sep 14, 2012 at 12:50 PM, Richard Guenther wrote: > >> Yikes, I didn't know about my_rev_post_order_compute. How horrible! > >> That function doesn't compute reverse post-order of the CFG, but a > >> post-order of the reverse CFG! > > > > Ok, well - then that's what we need for compute_antic to have > > minimal number of iterations and it is what VRP needs. Visit > > all successors before BB if possible. > > Right, visit all successors of BB before BB itself, aka visiting in > topological order of the reverse CFG. But your > my_rev_post_order_compute doesn't actually compute a post-order of the > reverse CFG. The first block pushed into the array is EXIT_BLOCK, iff > include_entry_exit==true. Fortunately, the function is only ever > called with include_entry_exit==false. Oops. If you can figure out a better name for the function we should probably move it to cfganal.c Richard.
[C++ Patch] PR 54575
Hi, here we crash because strip_typedefs while processing _RequireInputIter calls make_typename_type which returns error_mark_node (# line 3281). Thus I'm simply checking for result == error_mark_node right after the switch (in the switch finish_decltype_type could also return error_mark_node, for example) and before handling the alignment (where we are crashing now). Issue seems rather straightforward. Tested x86_64-linux. Thanks, Paolo. PS: I'm also attaching a patchlet for a couple of hard errors in make_typename_type not protected by (complain & tf_error) spotted in make_typename_type. / /cp 2012-09-14 Paolo Carlini PR c++/54575 * tree.c (strip_typedefs): Check result for error_mark_node. * pt.c (canonicalize_type_argument): Check strip_typedefs return value for error_mark_node. /testsuite 2012-09-14 Paolo Carlini PR c++/54575 * g++.dg/cpp0x/pr54575.C: New. Index: testsuite/g++.dg/cpp0x/pr54575.C === --- testsuite/g++.dg/cpp0x/pr54575.C(revision 0) +++ testsuite/g++.dg/cpp0x/pr54575.C(revision 0) @@ -0,0 +1,27 @@ +// PR c++/54575 +// { dg-do compile { target c++11 } } + +template +struct is_convertible { static const bool value = true; }; + +template struct enable_if { }; +template<> struct enable_if { typedef int type; }; + +template +using _RequireInputIter += typename enable_if::value>::type; + +template +struct X +{ + template> +void insert(_InputIterator) {} +}; + +template +void foo() +{ + X subdomain_indices; + subdomain_indices.insert(0); +} Index: cp/tree.c === --- cp/tree.c (revision 191290) +++ cp/tree.c (working copy) @@ -1210,8 +1210,10 @@ strip_typedefs (tree t) break; } + if (result == error_mark_node) +return error_mark_node; if (!result) - result = TYPE_MAIN_VARIANT (t); +result = TYPE_MAIN_VARIANT (t); if (TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (result) || TYPE_ALIGN (t) != TYPE_ALIGN (result)) { Index: cp/pt.c === --- cp/pt.c (revision 191290) +++ cp/pt.c (working copy) @@ -6114,6 +6114,8 @@ canonicalize_type_argument (tree arg, tsubst_flags return arg; mv = TYPE_MAIN_VARIANT (arg); arg = strip_typedefs (arg); + if (arg == error_mark_node) +return error_mark_node; if (TYPE_ALIGN (arg) != TYPE_ALIGN (mv) || TYPE_ATTRIBUTES (arg) != TYPE_ATTRIBUTES (mv)) { 2012-09-14 Paolo Carlini * decl.c (make_typename_type): Only error out if tf_error is set in complain. Index: decl.c === --- decl.c (revision 191290) +++ decl.c (working copy) @@ -3235,13 +3235,15 @@ make_typename_type (tree context, tree name, enum name = TREE_OPERAND (fullname, 0) = DECL_NAME (name); else if (TREE_CODE (name) == OVERLOAD) { - error ("%qD is not a type", name); + if (complain & tf_error) + error ("%qD is not a type", name); return error_mark_node; } } if (TREE_CODE (name) == TEMPLATE_DECL) { - error ("%qD used without template parameters", name); + if (complain & tf_error) + error ("%qD used without template parameters", name); return error_mark_node; } gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
Re: vector comparisons in C++
On 09/13/2012 07:37 PM, Marc Glisse wrote: Looks like a latent bug in fold_unary. The following seems to work in this case. Looks good. Jason
Re: vector comparisons in C++
Here is the patch I just tested. Changes compared to the previous patch include: * same_type_ignoring_top_level_qualifiers_p * build_vector_type: don't use an opaque vector for the return type of operator< (not sure what the point was of making it opaque?) * Disable BIT_AND -> TRUTH_AND optimization for vectors * Disable (type)(a (a a<=b, use a vector type, not boolean * 2 more testcases (through which I discovered the issues) I am sure there are other optimizations that do weird things, but I should stop increasing the size of this patch... I'll update the doc in a later patch. Ok? 2012-09-14 Marc Glisse PR c++/54427 gcc/ChangeLog * fold-const.c (fold_unary_loc): Disable for VECTOR_TYPE. (fold_binary_loc): Likewise. * gimple-fold.c (and_comparisons_1): Handle VECTOR_TYPE. (or_comparisons_1): Likewise. gcc/cp/ChangeLog * typeck.c (cp_build_binary_op) [LSHIFT_EXPR, RSHIFT_EXPR, EQ_EXPR, NE_EXPR, LE_EXPR, GE_EXPR, LT_EXPR, GT_EXPR]: Handle VECTOR_TYPE. gcc/testsuite/ChangeLog * g++.dg/other/vector-compare.C: New testcase. * gcc/testsuite/c-c++-common/vector-compare-3.c: New testcase. * gcc.dg/vector-shift.c: Move ... * c-c++-common/vector-shift.c: ... here. * gcc.dg/vector-shift1.c: Move ... * c-c++-common/vector-shift1.c: ... here. * gcc.dg/vector-shift3.c: Move ... * c-c++-common/vector-shift3.c: ... here. * gcc.dg/vector-compare-1.c: Move ... * c-c++-common/vector-compare-1.c: ... here. * gcc.dg/vector-compare-2.c: Move ... * c-c++-common/vector-compare-2.c: ... here. * gcc.c-torture/execute/vector-compare-1.c: Move ... * c-c++-common/torture/vector-compare-1.c: ... here. * gcc.c-torture/execute/vector-compare-2.x: Delete. * gcc.c-torture/execute/vector-compare-2.c: Move ... * c-c++-common/torture/vector-compare-2.c: ... here. * gcc.c-torture/execute/vector-shift.c: Move ... * c-c++-common/torture/vector-shift.c: ... here. * gcc.c-torture/execute/vector-shift2.c: Move ... * c-c++-common/torture/vector-shift2.c: ... here. * gcc.c-torture/execute/vector-subscript-1.c: Move ... * c-c++-common/torture/vector-subscript-1.c: ... here. * gcc.c-torture/execute/vector-subscript-2.c: Move ... * c-c++-common/torture/vector-subscript-2.c: ... here. * gcc.c-torture/execute/vector-subscript-3.c: Move ... * c-c++-common/torture/vector-subscript-3.c: ... here. -- Marc GlisseIndex: gcc/testsuite/g++.dg/other/vector-compare.C === --- gcc/testsuite/g++.dg/other/vector-compare.C (revision 0) +++ gcc/testsuite/g++.dg/other/vector-compare.C (revision 0) @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu++11 -Wall" } */ + +// Check that we can compare vector types that really are the same through +// typedefs. + +typedef float v4f __attribute__((vector_size(4*sizeof(float; + +template void eat (T&&) {} + +template +struct Vec +{ + typedef T type __attribute__((vector_size(4*sizeof(T; + + template + static void fun (type const& t, U& u) { eat (t > u); } +}; + +long long +f (v4f *x, v4f const *y) +{ + return ((*x < *y) | (*x <= *y))[2]; +} + +int main () +{ + v4f x = {0,1,2,3}; + typedef decltype (x < x) v4i; + v4i y = {4,5,6,7}; // v4i is not opaque + Vec::type f = {-1,5,2,3.1}; + v4i c = (x == f) == y; + eat (c); + Vec::fun (f, x); + Vec::fun (x, f); + Vec::fun (f, f); + Vec::fun (x, x); +} Property changes on: gcc/testsuite/g++.dg/other/vector-compare.C ___ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/testsuite/c-c++-common/vector-compare-3.c === --- gcc/testsuite/c-c++-common/vector-compare-3.c (revision 0) +++ gcc/testsuite/c-c++-common/vector-compare-3.c (revision 0) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef int v4i __attribute__((vector_size(4*sizeof(int; + +// fold should not turn (vec_other)(x *y; +} + Property changes on: gcc/testsuite/c-c++-common/vector-compare-3.c ___ Added: svn:eol-style + native Added: svn:keywords + Author Date Id Revision URL Index: gcc/testsuite/c-c++-common/vector-shift.c === --- gcc/testsuite/c-c++-common/vector-shift.c (revision 190834) +++ gcc/testsuite/c-c++-common/vector-shift.c (working copy) @@ -1,11 +1,12 @@ /* { dg-do compile } */ +/* { dg-prune-output "in evaluation of" } */ #define vector(elcount, type) \ __attribute__((vector_size((elcount)*sizeof(type type int main (int argc, char *argv[]) { vector(4,char)
Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable
On Fri, Sep 14, 2012 at 5:26 AM, Jakub Jelinek wrote: > On Fri, Sep 14, 2012 at 05:12:19AM -0700, H.J. Lu wrote: >> On Fri, Sep 14, 2012 at 2:41 AM, Jakub Jelinek wrote: >> > Well, there is. For more than 2 years after the addition of --eh-frame-hdr >> > support dl_iterate_phdr in libc.a would simply always fail, you aren't >> > adding any kind of check that old glibc (2001-2003ish) isn't used. >> > Even in newer glibcs, it relies on AT_* aux vector values provided by the >> > kernel, if they are not provided for whatever reason, it would fail. > >> It was implemented in >> >> http://sourceware.org/ml/libc-alpha/2003-10/msg00098.html >> >> for glibc 2.3.0 and we can check > > Yeah, I know, but that is still later than 2001 when it was implemented for > dynamically linked executables. > USE_PT_GNU_EH_FRAME is defined even for glibc 2.2.something (if DT_CONFIG > macro is defined in headers). Here is a patch to add an option to use --eh-frame-hdr on static. It won't enable it for uclibc since it is > glibc 2.2. >> AT_PHDR: 0x400040 >> AT_PHNUM:10 >> >> with LD_SHOW_AUXV. > > I was worried about some loaders that wouldn't pass the aux vector down. > E.g. valgrind's loader does, but perhaps others wouldn't need to. Those loaders are broken for binaries, static or dynamic, which use AUXV, independent of this change, and they should be fixed. It shouldn't block using --eh-frame-hdr for -static. > Anyway, IMHO statically linked binaries aren't something one should spend > too much time on, they shouldn't be used (with very few exceptions) at all. > Android doesn't need those legacy stuff. We have to keep it since GCC doesn't pass --eh-frame-hdr for -static. At minimum, I'd like a configure option to use --eh-frame-hdr for -static, even if it is off by default. OK to install? Thanks. -- H.J. --- gcc/ 2012-09-14 H.J. Lu PR debug/54568 * configure.ac: Add --enable-eh-frame-hdr-for-static. Set USE_EH_FRAME_HDR_FOR_STATIC if PT_GNU_EH_FRAME is supported for static executable. * config.in: Regenerated. * configure: Likewise. * config/gnu-user.h (LINK_EH_SPEC): Defined as "--eh-frame-hdr " if USE_EH_FRAME_HDR_FOR_STATIC is defined. * config/sol2.h (LINK_EH_SPEC): Likewise. * config/openbsd.h (LINK_EH_SPEC): Likewise. * config/alpha/elf.h (LINK_EH_SPEC): Likewise. * config/freebsd.h (LINK_EH_SPEC): Likewise. * config/rs6000/sysv4.h (LINK_EH_SPEC): Likewise. gcc/testsuite/ 2012-09-13 H.J. Lu PR debug/54568 * g++.dg/eh/spec3-static.C: New test. libgcc/ 2012-09-14 H.J. Lu PR debug/54568 * crtstuff.c (USE_PT_GNU_EH_FRAME): Check CRTSTUFFT_O together with USE_EH_FRAME_HDR_FOR_STATIC. gcc-pr54568.patch Description: Binary data
Re: [PATCH 0/4] slim-lto-bootstrap.mk build-target support
Markus Trippelsdorf writes: > > If the patches look Ok, I would be nice if someone could commit them, > because I have no access. I cannot approve them, so needs someone to do that first. But I can commit once that's done. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: vector comparisons in C++
On 09/14/2012 09:59 AM, Marc Glisse wrote: * build_vector_type: don't use an opaque vector for the return type of operator< (not sure what the point was of making it opaque?) I think the point was to allow conversion of the result to a different vector type. Why do you want it not to be opaque? * Disable (type)(a (a Right. It certainly doesn't help for integer vectors. + error_at (location, "comparing vectors with different " + "element types"); Let's print the vector types in these errors. Jason
Re: [C++ Patch] PR 54575
On 09/14/2012 09:05 AM, Paolo Carlini wrote: here we crash because strip_typedefs while processing _RequireInputIter calls make_typename_type which returns error_mark_node (# line 3281). strip_typedefs should not return error_mark_node unless it gets error_mark_node as an argument; if a typedef-using type is valid, removing the typedefs should still produce a valid type. I ran into a similar bug when I was working on the instantiation-dependent stuff, but I don't remember which bit fixed that one. PS: I'm also attaching a patchlet for a couple of hard errors in make_typename_type not protected by (complain & tf_error) spotted in make_typename_type. This patchlet is OK. Jason
[PATCH][ARM] Require that one of the operands be a register in thumb2_movdf_vfp
Hi, There are some Thumb2 patterns in vfp.md which are duplicated with only minor changes from their ARM equivalents. This patch adds requirement in "*thumb2_movdf_vfp" pattern, that one of the operands sould be register, like in ARM "*movdf_vfp" pattern. There is one functional change: the ARM "*movdf_vfp" pattern disallows the [mem]=const store case while the Thumb2 one does not. The ARM version is more desirable, because [mem]=const would be split anyway and add temporary vfp register. Example: double delta; void foo () { delta = 0.0; } Generated code (without patch): Compiler options: g++ -mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mthumb -O2 movsr0, #0 movsr1, #0 fmdrr d16, r0, r1 fstdd16, [r3, #0] Generated code (with patch): movsr0, #0 movsr1, #0 strdr0, [r3] Regtested with QEMU-ARM. Code size: SPEC2K INT and FP sizes both decrease by ~0.2% (up to 2.64% on 252.eon). Performance: 252.eon +4.3%, other tests almost unaffected; SPEC2K FP grows by 0.16% (up to 1.8% on 183.equake). (Detailed data below). Ok for trunk? BasePeak Benchmarks SizeSize ----- --- 164.gzip33223332230 0,000% 175.vpr108947 1089470 0,000% 176.gcc936220 9362200 0,000% 181.mcf 7768 77680 0,000% 186.crafty 158714 1587140 0,000% 197.parser 66371663710 0,000% 252.eon239982 233646 6336 2,640% 253.perlbmk367652 367628 24 0,007% 254.gap311193 3111930 0,000% 255.vortex 349276 3492688 0,002% 256.bzip2 23044230440 0,000% 300.twolf 144094 143990 104 0,072% SPECint2000 2746484 2740012 6472 0,236% BasePeak Benchmarks SizeSize ----- --- 168.wupwise 17541175410 0,000% 171.swim 8662 8686 -24 -0,277% 172.mgrid 1007210088 -16 -0,159% 173.applu 4319342585 608 1,408% 177.mesa 337863 337735 128 0,038% 178.galge l123270 122854 416 0,337% 179.art 1176211686 76 0,646% 183.equake 1483614620 216 1,456% 187.facerec 44554445540 0,000% 188.ammp8594085612 328 0,382% 189.lucas 3176431700 64 0,201% 191.fma3d 662349 662215 134 0,020% 200.sixtrack 713980 712324 1656 0,232% 301.apsi7427773997 280 0,377% SPECfp20002153860 2149970 3890 0,181% BasePeak Benchmarks SizeSize ----- --- 252.eon 0.7 0.67 0.3 4.286% SPECint2000 4.286% 168.wupwise 18,1618,04 0,12 0,661% 171.swim 1,5 1,52-0,02 -1,333% 172.mgrid 40,3340,39-0,06 -0,149% 173.applu0,54 0,55-0,01 -1,852% 177.mesa 2,56 2,57-0,01 -0,391% 178.galgel 22,1 22 0,1 0,452% 179.art 12,3912,29 0,1 0,807% 183.equake 2,25 2,21 0,04 1,778% 187.facerec 10,2510,21 0,04 0,390% 188.ammp22,21 22,1 0,11 0,495% 189.lucas6,26 6,25 0,01 0,160% 191.fma3d 0,6450,648 -0,003 -0,465% 200.sixtrack16,19 16,2-0,01 -0,062% 301.apsi10,8410,92-0,08 -0,738% SPECfp2000146.565 146.3380.227 0.155% -- Best Regards, Ruben. thumb2_movdf_vfp.diff Description: Binary data
Re: vector comparisons in C++
On Fri, 14 Sep 2012, Jason Merrill wrote: On 09/14/2012 09:59 AM, Marc Glisse wrote: * build_vector_type: don't use an opaque vector for the return type of operator< (not sure what the point was of making it opaque?) I think the point was to allow conversion of the result to a different vector type. Ah, I see. I'll change it back to opaque and remove that use from the testcase. I noticed that for the fold patch, I am using once opaque and once not-opaque, I'll make them both opaque, although it probably doesn't matter once we are out of the front-end. Why do you want it not to be opaque? I wanted to use decltype(xsize as x, and then actually be able to use it. Being opaque, it refuses to be initialized (cp/decl.c:5550). Maybe decltype (and others?) could return non-opaque types? + error_at (location, "comparing vectors with different " + "element types"); Let's print the vector types in these errors. Type is %qT right? I see a number of %q#T but can't remember where the doc is. Well, I'll try both and see what happens. Thanks, -- Marc Glisse
GCC 4.7.2 Status Report (2012-09-14), branch frozen
The GCC 4.7 branch is now frozen for creating a first release candidate of the GCC 4.7.2 release. All changes need explicit release manager approval until the final release of GCC 4.7.2 which should happen roughly one week after the release candidate if no issues show up with it. Previous Report === http://gcc.gnu.org/ml/gcc/2012-06/msg00195.html
Re: [C++ Patch] PR 54575
Hi, On 09/14/2012 04:36 PM, Jason Merrill wrote: On 09/14/2012 09:05 AM, Paolo Carlini wrote: here we crash because strip_typedefs while processing _RequireInputIter calls make_typename_type which returns error_mark_node (# line 3281). strip_typedefs should not return error_mark_node unless it gets error_mark_node as an argument; if a typedef-using type is valid, removing the typedefs should still produce a valid type. I see, makes sense. What I'm seeing is that make_typename_type, called for this context (the enable_if): align 8 symtab 0 alias set -1 canonical type 0x7694ac78 context full-name "struct enable_if::value>" no-binfo use_template=1 interface-unknown chain > wants to return error_mark_node, because here: if (!dependent_scope_p (context)) /* We should only set WANT_TYPE when we're a nested typename type. Then we can give better diagnostics if we find a non-type. */ t = lookup_field (context, name, 2, /*want_type=*/true); else t = NULL_TREE; if ((!t || TREE_CODE (t) == TREE_LIST) && dependent_type_p (context)) return build_typename_type (context, name, fullname, tag_type); want_template = TREE_CODE (fullname) == TEMPLATE_ID_EXPR; if (!t) { if (complain & tf_error) error (want_template ? G_("no class template named %q#T in %q#T") : G_("no type named %q#T in %q#T"), name, context); return error_mark_node; } lookup_field returns NULL_TREE and dependent_type_p (context) is false. It seems to me that the return value of lookup_field is right. Thus either dependent_type_p shouldn't be false or something is wrong in the logic or strip_typedefs should not call make_typename_type at all?!? Paolo.
Re: [PATCH] Set correct source location for deallocator calls
ping... Thanks, Dehao On Sun, Sep 9, 2012 at 5:42 AM, Dehao Chen wrote: > Hi, > > I've added a libjava unittest which verifies that this patch will not > break Java debug info. I've also incorporated Richard's review in the > previous mail. Attached is the new patch, which passed bootstrap and > all gcc/libjava testsuites on x86. > > Is it ok for trunk? > > Thanks, > Dehao > > gcc/ChangeLog: > 2012-09-08 Dehao Chen > > * tree-eh.c (goto_queue_node): New field. > (record_in_goto_queue): New parameter. > (record_in_goto_queue_label): New parameter. > (lower_try_finally_dup_block): New parameter. > (maybe_record_in_goto_queue): Update source location. > (lower_try_finally_copy): Likewise. > (honor_protect_cleanup_actions): Likewise. > * gimplify.c (gimplify_expr): Reset the location to unknown. > > gcc/testsuite/ChangeLog: > 2012-09-08 Dehao Chen > > * g++.dg/debug/dwarf2/deallocator.C: New test. > > libjava/ChangeLog: > 2012-09-08 Dehao Chen > > * testsuite/libjava.lang/sourcelocation.java: New cases. > * testsuite/libjava.lang/sourcelocation.out: New cases.
Re: [PATCH] Set correct source location for deallocator calls
On 09/08/2012 10:42 PM, Dehao Chen wrote: > I've added a libjava unittest which verifies that this patch will not > break Java debug info. I've also incorporated Richard's review in the > previous mail. Attached is the new patch, which passed bootstrap and > all gcc/libjava testsuites on x86. > > Is it ok for trunk? Yes, thanks. Andrew.
Re: [Patch ARM] big-endian support for Neon vext tests
On 13 September 2012 19:07, Mike Stump wrote: > On Sep 13, 2012, at 2:45 AM, Christophe Lyon > wrote: >> Ping? >> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00330.html > > So, two things I thought I'd ask about: > >> +/* __attribute__ ((noinline)) is currently required, otherwise the >> + generated code computes wrong results in big-endian. */ > > and: > >> +#ifdef __ARMEL__ >> + uint64x2_t __mask1 = {1, 0}; >> +#else >>uint64x2_t __mask1 = {1, 0}; >> +#endif > > >>> * In the case of the test which is executed, I had to force the >>> noinline attribute on the helper functions, otherwise the computed >>> results are wrong in big-endian. It is probably an overkill workaround >>> but it works :-) >>> I am going to file a bugzilla for this problem. > > I think that for developing the patches noinline was fine, we are confident > there aren't any more bugs, > but, for checkin, I think it is better to leave the test case as is, and let > it fail until the PR you filed is fixed. > We usually don't put hack arounds for code-gen compiler bugs into the > testsuite just to make them > pass… :-) OK. So I will remove the noinline stuff, and update the bugzilla entry with the name of this testcase. Should I really leave the test as FAILED rather then XFAIL? > The second (occurs more than once) just looks odd. I thought I'd mention it, > not sure what my preference is. Well... it was just for consistency will all the other tests as in general the mask vectors are different in big and little endian, and to expose the fact that I hadn't forgotten to write the big-endian variant :-) I can remove these occurrences and add a comment instead. Thanks for your review, Christophe.
[PATCH, AArch64] Implement ffs standard pattern
I've implemented the standard pattern ffs, which leads to __builtin_ffs being generated with 4 instructions instead of 5 instructions. Regression tests and my new test pass. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton gcc/ * config/aarch64/aarch64.md (csinc3): Make it into a named pattern. * config/aarch64/aarch64.md (ffs2): New pattern. testsuite/ * gcc.target/aarch64/ffs.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5278957..dfdba42 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2021,7 +2021,7 @@ [(set_attr "v8type" "csel") (set_attr "mode" "")]) -(define_insn "*csinc3_insn" +(define_insn "csinc3_insn" [(set (match_operand:GPI 0 "register_operand" "=r") (if_then_else:GPI (match_operator:GPI 1 "aarch64_comparison_operator" @@ -2157,6 +2157,21 @@ } ) +(define_expand "ffs2" + [(match_operand:GPI 0 "register_operand") + (match_operand:GPI 1 "register_operand")] + "" + { +rtx ccreg = aarch64_gen_compare_reg (EQ, operands[1], const0_rtx); +rtx x = gen_rtx_NE (VOIDmode, ccreg, const0_rtx); + +emit_insn (gen_rbit2 (operands[0], operands[1])); +emit_insn (gen_clz2 (operands[0], operands[0])); +emit_insn (gen_csinc3_insn (operands[0], x, ccreg, operands[0], const0_rtx)); +DONE; + } +) + (define_insn "*and3nr_compare0" [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/ffs.c b/gcc/testsuite/gcc.target/aarch64/ffs.c new file mode 100644 index 000..a344761 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ffs.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_ffs(x); +} + +/* { dg-final { scan-assembler "cmp\tw" } } */ +/* { dg-final { scan-assembler "rbit\tw" } } */ +/* { dg-final { scan-assembler "clz\tw" } } */ +/* { dg-final { scan-assembler "csinc\tw" } } */
Re: [C++ Patch] PR 54575
On 09/14/2012 05:14 PM, Paolo Carlini wrote: lookup_field returns NULL_TREE and dependent_type_p (context) is false. It seems to me that the return value of lookup_field is right. No, I don't think it makes sense for lookup_field to return NULL_TREE. For our testcase we should find the nested type (or we should not call lookup_field at all here) Paolo.
[PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC
Add __builtin_ppc_get_timebase and __builtin_ppc_mftb to read the Time Base Register on PowerPC. They are required by applications that measure time at high frequencies with high precision that can't afford a syscall. __builtin_ppc_get_timebase returns the 64 bits of the Time Base Register while __builtin_ppc_mftb generates only 1 instruction and returns the least significant word on 32-bit environments and the whole Time Base value on 64-bit. [gcc] 2012-09-14 Tulio Magno Quites Machado Filho * config/rs6000/rs6000-builtin.def: Add __builtin_ppc_get_timebase and __builtin_ppc_mftb. * config/rs6000/rs6000.c (rs6000_expand_zeroop_builtin): New function to expand an expression that calls a built-in without arguments. (rs6000_expand_builtin): Add __builtin_ppc_get_timebase and __builtin_ppc_mftb. (rs6000_init_builtins): Likewise. * config/rs6000/rs6000.md: Likewise. * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions): Document __builtin_ppc_get_timebase and _builtin_ppc_mftb. [gcc/testsuite] 2012-09-14 Tulio Magno Quites Machado Filho * gcc.target/powerpc/ppc-get-timebase.c: New file. * gcc.target/powerpc/ppc-mftb.c: New file. --- gcc/config/rs6000/rs6000-builtin.def |6 ++ gcc/config/rs6000/rs6000.c | 46 +++ gcc/config/rs6000/rs6000.md| 80 gcc/doc/extend.texi| 10 +++ .../gcc.target/powerpc/ppc-get-timebase.c | 20 + gcc/testsuite/gcc.target/powerpc/ppc-mftb.c| 18 + 6 files changed, 180 insertions(+), 0 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-get-timebase.c create mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-mftb.c diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index c8f8f86..9fa3a0f 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -1429,6 +1429,12 @@ BU_SPECIAL_X (RS6000_BUILTIN_RSQRT, "__builtin_rsqrt", RS6000_BTM_FRSQRTE, BU_SPECIAL_X (RS6000_BUILTIN_RSQRTF, "__builtin_rsqrtf", RS6000_BTM_FRSQRTES, RS6000_BTC_FP) +BU_SPECIAL_X (RS6000_BUILTIN_GET_TB, "__builtin_ppc_get_timebase", +RS6000_BTM_ALWAYS, RS6000_BTC_MISC) + +BU_SPECIAL_X (RS6000_BUILTIN_MFTB, "__builtin_ppc_mftb", +RS6000_BTM_ALWAYS, RS6000_BTC_MISC) + /* Darwin CfString builtin. */ BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", RS6000_BTM_ALWAYS, RS6000_BTC_MISC) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index a5a3848..c3bece1 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -9748,6 +9748,30 @@ rs6000_overloaded_builtin_p (enum rs6000_builtins fncode) return (rs6000_builtin_info[(int)fncode].attr & RS6000_BTC_OVERLOADED) != 0; } +/* Expand an expression EXP that calls a builtin without arguments. */ +static rtx +rs6000_expand_zeroop_builtin (enum insn_code icode, rtx target) +{ + rtx pat; + enum machine_mode tmode = insn_data[icode].operand[0].mode; + + if (icode == CODE_FOR_nothing) +/* Builtin not supported on this processor. */ +return 0; + + if (target == 0 + || GET_MODE (target) != tmode + || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) +target = gen_reg_rtx (tmode); + + pat = GEN_FCN (icode) (target); + if (! pat) +return 0; + emit_insn (pat); + + return target; +} + static rtx rs6000_expand_unop_builtin (enum insn_code icode, tree exp, rtx target) @@ -11337,6 +11361,16 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED, ? CODE_FOR_bpermd_di : CODE_FOR_bpermd_si), exp, target); +case RS6000_BUILTIN_GET_TB: + return rs6000_expand_zeroop_builtin (CODE_FOR_rs6000_get_timebase, + target); + +case RS6000_BUILTIN_MFTB: + return rs6000_expand_zeroop_builtin (((TARGET_64BIT) + ? CODE_FOR_rs6000_mftb_di + : CODE_FOR_rs6000_mftb_si), + target); + case ALTIVEC_BUILTIN_MASK_FOR_LOAD: case ALTIVEC_BUILTIN_MASK_FOR_STORE: { @@ -11621,6 +11655,18 @@ rs6000_init_builtins (void) POWER7_BUILTIN_BPERMD, "__builtin_bpermd"); def_builtin ("__builtin_bpermd", ftype, POWER7_BUILTIN_BPERMD); + ftype = build_function_type_list (unsigned_intDI_type_node, + NULL_TREE); + def_builtin ("__builtin_ppc_get_timebase", ftype, RS6000_BUILTIN_GET_TB); + + if (TARGET_64BIT) +ftype = build_function_type_list (unsigned_intDI_type_node, + NULL_TREE); + else +fty
Re: vector comparisons in C++
On 09/14/2012 11:03 AM, Marc Glisse wrote: I wanted to use decltype(x That sounds like the right answer. Type is %qT right? I see a number of %q#T but can't remember where the doc is. Well, I'll try both and see what happens. Either one works; the # asks for more verbose output. Jason
Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable
On Fri, 14 Sep 2012, H.J. Lu wrote: > +# Only support for glibc 2.3.0 or higher with AT_PHDR/AT_PHNUM from > +# Linux kernel. > + [[if test x$host = x$build -a x$host = x$target && > + ldd --version 2>&1 >/dev/null && Could we please stop adding this sort of native-only test? There is various existing code that examines headers to determine glibc features, which is more cross-compile friendly. This should probably be factored out into an autoconf macro to determine values from target headers, used once in configure.ac to get the glibc version for tools with glibc targets, and the --enable-gnu-unique-object test should be changed to use the version from the headers like other tests do. -- Joseph S. Myers jos...@codesourcery.com
Re: vector comparisons in C++
On Fri, 14 Sep 2012, Jason Merrill wrote: On 09/14/2012 11:03 AM, Marc Glisse wrote: I wanted to use decltype(x That sounds like the right answer. Ok, I'll open a bugzilla to remember to try that later. Type is %qT right? I see a number of %q#T but can't remember where the doc is. Well, I'll try both and see what happens. Either one works; the # asks for more verbose output. The %qT looked good enough to me (it prints the alias and the type). I added the types in an "inform" because the line was already getting long, I hope that's ok. The attached just finished the testsuite (changelog unchanged). -- Marc GlisseIndex: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 191291) +++ gcc/gimple-fold.c (working copy) @@ -23,20 +23,21 @@ along with GCC; see the file COPYING3. #include "coretypes.h" #include "tm.h" #include "tree.h" #include "flags.h" #include "function.h" #include "dumpfile.h" #include "tree-flow.h" #include "tree-ssa-propagate.h" #include "target.h" #include "gimple-fold.h" +#include "langhooks.h" /* Return true when DECL can be referenced from current unit. FROM_DECL (if non-null) specify constructor of variable DECL was taken from. We can get declarations that are not possible to reference for various reasons: 1) When analyzing C++ virtual tables. C++ virtual tables do have known constructors even when they are keyed to other compilation unit. Those tables can contain pointers to methods and vars @@ -1685,41 +1686,51 @@ and_var_with_comparison_1 (gimple stmt, (OP1A CODE1 OP1B) and (OP2A CODE2 OP2B), respectively. If this can be done without constructing an intermediate value, return the resulting tree; otherwise NULL_TREE is returned. This function is deliberately asymmetric as it recurses on SSA_DEFs in the first comparison but not the second. */ static tree and_comparisons_1 (enum tree_code code1, tree op1a, tree op1b, enum tree_code code2, tree op2a, tree op2b) { + tree truth_type = boolean_type_node; + if (TREE_CODE (TREE_TYPE (op1a)) == VECTOR_TYPE) +{ + tree vec_type = TREE_TYPE (op1a); + tree elem = lang_hooks.types.type_for_size + (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vec_type))), 0); + truth_type = build_opaque_vector_type (elem, +TYPE_VECTOR_SUBPARTS (vec_type)); +} + /* First check for ((x CODE1 y) AND (x CODE2 y)). */ if (operand_equal_p (op1a, op2a, 0) && operand_equal_p (op1b, op2b, 0)) { /* Result will be either NULL_TREE, or a combined comparison. */ tree t = combine_comparisons (UNKNOWN_LOCATION, TRUTH_ANDIF_EXPR, code1, code2, - boolean_type_node, op1a, op1b); + truth_type, op1a, op1b); if (t) return t; } /* Likewise the swapped case of the above. */ if (operand_equal_p (op1a, op2b, 0) && operand_equal_p (op1b, op2a, 0)) { /* Result will be either NULL_TREE, or a combined comparison. */ tree t = combine_comparisons (UNKNOWN_LOCATION, TRUTH_ANDIF_EXPR, code1, swap_tree_comparison (code2), - boolean_type_node, op1a, op1b); + truth_type, op1a, op1b); if (t) return t; } /* If both comparisons are of the same value against constants, we might be able to merge them. */ if (operand_equal_p (op1a, op2a, 0) && TREE_CODE (op1b) == INTEGER_CST && TREE_CODE (op2b) == INTEGER_CST) { @@ -2147,41 +2158,51 @@ or_var_with_comparison_1 (gimple stmt, (OP1A CODE1 OP1B) and (OP2A CODE2 OP2B), respectively. If this can be done without constructing an intermediate value, return the resulting tree; otherwise NULL_TREE is returned. This function is deliberately asymmetric as it recurses on SSA_DEFs in the first comparison but not the second. */ static tree or_comparisons_1 (enum tree_code code1, tree op1a, tree op1b, enum tree_code code2, tree op2a, tree op2b) { + tree truth_type = boolean_type_node; + if (TREE_CODE (TREE_TYPE (op1a)) == VECTOR_TYPE) +{ + tree vec_type = TREE_TYPE (op1a); + tree elem = lang_hooks.types.type_for_size + (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vec_type))), 0); + truth_type = build_opaque_vector_type (elem, +TYPE_VECTOR_SUBPARTS (vec_type)); +} + /* First check for ((x CODE1 y) OR (x CODE2 y)). */ if (operand_equal_p (op1a, op2a, 0) && operand_equal_p (op1b, op2b, 0)) { /* Result will be either NULL_TREE, or a combined comparison. */ tree t = combine_comparisons (UNKNOWN_LOC
Re: [PATCH] Prevent cselib substitution of FP, SP, SFP
Hi Jakub I have run it on 4.6, it passes the following testing: x86-64 bootstrap x86-64 regression test regression test on arm qemu Is it OK for gcc4.6? Ahmad, is it OK for google/gcc-4_6/ and google/gcc-4_6-mobile ? thanks Carrot On Wed, Sep 12, 2012 at 2:01 PM, Carrot Wei wrote: > Hi Jakub > > The same problem also affects gcc4.6, > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398. Could this be > ported to 4.6 branch? > > thanks > Carrot > > On Mon, Feb 13, 2012 at 11:54 AM, Jakub Jelinek wrote: >> >> On Wed, Jan 04, 2012 at 05:21:38PM +, Marcus Shawcroft wrote: >> > Alias analysis by DSE based on CSELIB expansion assumes that >> > references to the stack frame from different base registers (ie FP, SP) >> > never alias. >> > >> > The comment block in cselib explains that cselib does not allow >> > substitution of FP, SP or SFP specifically in order not to break DSE. >> >> Looks reasonable, appart from coding style (no spaces around -> and >> no {} around return p->loc;), I just wonder if having a separate >> loop in expand_loc just for this isn't too expensive. On sane targets >> IMHO hard frame pointer in the prologue should be initialized from sp, not >> the other way around, thus hard frame pointer based VALUEs should have >> hard frame pointer earlier in the locs list (when there is >> hfp = sp (+ optionally some const) >> insn, we first cselib_lookup_from_insn the rhs and add to locs >> of the new VALUE (plus (VALUE of sp) (const_int)), then process the >> lhs and add it to locs, moving the plus to locs->next). >> So I think the following patch could be enough (bootstrapped/regtested >> on x86_64-linux and i686-linux). >> There is AVR though, which has really weirdo prologue - PR50063, >> but I think it should just use UNSPEC for that or something similar, >> setting sp from hfp seems unnecessary and especially for values with long >> locs chains could make cselib more expensive. >> >> Richard, what do you think about this? >> >> 2012-02-13 Jakub Jelinek >> >> * cselib.c (expand_loc): Return sp, fp, hfp or cfa base reg right >> away if seen. >> >> --- gcc/cselib.c.jj 2012-02-13 11:07:15.0 +0100 >> +++ gcc/cselib.c2012-02-13 18:15:17.531776145 +0100 >> @@ -1372,8 +1372,18 @@ expand_loc (struct elt_loc_list *p, stru >>unsigned int regno = UINT_MAX; >>struct elt_loc_list *p_in = p; >> >> - for (; p; p = p -> next) >> + for (; p; p = p->next) >> { >> + /* Return these right away to avoid returning stack pointer based >> +expressions for frame pointer and vice versa, which is something >> +that would confuse DSE. See the comment in >> cselib_expand_value_rtx_1 >> +for more details. */ >> + if (REG_P (p->loc) >> + && (REGNO (p->loc) == STACK_POINTER_REGNUM >> + || REGNO (p->loc) == FRAME_POINTER_REGNUM >> + || REGNO (p->loc) == HARD_FRAME_POINTER_REGNUM >> + || REGNO (p->loc) == cfa_base_preserved_regno)) >> + return p->loc; >>/* Avoid infinite recursion trying to expand a reg into a >> the same reg. */ >>if ((REG_P (p->loc)) >> >> >> Jakub
Re: [PATCH] Prevent cselib substitution of FP, SP, SFP
On Fri, Sep 14, 2012 at 09:39:43AM -0700, Carrot Wei wrote: > I have run it on 4.6, it passes the following testing: > > x86-64 bootstrap > x86-64 regression test > regression test on arm qemu > > Is it OK for gcc4.6? Yes. Jakub
Re: vector comparisons in C++
On 09/14/2012 12:33 PM, Marc Glisse wrote: Ok, I'll open a bugzilla to remember to try that later. OK. The attached just finished the testsuite (changelog unchanged). This patch is OK. Jason
Re: [PATCH][ARM] Require that one of the operands be a register in thumb2_movdf_vfp
On 14/09/12 15:58, Рубен Бучацкий wrote: > Hi, > > There are some Thumb2 patterns in vfp.md which are duplicated with only > minor changes from their ARM equivalents. This patch adds requirement > in "*thumb2_movdf_vfp" pattern, that one of the operands sould be register, > like in ARM "*movdf_vfp" pattern. > > There is one functional change: the ARM "*movdf_vfp" pattern disallows the > [mem]=const store case while the Thumb2 one does not. The ARM version is > more desirable, because [mem]=const would be split anyway and add temporary > vfp register. > > Example: > > double delta; > > void foo () > { > delta = 0.0; > } > > Generated code (without patch): > Compiler options: g++ -mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon > -mfloat-abi=softfp -mthumb -O2 > > movsr0, #0 > movsr1, #0 > fmdrr d16, r0, r1 > fstdd16, [r3, #0] > > Generated code (with patch): > > movsr0, #0 > movsr1, #0 > strdr0, [r3] > > Regtested with QEMU-ARM. > > Code size: SPEC2K INT and FP sizes both decrease by ~0.2% (up to 2.64% > on 252.eon). > Performance: 252.eon +4.3%, other tests almost unaffected; > SPEC2K FP grows by 0.16% (up to 1.8% on 183.equake). > (Detailed data below). > > Ok for trunk? > > OK. R.
Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC
Hi Tulio, @@ -14103,6 +14105,84 @@ "" "") +(define_expand "rs6000_get_timebase" + [(use (match_operand:DI 0 "gpc_reg_operand" ""))] + "" + " +{ + if (TARGET_POWERPC64) +emit_insn (gen_rs6000_get_timebase_ppc64 (operands[0])); + else +emit_insn (gen_rs6000_get_timebase_ppc32 (operands[0])); + DONE; +}") Please don't put quotes around the C block. +(define_insn "rs6000_get_timebase_ppc32" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") +(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB)) + (clobber (match_scratch:SI 1 "=r")) + (clobber (match_scratch:CC 2 "=x"))] You can do "y" instead, to allow all CRn fields. + "!TARGET_POWERPC64" +{ + if (WORDS_BIG_ENDIAN) +if (TARGET_MFCRF) + { +return "mfspr %0, 269\;" + "mfspr %L0, 268\;" + "mfspr %1, 269\;" + "cmpw %0,%1\;" + "bne- $-16"; + } +else + { +return "mftbu %0\;" + "mftb %L0\;" + "mftbu %1\;" + "cmpw %0,%1\;" + "bne- $-16"; + } + else +if (TARGET_MFCRF) + { +return "mfspr %L0, 269\;" + "mfspr %0, 268\;" + "mfspr %1, 269\;" + "cmpw %L0,%1\;" + "bne- $-16"; + } +else + { +return "mftbu %L0\;" + "mftb %0\;" + "mftbu %1\;" + "cmpw %L0,%1\;" + "bne- $-16"; + } +}) I don't think TARGET_MFCRF is correct. For example, if you use -mcpu=powerpc64 (which doesn't set this flag) you will get code that does not run on the newer machines. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index e850266..b3fc236 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -13660,6 +13660,8 @@ float __builtin_rsqrtf (float); double __builtin_recipdiv (double, double); double __builtin_rsqrt (double); long __builtin_bpermd (long, long); +uint64_t __builtin_ppc_get_timebase (); +unsigned long __builtin_ppc_mftb (); @end smallexample This is section "PowerPC AltiVec/VSX Built-in Functions"; there should be a preceding separate "PowerPC Built-in Functions" section. Some of the current documentation should go there, too. @@ -13671,6 +13673,14 @@ The @code{__builtin_recipdiv}, and @code {__builtin_recipdivf} functions generate multiple instructions to implement division using the reciprocal estimate instructions. +The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} +functions generate instructions to read the Time Base Register. The +@code{__builtin_ppc_get_timebase} function may generate multiple +instructions and always return the 64 bits of the Time Base Register. The s/return/returns/ +@code{__builtin_ppc_mftb} function always generate one instruction and generates +returns the Time Base Register value as an unsigned long, throwing away +the most significant word on 32-bit environments. Looks good other than those nits, and the MFCRF thing. Oh, and you didn't say how you tested it (what targets). Segher
Fix C ICEs on truthvalue conversions of some expressions with integer constant operands (PR c/54103)
Bug 54103 is a C front-end regression involving ICEs in certain cases of expressions such as 0 / 0 being converted to truthvalues. c_common_truthvalue_conversion, or c_objc_common_truthvalue_conversion, received 0 / 0 directly as a TRUNC_DIV_EXPR, without any wrapping C_MAYBE_CONST_EXPR to indicate that the expression has integer constant operands, so did not know that it had to produce a result valid for an expression with integer constant operands (that is, one satisfying EXPR_INT_CONST_OPERANDS - meaning either an INTEGER_CST, or a C_MAYBE_CONST_EXPR with C_MAYBE_CONST_EXPR_INT_OPERANDS set, but not anything with C_MAYBE_CONST_EXPR deeper in the expression). As a result it returned an expression with a C_MAYBE_CONST_EXPR not at top level - but the callers assumed this could not occur, so created their own wrapping C_MAYBE_CONST_EXPR (and C_MAYBE_CONST_EXPR should never be nested). This is fixed by ensuring that a properly marked expression gets passed to c_objc_common_truthvalue_conversion; ensuring c_objc_common_truthvalue_conversion handes such expressions directly so c_common_truthvalue_conversion doesn't have to; and avoiding direct calls to c_common_truthvalue_conversion in relevant places. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. Applied to mainline. Will apply to 4.7 (when not frozen) and 4.6 branches subject to testing there. c: 2012-09-14 Joseph Myers PR c/54103 * c-typeck.c (build_unary_op): Pass original argument of TRUTH_NOT_EXPR to c_objc_common_truthvalue_conversion, then remove any C_MAYBE_CONST_EXPR, if it has integer operands. (build_binary_op): Pass original arguments of TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR, TRUTH_AND_EXPR, TRUTH_OR_EXPR and TRUTH_XOR_EXPR to c_objc_common_truthvalue_conversion, then remove any C_MAYBE_CONST_EXPR, if they have integer operands. Use c_objc_common_truthvalue_conversion not c_common_truthvalue_conversion. (c_objc_common_truthvalue_conversion): Build NE_EXPR directly and call note_integer_operands for arguments with integer operands that are not integer constants. testsuite: 2012-09-14 Joseph Myers PR c/54103 * gcc.c-torture/compile/pr54103-1.c, gcc.c-torture/compile/pr54103-2.c, gcc.c-torture/compile/pr54103-3.c, gcc.c-torture/compile/pr54103-4.c, gcc.c-torture/compile/pr54103-5.c, gcc.c-torture/compile/pr54103-6.c: New tests. * gcc.dg/c90-const-expr-8.c: Update expected column number. Index: c/c-typeck.c === --- c/c-typeck.c(revision 191291) +++ c/c-typeck.c(working copy) @@ -3553,7 +3553,13 @@ build_unary_op (location_t location, "wrong type argument to unary exclamation mark"); return error_mark_node; } - arg = c_objc_common_truthvalue_conversion (location, arg); + if (int_operands) + { + arg = c_objc_common_truthvalue_conversion (location, xarg); + arg = remove_c_maybe_const_expr (arg); + } + else + arg = c_objc_common_truthvalue_conversion (location, arg); ret = invert_truthvalue_loc (location, arg); /* If the TRUTH_NOT_EXPR has been folded, reset the location. */ if (EXPR_P (ret) && EXPR_HAS_LOCATION (ret)) @@ -9807,8 +9813,20 @@ build_binary_op (location_t location, enum tree_co but that does not mean the operands should be converted to ints! */ result_type = integer_type_node; - op0 = c_common_truthvalue_conversion (location, op0); - op1 = c_common_truthvalue_conversion (location, op1); + if (op0_int_operands) + { + op0 = c_objc_common_truthvalue_conversion (location, orig_op0); + op0 = remove_c_maybe_const_expr (op0); + } + else + op0 = c_objc_common_truthvalue_conversion (location, op0); + if (op1_int_operands) + { + op1 = c_objc_common_truthvalue_conversion (location, orig_op1); + op1 = remove_c_maybe_const_expr (op1); + } + else + op1 = c_objc_common_truthvalue_conversion (location, op1); converted = 1; boolean_op = true; } @@ -10520,13 +10538,18 @@ c_objc_common_truthvalue_conversion (location_t lo int_const = (TREE_CODE (expr) == INTEGER_CST && !TREE_OVERFLOW (expr)); int_operands = EXPR_INT_CONST_OPERANDS (expr); - if (int_operands) -expr = remove_c_maybe_const_expr (expr); + if (int_operands && TREE_CODE (expr) != INTEGER_CST) +{ + expr = remove_c_maybe_const_expr (expr); + expr = build2 (NE_EXPR, integer_type_node, expr, +convert (TREE_TYPE (expr), integer_zero_node)); + expr = note_integer_operands (expr); +} + else +/* ??? Should we also give an error for vectors rather tha
[PATCH, AArch64] Implement fnma, fms and fnms standard patterns
The following standard pattern names were implemented by simply renaming some existing patterns: * fnma * fms * fnms I have added an extra pattern for when we don't care about signed zero, so we can do "-fma (a,b,c)" more efficiently. Regression testing all passed. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton gcc/ * config/aarch64/aarch64.md (fmsub4): Renamed to fnma4. * config/aarch64/aarch64.md (fnmsub4): Renamed to fms4. * config/aarch64/aarch64.md (fnmadd4): Renamed to fnms4. * config/aarch64/aarch64.md (*fnmadd4): New pattern. testsuite/ * gcc.target/aarch64/fmadd.c: Added extra tests. * gcc.target/aarch64/fnmadd-fastmath.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 3fbebf7..33815ff 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2506,7 +2506,7 @@ (set_attr "mode" "")] ) -(define_insn "*fmsub4" +(define_insn "fnma4" [(set (match_operand:GPF 0 "register_operand" "=w") (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w")) (match_operand:GPF 2 "register_operand" "w") @@ -2517,7 +2517,7 @@ (set_attr "mode" "")] ) -(define_insn "*fnmsub4" +(define_insn "fms4" [(set (match_operand:GPF 0 "register_operand" "=w") (fma:GPF (match_operand:GPF 1 "register_operand" "w") (match_operand:GPF 2 "register_operand" "w") @@ -2528,7 +2528,7 @@ (set_attr "mode" "")] ) -(define_insn "*fnmadd4" +(define_insn "fnms4" [(set (match_operand:GPF 0 "register_operand" "=w") (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w")) (match_operand:GPF 2 "register_operand" "w") @@ -2539,6 +2539,18 @@ (set_attr "mode" "")] ) +;; If signed zeros are ignored, -(a * b + c) = -a * b - c. +(define_insn "*fnmadd4" + [(set (match_operand:GPF 0 "register_operand") + (neg:GPF (fma:GPF (match_operand:GPF 1 "register_operand") + (match_operand:GPF 2 "register_operand") + (match_operand:GPF 3 "register_operand"] + "!HONOR_SIGNED_ZEROS (mode) && TARGET_FLOAT" + "fnmadd\\t%0, %1, %2, %3" + [(set_attr "v8type" "fmadd") + (set_attr "mode" "")] +) + ;; --- ;; Floating-point conversions ;; --- diff --git a/gcc/testsuite/gcc.target/aarch64/fmadd.c b/gcc/testsuite/gcc.target/aarch64/fmadd.c index 3b4..39975db 100644 --- a/gcc/testsuite/gcc.target/aarch64/fmadd.c +++ b/gcc/testsuite/gcc.target/aarch64/fmadd.c @@ -4,15 +4,52 @@ extern double fma (double, double, double); extern float fmaf (float, float, float); -double test1 (double x, double y, double z) +double test_fma1 (double x, double y, double z) { return fma (x, y, z); } -float test2 (float x, float y, float z) +float test_fma2 (float x, float y, float z) { return fmaf (x, y, z); } +double test_fnma1 (double x, double y, double z) +{ + return fma (-x, y, z); +} + +float test_fnma2 (float x, float y, float z) +{ + return fmaf (-x, y, z); +} + +double test_fms1 (double x, double y, double z) +{ + return fma (x, y, -z); +} + +float test_fms2 (float x, float y, float z) +{ + return fmaf (x, y, -z); +} + +double test_fnms1 (double x, double y, double z) +{ + return fma (-x, y, -z); +} + +float test_fnms2 (float x, float y, float z) +{ + return fmaf (-x, y, -z); +} + /* { dg-final { scan-assembler-times "fmadd\td\[0-9\]" 1 } } */ /* { dg-final { scan-assembler-times "fmadd\ts\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fmsub\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fmsub\ts\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmsub\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmsub\ts\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmadd\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmadd\ts\[0-9\]" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c new file mode 100644 index 000..9c115df --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math" } */ + +extern double fma (double, double, double); +extern float fmaf (float, float, float); + +double test_fma1 (double x, double y, double z) +{ + return - fma (x, y, z); +} + +float test_fma2 (float x, float y, float z) +{ + return - fmaf (x, y, z); +} + +/* { dg-final { scan-assembler-times "fnmadd\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmadd\ts\[0-9\]" 1 } } */ +
Re: [PATCH, AArch64] Implement fnma, fms and fnms standard patterns
On 14/09/12 18:05, Ian Bolton wrote: > The following standard pattern names were implemented by simply > renaming some existing patterns: > > * fnma > * fms > * fnms > > I have added an extra pattern for when we don't care about > signed zero, so we can do "-fma (a,b,c)" more efficiently. > > Regression testing all passed. > > OK to commit? > > Cheers, > Ian > > > 2012-09-14 Ian Bolton > > gcc/ > * config/aarch64/aarch64.md (fmsub4): Renamed > to fnma4. > * config/aarch64/aarch64.md (fnmsub4): Renamed > to fms4. > * config/aarch64/aarch64.md (fnmadd4): Renamed > to fnms4. > * config/aarch64/aarch64.md (*fnmadd4): New pattern. > > testsuite/ > * gcc.target/aarch64/fmadd.c: Added extra tests. > * gcc.target/aarch64/fnmadd-fastmath.c: New test. > OK. R.
Re: [Patch ARM] big-endian support for Neon vext tests
On Sep 14, 2012, at 8:23 AM, Christophe Lyon wrote: > OK. So I will remove the noinline stuff, and update the bugzilla entry > with the name of this testcase. > Should I really leave the test as FAILED rather then XFAIL? No. Best to either add the test case as that bug is fixed or xfail it.
Re: [PATCH] fix bootstrap on darwin to adapt to VEC changes
On Sep 11, 2012, at 7:55 AM, Jack Howarth wrote: > The attached patch fixes the bootstrap on darwin to cope with the > VEC changes to remove unnecessary VEC function overloads. Tested on > x86_64-apple-darwin12. Okay for gcc trunk. Ok.
Re: vector comparisons in C++
On Fri, 14 Sep 2012, Jason Merrill wrote: [decltype of opaque vector] I think a simple TYPE_MAIN_VARIANT will do, I just need to find where to add it, and how much to constrain that change. Type deduction in templates and auto already seem to remove opacity :-) This patch is OK. Committed. Thank you for all the quick comments, -- Marc Glisse
Re: [PATCH, AArch64] Implement ffs standard pattern
On 14/09/12 16:26, Ian Bolton wrote: > I've implemented the standard pattern ffs, which leads to > __builtin_ffs being generated with 4 instructions instead > of 5 instructions. > > Regression tests and my new test pass. > > OK to commit? > > > Cheers, > Ian > > > > 2012-09-14 Ian Bolton > > gcc/ > * config/aarch64/aarch64.md (csinc3): Make it into a > named pattern. > * config/aarch64/aarch64.md (ffs2): New pattern. > > testsuite/ > > * gcc.target/aarch64/ffs.c: New test. > OK. R.
Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC
Segher Boessenkool writes: Hi Tulio, Hi Segher, +(define_expand "rs6000_get_timebase" + [(use (match_operand:DI 0 "gpc_reg_operand" ""))] + "" + " +{ + if (TARGET_POWERPC64) +emit_insn (gen_rs6000_get_timebase_ppc64 (operands[0])); + else +emit_insn (gen_rs6000_get_timebase_ppc32 (operands[0])); + DONE; +}") Please don't put quotes around the C block. Fixed. +(define_insn "rs6000_get_timebase_ppc32" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") +(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB)) + (clobber (match_scratch:SI 1 "=r")) + (clobber (match_scratch:CC 2 "=x"))] You can do "y" instead, to allow all CRn fields. Fixed. + "!TARGET_POWERPC64" +{ + if (WORDS_BIG_ENDIAN) +if (TARGET_MFCRF) + { +return "mfspr %0, 269\;" + "mfspr %L0, 268\;" + "mfspr %1, 269\;" + "cmpw %0,%1\;" + "bne- $-16"; + } +else + { +return "mftbu %0\;" + "mftb %L0\;" + "mftbu %1\;" + "cmpw %0,%1\;" + "bne- $-16"; + } + else +if (TARGET_MFCRF) + { +return "mfspr %L0, 269\;" + "mfspr %0, 268\;" + "mfspr %1, 269\;" + "cmpw %L0,%1\;" + "bne- $-16"; + } +else + { +return "mftbu %L0\;" + "mftb %0\;" + "mftbu %1\;" + "cmpw %L0,%1\;" + "bne- $-16"; + } +}) I don't think TARGET_MFCRF is correct. For example, if you use -mcpu=powerpc64 (which doesn't set this flag) you will get code that does not run on the newer machines. Sorry, but it seems to be working here... I explain how I tested this in the end of the email. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index e850266..b3fc236 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -13660,6 +13660,8 @@ float __builtin_rsqrtf (float); double __builtin_recipdiv (double, double); double __builtin_rsqrt (double); long __builtin_bpermd (long, long); +uint64_t __builtin_ppc_get_timebase (); +unsigned long __builtin_ppc_mftb (); @end smallexample This is section "PowerPC AltiVec/VSX Built-in Functions"; there should be a preceding separate "PowerPC Built-in Functions" section. Some of the current documentation should go there, too. Agreed. I'm changing this. @@ -13671,6 +13673,14 @@ The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf} functions generate multiple instructions to implement division using the reciprocal estimate instructions. +The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} +functions generate instructions to read the Time Base Register. The +@code{__builtin_ppc_get_timebase} function may generate multiple +instructions and always return the 64 bits of the Time Base Register. The s/return/returns/ Fixed. +@code{__builtin_ppc_mftb} function always generate one instruction and generates Fixed. +returns the Time Base Register value as an unsigned long, throwing away +the most significant word on 32-bit environments. Looks good other than those nits, and the MFCRF thing. Oh, and you didn't say how you tested it (what targets). Sorry, I run the test suite on a Power7 running a 64-bit environment. Then, I manually built and run a modified copy of the 2 included tests that that only print the TBR values a 3 times to make sure they were increasing. I did this for both 32 and 64 bits on a Power3 and a Power7, making sure to use the correct -mcpu. In the end I checked that the generated assembly was right for Power3, Power7 and some random chooses I've done. Here are some snippets of code for __builtin_ppc_get_timebase, i.e.: - -m64 -mcpu=power7 .L.main: std 31,-8(1) stdu 1,-80(1) mr 31,1 mfspr 9, 268 std 9,56(31) li 9,0 stw 9,48(31) b .L2 - -m32 -mcpu=power3 main: stwu 1,-48(1) stw 31,44(1) mr 31,1 mftbu 8 mftb 9 mftbu 10 cmpw 8,10 bne- $-16 stw 8,24(31) stw 9,28(31) lfd 0,24(31) stfd 0,16(31) li 9,0 stw 9,8(31) b .L2 - -m64 -mcpu=power3 .L.main: std 31,-8(1) stdu 1,-80(1) mr 31,1 mftb 9 std 9,56(31) li 9,0 stw 9,48(31) b .L2 Thanks, -- Tulio Magno
[PATCH, ARM] Prefer vld1.64/vst1.64 over vldm/vstm
Hello, this patch changes the ARM back-end to use vld1.64/vst1.64 instructions instead of vldm/vstm -where possible- to implement double-word moves. The main benefit of this is that it allows the compiler to provide appropriate alignment hints, which may improve performance. The patch is based on an earlier version by Ramana. This version has now successfully passed regression testing and benchmarking (no performance regressions found, improvements of up to 2.5% on certain benchmarks). Tested on arm-linux-gnueabi. OK for mainline? Bye, Ulrich 2012-09-14 Ramana Radhakrishnan Ulrich Weigand * config/arm/arm.c (output_move_neon): Update comment. Use vld1.64/vst1.64 instead of vldm/vstm where possible. (neon_vector_mem_operand): Support double-word modes. * config/arm/neon.md (*neon_mov VD): Call output_move_neon instead of output_move_vfp. Change constraint from Uv to Un. Index: gcc-head/gcc/config/arm/arm.c === --- gcc-head.orig/gcc/config/arm/arm.c 2012-09-14 19:38:20.0 +0200 +++ gcc-head/gcc/config/arm/arm.c 2012-09-14 19:40:51.0 +0200 @@ -9629,7 +9629,11 @@ neon_vector_mem_operand (rtx op, int typ && REG_MODE_OK_FOR_BASE_P (XEXP (ind, 0), VOIDmode) && CONST_INT_P (XEXP (ind, 1)) && INTVAL (XEXP (ind, 1)) > -1024 - && INTVAL (XEXP (ind, 1)) < 1016 + /* For quad modes, we restrict the constant offset to be slightly less +than what the instruction format permits. We have no such constraint +on double mode offsets. (This must match arm_legitimate_index_p.) */ + && (INTVAL (XEXP (ind, 1)) + < (VALID_NEON_QREG_MODE (GET_MODE (op))? 1016 : 1024)) && (INTVAL (XEXP (ind, 1)) & 3) == 0) return TRUE; @@ -14573,15 +14577,16 @@ output_move_vfp (rtx *operands) return ""; } -/* Output a Neon quad-word load or store, or a load or store for - larger structure modes. +/* Output a Neon double-word or quad-word load or store, or a load + or store for larger structure modes. WARNING: The ordering of elements is weird in big-endian mode, - because we use VSTM, as required by the EABI. GCC RTL defines - element ordering based on in-memory order. This can be differ - from the architectural ordering of elements within a NEON register. - The intrinsics defined in arm_neon.h use the NEON register element - ordering, not the GCC RTL element ordering. + because the EABI requires that vectors stored in memory appear + as though they were stored by a VSTM, as required by the EABI. + GCC RTL defines element ordering based on in-memory order. + This can be different from the architectural ordering of elements + within a NEON register. The intrinsics defined in arm_neon.h use the + NEON register element ordering, not the GCC RTL element ordering. For example, the in-memory ordering of a big-endian a quadword vector with 16-bit elements when stored from register pair {d0,d1} @@ -14595,7 +14600,22 @@ output_move_vfp (rtx *operands) dN -> (rN+1, rN), dN+1 -> (rN+3, rN+2) So that STM/LDM can be used on vectors in ARM registers, and the - same memory layout will result as if VSTM/VLDM were used. */ + same memory layout will result as if VSTM/VLDM were used. + + Instead of VSTM/VLDM we prefer to use VST1.64/VLD1.64 where + possible, which allows use of appropriate alignment tags. + Note that the choice of "64" is independent of the actual vector + element size; this size simply ensures that the behavior is + equivalent to VSTM/VLDM in both little-endian and big-endian mode. + + Due to limitations of those instructions, use of VST1.64/VLD1.64 + is not possible if: +- the address contains PRE_DEC, or +- the mode refers to more than 4 double-word registers + + In those cases, it would be possible to replace VSTM/VLDM by a + sequence of instructions; this is not currently implemented since + this is not certain to actually improve performance. */ const char * output_move_neon (rtx *operands) @@ -14629,13 +14649,23 @@ output_move_neon (rtx *operands) switch (GET_CODE (addr)) { case POST_INC: - templ = "v%smia%%?\t%%0!, %%h1"; - ops[0] = XEXP (addr, 0); + /* We have to use vldm / vstm for too-large modes. */ + if (ARM_NUM_REGS (mode) / 2 > 4) + { + templ = "v%smia%%?\t%%0!, %%h1"; + ops[0] = XEXP (addr, 0); + } + else + { + templ = "v%s1.64\t%%h1, %%A0"; + ops[0] = mem; + } ops[1] = reg; break; case PRE_DEC: - /* FIXME: We should be using vld1/vst1 here in BE mode? */ + /* We have to use vldm / vstm in this case, since there is no +pre-decrement form of the vld1 / vst1 instructions. */ templ = "v%smdb%%?\t%%0!, %%h1"; ops[0] = XEXP (addr, 0); ops[1] = reg; @@ -14679,7
[PATCH, ARM] Use vld1/vst1 to implement vec_set/vec_extract
Hello, following up on the prior patch, this patch exploits more opportunities to generate the vld1 / vst1 family of instructions, this time to implement the vec_set and vec_extract patterns with memory scalar operands. Without the patch, vec_set/vec_extract only support register operands for the scalar, possibly requiring extra moves. In some cases we'd still get a vst1 instruction as a result, since combine would match a neon_vst1_lane pattern. However, that pattern seems to be actually incorrect for big-endian systems (due to a line number ordering mismatch). The patch therefore also changes neon_vst1_lane to use an UNSPEC instead of vec_select to model its operation, just like all the other NEON intrinsic patterns depending on line numbers already do. Benchmarking showed only marginal improvements, but no regression. It seems useful to support this anyway ... Tested on arm-linux-gnueabi. OK for mainline? Bye, Ulrich 2012-09-14 Ulrich Weigand * config/arm/arm.c (arm_rtx_costs_1): Handle vec_extract and vec_set patterns. * config/arm/arm.md ("vec_set_internal"): Support memory source operands, implemented via vld1 instruction. ("vec_extract"): Support memory destination operands, implemented via vst1 instruction. ("neon_vst1_lane"): Use UNSPEC_VST1_LANE instead of vec_select. * config/arm/predicates.md ("neon_lane_number"): Remove. Index: gcc-head/gcc/config/arm/arm.c === --- gcc-head.orig/gcc/config/arm/arm.c 2012-09-14 19:40:51.0 +0200 +++ gcc-head/gcc/config/arm/arm.c 2012-09-14 19:41:14.0 +0200 @@ -7666,6 +7666,28 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou return true; case SET: + /* The vec_extract patterns accept memory operands that require an +address reload. Account for the cost of that reload to give the +auto-inc-dec pass an incentive to try to replace them. */ + if (TARGET_NEON && MEM_P (SET_DEST (x)) + && GET_CODE (SET_SRC (x)) == VEC_SELECT) + { + *total = rtx_cost (SET_DEST (x), code, 0, speed); + if (!neon_vector_mem_operand (SET_DEST (x), 2)) + *total += COSTS_N_INSNS (1); + return true; + } + /* Likewise for the vec_set patterns. */ + if (TARGET_NEON && GET_CODE (SET_SRC (x)) == VEC_MERGE + && GET_CODE (XEXP (SET_SRC (x), 0)) == VEC_DUPLICATE + && MEM_P (XEXP (XEXP (SET_SRC (x), 0), 0))) + { + rtx mem = XEXP (XEXP (SET_SRC (x), 0), 0); + *total = rtx_cost (mem, code, 0, speed); + if (!neon_vector_mem_operand (mem, 2)) + *total += COSTS_N_INSNS (1); + return true; + } return false; case UNSPEC: Index: gcc-head/gcc/config/arm/neon.md === --- gcc-head.orig/gcc/config/arm/neon.md2012-09-14 19:40:51.0 +0200 +++ gcc-head/gcc/config/arm/neon.md 2012-09-14 19:41:14.0 +0200 @@ -416,30 +416,33 @@ [(set_attr "neon_type" "neon_vld1_1_2_regs")]) (define_insn "vec_set_internal" - [(set (match_operand:VD 0 "s_register_operand" "=w") + [(set (match_operand:VD 0 "s_register_operand" "=w,w") (vec_merge:VD (vec_duplicate:VD -(match_operand: 1 "s_register_operand" "r")) - (match_operand:VD 3 "s_register_operand" "0") - (match_operand:SI 2 "immediate_operand" "i")))] +(match_operand: 1 "nonimmediate_operand" "Um,r")) + (match_operand:VD 3 "s_register_operand" "0,0") + (match_operand:SI 2 "immediate_operand" "i,i")))] "TARGET_NEON" { int elt = ffs ((int) INTVAL (operands[2])) - 1; if (BYTES_BIG_ENDIAN) elt = GET_MODE_NUNITS (mode) - 1 - elt; operands[2] = GEN_INT (elt); - - return "vmov.\t%P0[%c2], %1"; + + if (which_alternative == 0) +return "vld1.\t{%P0[%c2]}, %A1"; + else +return "vmov.\t%P0[%c2], %1"; } - [(set_attr "neon_type" "neon_mcr")]) + [(set_attr "neon_type" "neon_vld1_vld2_lane,neon_mcr")]) (define_insn "vec_set_internal" - [(set (match_operand:VQ 0 "s_register_operand" "=w") + [(set (match_operand:VQ 0 "s_register_operand" "=w,w") (vec_merge:VQ (vec_duplicate:VQ -(match_operand: 1 "s_register_operand" "r")) - (match_operand:VQ 3 "s_register_operand" "0") - (match_operand:SI 2 "immediate_operand" "i")))] +(match_operand: 1 "nonimmediate_operand" "Um,r")) + (match_operand:VQ 3 "s_register_operand" "0,0") + (match_operand:SI 2 "immediate_operand" "i,i")))] "TARGET_NEON" { HOST_WIDE_INT elem = ffs ((int) INTVAL (operands[2])) - 1; @@ -454,18 +457,21 @@ operands[0] = gen_rtx_REG (mode, regno + hi); operands[2] = GEN_INT (elt); - return "vmov.\t%P0[%c2], %1"; + if (which_alternative == 0) +return "vld1.\t{%P0[%c2]}, %A1"; + else +
[doc] vector extensions
A fairly trivial follow-up to the patch with the code. I added a line for PR 53024 while I was there... 2012-09-14 Marc Glisse PR c/53024 PR c++/54427 * doc/extend.texi (Vector Extensions): C++ improvements. Power of 2 size requirement. -- Marc GlisseIndex: doc/extend.texi === --- doc/extend.texi (revision 191308) +++ doc/extend.texi (working copy) @@ -6813,21 +6813,22 @@ typedef int v4si __attribute__ ((vector_ The @code{int} type specifies the base type, while the attribute specifies the vector size for the variable, measured in bytes. For example, the declaration above causes the compiler to set the mode for the @code{v4si} type to be 16 bytes wide and divided into @code{int} sized units. For a 32-bit @code{int} this means a vector of 4 units of 4 bytes, and the corresponding mode of @code{foo} will be @acronym{V4SI}. The @code{vector_size} attribute is only applicable to integral and float scalars, although arrays, pointers, and function return values -are allowed in conjunction with this construct. +are allowed in conjunction with this construct. Only power of two +sizes are currently allowed. All the basic integer types can be used as base types, both as signed and as unsigned: @code{char}, @code{short}, @code{int}, @code{long}, @code{long long}. In addition, @code{float} and @code{double} can be used to build floating-point vector types. Specifying a combination that is not valid for the current architecture will cause GCC to synthesize the instructions using a narrower mode. For example, if you specify a variable of type @code{V4SI} and your architecture does not allow for this specific SIMD type, GCC will @@ -6850,21 +6851,21 @@ v4si a, b, c; c = a + b; @end smallexample Subtraction, multiplication, division, and the logical operations operate in a similar manner. Likewise, the result of using the unary minus or complement operators on a vector type is a vector whose elements are the negative or complemented values of the corresponding elements in the operand. -In C it is possible to use shifting operators @code{<<}, @code{>>} on +It is possible to use shifting operators @code{<<}, @code{>>} on integer-type vectors. The operation is defined as following: @code{@{a0, a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1, @dots{}, an >> bn@}}@. Vector operands must have the same number of elements. For the convenience in C it is allowed to use a binary vector operation where one operand is a scalar. In that case the compiler will transform the scalar operand into a vector where each element is the scalar from the operation. The transformation will happen only if the scalar could be safely converted to the vector-element type. @@ -6881,21 +6882,21 @@ a = 2 * b;/* a = @{2,2,2,2@} * b; */ a = l + a;/* Error, cannot convert long to int. */ @end smallexample Vectors can be subscripted as if the vector were an array with the same number of elements and base type. Out of bound accesses invoke undefined behavior at runtime. Warnings for out of bound accesses for vector subscription can be enabled with @option{-Warray-bounds}. -In GNU C vector comparison is supported within standard comparison +Vector comparison is supported with standard comparison operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be vector expressions of integer-type or real-type. Comparison between integer-type vectors and real-type vectors are not supported. The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type. Vectors are compared element-wise producing 0 when comparison is false and -1 (constant of the appropriate type where all bits are set) otherwise. Consider the following example.
Re: [C++ Patch] PR 54575
Jason, H.J. figured out that this changed when case SCOPE_REF of value_dependent_expression_p started always returning true, part of the instantiation_dependent_p work... I'm unassigning myself, I guess you will immediately see which further changes are needed. Thanks! Paolo.
Tighten forwprop1 testing
Hello, recent patches have let optimizations move from forwprop2 to forwprop1. The attached checks that this remains the case. (copyprop1 is the first pass after forwprop1 that does a dce-like cleanup) Only manually tested for now, will check better if it is accepted. 2012-09-15 Marc Glisse * gcc.dg/tree-ssa/forwprop-19.c: Check in forwprop1. * gcc.dg/tree-ssa/forwprop-20.c: Check in forwprop1. * gcc.dg/tree-ssa/forwprop-21.c: Check in copyprop1. * gcc.dg/tree-ssa/forwprop-22.c: Check in copyprop1. -- Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/forwprop-19.c === --- testsuite/gcc.dg/tree-ssa/forwprop-19.c (revision 191308) +++ testsuite/gcc.dg/tree-ssa/forwprop-19.c (working copy) @@ -1,15 +1,15 @@ /* { dg-do compile } */ -/* { dg-options "-O -fdump-tree-forwprop2" } */ +/* { dg-options "-O -fdump-tree-forwprop1" } */ typedef int vec __attribute__((vector_size (4 * sizeof (int; void f (vec *x1, vec *x2) { vec m = { 1, 2, 3, 0 }; vec n = { 3, 0, 1, 2 }; vec y = __builtin_shuffle (*x1, *x2, n); vec z = __builtin_shuffle (y, m); *x1 = z; } -/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop2" } } */ -/* { dg-final { cleanup-tree-dump "forwprop2" } } */ +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop1" } } */ +/* { dg-final { cleanup-tree-dump "forwprop1" } } */ Index: testsuite/gcc.dg/tree-ssa/forwprop-20.c === --- testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 191308) +++ testsuite/gcc.dg/tree-ssa/forwprop-20.c (working copy) @@ -1,13 +1,13 @@ /* { dg-do compile } */ /* { dg-require-effective-target double64 } */ -/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-options "-O -fdump-tree-forwprop1" } */ #include /* All of these optimizations happen for unsupported vector modes as a consequence of the lowering pass. We need to test with a vector mode that is supported by default on at least some architectures, or make the test target specific so we can pass a flag like -mavx. */ typedef double vecf __attribute__ ((vector_size (2 * sizeof (double; typedef int64_t veci __attribute__ ((vector_size (2 * sizeof (int64_t; @@ -59,12 +59,12 @@ void k (vecf* r) } void l (double d, vecf* r) { vecf x = { -d, 5 }; vecf y = { d, 4 }; veci m = { 2, 0 }; *r = __builtin_shuffle (x, y, m); // { d, -d } } -/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */ -/* { dg-final { cleanup-tree-dump "optimized" } } */ +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop1" } } */ +/* { dg-final { cleanup-tree-dump "forwprop1" } } */ Index: testsuite/gcc.dg/tree-ssa/forwprop-21.c === --- testsuite/gcc.dg/tree-ssa/forwprop-21.c (revision 191308) +++ testsuite/gcc.dg/tree-ssa/forwprop-21.c (working copy) @@ -1,13 +1,16 @@ /* { dg-do compile } */ -/* { dg-options "-O -fdump-tree-optimized" } */ +/* { dg-options "-O -fdump-tree-copyprop1" } */ typedef int v4si __attribute__ ((vector_size (4 * sizeof(int; int test (v4si *x, v4si *y) { v4si m = { 2, 3, 6, 5 }; v4si z = __builtin_shuffle (*x, *y, m); return z[2]; } -/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */ -/* { dg-final { cleanup-tree-dump "optimized" } } */ + +/* Optimization in forwprop1, cleanup in copyprop1. */ + +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "copyprop1" } } */ +/* { dg-final { cleanup-tree-dump "copyprop1" } } */ Index: testsuite/gcc.dg/tree-ssa/forwprop-22.c === --- testsuite/gcc.dg/tree-ssa/forwprop-22.c (revision 191308) +++ testsuite/gcc.dg/tree-ssa/forwprop-22.c (working copy) @@ -1,18 +1,20 @@ /* { dg-do compile } */ /* { dg-require-effective-target vect_double } */ /* { dg-require-effective-target vect_perm } */ -/* { dg-options "-O -fdump-tree-optimized" } */ +/* { dg-options "-O -fdump-tree-copyprop1" } */ typedef double vec __attribute__((vector_size (2 * sizeof (double; void f (vec *px, vec *y, vec *z) { vec x = *px; vec t1 = { x[1], x[0] }; vec t2 = { x[0], x[1] }; *y = t1; *z = t2; } -/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "optimized" } } */ -/* { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } */ -/* { dg-final { cleanup-tree-dump "optimized" } } */ +/* Optimization in forwprop1, cleanup in copyprop1. */ + +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "copyprop1" } } */ +/* { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "copyprop1" } } */ +/* { dg-final { cleanup-tree-dump "copyprop1" } } */
PR 44436 Associative containers emplace/emplace_hint
Hi Here is a patch to add emplace/emplace_hint on associative containers in C++11 mode. I did some refactoring to use as much of the same code between insert and emplace methods.. I have also change map::operator[] to now use the emplace logic in C++11 so that we only need the value type to be default constructible. The C++11 status table was not signaling that those methods were missing so I didn't had to update it. 2012-09-14 François Dumont PR libstdc++/44436 * include/bits/stl_tree.h (_Rb_tree<>::_M_insert_): Take _Base_ptr rather than _Const_Base_ptr. (_Rb_tree<>::_M_insert_node): New. (_Rb_tree<>::_M_get_insert_unique_pos): New, search code of _M_insert_unique method. (_Rb_tree<>::_M_insert_unique): Use latter. (_Rb_tree<>::_M_emplace_unique): New, likewise. (_Rb_tree<>::_M_get_insert_equal_pos): New, search code of _M_insert_equal method. (_Rb_tree<>::_M_insert_equal): Use latter. (_Rb_tree<>::_M_emplace_equal): New, likewise. (_Rb_tree<>::_M_get_insert_hint_unique_pos): New, search code of _M_insert_unique_ method. (_Rb_tree<>::_M_insert_unique_): Use latter. (_Rb_tree<>::_M_emplace_hint_unique): New, likewise. (_Rb_tree<>::_M_get_insert_hint_equal_pos): New, search code of _M_insert_equal_ method. (_Rb_tree<>::_M_insert_equal_): Use latter. (_Rb_tree<>::_M_emplace_hint_equal): New, likewise. (_Rb_tree<>::_M_insert_lower): Remove first _Base_ptr parameter, useless as always null. * include/bits/stl_map.h: Include in C++11. (map<>::operator[](const key_type&)): Use _Rb_tree<>::_M_emplace_hint_unique in C++11. (map<>::operator[](key_type&&)): Likewise. (map<>::emplace): New. (map<>::emplace_hint): New. * include/bits/stl_multimap.h (multimap<>::emplace): New. (multimap<>::emplace_hint): New. * include/bits/stl_set.h (set<>::emplace): New. (set<>::emplace_hint): New. * include/bits/stl_multiset.h (multiset<>::emplace): New. (multiset<>::emplace_hint): New. * include/debug/map.h (std::__debug::map<>::emplace): New. (std::__debug::map<>::emplace_hint): New. * include/debug/multimap.h (std::__debug::multimap<>::emplace): New. (std::__debug::multimap<>::emplace_hint): New. * include/debug/set.h (std::__debug::set<>::emplace): New. (std::__debug::set<>::emplace_hint): New. * include/debug/multiset.h (std::__debug::multiset<>::emplace): New. (std::__debug::multiset<>::emplace_hint): New. * testsuite/util/testsuite_container_traits.h: Signal that emplace and emplace_hint are available on std::map, std::multimap, std::set and std::multiset in C++11. * testsuite/23_containers/map/modifiers/emplate/1.cc: New. * testsuite/23_containers/multimap/modifiers/emplate/1.cc: New. * testsuite/23_containers/set/modifiers/emplate/1.cc: New. * testsuite/23_containers/multiset/modifiers/emplate/1.cc: New.Tested under linux x86_64. Tested x86_64 linux, normal/debug modes, C++98/C++11 modes. Ok to commit ? François Index: include/bits/stl_tree.h === --- include/bits/stl_tree.h (revision 191279) +++ include/bits/stl_tree.h (working copy) @@ -570,27 +570,50 @@ typedef std::reverse_iterator const_reverse_iterator; private: + pair<_Base_ptr, _Base_ptr> + _M_get_insert_unique_pos(const key_type& __k); + + pair<_Base_ptr, _Base_ptr> + _M_get_insert_equal_pos(const key_type& __k); + + pair<_Base_ptr, _Base_ptr> + _M_get_insert_hint_unique_pos(const_iterator __pos, +const key_type& __k); + + pair<_Base_ptr, _Base_ptr> + _M_get_insert_hint_equal_pos(const_iterator __pos, + const key_type& __k); + #ifdef __GXX_EXPERIMENTAL_CXX0X__ template iterator -_M_insert_(_Const_Base_ptr __x, _Const_Base_ptr __y, _Arg&& __v); +_M_insert_(_Base_ptr __x, _Base_ptr __y, _Arg&& __v); + iterator + _M_insert_node(_Base_ptr __x, _Base_ptr __y, _Link_type __z); + template iterator -_M_insert_lower(_Base_ptr __x, _Base_ptr __y, _Arg&& __v); +_M_insert_lower(_Base_ptr __y, _Arg&& __v); template iterator _M_insert_equal_lower(_Arg&& __x); + + iterator + _M_insert_lower_node(_Base_ptr __p, _Link_type __z); + + iterator + _M_insert_equal_lower_node(_Link_type __z); #else iterator - _M_insert_(_Const_Base_ptr __x, _Const_Base_ptr __y, + _M_insert_(_Base_ptr __x, _Base_ptr __y, const value_type& __v); // _GLIBCXX_RESOLVE_LIB_DEFECTS // 233. Insertion hints in associative containers. iterator - _M_insert_lower(_Base_ptr __x, _Base_ptr __y, const value_type& __v); + _M_insert_lower(_Base_ptr __y, const value_type& __v); iterator _M_insert_equal_lower(const value_type& __x); @@ -726,6 +749,22 @@
[google/main] Backport counter histogram in fdo summary from trunk (issue6513045)
Backport from trunk r190952 to add counter histogram to gcov program summary, and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238). Tested on x86_64-unknown-linux-gnu. Ok for google branches? 2012-09-14 Teresa Johnson * libgcc/libgcov.c (gcov_histogram_insert): New function. (gcov_compute_histogram): Ditto. (sort_by_reverse_gcov_value): Remove function. (gcov_compute_cutoff_values): Ditto. (gcov_merge_gcda_file): Merge histogram while merging summary. (gcov_gcda_file_size): Include histogram in summary size computation. (gcov_write_gcda_file): Remove assert that is no longer valid. (gcov_exit_init): Invoke gcov_compute_histogram. * gcc/gcov-io.c (gcov_write_summary): Write out non-zero histogram entries to function summary along with an occupancy bit vector. (gcov_read_summary): Read in the histogram entries. (gcov_histo_index): New function. (gcov_histogram_merge): Ditto. * gcc/gcov-io.h (gcov_type_unsigned): New type. (struct gcov_bucket_type): Ditto. (struct gcov_ctr_summary): Include histogram. (GCOV_TAG_SUMMARY_LENGTH): Update to include histogram entries. (GCOV_HISTOGRAM_SIZE): New macro. (GCOV_HISTOGRAM_BITVECTOR_SIZE): Ditto. (gcov_gcda_file_size): New parameter. * gcc/profile.c (NUM_GCOV_WORKING_SETS): Ditto. (gcov_working_sets): New global variable. (compute_working_sets): New function. (find_working_set): Ditto. (get_exec_counts): Invoke compute_working_sets. * gcc/loop-unroll.c (code_size_limit_factor): Call new function find_working_set to obtain working set information. * gcc/coverage.c (read_counts_file): Merge histograms, and fix bug with accessing summary info for non-summable counters. * gcc/basic-block.h (gcov_type_unsigned): New type. (struct gcov_working_set_info): Ditto. (find_working_set): Declare. * gcc/gcov-dump.c (tag_summary): Dump out histogram. * gcc/configure.ac (HOST_HAS_F_SETLKW): Set based on compile test using F_SETLKW with fcntl. * gcc/configure, gcc/config.in: Regenerate. Index: libgcc/libgcov.c === --- libgcc/libgcov.c(revision 191302) +++ libgcc/libgcov.c(working copy) @@ -585,6 +585,76 @@ gcov_dump_module_info (void) __gcov_finalize_dyn_callgraph (); } +/* Insert counter VALUE into HISTOGRAM. */ + +static void +gcov_histogram_insert(gcov_bucket_type *histogram, gcov_type value) +{ + unsigned i; + + i = gcov_histo_index(value); + histogram[i].num_counters++; + histogram[i].cum_value += value; + if (value < histogram[i].min_value) +histogram[i].min_value = value; +} + +/* Computes a histogram of the arc counters to place in the summary SUM. */ + +static void +gcov_compute_histogram (struct gcov_summary *sum) +{ + struct gcov_info *gi_ptr; + const struct gcov_fn_info *gfi_ptr; + const struct gcov_ctr_info *ci_ptr; + struct gcov_ctr_summary *cs_ptr; + unsigned t_ix, f_ix, ctr_info_ix, ix; + int h_ix; + + /* This currently only applies to arc counters. */ + t_ix = GCOV_COUNTER_ARCS; + + /* First check if there are any counts recorded for this counter. */ + cs_ptr = &(sum->ctrs[t_ix]); + if (!cs_ptr->num) +return; + + for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++) +{ + cs_ptr->histogram[h_ix].num_counters = 0; + cs_ptr->histogram[h_ix].min_value = cs_ptr->run_max; + cs_ptr->histogram[h_ix].cum_value = 0; +} + + /* Walk through all the per-object structures and record each of + the count values in histogram. */ + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) +{ + if (!gi_ptr->merge[t_ix]) +continue; + + /* Find the appropriate index into the gcov_ctr_info array + for the counter we are currently working on based on the + existence of the merge function pointer for this object. */ + for (ix = 0, ctr_info_ix = 0; ix < t_ix; ix++) +{ + if (gi_ptr->merge[ix]) +ctr_info_ix++; +} + for (f_ix = 0; f_ix != gi_ptr->n_functions; f_ix++) +{ + gfi_ptr = gi_ptr->functions[f_ix]; + + if (!gfi_ptr || gfi_ptr->key != gi_ptr) +continue; + + ci_ptr = &gfi_ptr->ctrs[ctr_info_ix]; + for (ix = 0; ix < ci_ptr->num; ix++) +gcov_histogram_insert (cs_ptr->histogram, ci_ptr->values[ix]); +} +} +} + /* Dump the coverage counts. We merge with existing counts when possible, to avoid growing the .da files ad infinitum. We use this program's checksum to make sure we only accumulate whole program @@ -758,118 +828,6 @@ gcov_sort_topn_counter_arrays (const struct gcov_i } } -/* Used by qsort to sort gcov values in descending order. */ - -static int -sort_by_reverse_gcov_va
Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)
On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson wrote: > Backport from trunk r190952 to add counter histogram to gcov program summary, > and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238). Why don't we just get this via the trunk -> google/main merges? Diego.
Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)
On Fri, Sep 14, 2012 at 1:10 PM, Diego Novillo wrote: > On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson wrote: >> Backport from trunk r190952 to add counter histogram to gcov program summary, >> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238). > > Why don't we just get this via the trunk -> google/main merges? > > > Diego. Should I just put it onto ggogle/4_7 and 4_6 directly then? Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)
On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote: Should I just put it onto ggogle/4_7 and 4_6 directly then? Yeah. Not sure it's really needed in 4_6, though. Diego.
Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)
Yes. The google/main update will happen next quarter. David On Fri, Sep 14, 2012 at 1:17 PM, Teresa Johnson wrote: > On Fri, Sep 14, 2012 at 1:10 PM, Diego Novillo wrote: >> On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson wrote: >>> Backport from trunk r190952 to add counter histogram to gcov program >>> summary, >>> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238). >> >> Why don't we just get this via the trunk -> google/main merges? >> >> >> Diego. > > Should I just put it onto ggogle/4_7 and 4_6 directly then? > > Teresa > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)
On Fri, Sep 14, 2012 at 1:19 PM, Diego Novillo wrote: > On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote: > >> Should I just put it onto ggogle/4_7 and 4_6 directly then? > > > Yeah. Not sure it's really needed in 4_6, though. Ok. There are only trivial differences between the patch I uploaded for google/main and the google/4_7 patch that I have also already created and tested. Ok for google/4_7? Thanks, Teresa > > > Diego. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)
yes. thanks, David On Fri, Sep 14, 2012 at 1:20 PM, Teresa Johnson wrote: > On Fri, Sep 14, 2012 at 1:19 PM, Diego Novillo wrote: >> On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote: >> >>> Should I just put it onto ggogle/4_7 and 4_6 directly then? >> >> >> Yeah. Not sure it's really needed in 4_6, though. > > Ok. There are only trivial differences between the patch I uploaded > for google/main and the google/4_7 patch that I have also already > created and tested. Ok for google/4_7? > > Thanks, > Teresa > >> >> >> Diego. > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH,ARM] Suppress the dynamic linker commands for statically linked programs
Hi, Recently we found out that the .interp section starts to show up in ARM executables compiled with "-shared -static" and the gold linker from binutils 2.22. We tracked down the origin of the dynamic linker commands and they are always explicitly specified in config/arm/linux-elf.h. We tested the following simple patch to suppress the dynamic linker options for statically linked programs on Android's AOSP tree. Everything builds fine and the unneeded .interp section is gone. Thanks, -Ben == gcc/ChangeLog 2012-09-14 Ben Cheng * config/arm/linux-elf.h: Suppress the dynamic linker commands for statically linked programs. Index: config/arm/linux-elf.h === --- config/arm/linux-elf.h (revision 191198) +++ config/arm/linux-elf.h (working copy) @@ -65,8 +65,9 @@ %{static:-Bstatic} \ %{shared:-shared} \ %{symbolic:-Bsymbolic} \ - %{rdynamic:-export-dynamic} \ - -dynamic-linker " GNU_USER_DYNAMIC_LINKER " \ + %{!static: \ + %{rdynamic:-export-dynamic} \ + -dynamic-linker " GNU_USER_DYNAMIC_LINKER "} \ -X \ %{mbig-endian:-EB} %{mlittle-endian:-EL}" \ SUBTARGET_EXTRA_LINK_SPEC
Re: PR 44436 Associative containers emplace/emplace_hint
On 09/14/2012 10:07 PM, François Dumont wrote: Hi Here is a patch to add emplace/emplace_hint on associative containers in C++11 mode. Ah, excellent! I will review the patch over the next couple of days. I did some refactoring to use as much of the same code between insert and emplace methods.. I have also change map::operator[] to now use the emplace logic in C++11 so that we only need the value type to be default constructible. About this, can I ask you to likewise add completely similar testcases too? Thanks! Paolo.
Re: [TILE-Gx, committed] support -mcmodel=MODEL
On 9/1/2012 7:33 AM, Gerald Pfeifer wrote: On Tue, 28 Aug 2012, Walter Lee wrote: This patch adds support for the -mcmodel=MODEL flag on TILE-Gx. At which point I cannot help asking for an update to the release notes at http://gcc.gnu.org/gcc-4.8/changes.html. ;-) Let me know if you need help with that. How does this look: --- changes.html6 Sep 2012 03:42:45 - 1.28 +++ changes.html14 Sep 2012 20:40:39 - @@ -298,6 +298,13 @@ by this change. Added optimized instruction scheduling for Niagara4. +TILE-Gx + + +Added support for the -mcmodel=MODEL command-line option. The +models supported are small and large. + + XStormy16 @emph{TILEPro Options} @gccoptlist{-mcpu=CPU -m32} Why are only -mcpu and -m32 listed here? Those are the only options supported on TILEPro. -m32 is basically a no-op as tilepro does not support other models. I've fixed the remaining grammar/spelling issues you pointed out. See patch below. Thanks, Walter * doc/invoke.texi (Option Summary): fix typesetting for -mcpu option for TILEPro and TILE-Gx. (TILE-Gx Options): Fix grammar and spellings in documentation for -mcmodel. --- gcc/doc/invoke.texi(revision 191306) +++ gcc/doc/invoke.texi(working copy) @@ -932,10 +932,10 @@ See RS/6000 and PowerPC Options. @gccoptlist{-Qy -Qn -YP,@var{paths} -Ym,@var{dir}} @emph{TILE-Gx Options} -@gccoptlist{-mcpu=CPU -m32 -m64 -mcmodel=@var{code-model}} +@gccoptlist{-mcpu=@var{cpu} -m32 -m64 -mcmodel=@var{code-model}} @emph{TILEPro Options} -@gccoptlist{-mcpu=CPU -m32} +@gccoptlist{-mcpu=@var{cpu} -m32} @emph{V850 Options} @gccoptlist{-mlong-calls -mno-long-calls -mep -mno-ep @gol @@ -19003,13 +19003,13 @@ These @samp{-m} options are supported on @table @gcctabopt @item -mcmodel=small @opindex mcmodel=small -Generate code for the small model. Distance for direct calls is +Generate code for the small model. The distance for direct calls is limited to 500M in either direction. PC-relative addresses are 32 bits. Absolute addresses support the full address range. @item -mcmodel=large @opindex mcmodel=large -Generate code for the large model. There is no limiation on call +Generate code for the large model. There is no limitation on call distance, pc-relative addresses, or absolute addresses. @item -mcpu=@var{name}
Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC
I don't think TARGET_MFCRF is correct. For example, if you use -mcpu=powerpc64 (which doesn't set this flag) you will get code that does not run on the newer machines. Sorry, but it seems to be working here... I explain how I tested this in the end of the email. David tells me all current CPUs actually do support the MFTB insns just fine, so that there is no problem. Cheers, Segher
Fix C ICE with casts to pointers to VLAs (PR c/54552)
Bug 54552 is a C front-end regression involving ICEs when an expression involving C_MAYBE_CONST_EXPR, such as a compound literal of variably modified type, is cast to a variably modified type. This patch fixes it by doing the appropriate folding before creating the outer C_MAYBE_CONST_EXPR. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. Applied to mainline. Will apply to 4.7 (when not frozen) and 4.6 branches subject to testing there. c: 2012-09-14 Joseph Myers PR c/54552 * c-typeck.c (c_cast_expr): When casting to a type requiring C_MAYBE_CONST_EXPR to be created, pass the inner expression to c_fully_fold first. testsuite: 2012-09-14 Joseph Myers PR c/54552 * gcc.c-torture/compile/pr54552-1.c: New test. Index: c/c-typeck.c === --- c/c-typeck.c(revision 191305) +++ c/c-typeck.c(working copy) @@ -4779,8 +4779,11 @@ c_cast_expr (location_t loc, struct c_type_name *t ret = build_c_cast (loc, type, expr); if (type_expr) { + bool inner_expr_const = true; + ret = c_fully_fold (ret, require_constant_value, &inner_expr_const); ret = build2 (C_MAYBE_CONST_EXPR, TREE_TYPE (ret), type_expr, ret); - C_MAYBE_CONST_EXPR_NON_CONST (ret) = !type_expr_const; + C_MAYBE_CONST_EXPR_NON_CONST (ret) = !(type_expr_const +&& inner_expr_const); SET_EXPR_LOCATION (ret, loc); } Index: testsuite/gcc.c-torture/compile/pr54552-1.c === --- testsuite/gcc.c-torture/compile/pr54552-1.c (revision 0) +++ testsuite/gcc.c-torture/compile/pr54552-1.c (revision 0) @@ -0,0 +1,8 @@ +void +f (void) +{ + unsigned n = 10; + + typedef double T[n]; + (double (*)[n])((unsigned char (*)[sizeof (T)]){ 0 }); +} -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC
On Fri, Sep 14, 2012 at 4:52 PM, Segher Boessenkool wrote: >>> I don't think TARGET_MFCRF is correct. For example, if you use >>> -mcpu=powerpc64 (which doesn't set this flag) you will get code >>> that does not run on the newer machines. >> >> Sorry, but it seems to be working here... >> I explain how I tested this in the end of the email. > > David tells me all current CPUs actually do support the MFTB insns > just fine, so that there is no problem. There is no ideal solution and this method will work on all PowerPC targets with a combination of GCC and Binutils support., so I think this is the best we can do. Thanks, David
Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC
Segher Boessenkool writes: I don't think TARGET_MFCRF is correct. For example, if you use -mcpu=powerpc64 (which doesn't set this flag) you will get code that does not run on the newer machines. Sorry, but it seems to be working here... I explain how I tested this in the end of the email. David tells me all current CPUs actually do support the MFTB insns just fine, so that there is no problem. I don't understand the problem. Let me paste again this snippet of code here: It was generated using the __builtin_ppc_get_timebase test case with the extra parameters -S -m64 -mcpu=power7. The rest is the same used by the test suite. .L.main: std 31,-8(1) stdu 1,-80(1) mr 31,1 mfspr 9, 268 std 9,56(31) li 9,0 stw 9,48(31) b .L2 I've just done this same test for all -mcpu values from power3 through power7 and both -m32 and -m64. The only value that outputs mftb is power3 in both environments. What am I missing? -- Tulio Magno
[PATCH, docs] Fix some obsolete info in tm.texi
While trying to revive an ancient 4.1-based port, I found that tm.texi still documented things like current_function_pretend_args_size which were deleted in 2008. I've updated the text to reflect the idioms used in current code. Checked in as obvious after building and inspecting the manual. -Sandra 2012-09-14 Sandra Loosemore gcc/ * doc/tm.texi.in (Stack Arguments): Update obsolete references to current_function_outgoing_args_size. (Function Entry): Likewise for current_function_pops_args, current_function_pretend_args_size, current_function_outgoing_args_size, and current_function_epilogue_delay_list. (Misc): Fix garbled sentence referencing nonexistent current_function_leaf_function. * doc/tm.texi: Regenerated. Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in (revision 191332) +++ gcc/doc/tm.texi.in (working copy) @@ -3863,11 +3863,12 @@ alignment. Then the definition should b If the value of this macro has a type, it should be an unsigned type. @end defmac -@findex current_function_outgoing_args_size +@findex outgoing_args_size +@findex crtl->outgoing_args_size @defmac ACCUMULATE_OUTGOING_ARGS A C expression. If nonzero, the maximum amount of space required for outgoing arguments -will be computed and placed into the variable -@code{current_function_outgoing_args_size}. No space will be pushed +will be computed and placed into +@code{crtl->outgoing_args_size}. No space will be pushed onto the stack for each call; instead, the function prologue should increase the stack frame size by this amount. @@ -3901,7 +3902,7 @@ if the function called is a library func If @code{ACCUMULATE_OUTGOING_ARGS} is defined, this macro controls whether the space for these arguments counts in the value of -@code{current_function_outgoing_args_size}. +@code{crtl->outgoing_args_size}. @end defmac @defmac STACK_PARMS_IN_REG_PARM_AREA @@ -4700,7 +4701,8 @@ others leave that for the caller to do. given @option{-mrtd} pops arguments in functions that take a fixed number of arguments. -@findex current_function_pops_args +@findex pops_args +@findex crtl->args.pops_args Your definition of the macro @code{RETURN_POPS_ARGS} decides which functions pop their own arguments. @code{TARGET_ASM_FUNCTION_EPILOGUE} needs to know what was decided. The number of bytes of the current @@ -4710,8 +4712,9 @@ function's arguments that this function @itemize @bullet @item -@findex current_function_pretend_args_size -A region of @code{current_function_pretend_args_size} bytes of +@findex pretend_args_size +@findex crtl->args.pretend_args_size +A region of @code{crtl->args.pretend_args_size} bytes of uninitialized space just underneath the first argument arriving on the stack. (This may not be at the very start of the allocated stack region if the calling sequence has pushed anything else since pushing the stack @@ -4738,7 +4741,7 @@ save area closer to the top of the stack @item @cindex @code{ACCUMULATE_OUTGOING_ARGS} and stack frames Optionally, when @code{ACCUMULATE_OUTGOING_ARGS} is defined, a region of -@code{current_function_outgoing_args_size} bytes to be used for outgoing +@code{crtl->outgoing_args_size} bytes to be used for outgoing argument lists of the function. @xref{Stack Arguments}. @end itemize @@ -4787,11 +4790,12 @@ may be reconsidered for a subsequent del (at least in principle) be considered for the so far unfilled delay slot. -@findex current_function_epilogue_delay_list +@findex epilogue_delay_list +@findex crtl->epilogue_delay_list @findex final_scan_insn The insns accepted to fill the epilogue delay slots are put in an RTL -list made with @code{insn_list} objects, stored in the variable -@code{current_function_epilogue_delay_list}. The insn for the first +list made with @code{insn_list} objects, stored in +@code{crtl->epilogue_delay_list}. The insn for the first delay slot comes first in the list. Your definition of the macro @code{TARGET_ASM_FUNCTION_EPILOGUE} should fill the delay slots by outputting the insns in this list, usually by calling @@ -10831,8 +10835,8 @@ the hard register itself, if it is known @code{MEM}. If you are returning a @code{MEM}, this is only a hint for the allocator; it might decide to use another register anyways. -You may use @code{current_function_leaf_function} in the hook, functions -that use @code{REG_N_SETS}, to determine if the hard +You may use @code{current_function_is_leaf} or +@code{REG_N_SETS} in the hook to determine if the hard register in question will not be clobbered. The default value of this hook is @code{NULL}, which disables any special allocation.
Re: [PATCH,mmix] convert to constraints.md
On Wed, 12 Sep 2012, Hans-Peter Nilsson wrote: > On Wed, 12 Sep 2012, Nathan Froyd wrote: > > > - Keeping old layout of "mmix_reg_or_8bit_operand". That looked like > > > a spurious change and I prefer the ior construct to the > > > if_then_else. > > > > ISTR without this change, there were lots of assembly changes like: > I'll try with your original patch and see it I can spot > something. Nope, I see no differences in the generated code before/after the patch-patch below (applied to your original patch, except edited as if using --no-prefix, to fit with my other patches). Case closed: I don't think gen* mishandled neither construct. --- patch.nathanorig.adjusted 2012-09-12 12:33:34.0 +0200 +++ patch3 2012-09-14 14:42:31.0 +0200 @@ -364,7 +364,7 @@ diff --git a/gcc/config/mmix/predicates. index b5773b8..7fa3bf1 100644 --- gcc/config/mmix/predicates.md +++ gcc/config/mmix/predicates.md -@@ -149,7 +149,13 @@ +@@ -149,7 +149,14 @@ ;; True if this is a register or an int 0..255. (define_predicate "mmix_reg_or_8bit_operand" @@ -372,9 +372,10 @@ index b5773b8..7fa3bf1 100644 - (match_operand 0 "register_operand") - (and (match_code "const_int") - (match_test "CONST_OK_FOR_LETTER_P (INTVAL (op), 'I')" -+ (if_then_else (match_code "const_int") -+(match_test "satisfies_constraint_I (op)") -+(match_operand 0 "register_operand"))) ++ (ior ++ (match_operand 0 "register_operand") ++ (and (match_code "const_int") ++ (match_test "satisfies_constraint_I (op)" + +;; True if this is a memory address, possibly strictly. + brgds, H-P
Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC
On Fri, Sep 14, 2012 at 8:34 PM, Tulio Magno Quites Machado Filho wrote: > Segher Boessenkool writes: > I don't understand the problem. There is no problem. Segher is concerned that -mcpu=powerpc64, which is suppose to generate "generic" PPC64 code produces mftb instead of mfspr. However, there is no right answer and it is unclear if it is better for -mcpu=powerpc64 to support the most processors now or only forward compatibility for the future. for the moment, working on all PPC64 systems seems like the better option. we can revisit this in the future when POWER3 is even father in the past. - david
Re: [PATCH] Set correct source location for deallocator calls
On Sat, Sep 8, 2012 at 2:42 PM, Dehao Chen wrote: > Hi, > > I've added a libjava unittest which verifies that this patch will not > break Java debug info. I've also incorporated Richard's review in the > previous mail. Attached is the new patch, which passed bootstrap and > all gcc/libjava testsuites on x86. > > Is it ok for trunk? > > Thanks, > Dehao > > gcc/ChangeLog: > 2012-09-08 Dehao Chen > > * tree-eh.c (goto_queue_node): New field. > (record_in_goto_queue): New parameter. > (record_in_goto_queue_label): New parameter. > (lower_try_finally_dup_block): New parameter. > (maybe_record_in_goto_queue): Update source location. > (lower_try_finally_copy): Likewise. > (honor_protect_cleanup_actions): Likewise. > * gimplify.c (gimplify_expr): Reset the location to unknown. > > gcc/testsuite/ChangeLog: > 2012-09-08 Dehao Chen > > * g++.dg/debug/dwarf2/deallocator.C: New test. > > libjava/ChangeLog: > 2012-09-08 Dehao Chen > > * testsuite/libjava.lang/sourcelocation.java: New cases. > * testsuite/libjava.lang/sourcelocation.out: New cases. On Linux/x86, I got FAIL: sourcelocation -O3 -findirect-dispatch output - source compiled test FAIL: sourcelocation -O3 output - source compiled test FAIL: sourcelocation -findirect-dispatch output - source compiled test FAIL: sourcelocation output - source compiled test spawn [open ...]^M -1 -1 -1 PASS: sourcelocation -findirect-dispatch execution - source compiled test FAIL: sourcelocation -findirect-dispatch output - source compiled test -- H.J.
Re: [PATCH] Set correct source location for deallocator calls
On Fri, Sep 14, 2012 at 9:25 PM, H.J. Lu wrote: > On Sat, Sep 8, 2012 at 2:42 PM, Dehao Chen wrote: >> Hi, >> >> I've added a libjava unittest which verifies that this patch will not >> break Java debug info. I've also incorporated Richard's review in the >> previous mail. Attached is the new patch, which passed bootstrap and >> all gcc/libjava testsuites on x86. >> >> Is it ok for trunk? >> >> Thanks, >> Dehao >> >> gcc/ChangeLog: >> 2012-09-08 Dehao Chen >> >> * tree-eh.c (goto_queue_node): New field. >> (record_in_goto_queue): New parameter. >> (record_in_goto_queue_label): New parameter. >> (lower_try_finally_dup_block): New parameter. >> (maybe_record_in_goto_queue): Update source location. >> (lower_try_finally_copy): Likewise. >> (honor_protect_cleanup_actions): Likewise. >> * gimplify.c (gimplify_expr): Reset the location to unknown. >> >> gcc/testsuite/ChangeLog: >> 2012-09-08 Dehao Chen >> >> * g++.dg/debug/dwarf2/deallocator.C: New test. >> >> libjava/ChangeLog: >> 2012-09-08 Dehao Chen >> >> * testsuite/libjava.lang/sourcelocation.java: New cases. >> * testsuite/libjava.lang/sourcelocation.out: New cases. > > On Linux/x86, I got > > FAIL: sourcelocation -O3 -findirect-dispatch output - source compiled test > FAIL: sourcelocation -O3 output - source compiled test > FAIL: sourcelocation -findirect-dispatch output - source compiled test > FAIL: sourcelocation output - source compiled test > > spawn [open ...]^M > -1 > -1 > -1 > PASS: sourcelocation -findirect-dispatch execution - source compiled test > FAIL: sourcelocation -findirect-dispatch output - source compiled test > > I am using binutils mainline 20120914 from CVS. -- H.J.
Re: [PATCH] Set correct source location for deallocator calls
On Fri, Sep 14, 2012 at 9:25 PM, H.J. Lu wrote: > On Sat, Sep 8, 2012 at 2:42 PM, Dehao Chen wrote: >> Hi, >> >> I've added a libjava unittest which verifies that this patch will not >> break Java debug info. I've also incorporated Richard's review in the >> previous mail. Attached is the new patch, which passed bootstrap and >> all gcc/libjava testsuites on x86. >> >> Is it ok for trunk? >> >> Thanks, >> Dehao >> >> gcc/ChangeLog: >> 2012-09-08 Dehao Chen >> >> * tree-eh.c (goto_queue_node): New field. >> (record_in_goto_queue): New parameter. >> (record_in_goto_queue_label): New parameter. >> (lower_try_finally_dup_block): New parameter. >> (maybe_record_in_goto_queue): Update source location. >> (lower_try_finally_copy): Likewise. >> (honor_protect_cleanup_actions): Likewise. >> * gimplify.c (gimplify_expr): Reset the location to unknown. >> >> gcc/testsuite/ChangeLog: >> 2012-09-08 Dehao Chen >> >> * g++.dg/debug/dwarf2/deallocator.C: New test. >> >> libjava/ChangeLog: >> 2012-09-08 Dehao Chen >> >> * testsuite/libjava.lang/sourcelocation.java: New cases. >> * testsuite/libjava.lang/sourcelocation.out: New cases. > > On Linux/x86, I got > > FAIL: sourcelocation -O3 -findirect-dispatch output - source compiled test > FAIL: sourcelocation -O3 output - source compiled test > FAIL: sourcelocation -findirect-dispatch output - source compiled test > FAIL: sourcelocation output - source compiled test > > spawn [open ...]^M > -1 > -1 > -1 > PASS: sourcelocation -findirect-dispatch execution - source compiled test > FAIL: sourcelocation -findirect-dispatch output - source compiled test I bet you have an older addr2line installed. Thanks, Andrew Pinski
Re: [Patch,avr] ad PR54222: Support saturated +, -, ABS
2012/9/14 Georg-Johann Lay : > This patch adds more fixed-point support, namely saturated operations: > > SS_PLUS, SS_MINUS, SS_NEG, SS_ABS, > US_PLUS, US_MINUS, US_NEG > > for all supported fixed-point modes: > > [U]QQ, > [U]HQ, [U]HA, > [U]SQ, [U]SA, > [U]DQ, [U]DA, [U]TA. > > Depending on their complexity, the functions are implemented in libgcc > or are natively supported by avr-gcc. > > The bulk of code is in the avr_out_plus_1 routine which has been generalized > to perform saturation. > > avr_out_plus has been rewritten and is now generic enough to handle all the > cases that were formerly treated by: > avr_out_plus > avr_out_plus_noclobber > avr_out_minus > avr_out_plus64 > avr_out_minus64 > > The latter 4 functions are removed and the md files are cleaned up to use > avr_out_plus. > > There are no new regressions. > > However, all new tests with "-Os -flto" fail because they trigger a > segmentation fault in lto1 at >gcc/tree-streamer-in.c:unpack_ts_fixed_cst_value_fields() > while that function tries to deserialize TREE_FIXED_CST. > > Thus, these FAILs are because of an LTO issue. > > Except the "-Os -flto" cases, all other new tests PASS. > > Ok for trunk? > Ok. Please apply. Denis.