Re: [PATCH fortran/diagnostics] Move gfc_error (buffered) to common diagnostics (try 2)
Manuel López-Ibáñez writes: > New version using XNEW. Bootstrapped & tested on x86_64-linux-gnu. > > OK? The diagnostics infrastructure changes are OK for me. Thanks! Cheers, -- Dodji
Re: [RFC] diagnostics.c: For terminals, restrict messages to terminal width?
Tobias Burnus writes: > 2014-12-06 Tobias Burnus > Manuel L³pez-Ib¡±ez > > gcc/ > * diagnostic.c (get_terminal_width): Renamed from getenv_columns, > removed static, and additionally use ioctl to get width. > (diagnostic_set_caret_max_width): Update call. > * diagnostic.h (get_terminal_width): Add prototype. > * opts.c (print_specific_help): Use it for x_help_columns. > * doc/invoke.texi (fdiagnostics-show-caret): Document how the > width is set. > > gcc/fortran/ > * error.c (gfc_get_terminal_width): Renamed from > get_terminal_width and use same-named common function. > (gfc_error_init_1): Update call. The diagnostics infrastructure changes are OK for me. Thanks! Cheers, -- Dodji
Re: [PATCH] TYPE_OVERFLOW_* cleanup
On Wed, Dec 10, 2014 at 08:11:02PM +0100, Marc Glisse wrote: > >+inline tree > >+any_integral_type_check (tree __t, const char *__f, int __l, const char > >*__g) > >+{ > >+ if (!(INTEGRAL_TYPE_P (__t) > >+|| ((TREE_CODE (__t) == COMPLEX_TYPE > >+ || VECTOR_TYPE_P (__t)) > >+&& INTEGRAL_TYPE_P (TREE_TYPE (__t) > >+tree_check_failed (__t, __f, __l, __g, BOOLEAN_TYPE, ENUMERAL_TYPE, > >+ INTEGER_TYPE, 0); > >+ return __t; > >+} > > Is there a particular reason why you are avoiding ANY_INTEGRAL_TYPE_P in > any_integral_type_check? No, I'm just blind ;). Changed in the following, thanks for looking into this! Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-12-11 Marek Polacek * fold-const.c (fold_negate_expr): Add ANY_INTEGRAL_TYPE_P check. (extract_muldiv_1): Likewise. (maybe_canonicalize_comparison_1): Likewise. (fold_comparison): Likewise. (tree_binary_nonnegative_warnv_p): Likewise. (tree_binary_nonzero_warnv_p): Likewise. * gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise. * tree-scalar-evolution.c (simple_iv): Likewise. (scev_const_prop): Likewise. * tree-ssa-loop-niter.c (expand_simple_operations): Likewise. * tree-vect-generic.c (expand_vector_operation): Likewise. * tree.h (ANY_INTEGRAL_TYPE_CHECK): Define. (ANY_INTEGRAL_TYPE_P): Define. (TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS): Add ANY_INTEGRAL_TYPE_CHECK. (any_integral_type_check): New function. diff --git gcc/fold-const.c gcc/fold-const.c index 0d947ae..7b68bea 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -558,7 +558,8 @@ fold_negate_expr (location_t loc, tree t) case INTEGER_CST: tem = fold_negate_const (t, type); if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) - || (!TYPE_OVERFLOW_TRAPS (type) + || (ANY_INTEGRAL_TYPE_P (type) + && !TYPE_OVERFLOW_TRAPS (type) && TYPE_OVERFLOW_WRAPS (type)) || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0) return tem; @@ -5951,7 +5952,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, || EXPRESSION_CLASS_P (op0)) /* ... and has wrapping overflow, and its type is smaller than ctype, then we cannot pass through as widening. */ - && ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)) + && (((ANY_INTEGRAL_TYPE_P (TREE_TYPE (op0)) + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))) && (TYPE_PRECISION (ctype) > TYPE_PRECISION (TREE_TYPE (op0 /* ... or this is a truncation (t is narrower than op0), @@ -5966,7 +5968,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, /* ... or has undefined overflow while the converted to type has not, we cannot do the operation in the inner type as that would introduce undefined overflow. */ - || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)) + || ((ANY_INTEGRAL_TYPE_P (TREE_TYPE (op0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))) && !TYPE_OVERFLOW_UNDEFINED (type break; @@ -8497,7 +8500,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum tree_code code, tree type, /* Match A +- CST code arg1 and CST code arg1. We can change the first form only if overflow is undefined. */ - if (!((TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) + if (!(((ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) /* In principle pointers also have undefined overflow behavior, but that causes problems elsewhere. */ && !POINTER_TYPE_P (TREE_TYPE (arg0)) @@ -8712,7 +8716,9 @@ fold_comparison (location_t loc, enum tree_code code, tree type, /* Transform comparisons of the form X +- C1 CMP C2 to X CMP C2 -+ C1. */ if ((TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR) - && (equality_code || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) + && (equality_code + || (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0 && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST && !TREE_OVERFLOW (TREE_OPERAND (arg0, 1)) && TREE_CODE (arg1) == INTEGER_CST @@ -9031,7 +9037,8 @@ fold_comparison (location_t loc, enum tree_code code, tree type, X CMP Y +- C2 +- C1 for signed X, Y. This is valid if the resulting offset is smaller in absolute value than the original one and has the same sign. */ - if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) + if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) && (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR) && (TREE_CODE (TREE_O
[Patch, libstdc++/64239] Fix regex_iterator copying
As discussed in Bugzilla. Bootstrapped and tested. Is it Ok to backport it to 4.9 branch, with _M_in_iterator kept unused? Thanks! :) -- Regards, Tim Shen commit 18c4399589b414c79c6e85ab91f7a95f2fcad829 Author: timshen Date: Wed Dec 10 21:30:13 2014 -0800 PR libstdc++/64239 * include/bits/regex.h (match_results<>::match_results, match_results<>::operator=, match_results<>::position, match_results<>::swap): Remove match_results::_M_in_iterator. Fix ctor/assign/swap. * include/bits/regex.tcc: (__regex_algo_impl<>, regex_iterator<>::operator++): Set match_results::_M_begin as "start position". * testsuite/28_regex/iterators/regex_iterator/char/ string_position_01.cc: Test cases. diff --git a/libstdc++-v3/include/bits/regex.h b/libstdc++-v3/include/bits/regex.h index cb6bc93..3afec37 100644 --- a/libstdc++-v3/include/bits/regex.h +++ b/libstdc++-v3/include/bits/regex.h @@ -1563,42 +1563,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION */ explicit match_results(const _Alloc& __a = _Alloc()) - : _Base_type(__a), _M_in_iterator(false) + : _Base_type(__a) { } /** * @brief Copy constructs a %match_results. */ - match_results(const match_results& __rhs) - : _Base_type(__rhs), _M_in_iterator(false) - { } + match_results(const match_results& __rhs) = default; /** * @brief Move constructs a %match_results. */ - match_results(match_results&& __rhs) noexcept - : _Base_type(std::move(__rhs)), _M_in_iterator(false) - { } + match_results(match_results&& __rhs) noexcept = default; /** * @brief Assigns rhs to *this. */ match_results& - operator=(const match_results& __rhs) - { - match_results(__rhs).swap(*this); - return *this; - } + operator=(const match_results& __rhs) = default; /** * @brief Move-assigns rhs to *this. */ match_results& - operator=(match_results&& __rhs) - { - match_results(std::move(__rhs)).swap(*this); - return *this; - } + operator=(match_results&& __rhs) = default; /** * @brief Destroys a %match_results object. @@ -1685,13 +1673,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION difference_type position(size_type __sub = 0) const { - // [28.12.1.4.5] - if (_M_in_iterator) - return __sub < size() ? std::distance(_M_begin, - (*this)[__sub].first) : -1; - else - return __sub < size() ? std::distance(this->prefix().first, - (*this)[__sub].first) : -1; + return __sub < size() ? std::distance(_M_begin, + (*this)[__sub].first) : -1; } /** @@ -1876,7 +1859,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION */ void swap(match_results& __that) - { _Base_type::swap(__that); } + { + _Base_type::swap(__that); + swap(_M_begin, __that._M_begin); + } //@} private: @@ -1894,7 +1880,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION regex_constants::match_flag_type); _Bi_iter _M_begin; - bool _M_in_iterator; }; typedef match_results cmatch; diff --git a/libstdc++-v3/include/bits/regex.tcc b/libstdc++-v3/include/bits/regex.tcc index 9692402..b676428 100644 --- a/libstdc++-v3/include/bits/regex.tcc +++ b/libstdc++-v3/include/bits/regex.tcc @@ -62,6 +62,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return false; typename match_results<_BiIter, _Alloc>::_Base_type& __res = __m; + __m._M_begin = __s; __res.resize(__re._M_automaton->_M_sub_count() + 2); for (auto& __it : __res) __it.matched = false; @@ -572,7 +573,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto& __prefix = _M_match.at(_M_match.size()); __prefix.first = __prefix_first; __prefix.matched = __prefix.first != __prefix.second; - _M_match._M_in_iterator = true; + // [28.12.1.4.5] _M_match._M_begin = _M_begin; return *this; } @@ -587,7 +588,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto& __prefix = _M_match.at(_M_match.size()); __prefix.first = __prefix_first; __prefix.matched = __prefix.first != __prefix.second; - _M_match._M_in_iterator = true; + // [28.12.1.4.5] _M_match._M_begin = _M_begin; } else diff --git a/libstdc++-v3/testsuite/28_regex/iterators/regex_iterator/char/string_position_01.cc b/libstdc++-v3/testsuite/28_regex/iterators/regex_iterator/char/string_position_01.cc index 5fa4ea7..91aa061 100644 ---
Re: [PATCH fortran/diagnostics] Move gfc_error (buffered) to common diagnostics (try 2)
Dodji Seketeli wrote: Manuel López-Ibáñez writes: New version using XNEW. Bootstrapped & tested on x86_64-linux-gnu. OK? The diagnostics infrastructure changes are OK for me. Thanks! And the Fortran part was already approved before. (Otherwise, take this as another rubber stamp.) Thanks also from my side! BTW: The terminal-width patch was committed as Rev. 218619. Tobias
Re: [PATCH 2/3] Extended if-conversion
On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev wrote: > Richard, > > Thanks for your reply! > > I didn't understand your point: > > Well, I don't mind splitting all critical edges unconditionally > > but you do it unconditionally in proposed patch. I don't mind means I am fine with it. > Also I assume that > call of split_critical_edges() can break ssa. For example, we can > split headers of loops, loop exit blocks etc. How does that "break SSA"? You mean loop-closed SSA? I'd be surprised if so but that may be possible. > I prefer to do something > more loop-specialized, e.g. call edge_split() for critical edges > outgoing from bb ending with GIMPLE_COND stmt (assuming that edge > destination bb belongs to loop). That works for me as well but it is more complicated to implement. Ideally you'd only split one edge if you find a block with only critical predecessors (where we'd currently give up). But note that this requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it will change loop->num_nodes so we have to be more careful in constructing the loop calling if_convertible_bb_p. Richard. > > 2014-12-10 17:31 GMT+03:00 Richard Biener : >> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> Sorry that I forgot to delete debug dump from my fix. >>> I have few questions about your comments. >>> >>> 1. You wrote : You also still have two functions for PHI predication. And the new extended variant doesn't commonize the 2-args and general path >>> Did you mean that I must combine predicate_scalar_phi and >>> predicate_extended scalar phi to one function? >>> Please note that if additional flag was not set up (i.e. >>> aggressive_if_conv is false) extended predication is required more >>> compile time since it builds hash_map. >> >> It's compile-time complexity is reasonable enough even for >> non-aggressive if-conversion. >> >>> 2. About critical edge splitting. >>> >>> Did you mean that we should perform it (1) under aggressive_if_conv >>> option only; (2) should we split all critical edges. >>> Note that this leads to recomputing of topological order. >> >> Well, I don't mind splitting all critical edges unconditionally, thus >> do something like >> >> Index: gcc/tree-if-conv.c >> === >> --- gcc/tree-if-conv.c (revision 218515) >> +++ gcc/tree-if-conv.c (working copy) >> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f >>if (number_of_loops (fun) <= 1) >> return 0; >> >> + bool critical_edges_split_p = false; >>FOR_EACH_LOOP (loop, 0) >> if (flag_tree_loop_if_convert == 1 >> || flag_tree_loop_if_convert_stores == 1 >> || ((flag_tree_loop_vectorize || loop->force_vectorize) >> && !loop->dont_vectorize)) >> - todo |= tree_if_conversion (loop); >> + { >> + if (!critical_edges_split_p) >> + { >> + split_critical_edges (); >> + critical_edges_split_p = true; >> + todo |= TODO_cleanup_cfg; >> + } >> + todo |= tree_if_conversion (loop); >> + } >> >> #ifdef ENABLE_CHECKING >>{ >> >>> It is worth noting that in current implementation bb's with 2 >>> predecessors and both are on critical edges are accepted without >>> additional option. >> >> Yes, I know. >> >> tree-if-conv.c is a mess right now and if we can avoid adding more >> to it and even fix the critical edge missed optimization with splitting >> critical edges then I am all for that solution. >> >> Richard. >> >>> Thanks ahead. >>> Yuri. >>> 2014-12-09 18:20 GMT+03:00 Richard Biener : On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev wrote: > Richard, > > Here is updated patch2 with the following changes: > 1. Delete functions phi_has_two_different_args and find_insertion_point. > 2. Use only one function for extended predication - > predicate_extended_scalar_phi. > 3. Save gsi before insertion of predicate computations for basic > blocks if it has 2 predecessors and > both incoming edges are critical or it gas more than 2 predecessors > and at least one incoming edge > is critical. This saved iterator can be used by extended phi predication. > > Here is motivated test-case which explains this point. > Test-case is attached (t5.c) and it must be compiled with -O2 > -ftree-loop-vectorize -fopenmp options. > The problem phi is in bb-7: > > bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) > { > : > xmax_edge_18 = xmax_edge_36 + 1; > if (xmax_17 == xmax_27) > goto ; > else > goto ; > > } > bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) > { > : > if (xmax_17 == xmax_27) > goto ; > else > goto ; > > } > bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) > { > : > # xmax_ed
Re: [PATCH 00/13] Go closures, libffi, and the static chain
On Fri, Oct 10, 2014 at 01:42:40PM -0700, Richard Henderson wrote: > The background here is my thread from last week[1], and Ian's reply[2], > wherein he rightly points out that not needing to play games with > mmap in order to implement closures for Go is a strong reason to > continue using custom code within libgo. > > While that thread did have a go at implementing that custom code for > aarch64, I still think that replicating libffi's calling convention > knowledge for every interesting target is a mistake. > > So instead I thought about how I'd add some support for Go directly > into libffi. ... > But the comment immediately before __go_set_closure itself says > that it would be better to use the static chain register. ... > Before I go too much farther down this road, I wanted to get some > feedback. FWIW, a complete tree can be found at [4]. ... > [4] git://github.com/rth7680/gcc.git rth/go-closure 1) On s390x, the static chain register cannot be used for passing the Go closure pointer to a function: According to the Abi, the dynamic linker is allowed to destroy the contents of r0 (static chain register) eventually causing a crash if libgo is linked dynamically. The assumption that the static chain register can be used to pass information to a function is wrong for s390x. 2) With this branch, the reflection tests on amd64 crash: $ cd /build # build gcc $ cd /libgo $ make reflect/check --> -- snip -- Aborted reflect.call ../../../libgo/runtime/go-reflect-call.c:216 reflect.call.N13_reflect.Value GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/value.go:579 reflect.Call.N13_reflect.Value GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/value.go:412 reflect_test.TestCallWithStruct GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/all_test.go:1490 testing.tRunner ../../../libgo/go/testing/testing.go:422 goroutine 16 [chan receive]: testing.RunTests ../../../libgo/go/testing/testing.go:505 testing.Main ../../../libgo/go/testing/testing.go:435 main.main GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/_testmain.go:124 created by main ../../../libgo/runtime/go-main.c:42 goroutine 18 [finalizer wait]: created by runtime_createfing ../../../libgo/runtime/mgc0.c:2572 goroutine 53 [sleep]: reflect_test.selectWatcher GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/all_test.go:1377 created by reflect_test.$nested2 GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/all_test.go:1107 FAIL: reflect make: *** [reflect/check] Error 1 -- snip -- Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [PATCH 00/13] Go closures, libffi, and the static chain
On Thu, Dec 11, 2014 at 10:06:23AM +0100, Dominik Vogt wrote: > On s390x, the static chain register cannot be used for passing the > Go closure pointer to a function: According to the Abi, the > dynamic linker is allowed to destroy the contents of r0 (static > chain register) eventually causing a crash if libgo is linked > dynamically. The assumption that the static chain register can be > used to pass information to a function is wrong for s390x. I was worried about exactly the same "problem" on powerpc with r11 being used for the static chain and also destroyed in linkage stubs. It turns out we don't traverse any linkage stubs. See https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00446.html. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model
On 10/12/14 22:18, Mike Stump wrote: On Dec 10, 2014, at 10:05 AM, Kyrill Tkachov wrote: Thanks for the guidance. I've moved the definitions into a separate file and included that in the places that use it (more than 2 places in my count). This is the patch attached. The second patch (will send shortly after this) adds the logic to libstdc++. Ok? Ok. If anyone else wants to refactor annoying to maintain code into a single place… certainly the legacy of cut-n-paste programming is alive and well in the *.exp files. It was never a design goal to replicate annoying to maintain code. :-) Thanks, and the patch that adds the libstdc++.exp changes at https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00952.html using the new target-utils.exp file is ok too then? Cheers, Kyrill
[PATCH][ARM][cleanup] Use R0_REGNUM and R1_REGNUM instead of 0 and 1 where appropriate
Hi all, While looking in this area on other business I noticed we could be using the names R0_REGNUM and R1_REGNUM when creating those REG rtxs since it's a bit more descriptive that just 0 and 1. Tested arm-none-eabi. Ok for trunk? Thanks, Kyrill 2014-12-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (arm_load_tp): Use R0_REGNUM instead of constant 0 in gen_rtx_REG. (arm_tls_descseq_addr): Likewise. (arm_gen_movmemqi): Likewise. (arm_expand_epilogue_apcs_frame): Likewise. (arm_expand_epilogue): Likewise. (arm_expand_prologue): Likewise. Use R1_REGNUM instead of constant 1 in gen_rtx_REG.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 64494e8..d17c81d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -7431,7 +7431,7 @@ arm_load_tp (rtx target) emit_insn (gen_load_tp_soft ()); - tmp = gen_rtx_REG (SImode, 0); + tmp = gen_rtx_REG (SImode, R0_REGNUM); emit_move_insn (target, tmp); } return target; @@ -7495,13 +7495,13 @@ arm_tls_descseq_addr (rtx x, rtx reg) gen_rtx_CONST (VOIDmode, label), GEN_INT (!TARGET_ARM)), UNSPEC_TLS); - rtx reg0 = load_tls_operand (sum, gen_rtx_REG (SImode, 0)); + rtx reg0 = load_tls_operand (sum, gen_rtx_REG (SImode, R0_REGNUM)); emit_insn (gen_tlscall (x, labelno)); if (!reg) reg = gen_reg_rtx (SImode); else -gcc_assert (REGNO (reg) != 0); +gcc_assert (REGNO (reg) != R0_REGNUM); emit_move_insn (reg, reg0); @@ -14659,7 +14659,7 @@ arm_gen_movmemqi (rtx *operands) else { mem = adjust_automodify_address (dstbase, SImode, dst, dstoffset); - emit_move_insn (mem, gen_rtx_REG (SImode, 0)); + emit_move_insn (mem, gen_rtx_REG (SImode, R0_REGNUM)); if (last_bytes != 0) { emit_insn (gen_addsi3 (dst, dst, GEN_INT (4))); @@ -21092,8 +21092,8 @@ arm_expand_prologue (void) Just tell it we saved SP in r0. */ gcc_assert (TARGET_THUMB2 && !arm_arch_notm && args_to_push == 0); - r0 = gen_rtx_REG (SImode, 0); - r1 = gen_rtx_REG (SImode, 1); + r0 = gen_rtx_REG (SImode, R0_REGNUM); + r1 = gen_rtx_REG (SImode, R1_REGNUM); insn = emit_insn (gen_movsi (r0, stack_pointer_rtx)); RTX_FRAME_RELATED_P (insn) = 1; @@ -24866,7 +24866,7 @@ arm_expand_epilogue_apcs_frame (bool really_return) /* Restore the original stack pointer. Before prologue, the stack was realigned and the original stack pointer saved in r0. For details, see comment in arm_expand_prologue. */ -emit_insn (gen_movsi (stack_pointer_rtx, gen_rtx_REG (SImode, 0))); +emit_insn (gen_movsi (stack_pointer_rtx, gen_rtx_REG (SImode, R0_REGNUM))); emit_jump_insn (simple_return_rtx); } @@ -25148,7 +25148,7 @@ arm_expand_epilogue (bool really_return) /* Restore the original stack pointer. Before prologue, the stack was realigned and the original stack pointer saved in r0. For details, see comment in arm_expand_prologue. */ -emit_insn (gen_movsi (stack_pointer_rtx, gen_rtx_REG (SImode, 0))); +emit_insn (gen_movsi (stack_pointer_rtx, gen_rtx_REG (SImode, R0_REGNUM))); emit_jump_insn (simple_return_rtx); }
Re: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.
On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener wrote: > On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng wrote: >> Hi, >> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt >> issue still exists. >> >> Current candidate selecting algorithm tends to select fewer candidates given >> below reasons: >> 1) to better handle loops with many induction uses but the best choice is >> one generic basic induction variable; >> 2) to keep compilation time low. >> >> One fundamental weakness of the strategy is the opposite situation can't be >> handled properly sometimes. For these cases the best choice is each >> induction variable has its own candidate. >> This patch fixes the problem by shuffling candidate set after fix-point is >> reached by current implementation. The reason why this strategy works is it >> replaces candidate set by selecting local optimal candidate for some >> induction uses, and the new candidate set (has lower cost) is exact what we >> want in the mentioned case. Instrumentation data shows this can find better >> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >> >> This patch actually is extension to the first version patch posted at >> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >> another selecting pass with special seed set (more or less like the shuffled >> set in this patch). Data also confirms this patch can find optimal sets for >> most loops found by the first one, as well as optimal sets for many new >> loops. >> >> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >> test on aarch64. >> Since this patch only selects candidate set with lower cost, any regressions >> revealed are latent bugs of other components in GCC. >> I also collected GCC bootstrap time on x86_64, no regression either. >> Is this OK? > > The algorithm seems to be quadratic in the number of IV candidates > (at least): Yes, I worried about that too, that's why I measured the bootstrap time. One way is restrict this procedure one time for each loop. I already tried that and it can capture +90% loops. Is this sounds reasonable? BTW, do we have some compilation time benchmarks for GCC? Thanks, bin > > + for (i = 0; i < n_iv_cands (data); i++) > + { > ... > + iv_ca_replace (data, ivs, cand, act_delta, &tmp_delta); > ... > > and > > +static void > +iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs, > + struct iv_cand *cand, struct iv_ca_delta *act_delta, > + struct iv_ca_delta **delta) > +{ > ... > + for (i = 0; i < ivs->upto; i++) > +{ > ... > + if (data->consider_all_candidates) > + { > + for (j = 0; j < n_iv_cands (data); j++) > + { > > possibly cubic if ivs->upto is of similar value. > > I wonder if it is possible to restrict this to the single IV with > the largest delta? After all we are iterating try_improve_iv_set. > Alternatively move the handling out of iteration completey, > thus into the caller of try_improve_iv_set? > > Note that compile-time issues always arise in auto-generated code, > not during GCC bootstrap. > > Richard. > > >> 2014-12-03 Bin Cheng bin.ch...@arm.com >> >> PR tree-optimization/62178 >> * tree-ssa-loop-ivopts.c (iv_ca_replace): New function. >> (try_improve_iv_set): Shuffle candidates set in order to handle >> case in which candidate wrto each iv use should be selected. >> >> gcc/testsuite/ChangeLog >> 2014-12-03 Bin Cheng bin.ch...@arm.com >> >> PR tree-optimization/62178 >> * gcc.target/aarch64/pr62178.c: New test.
Re: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.
On Thu, Dec 11, 2014 at 5:56 PM, Bin.Cheng wrote: > On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener > wrote: >> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng wrote: >>> Hi, >>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt >>> issue still exists. >>> >>> Current candidate selecting algorithm tends to select fewer candidates given >>> below reasons: >>> 1) to better handle loops with many induction uses but the best choice is >>> one generic basic induction variable; >>> 2) to keep compilation time low. >>> >>> One fundamental weakness of the strategy is the opposite situation can't be >>> handled properly sometimes. For these cases the best choice is each >>> induction variable has its own candidate. >>> This patch fixes the problem by shuffling candidate set after fix-point is >>> reached by current implementation. The reason why this strategy works is it >>> replaces candidate set by selecting local optimal candidate for some >>> induction uses, and the new candidate set (has lower cost) is exact what we >>> want in the mentioned case. Instrumentation data shows this can find better >>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >>> >>> This patch actually is extension to the first version patch posted at >>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >>> another selecting pass with special seed set (more or less like the shuffled >>> set in this patch). Data also confirms this patch can find optimal sets for >>> most loops found by the first one, as well as optimal sets for many new >>> loops. >>> >>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >>> test on aarch64. >>> Since this patch only selects candidate set with lower cost, any regressions >>> revealed are latent bugs of other components in GCC. >>> I also collected GCC bootstrap time on x86_64, no regression either. >>> Is this OK? >> >> The algorithm seems to be quadratic in the number of IV candidates >> (at least): > Yes, I worried about that too, that's why I measured the bootstrap > time. One way is restrict this procedure one time for each loop. I > already tried that and it can capture +90% loops. Is this sounds > reasonable? By +90%, I mean 90% from the 6% improved loops, not the total loop number... > > BTW, do we have some compilation time benchmarks for GCC? > > Thanks, > bin >> >> + for (i = 0; i < n_iv_cands (data); i++) >> + { >> ... >> + iv_ca_replace (data, ivs, cand, act_delta, &tmp_delta); >> ... >> >> and >> >> +static void >> +iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs, >> + struct iv_cand *cand, struct iv_ca_delta *act_delta, >> + struct iv_ca_delta **delta) >> +{ >> ... >> + for (i = 0; i < ivs->upto; i++) >> +{ >> ... >> + if (data->consider_all_candidates) >> + { >> + for (j = 0; j < n_iv_cands (data); j++) >> + { >> >> possibly cubic if ivs->upto is of similar value. >> >> I wonder if it is possible to restrict this to the single IV with >> the largest delta? After all we are iterating try_improve_iv_set. >> Alternatively move the handling out of iteration completey, >> thus into the caller of try_improve_iv_set? >> >> Note that compile-time issues always arise in auto-generated code, >> not during GCC bootstrap. >> >> Richard. >> >> >>> 2014-12-03 Bin Cheng bin.ch...@arm.com >>> >>> PR tree-optimization/62178 >>> * tree-ssa-loop-ivopts.c (iv_ca_replace): New function. >>> (try_improve_iv_set): Shuffle candidates set in order to handle >>> case in which candidate wrto each iv use should be selected. >>> >>> gcc/testsuite/ChangeLog >>> 2014-12-03 Bin Cheng bin.ch...@arm.com >>> >>> PR tree-optimization/62178 >>> * gcc.target/aarch64/pr62178.c: New test.
[PATCH PR62151]Fix REG_DEAD note distribution issue by using right ELIM_I0/ELIM_I1
Hi, As described both in the PR and patch comments, this patch fixes PR62151 by setting right value to ELIM_I0/ELIM_I1 when distributing REG_DEAD notes from i0/i1. It is said that distribute_notes had caused many bugs in the past. I think it still has bug in it, as noted in the PR. This patch doesn't touch distribute_notes because we are in stage3 and I want to have more discussion on it. Bootstrap and test on x86_64. aarch64 is ongoing. So is it ok? 2014-12-11 Bin Cheng PR rtl-optimization/62151 * combine.c (try_combine): Reset elim_i0 and elim_i1 when distributing notes from i0notes or i1notes, this time don't check whether newi2pat sets i1dest or i0dest. gcc/testsuite/ChangeLog 2014-12-11 Bin Cheng PR rtl-optimization/62151 * gcc.c-torture/execute/pr62151.c: New test.Index: gcc/combine.c === --- gcc/combine.c (revision 218200) +++ gcc/combine.c (working copy) @@ -4183,11 +4183,42 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); if (i1notes) - distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL, - elim_i2, elim_i1, elim_i0); + { + /* When distributing REG_DEAD note from i1, it doesn't matter + if newi2pat sets i1dest/i0dest or not. Recompute and use + elim_i0/elim_i1 in temp variables. + + See PR62151, if we have four insns combination: +i0: r0 <- i0src +i1: r1 <- i1src (using r0) + REG_DEAD (r0) +i2: r0 <- i2src (using r1) +i3: r3 <- i3src (using r0) +ix: using r0 + From i1's point of view, r0 is eliminated, no matter if it is + set by newi2pat or not. In other words, REG_DEAD info for r0 + in i1 should be discarded. + + Note this only affects cases in which I2 is after I0/I1, like + "I1->I2->I3", "I0->I1->I2->I3" or "I0&I1->I2, I2->I3". For + other cases like "I0->I1, I1&I2->I3" or "I1&I2->I3", newi2pat + will not set i1dest or i0dest. */ + rtx tmp_elim_i1 = (i1 == 0 || i1dest_in_i1src || i1dest_in_i0src + || !i1dest_killed + ? 0 : i1dest); + rtx tmp_elim_i0 = (i0 == 0 || i0dest_in_i0src || !i0dest_killed + ? 0 : i0dest); + distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL, + elim_i2, tmp_elim_i1, tmp_elim_i0); + } if (i0notes) - distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL, - elim_i2, elim_i1, elim_i0); + { + /* Same with distribution of i1notes. */ + rtx tmp_elim_i0 = (i0 == 0 || i0dest_in_i0src || !i0dest_killed + ? 0 : i0dest); + distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL, + elim_i2, elim_i1, tmp_elim_i0); + } if (midnotes) distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); Index: gcc/testsuite/gcc.c-torture/execute/pr62151.c === --- gcc/testsuite/gcc.c-torture/execute/pr62151.c (revision 0) +++ gcc/testsuite/gcc.c-torture/execute/pr62151.c (revision 0) @@ -0,0 +1,41 @@ +/* PR rtl-optimization/62151 */ + +int a, c, d, e, f, g, h, i; +short b; + +int +fn1 () +{ + b = 0; + for (;;) +{ + int j[2]; + j[f] = 0; + if (h) + d = 0; + else + { + for (; f; f++) + ; + for (a = 0; a < 1; a++) + for (;;) + { + i = b & ((b ^ 1) & 83647) ? b : b - 1; + g = 1 ? i : 0; + e = j[0]; + if (c) + break; + return 0; + } + } +} +} + +int +main () +{ + fn1 (); + if (g != -1) +__builtin_abort (); + return 0; +}
[PATCH AARCH64]Make ldp/stp case less vulnerable
Hi, Case gcc.target/aarch64/ldp_stp_3.c test fails on aarch64-none-elf. Instead of merging the loads into ldp it generates: foo: adrpx1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 ldr w0, [x1, 4] ldr w3, [x1, 20] ldr w2, [x1, 32] ldr w1, [x1, 16] add x2, x3, x2 add x0, x0, x1 add x0, x2, x0 ret Once register allocation decides to load [x1, 16] into x1(w1) like below: 14: x0:DI = zero_extend([x1:DI+0x4]) 7: x3:DI = zero_extend([x1:DI+0x14]) 10: x2:DI = zero_extend([x1:DI+0x20]) 17: x1:DI = zero_extend([x1:DI+0x10]) Instructions 14/7/10 are anti-dependent on insn 17, bug sched_fusion orders ready list (14/7/10) in ascending order of address. As a result insn 10 intervenes between 7 and 17. This patch fixes this by making cases less vulnerable. One possible fix is to move sched_fusion after regrename, it does help a lot. I didn't do that because regrenamre is currently disabled. Tested on aarch64-elf. Is it OK? Thanks, bin gcc/testsuite/ChangeLog 2014-12-11 Bin Cheng * gcc.target/aarch64/ldp_stp_2.c: Make test less vulnerable. * gcc.target/aarch64/ldp_stp_3.c: Ditto.Index: gcc/testsuite/gcc.target/aarch64/ldp_stp_2.c === --- gcc/testsuite/gcc.target/aarch64/ldp_stp_2.c(revision 218558) +++ gcc/testsuite/gcc.target/aarch64/ldp_stp_2.c(working copy) @@ -7,10 +7,8 @@ long long foo () { long long ll = 0; - ll += arr[0][1]; ll += arr[1][0]; ll += arr[1][1]; - ll += arr[2][0]; return ll; } Index: gcc/testsuite/gcc.target/aarch64/ldp_stp_3.c === --- gcc/testsuite/gcc.target/aarch64/ldp_stp_3.c(revision 218558) +++ gcc/testsuite/gcc.target/aarch64/ldp_stp_3.c(working copy) @@ -7,10 +7,8 @@ unsigned long long foo () { unsigned long long ll = 0; - ll += arr[0][1]; ll += arr[1][0]; ll += arr[1][1]; - ll += arr[2][0]; return ll; }
[PATCH] Fix PR42108
The following patch fixes the performance regression in PR42108 by allowing PRE and LIM to see the division (to - from) / step in translating do loops executed unconditionally. This makes them not care for the fact that step might be zero and thus the division might trap. This makes the runtime of the testcase improve from 10.7s to 8s (same as gfortran 4.3). The caveat is that iff the loop is not executed (to < from for positive step for example) then there will be an additional executed division computing the unused countm1. Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk? Thanks, Richard. 2014-12-11 Richard Biener PR tree-optimization/42108 * trans-stmt.c (gfc_trans_do): Execute the division computing countm1 before the loop entry check. * gfortran.dg/pr42108.f90: Amend. Index: gcc/fortran/trans-stmt.c === --- gcc/fortran/trans-stmt.c(revision 218515) +++ gcc/fortran/trans-stmt.c(working copy) @@ -1645,15 +1645,15 @@ gfc_trans_do (gfc_code * code, tree exit This code is executed before we enter the loop body. We generate: if (step > 0) { +countm1 = (to - from) / step; if (to < from) goto exit_label; -countm1 = (to - from) / step; } else { +countm1 = (from - to) / -step; if (to > from) goto exit_label; -countm1 = (from - to) / -step; } */ @@ -1675,11 +1675,12 @@ gfc_trans_do (gfc_code * code, tree exit fold_build2_loc (loc, MINUS_EXPR, utype, tou, fromu), stepu); - pos = fold_build3_loc (loc, COND_EXPR, void_type_node, tmp, -fold_build1_loc (loc, GOTO_EXPR, void_type_node, - exit_label), -fold_build2 (MODIFY_EXPR, void_type_node, - countm1, tmp2)); + pos = build2 (COMPOUND_EXPR, void_type_node, + fold_build2 (MODIFY_EXPR, void_type_node, +countm1, tmp2), + build3_loc (loc, COND_EXPR, void_type_node, tmp, + build1_loc (loc, GOTO_EXPR, void_type_node, + exit_label), NULL_TREE)); /* For a negative step, when to > from, exit, otherwise compute countm1 = ((unsigned)from - (unsigned)to) / -(unsigned)step */ @@ -1688,11 +1689,12 @@ gfc_trans_do (gfc_code * code, tree exit fold_build2_loc (loc, MINUS_EXPR, utype, fromu, tou), fold_build1_loc (loc, NEGATE_EXPR, utype, stepu)); - neg = fold_build3_loc (loc, COND_EXPR, void_type_node, tmp, -fold_build1_loc (loc, GOTO_EXPR, void_type_node, - exit_label), -fold_build2 (MODIFY_EXPR, void_type_node, - countm1, tmp2)); + neg = build2 (COMPOUND_EXPR, void_type_node, + fold_build2 (MODIFY_EXPR, void_type_node, +countm1, tmp2), + build3_loc (loc, COND_EXPR, void_type_node, tmp, + build1_loc (loc, GOTO_EXPR, void_type_node, + exit_label), NULL_TREE)); tmp = fold_build2_loc (loc, LT_EXPR, boolean_type_node, step, build_int_cst (TREE_TYPE (step), 0)); Index: gcc/testsuite/gfortran.dg/pr42108.f90 === --- gcc/testsuite/gfortran.dg/pr42108.f90 (revision 218584) +++ gcc/testsuite/gfortran.dg/pr42108.f90 (working copy) @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O2 -fdump-tree-fre1" } +! { dg-options "-O2 -fdump-tree-fre1 -fdump-tree-pre-details" } subroutine eval(foo1,foo2,foo3,foo4,x,n,nnd) implicit real*8 (a-h,o-z) @@ -21,7 +21,9 @@ subroutine eval(foo1,foo2,foo3,foo4,x,n end do end subroutine eval +! We should have hoisted the division +! { dg-final { scan-tree-dump "in all uses of countm1\[^\n\]* / " "pre" } } ! There should be only one load from n left - ! { dg-final { scan-tree-dump-times "\\*n_" 1 "fre1" } } ! { dg-final { cleanup-tree-dump "fre1" } } +! { dg-final { cleanup-tree-dump "pre" } }
RE: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
Hi Christophe, Sorry to bother you again. After my clarification email below are you now happy for these patches to go in? Kind Regards, David Sherwood. > -Original Message- > From: David Sherwood [mailto:david.sherw...@arm.com] > Sent: 27 November 2014 14:53 > To: 'Christophe Lyon' > Cc: gcc-patches@gcc.gnu.org; Marcus Shawcroft; Alan Hayward; 'Tejas Belagod'; > Richard Sandiford > Subject: RE: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes > endianness-safe. > > > On 18 November 2014 10:14, David Sherwood wrote: > > > Hi Christophe, > > > > > > Ah sorry. My mistake - it fixes this in bugzilla: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 > > > > I did look at that PR, but since it has no testcase attached, I was unsure. > > And I am still not :-) > > PR 59810 is "[AArch64] LDn/STn implementations are not ABI-conformant > > for bigendian." > > but the advsimd-intrinsics/vldX.c and vldX_lane.c now PASS with Alan's > > patches on aarch64_be, so I thought Alan's patches solve PR59810. > > > > What am I missing? > > Hi Christophe, > > I think probably this is our fault for making our lives way too difficult and > artificially splitting all these patches up. :) > > Alan's patch: > > https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00952.html > > fixes some issues on aarch64_be, but also causes regressions. For example, > > > Tests that now fail, but worked before: > > aarch64_be-elf-aem: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects > execution test > aarch64_be-elf-aem: gcc.dg/vect/slp-perm-8.c execution test > aarch64_be-elf-aem: gcc.dg/vect/vect-over-widen-1-big-array.c -flto > -ffat-lto-objects execution test > ... > > Tests that now work, but didn't before: > > aarch64_be-elf-aem: gcc.dg/vect/fast-math-vect-complex-3.c execution test > aarch64_be-elf-aem: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c execution test > aarch64_be-elf-aem: gcc.dg/vect/no-scevccp-outer-10a.c execution test > ... > > > His patch is only half of the story and must be applied at the same time as > the > "[AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe." > patch. With both patches applied the result looks much healthier: > > > # Comparing 1 common sum files > ## /bin/sh ./src/gcc/contrib/compare_tests /tmp/gxx-sum1.10051 > /tmp/gxx-sum2.10051 > Tests that now work, but didn't before: > > aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer > execution test > aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer > -funroll-all-loops -finline- > functions execution test > aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer > -funroll-loops execution test > ... > > > with no new regressions. After applying both patches the aarch64_be gcc > testsuite is > on a parity with the aarch64 testsuite. Furthermore, after applying both of > these patches: > > "[AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe" > "[AArch64] [BE] Fix vector load/stores to not use ld1/st1" > > it then becomes safe for us to remove the CCMC macro, which is the cause of > unnecessary spills to the stack for certain auto-vectorised code. So really I > suppose when I posted my second patch > > "[AArch64] [BE] [2/2] Make large opaque integer modes endianness-safe" > > I should have really just called this > > "[AArch64] [BE] Remove CCMC for aarch64" > > in order to make it clear exactly what the purpose of these patches is. > > Kind Regards, > David Sherwood.
[PATCHv2] New check and updates in check_GNU_style script
Hi all, Attached patch adds new check (all blocks of 8 spaces are replaced with tabs) to contrib/check_GNU_style.sh. It also changes the script to allow reading patches from stdin and strengthens the "Dot, space, space, new sentence." check. Is this ok to commit? -Y >From c099086a7325d5feca28630be5a569a7de027c93 Mon Sep 17 00:00:00 2001 From: Yury Gribov Date: Thu, 11 Dec 2014 13:19:59 +0300 Subject: [PATCH] 2014-12-11 Yury Gribov check_GNU_style.sh: Support patches coming from stdin, check that spaces are converted to tabs and make double-space-after-dot check more precice. --- contrib/check_GNU_style.sh | 49 ++-- 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh index 5f90190..cf6081e 100755 --- a/contrib/check_GNU_style.sh +++ b/contrib/check_GNU_style.sh @@ -23,6 +23,8 @@ usage() { check_GNU_style.sh [patch]... Checks the patches for some of the GNU style formatting problems. +When FILE is -, read standard input. + Please note that these checks are not always accurate, and complete. The reference documentation of the GNU Coding Standards can be found here: http://www.gnu.org/prep/standards_toc.html @@ -35,19 +37,22 @@ EOF test $# -eq 0 && usage +inp=check_GNU_style.inp tmp=check_GNU_style.tmp # Remove $tmp on exit and various signals. -trap "rm -f $tmp" 0 -trap "rm -f $tmp ; exit 1" 1 2 3 5 9 13 15 +trap "rm -f $inp $tmp" 0 +trap "rm -f $inp $tmp ; exit 1" 1 2 3 5 9 13 15 + +grep -nH '^+' $* \ + | grep -v ':+++' \ + > $inp # Grep g (){ msg="$1" arg="$2" -shift 2 -grep -nH '^+' $* \ - | grep -v ':+++' \ +cat $inp \ | egrep --color=always -- "$arg" \ > $tmp && printf "\n$msg\n" cat $tmp @@ -58,9 +63,7 @@ ag (){ msg="$1" arg1="$2" arg2="$3" -shift 3 -grep -nH '^+' $* \ - | grep -v ':+++' \ +cat $inp \ | egrep --color=always -- "$arg1" \ | egrep --color=always -- "$arg2" \ > $tmp && printf "\n$msg\n" @@ -72,9 +75,7 @@ vg (){ msg="$1" varg="$2" arg="$3" -shift 3 -grep -nH '^+' $* \ - | grep -v ':+++' \ +cat $inp \ | egrep -v -- "$varg" \ | egrep --color=always -- "$arg" \ > $tmp && printf "\n$msg\n" @@ -83,9 +84,7 @@ vg (){ col (){ msg="$1" -shift 1 -grep -nH '^+' $* \ - | grep -v ':+++' \ +cat $inp \ | cut -f 2 -d '+' \ | awk '{ if (length ($0) > 80) print $0 }' \ > $tmp @@ -95,30 +94,32 @@ col (){ fi } -col 'Lines should not exceed 80 characters.' $* +col 'Lines should not exceed 80 characters.' + +g 'Blocks of 8 spaces should be replaced with tabs.' \ +' {8}' g 'Trailing whitespace.' \ -'[[:space:]]$' $* +'[[:space:]]$' g 'Space before dot.' \ -'[[:alnum:]][[:blank:]]+\.' $* +'[[:alnum:]][[:blank:]]+\.' g 'Dot, space, space, new sentence.' \ -'[[:alnum:]]\.([[:blank:]]|[[:blank:]]{3,})[[:alnum:]]' $* +'[[:alnum:]]\.([[:blank:]]|[[:blank:]]{3,})[A-Z0-9]' g 'Dot, space, space, end of comment.' \ -'[[:alnum:]]\.([[:blank:]]{0,1}|[[:blank:]]{3,})\*/' $* +'[[:alnum:]]\.([[:blank:]]{0,1}|[[:blank:]]{3,})\*/' g 'Sentences should end with a dot. Dot, space, space, end of the comment.' \ -'[[:alnum:]][[:blank:]]*\*/' $* +'[[:alnum:]][[:blank:]]*\*/' vg 'There should be exactly one space between function name and parentheses.' \ -'\#define' '[[:alnum:]]([[:blank:]]{2,})?\(' $* +'\#define' '[[:alnum:]]([[:blank:]]{2,})?\(' g 'There should be no space before closing parentheses.' \ -'[[:graph:]][[:blank:]]+\)' $* +'[[:graph:]][[:blank:]]+\)' ag 'Braces should be on a separate line.' \ -'\{' 'if[[:blank:]]\(|while[[:blank:]]\(|switch[[:blank:]]\(' $* - +'\{' 'if[[:blank:]]\(|while[[:blank:]]\(|switch[[:blank:]]\(' -- 1.7.9.5
Re: r218609 - in /trunk/gcc: ChangeLog common.opt d...
hubi...@gcc.gnu.org writes: > Author: hubicka > Date: Wed Dec 10 21:17:28 2014 > New Revision: 218609 > > URL: https://gcc.gnu.org/viewcvs?rev=218609&root=gcc&view=rev > Log: > * doc/invoke.texi: (-devirtualize-at-ltrans): Document. > * lto-cgraph.c (lto_output_varpool_node): Mark initializer as removed > when it is not streamed to the given ltrans. > (compute_ltrans_boundary): Make code adding all polymorphic > call targets conditional with !flag_wpa || flag_ltrans_devirtualize. > * common.opt (fdevirtualize-at-ltrans): New flag. /usr/local/gcc/gcc-20141211/gcc/testsuite/g++.dg/ipa/pr64059.C:56:1: internal compiler error: Segmentation fault. 0x40df742f crash_signal. ../../gcc/toplev.c:358. 0x412f2c9f get_binfo_at_offset(tree_node*, long, tree_node*). ../../gcc/tree.c:11922. 0x40a0d75f possible_polymorphic_call_targets(tree_node*, long, ipa_polymorphic_call_context, bool*, void**, bool). ../../gcc/ipa-devirt.c:2404. 0x40b6f2ef possible_polymorphic_call_targets(cgraph_edge*, bool*, void**, bool). ../../gcc/ipa-utils.h:109. 0x40b6f2ef compute_ltrans_boundary(lto_symtab_encoder_d*). ../../gcc/lto-cgraph.c:952. 0x40c40f2f ipa_write_summaries(bool). ../../gcc/passes.c:2511. 0x406584ff ipa_passes. ../../gcc/cgraphunit.c:2091. 0x406584ff symbol_table::compile(). ../../gcc/cgraphunit.c:2187. 0x4065be1f symbol_table::finalize_compilation_unit(). ../../gcc/cgraphunit.c:2340. 0x4029bcef cp_write_global_declarations(). ../../gcc/cp/decl2.c:4688. Please submit a full bug report,. with preprocessed source if appropriate.. Please include the complete backtrace with any bug report.. See <http://gcc.gnu.org/bugs.html> for instructions.. FAIL: g++.dg/ipa/pr64059.C -std=gnu++11 (internal compiler error) Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [gofrontend-dev] Re: [PATCH 00/13] Go closures, libffi, and the static chain
On Thu, Dec 11, 2014 at 07:51:44PM +1030, Alan Modra wrote: > On Thu, Dec 11, 2014 at 10:06:23AM +0100, Dominik Vogt wrote: > > On s390x, the static chain register cannot be used for passing the > > Go closure pointer to a function: According to the Abi, the > > dynamic linker is allowed to destroy the contents of r0 (static > > chain register) eventually causing a crash if libgo is linked > > dynamically. The assumption that the static chain register can be > > used to pass information to a function is wrong for s390x. > > I was worried about exactly the same "problem" on powerpc with r11 > being used for the static chain and also destroyed in linkage stubs. > It turns out we don't traverse any linkage stubs. Just to make this clear: It's not something that *might* happen. It *does* happen on s390[x] which does not use libffi but the hand written code in makefunc_s390.S and makefuncgo_s390[x].go. The same may not happen when calling functions through libffi (which may be dynamically linked) because ffi_call_go() is passed the closure pointer as an argument and not in the static chain register. > See https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00446.html. Thanks for the link. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [PATCH] TYPE_OVERFLOW_* cleanup
On Thu, 11 Dec 2014, Marek Polacek wrote: > On Wed, Dec 10, 2014 at 08:11:02PM +0100, Marc Glisse wrote: > > >+inline tree > > >+any_integral_type_check (tree __t, const char *__f, int __l, const char > > >*__g) > > >+{ > > >+ if (!(INTEGRAL_TYPE_P (__t) > > >+ || ((TREE_CODE (__t) == COMPLEX_TYPE > > >+ || VECTOR_TYPE_P (__t)) > > >+ && INTEGRAL_TYPE_P (TREE_TYPE (__t) > > >+tree_check_failed (__t, __f, __l, __g, BOOLEAN_TYPE, ENUMERAL_TYPE, > > >+ INTEGER_TYPE, 0); > > >+ return __t; > > >+} > > > > Is there a particular reason why you are avoiding ANY_INTEGRAL_TYPE_P in > > any_integral_type_check? > > No, I'm just blind ;). Changed in the following, thanks for looking > into this! > > Bootstrapped/regtested on x86_64-linux, ok for trunk? Ok. Thanks, Richard. > 2014-12-11 Marek Polacek > > * fold-const.c (fold_negate_expr): Add ANY_INTEGRAL_TYPE_P check. > (extract_muldiv_1): Likewise. > (maybe_canonicalize_comparison_1): Likewise. > (fold_comparison): Likewise. > (tree_binary_nonnegative_warnv_p): Likewise. > (tree_binary_nonzero_warnv_p): Likewise. > * gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise. > * tree-scalar-evolution.c (simple_iv): Likewise. > (scev_const_prop): Likewise. > * tree-ssa-loop-niter.c (expand_simple_operations): Likewise. > * tree-vect-generic.c (expand_vector_operation): Likewise. > * tree.h (ANY_INTEGRAL_TYPE_CHECK): Define. > (ANY_INTEGRAL_TYPE_P): Define. > (TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS): > Add ANY_INTEGRAL_TYPE_CHECK. > (any_integral_type_check): New function. > > diff --git gcc/fold-const.c gcc/fold-const.c > index 0d947ae..7b68bea 100644 > --- gcc/fold-const.c > +++ gcc/fold-const.c > @@ -558,7 +558,8 @@ fold_negate_expr (location_t loc, tree t) > case INTEGER_CST: >tem = fold_negate_const (t, type); >if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) > - || (!TYPE_OVERFLOW_TRAPS (type) > + || (ANY_INTEGRAL_TYPE_P (type) > + && !TYPE_OVERFLOW_TRAPS (type) > && TYPE_OVERFLOW_WRAPS (type)) > || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0) > return tem; > @@ -5951,7 +5952,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, > tree wide_type, > || EXPRESSION_CLASS_P (op0)) > /* ... and has wrapping overflow, and its type is smaller >than ctype, then we cannot pass through as widening. */ > - && ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)) > + && (((ANY_INTEGRAL_TYPE_P (TREE_TYPE (op0)) > + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))) > && (TYPE_PRECISION (ctype) > > TYPE_PRECISION (TREE_TYPE (op0 > /* ... or this is a truncation (t is narrower than op0), > @@ -5966,7 +5968,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, > tree wide_type, > /* ... or has undefined overflow while the converted to >type has not, we cannot do the operation in the inner type >as that would introduce undefined overflow. */ > - || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)) > + || ((ANY_INTEGRAL_TYPE_P (TREE_TYPE (op0)) > +&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))) > && !TYPE_OVERFLOW_UNDEFINED (type > break; > > @@ -8497,7 +8500,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum > tree_code code, tree type, > >/* Match A +- CST code arg1 and CST code arg1. We can change the > first form only if overflow is undefined. */ > - if (!((TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) > + if (!(((ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) >/* In principle pointers also have undefined overflow behavior, > but that causes problems elsewhere. */ >&& !POINTER_TYPE_P (TREE_TYPE (arg0)) > @@ -8712,7 +8716,9 @@ fold_comparison (location_t loc, enum tree_code code, > tree type, > >/* Transform comparisons of the form X +- C1 CMP C2 to X CMP C2 -+ C1. */ >if ((TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR) > - && (equality_code || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) > + && (equality_code > + || (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0 >&& TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST >&& !TREE_OVERFLOW (TREE_OPERAND (arg0, 1)) >&& TREE_CODE (arg1) == INTEGER_CST > @@ -9031,7 +9037,8 @@ fold_comparison (location_t loc, enum tree_code code, > tree type, > X CMP Y +- C2 +- C1 for signed X, Y. This is valid if > the resulting offset is smaller in absolute value than the > original one and has the same sign. */ > - if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) > + if (ANY_INTEGRAL_TYP
[C++ Patch] Mini maybe_warn_about_useless_cast clean up?
Hi, yesterday, while working on c++/60955 I noticed this comment and wondered if for 5 we want to do the below. Certainly passes testing on x86_64-linux. Thanks, Paolo. /// 2014-12-11 Paolo Carlini * typeck.c (maybe_warn_about_useless_cast): Remove unnecessary conditional. Index: typeck.c === --- typeck.c(revision 218619) +++ typeck.c(working copy) @@ -6363,12 +6364,6 @@ maybe_warn_about_useless_cast (tree type, tree exp if (warn_useless_cast && complain & tf_warning) { - /* In C++14 mode, this interacts badly with force_paren_expr. And it -isn't necessary in any mode, because the code below handles -glvalues properly. For 4.9, just skip it in C++14 mode. */ - if (cxx_dialect < cxx14 && REFERENCE_REF_P (expr)) - expr = TREE_OPERAND (expr, 0); - if ((TREE_CODE (type) == REFERENCE_TYPE && (TYPE_REF_IS_RVALUE (type) ? xvalue_p (expr) : real_lvalue_p (expr))
RE: RFC: PATCH to genericize C++ loops to LOOP_EXPR instead of gotos
Hi Jason, I managed to reproduce this fault now. and entered a bug tracker for it: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64265 any ideas how this patch could move the __tsan_func_entry into the loop? Thanks Bernd. On Wed, 10 Dec 2014 00:10:07, Bernd Edlinger wrote: > > Hi Jason, > >> I ran the tramp3d benchmark over 500 iterations before and after the > change and couldn't see any measurable difference in runtime. The > binary with my >> change was 0.4% smaller. > I'm going to go ahead and check it in; if a performance hit shows up on > the automated testing we can revisit the choice. > > Unfortunately, this checkin broke the thread sanitizer: > > r217669 | jason | 2014-11-17 20:08:02 +0100 (Mo, 17. Nov 2014) | 2 Zeilen > > * cp-gimplify.c (genericize_cp_loop): Use LOOP_EXPR. > (genericize_for_stmt): Handle null statement-list. > > > therefore I would kindly ask you to revert this again. > > After that patch some C++ functions moved the call to the __tsan_func_entry > into the loop. > And we get crashes or major memory leaks from this, if the software is > compiled with -fsanitize=thread. > > This happens in a software package on which I currently work. It is called > Softing OPC UA Toolbox. > > I found this in the generated assembler code by bisection: > > 0092747d > <_ZNSt12_Destroy_auxILb0EE9__destroyIPN18SoftingOPCToolbox55ValueEEEvT_S5_>: > 92747d: 55 push %rbp > 92747e: 48 89 e5mov%rsp,%rbp > 927481: 53 push %rbx > 927482: 48 83 ec 18 sub$0x18,%rsp > 927486: 48 89 7d e8 mov%rdi,-0x18(%rbp) > 92748a: 48 89 75 e0 mov%rsi,-0x20(%rbp) > 92748e: 48 8b 45 08 mov0x8(%rbp),%rax > 927492: 48 89 c7mov%rax,%rdi > 927495: e8 26 33 fe ff callq 90a7c0 <__tsan_func_entry@plt> > 92749a: 48 8b 45 e8 mov-0x18(%rbp),%rax > 92749e: 48 3b 45 e0 cmp-0x20(%rbp),%rax > 9274a2: 74 3d je 9274e1 > <_ZNSt12_Destroy_auxILb0EE9__destroyIPN18SoftingOPCToolbox55ValueEEEvT_S5_+0x64> > 9274a4: 48 8b 5d e8 mov-0x18(%rbp),%rbx > 9274a8: 48 89 d8mov%rbx,%rax > 9274ab: 48 85 dbtest %rbx,%rbx > 9274ae: 74 0b je 9274bb > <_ZNSt12_Destroy_auxILb0EE9__destroyIPN18SoftingOPCToolbox55ValueEEEvT_S5_+0x3e> > 9274b0: 48 89 c2mov%rax,%rdx > 9274b3: 83 e2 07and$0x7,%edx > 9274b6: 48 85 d2test %rdx,%rdx > 9274b9: 74 0f je 9274ca > <_ZNSt12_Destroy_auxILb0EE9__destroyIPN18SoftingOPCToolbox55ValueEEEvT_S5_+0x4d> > 9274bb: 48 89 c6mov%rax,%rsi > 9274be: 48 8d 3d 1b a3 f8 00lea0xf8a31b(%rip),%rdi# > 18b17e0 > 9274c5: e8 06 36 fe ff callq 90aad0 > <__ubsan_handle_type_mismatch@plt> > 9274ca: 48 89 dfmov%rbx,%rdi > 9274cd: e8 1b 09 00 00 callq 927ded > <_ZSt11__addressofIN18SoftingOPCToolbox55ValueEEPT_RS2_> > 9274d2: 48 89 c7mov%rax,%rdi > 9274d5: e8 21 09 00 00 callq 927dfb > <_ZSt8_DestroyIN18SoftingOPCToolbox55ValueEEvPT_> > 9274da: 48 83 45 e8 20 addq $0x20,-0x18(%rbp) > 9274df: eb ad jmp92748e > <_ZNSt12_Destroy_auxILb0EE9__destroyIPN18SoftingOPCToolbox55ValueEEEvT_S5_+0x11> > 9274e1: e8 da 2c fe ff callq 90a1c0 <__tsan_func_exit@plt> > 9274e6: 48 83 c4 18 add$0x18,%rsp > 9274ea: 5b pop%rbx > 9274eb: 5d pop%rbp > 9274ec: c3 retq > 9274ed: 90 nop > > > see the jmp at 9274df: it jumps to _before_ the tsan_func_entry. > I am not sure how to locate the source code of the above assembler section. > But I'd guess, it must be some kind of automatically generated default > destructor. > > All I can say in the moment, that it is was working perfectly before Nov 17. > > > Thanks > Bernd. >
Re: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.
On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng wrote: > On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener > wrote: >> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng wrote: >>> Hi, >>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt >>> issue still exists. >>> >>> Current candidate selecting algorithm tends to select fewer candidates given >>> below reasons: >>> 1) to better handle loops with many induction uses but the best choice is >>> one generic basic induction variable; >>> 2) to keep compilation time low. >>> >>> One fundamental weakness of the strategy is the opposite situation can't be >>> handled properly sometimes. For these cases the best choice is each >>> induction variable has its own candidate. >>> This patch fixes the problem by shuffling candidate set after fix-point is >>> reached by current implementation. The reason why this strategy works is it >>> replaces candidate set by selecting local optimal candidate for some >>> induction uses, and the new candidate set (has lower cost) is exact what we >>> want in the mentioned case. Instrumentation data shows this can find better >>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >>> >>> This patch actually is extension to the first version patch posted at >>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >>> another selecting pass with special seed set (more or less like the shuffled >>> set in this patch). Data also confirms this patch can find optimal sets for >>> most loops found by the first one, as well as optimal sets for many new >>> loops. >>> >>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >>> test on aarch64. >>> Since this patch only selects candidate set with lower cost, any regressions >>> revealed are latent bugs of other components in GCC. >>> I also collected GCC bootstrap time on x86_64, no regression either. >>> Is this OK? >> >> The algorithm seems to be quadratic in the number of IV candidates >> (at least): > Yes, I worried about that too, that's why I measured the bootstrap > time. One way is restrict this procedure one time for each loop. I > already tried that and it can capture +90% loops. Is this sounds > reasonable? Yes. That's my suggestion to handle it in the caller of try_improve_iv_set? > BTW, do we have some compilation time benchmarks for GCC? There are various testcases linked from PR47344, I don't remember any particular one putting load on IVOPTs (but I do remember seeing IVOPTs in the ~25% area in -ftime-report for some testcases). Thanks, Richard. > Thanks, > bin >> >> + for (i = 0; i < n_iv_cands (data); i++) >> + { >> ... >> + iv_ca_replace (data, ivs, cand, act_delta, &tmp_delta); >> ... >> >> and >> >> +static void >> +iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs, >> + struct iv_cand *cand, struct iv_ca_delta *act_delta, >> + struct iv_ca_delta **delta) >> +{ >> ... >> + for (i = 0; i < ivs->upto; i++) >> +{ >> ... >> + if (data->consider_all_candidates) >> + { >> + for (j = 0; j < n_iv_cands (data); j++) >> + { >> >> possibly cubic if ivs->upto is of similar value. >> >> I wonder if it is possible to restrict this to the single IV with >> the largest delta? After all we are iterating try_improve_iv_set. >> Alternatively move the handling out of iteration completey, >> thus into the caller of try_improve_iv_set? >> >> Note that compile-time issues always arise in auto-generated code, >> not during GCC bootstrap. >> >> Richard. >> >> >>> 2014-12-03 Bin Cheng bin.ch...@arm.com >>> >>> PR tree-optimization/62178 >>> * tree-ssa-loop-ivopts.c (iv_ca_replace): New function. >>> (try_improve_iv_set): Shuffle candidates set in order to handle >>> case in which candidate wrto each iv use should be selected. >>> >>> gcc/testsuite/ChangeLog >>> 2014-12-03 Bin Cheng bin.ch...@arm.com >>> >>> PR tree-optimization/62178 >>> * gcc.target/aarch64/pr62178.c: New test.
Re: [gofrontend-dev] Re: [PATCH 00/13] Go closures, libffi, and the static chain
On Thu, Dec 11, 2014 at 11:31:06AM +0100, Dominik Vogt wrote: > Just to make this clear: It's not something that *might* happen. > It *does* happen on s390[x] which does not use libffi but the hand > written code in makefunc_s390.S and makefuncgo_s390[x].go. > > The same may not happen when calling functions through libffi > (which may be dynamically linked) because ffi_call_go() is passed > the closure pointer as an argument and not in the static chain > register. Update: If I disable the custom s390x code and switch to the implementation just using libffi for reflection calls, the same crash occurs with the testing/quick libgo test case. The called function sees a bogus value written by the synamic linker as the closure pointer, for example with this line in the test code: CheckEqual(fComplex64, fComplex64, nil) Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
PR64182: Fix rounding division and modulus
As pointed out in PR 64182, wide-int rounded division gets the ties-away-from-zero case wrong for odd-numbered dividends, while double_int gets the unsigned case wrong by unconditionally treating a dividend or remainder with the top bit set as negative. As Jakub says, the test used in double_int might also have overflow problems. This patch uses: abs (remainder) >= abs (dividend) - abs (remainder) for both wide-int and double_int and fixes the unsigned case in double_int. I didn't know how to test the double_int change using input code so resorted to doing some double_int arithmetic at the start of main. Thanks to Joseph for the testcase. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ PR middle-end/64182 * wide-int.h (wi::div_round, wi::mod_round): Fix rounding of tied cases. * double-int.c (div_and_round_double): Fix handling of unsigned cases. Use same rounding approach as wide-int.h. gc/testsuite/ 2014-xx-xx Joseph Myers PR middle-end/64182 * gnat.dg/round_div.adb: New test. Index: gcc/double-int.c === --- gcc/double-int.c2014-12-11 10:45:44.430786435 + +++ gcc/double-int.c2014-12-11 10:46:10.570461030 + @@ -569,24 +569,23 @@ div_and_round_double (unsigned code, int { unsigned HOST_WIDE_INT labs_rem = *lrem; HOST_WIDE_INT habs_rem = *hrem; - unsigned HOST_WIDE_INT labs_den = lden, ltwice; - HOST_WIDE_INT habs_den = hden, htwice; + unsigned HOST_WIDE_INT labs_den = lden, lnegabs_rem, ldiff; + HOST_WIDE_INT habs_den = hden, hnegabs_rem, hdiff; /* Get absolute values. */ - if (*hrem < 0) + if (!uns && *hrem < 0) neg_double (*lrem, *hrem, &labs_rem, &habs_rem); - if (hden < 0) + if (!uns && hden < 0) neg_double (lden, hden, &labs_den, &habs_den); - /* If (2 * abs (lrem) >= abs (lden)), adjust the quotient. */ - mul_double ((HOST_WIDE_INT) 2, (HOST_WIDE_INT) 0, - labs_rem, habs_rem,= abs(den) - abs(rem), adjust the quotient. */ + neg_double (labs_rem, habs_rem, &lnegabs_rem, &hnegabs_rem); + add_double (labs_den, habs_den, lnegabs_rem, hnegabs_rem, + &ldiff, &hdiff); - if (((unsigned HOST_WIDE_INT) habs_den -< (unsigned HOST_WIDE_INT) htwice) - || (((unsigned HOST_WIDE_INT) habs_den -== (unsigned HOST_WIDE_INT) htwice) - && (labs_den <= ltwice))) + if (((unsigned HOST_WIDE_INT) habs_rem +> (unsigned HOST_WIDE_INT) hdiff) + || (habs_rem == hdiff && labs_rem >= ldiff)) { if (quo_neg) /* quo = quo - 1; */ Index: gcc/testsuite/gnat.dg/round_div.adb === --- /dev/null 2014-11-19 08:41:51.310561007 + +++ gcc/testsuite/gnat.dg/round_div.adb 2014-12-11 10:46:10.570461030 + @@ -0,0 +1,17 @@ +-- { dg-do run } +-- { dg-options "-O3" } +procedure Round_Div is + type Fixed is delta 1.0 range -2147483648.0 .. 2147483647.0; + A : Fixed := 1.0; + B : Fixed := 3.0; + C : Integer; + function Divide (X, Y : Fixed) return Integer is + begin + return Integer (X / Y); + end; +begin + C := Divide (A, B); + if C /= 0 then + raise Program_Error; + end if; +end Round_Div; Index: gcc/wide-int.h === --- gcc/wide-int.h 2014-12-11 10:45:44.434786385 + +++ gcc/wide-int.h 2014-12-11 10:46:10.570461030 + @@ -2616,8 +2616,8 @@ wi::div_round (const T1 &x, const T2 &y, { if (sgn == SIGNED) { - if (wi::ges_p (wi::abs (remainder), -wi::lrshift (wi::abs (y), 1))) + WI_BINARY_RESULT (T1, T2) abs_remainder = wi::abs (remainder); + if (wi::geu_p (abs_remainder, wi::abs (y) - abs_remainder)) { if (wi::neg_p (x, sgn) != wi::neg_p (y, sgn)) return quotient - 1; @@ -2627,7 +2627,7 @@ wi::div_round (const T1 &x, const T2 &y, } else { - if (wi::geu_p (remainder, wi::lrshift (y, 1))) + if (wi::geu_p (remainder, y - remainder)) return quotient + 1; } } @@ -2784,8 +2784,8 @@ wi::mod_round (const T1 &x, const T2 &y, { if (sgn == SIGNED) { - if (wi::ges_p (wi::abs (remainder), -wi::lrshift (wi::abs (y), 1))) + WI_BINARY_RESULT (T1, T2) abs_remainder = wi::abs (remainder); + if (wi::geu_p (abs_remainder, wi::abs (y) - abs_remainder)) { if (wi::neg_p (x, sgn) != wi::neg_p (y, sgn)) return remainder + y; @@ -2795,7 +2795,7 @@ wi::mod_round (const T1 &x, const T2 &y, } else { -
Re: [PATCH][ARM] Fix names of some rounding intrinsics, impement vrndx_f32 and vrndxq_f32
On Tue, Sep 23, 2014 at 4:07 PM, Kyrill Tkachov wrote: > Hi all, > > Some intrinsics had the wrong name (inconsistent with the NEON intrinsics > spec). This patch fixes that and adds the vrndx_f32 and vrndxq_f32 > intrinsics that were missing. > These map down to vrintx.f32 NEON instructions (d and q forms). We already > had builtins defined for them, just the intrinsics were not wired up to them > properly. > > Tested arm-none-eabi > > Ok for trunk? This is OK if no regressions. Ramana > > 2014-09-23 Kyrylo Tkachov > > * config/arm/arm_neon.h (vrndqn_f32): Rename to... > (vrndnq_f32): ... this. > (vrndqa_f32): Rename to... > (vrndaq_f32): ... this. > (vrndqp_f32): Rename to... > (vrndpq_f32): ... this. > (vrndqm_f32): Rename to... > (vrndmq_f32): ... this. > (vrndx_f32): New intrinsic. > (vrndxq_f32): Likewise. > > 2014-09-23 Kyrylo Tkachov > > * gcc.target/arm/simd/neon-vrndx_f32_1.c: New test. > * gcc.target/arm/simd/neon-vrndxq_f32_1.c: Likewise. > * gcc.target/arm/neon/vrndqaf32.c: Rename to... > * gcc.target/arm/neon/vrndaqf32.c: ... This. Update intrinsic names. > * gcc.target/arm/neon/vrndqmf32.c: Rename to... > * gcc.target/arm/neon/vrndmqf32.c: ... This. Update intrinsic names. > * gcc.target/arm/neon/vrndqnf32.c: Rename to... > * gcc.target/arm/neon/vrndnqf32.c: ... This. Update intrinsic names. > * gcc.target/arm/neon/vrndqpf32.c: Rename to... > * gcc.target/arm/neon/vrndpqf32.c: ... This. Update intrinsic names.
Re: [PATCH] IPA ICF: refactoring + fix for PR ipa/63569
On Wed, Dec 10, 2014 at 1:18 PM, Martin Liška wrote: > Hello. > > As suggested by Richard, I split compare_operand functions to various > functions > related to a specific comparison. Apart from that I added fast check for > volatility flag that caused miscompilation mentioned in PR63569. > > Patch can bootstrap on x86_64-linux-pc without any regression seen and I was > able to build Firefox with LTO. > > Ready for trunk? Hmm, I don't think the dispatch to compare_memory_operand is at the correct place. It should be called from places where currently compare_operand is called and it should recurse to compare_operand. That is, it is more "high-level". Can you please fix the volatile issue separately? It's also not necessary to do that check on every operand but just on memory operands. Thanks, Richard. > Thanks, > Martin
Re: PR64182: Fix rounding division and modulus
On Thu, Dec 11, 2014 at 1:26 PM, Richard Sandiford wrote: > As pointed out in PR 64182, wide-int rounded division gets the > ties-away-from-zero case wrong for odd-numbered dividends, while > double_int gets the unsigned case wrong by unconditionally treating > a dividend or remainder with the top bit set as negative. As Jakub > says, the test used in double_int might also have overflow problems. > > This patch uses: > >abs (remainder) >= abs (dividend) - abs (remainder) > > for both wide-int and double_int and fixes the unsigned case in double_int. > I didn't know how to test the double_int change using input code so > resorted to doing some double_int arithmetic at the start of main. > > Thanks to Joseph for the testcase. > > Tested on x86_64-linux-gnu. OK to install? Can you add a testcase? You can follow the gcc.dg/plugin/sreal_plugin.c example, maybe even make it a generic host_test_plugin.c with separate files containing the actual tests. Otherwise ok. Thanks, Richard. > Thanks, > Richard > > > gcc/ > PR middle-end/64182 > * wide-int.h (wi::div_round, wi::mod_round): Fix rounding of tied > cases. > * double-int.c (div_and_round_double): Fix handling of unsigned > cases. Use same rounding approach as wide-int.h. > > gc/testsuite/ > 2014-xx-xx Joseph Myers > > PR middle-end/64182 > * gnat.dg/round_div.adb: New test. > > Index: gcc/double-int.c > === > --- gcc/double-int.c2014-12-11 10:45:44.430786435 + > +++ gcc/double-int.c2014-12-11 10:46:10.570461030 + > @@ -569,24 +569,23 @@ div_and_round_double (unsigned code, int >{ > unsigned HOST_WIDE_INT labs_rem = *lrem; > HOST_WIDE_INT habs_rem = *hrem; > - unsigned HOST_WIDE_INT labs_den = lden, ltwice; > - HOST_WIDE_INT habs_den = hden, htwice; > + unsigned HOST_WIDE_INT labs_den = lden, lnegabs_rem, ldiff; > + HOST_WIDE_INT habs_den = hden, hnegabs_rem, hdiff; > > /* Get absolute values. */ > - if (*hrem < 0) > + if (!uns && *hrem < 0) > neg_double (*lrem, *hrem, &labs_rem, &habs_rem); > - if (hden < 0) > + if (!uns && hden < 0) > neg_double (lden, hden, &labs_den, &habs_den); > > - /* If (2 * abs (lrem) >= abs (lden)), adjust the quotient. */ > - mul_double ((HOST_WIDE_INT) 2, (HOST_WIDE_INT) 0, > - labs_rem, habs_rem,+ /* If abs(rem) >= abs(den) - abs(rem), adjust the quotient. */ > + neg_double (labs_rem, habs_rem, &lnegabs_rem, &hnegabs_rem); > + add_double (labs_den, habs_den, lnegabs_rem, hnegabs_rem, > + &ldiff, &hdiff); > > - if (((unsigned HOST_WIDE_INT) habs_den > -< (unsigned HOST_WIDE_INT) htwice) > - || (((unsigned HOST_WIDE_INT) habs_den > -== (unsigned HOST_WIDE_INT) htwice) > - && (labs_den <= ltwice))) > + if (((unsigned HOST_WIDE_INT) habs_rem > +> (unsigned HOST_WIDE_INT) hdiff) > + || (habs_rem == hdiff && labs_rem >= ldiff)) > { > if (quo_neg) > /* quo = quo - 1; */ > Index: gcc/testsuite/gnat.dg/round_div.adb > === > --- /dev/null 2014-11-19 08:41:51.310561007 + > +++ gcc/testsuite/gnat.dg/round_div.adb 2014-12-11 10:46:10.570461030 + > @@ -0,0 +1,17 @@ > +-- { dg-do run } > +-- { dg-options "-O3" } > +procedure Round_Div is > + type Fixed is delta 1.0 range -2147483648.0 .. 2147483647.0; > + A : Fixed := 1.0; > + B : Fixed := 3.0; > + C : Integer; > + function Divide (X, Y : Fixed) return Integer is > + begin > + return Integer (X / Y); > + end; > +begin > + C := Divide (A, B); > + if C /= 0 then > + raise Program_Error; > + end if; > +end Round_Div; > Index: gcc/wide-int.h > === > --- gcc/wide-int.h 2014-12-11 10:45:44.434786385 + > +++ gcc/wide-int.h 2014-12-11 10:46:10.570461030 + > @@ -2616,8 +2616,8 @@ wi::div_round (const T1 &x, const T2 &y, > { >if (sgn == SIGNED) > { > - if (wi::ges_p (wi::abs (remainder), > -wi::lrshift (wi::abs (y), 1))) > + WI_BINARY_RESULT (T1, T2) abs_remainder = wi::abs (remainder); > + if (wi::geu_p (abs_remainder, wi::abs (y) - abs_remainder)) > { > if (wi::neg_p (x, sgn) != wi::neg_p (y, sgn)) > return quotient - 1; > @@ -2627,7 +2627,7 @@ wi::div_round (const T1 &x, const T2 &y, > } >else > { > - if (wi::geu_p (remainder, wi::lrshift (y, 1))) > + if (wi::geu_p (remainder, y - remainder)) > return quotient + 1; > } > } > @@ -2784,8 +2784,8 @@ wi::mod_round (const T1 &x, const T2 &y, >
[PATCH] [AArch64, NEON] Add vfms_n_f32, vfmsq_n_f32 and vfmsq_n_f64 specified by the ACLE
Hi, This patch add three intrinsics that are required by the ACLE specification. A new testcase is added which covers vfms_n_f32 and vfmsq_n_f32. Tested on both aarch64-linux-gnu and aarch64_be-linux-gnu. OK? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218582) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-12-11 Felix Yang + + * config/aarch64/arm_neon.h (vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64): New + intrinsics. + 2014-12-10 Felix Yang * config/aarch64/aarch64-protos.h (aarch64_function_profiler): Remove Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_n.c === --- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_n.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_n.c (revision 0) @@ -0,0 +1,67 @@ +#include +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +#ifdef __aarch64__ +/* Expected results. */ +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x4438ca3d, 0x44390a3d }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8, 0x4486deb8, 0x4486feb8 }; + +#define VECT_VAR_ASSIGN(S,Q,T1,W) S##Q##_##T1##W +#define ASSIGN(S, Q, T, W, V) T##W##_t S##Q##_##T##W = V +#define TEST_MSG "VFMS_N/VFMSQ_N" + +void exec_vfms_n (void) +{ + /* Basic test: v4=vfms_n(v1,v2), then store the result. */ +#define TEST_VFMS(Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +vfms##Q##_n_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N),\ + VECT_VAR_ASSIGN(scalar, Q, T1, W)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define CHECK_VFMS_RESULTS(test_name,comment) \ + {\ +CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment); \ +CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment); \ + } + +#define DECL_VABD_VAR(VAR) \ + DECL_VARIABLE(VAR, float, 32, 2);\ + DECL_VARIABLE(VAR, float, 32, 4);\ + + DECL_VABD_VAR(vector1); + DECL_VABD_VAR(vector2); + DECL_VABD_VAR(vector3); + DECL_VABD_VAR(vector_res); + + clean_results (); + + /* Initialize input "vector1" from "buffer". */ + VLOAD(vector1, buffer, , float, f, 32, 2); + VLOAD(vector1, buffer, q, float, f, 32, 4); + + /* Choose init value arbitrarily. */ + VDUP(vector2, , float, f, 32, 2, -9.3f); + VDUP(vector2, q, float, f, 32, 4, -29.7f); + + /* Choose init value arbitrarily. */ + ASSIGN(scalar, , float, 32, 81.2f); + ASSIGN(scalar, q, float, 32, 36.8f); + + /* Execute the tests. */ + TEST_VFMS(, float, f, 32, 2); + TEST_VFMS(q, float, f, 32, 4); + + CHECK_VFMS_RESULTS (TEST_MSG, ""); +} +#endif + +int main (void) +{ +#ifdef __aarch64__ + exec_vfms_n (); +#endif + return 0; +} Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 218582) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,7 @@ +2014-12-08 Felix Yang + + * gcc.target/aarch64/advsimd-intrinsics/vfms_n.c: New test. + 2014-12-10 Martin Liska * gcc.dg/ipa/pr63909.c: New test. Index: gcc/config/aarch64/arm_neon.h === --- gcc/config/aarch64/arm_neon.h (revision 218582) +++ gcc/config/aarch64/arm_neon.h (working copy) @@ -15254,7 +15254,24 @@ vfmsq_f64 (float64x2_t __a, float64x2_t __b, float return __builtin_aarch64_fmav2df (-__b, __c, __a); } +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vfms_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c) +{ + return __builtin_aarch64_fmav2sf (-__b, vdup_n_f32 (__c), __a); +} +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vfmsq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c) +{ + return __builtin_aarch64_fmav4sf (-__b, vdupq_n_f32 (__c), __a); +} + +__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) +vfmsq_n_f64 (float64x2_t __a, float64x2_t __b, float64_t __c) +{ + return __builtin_aarch64_fmav2df (-__b, vdupq_n_f64 (__c), __a); +} + /* vfms_lane */ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) add-vfms_n-v1.diff Description: add-vfms_n-v1.diff
Re: [PATCH] PR other/63613: Add fixincludes for dejagnu.h
David Malcolm writes: > On Mon, 2014-12-08 at 14:13 +0100, Rainer Orth wrote: >> Jeff Law writes: >> >> > On 12/04/14 15:42, Rainer Orth wrote: >> >> David Malcolm writes: >> >> >> >>> assumed -fgnu89-inline until a recent upstream fix; >> >>> see http://lists.gnu.org/archive/html/dejagnu/2014-10/msg00011.html >> >>> >> >>> Remove the workaround from jit.exp that used -fgnu89-inline >> >>> in favor of a fixincludes to dejagnu.h that applies the upstream fix >> >>> to a local copy. >> >>> >> >>> This should make it easier to support C++ testcases from jit.exp. >> >> >> >> I wonder how this would work if dejagnu.h doesn't live in a system >> >> include dir (e.g. a self-compiled version)? fixincludes won't touch >> >> those AFAIU. The previous version with -fgnu89-inline would still work >> >> in that case provided dejagnu.h is found at all. >> > Presumably in that case the answer is upgrade dejagnu? :-) >> >> I've two problems with this: >> >> * There's not yet a DejaGnu release available with the fix and I've no >> idea if there are any planned any time soon. Not everyone is >> comfortable with random git (or whatever) snapshots. > > FWIW I've asked on the DejaGnu mailing list, and Ben Elliston said: >> Yes. I plan on releasing 1.6 over the holidays. > http://lists.gnu.org/archive/html/dejagnu/2014-12/msg1.html Thanks for checking this, but ... >> * I don't consider this a critical issue that cannot work without >> current releases. We're already working around several upstream >> DejaGnu issues in our codebase, and I don't consider this particular >> one important enough to require everyone to upgrade to a not-a-release >> version. ... a DejaGnu 1.6 release would only address one part of my concern: I still don't believe this minor issues warrants us demanding all gcc testers upgrading to a newer DejaGnu release. I'd like my fellow testsuite maintainers to weigh in, though. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
On 11 December 2014 at 11:16, David Sherwood wrote: > Hi Christophe, > > Sorry to bother you again. After my clarification email below are you now > happy for these patches to go in? > > Kind Regards, > David Sherwood. > >> -Original Message- >> From: David Sherwood [mailto:david.sherw...@arm.com] >> Sent: 27 November 2014 14:53 >> To: 'Christophe Lyon' >> Cc: gcc-patches@gcc.gnu.org; Marcus Shawcroft; Alan Hayward; 'Tejas >> Belagod'; Richard Sandiford >> Subject: RE: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes >> endianness-safe. >> >> > On 18 November 2014 10:14, David Sherwood wrote: >> > > Hi Christophe, >> > > >> > > Ah sorry. My mistake - it fixes this in bugzilla: >> > > >> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 >> > >> > I did look at that PR, but since it has no testcase attached, I was unsure. >> > And I am still not :-) >> > PR 59810 is "[AArch64] LDn/STn implementations are not ABI-conformant >> > for bigendian." >> > but the advsimd-intrinsics/vldX.c and vldX_lane.c now PASS with Alan's >> > patches on aarch64_be, so I thought Alan's patches solve PR59810. >> > >> > What am I missing? >> >> Hi Christophe, >> >> I think probably this is our fault for making our lives way too difficult and >> artificially splitting all these patches up. :) >> >> Alan's patch: >> >> https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00952.html >> >> fixes some issues on aarch64_be, but also causes regressions. For example, >> >> >> Tests that now fail, but worked before: >> >> aarch64_be-elf-aem: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects >> execution test >> aarch64_be-elf-aem: gcc.dg/vect/slp-perm-8.c execution test >> aarch64_be-elf-aem: gcc.dg/vect/vect-over-widen-1-big-array.c -flto >> -ffat-lto-objects execution test >> ... >> >> Tests that now work, but didn't before: >> >> aarch64_be-elf-aem: gcc.dg/vect/fast-math-vect-complex-3.c execution test >> aarch64_be-elf-aem: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c execution test >> aarch64_be-elf-aem: gcc.dg/vect/no-scevccp-outer-10a.c execution test >> ... >> I didn't notice that because I tested Alan's patch only against the advsimd-intrinsics tests. In this respect, I don't understand why your ChangeLog entry says * config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i, vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. since the existing advsimd-intrinsics tests already pass with Alan's patch alone or is vld1_lane still broken (for which I haven't posted a test yet)? >> His patch is only half of the story and must be applied at the same time as >> the >> "[AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe." >> patch. With both patches applied the result looks much healthier: >> >> >> # Comparing 1 common sum files >> ## /bin/sh ./src/gcc/contrib/compare_tests /tmp/gxx-sum1.10051 >> /tmp/gxx-sum2.10051 >> Tests that now work, but didn't before: >> >> aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer >> execution test >> aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer >> -funroll-all-loops -finline- >> functions execution test >> aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer >> -funroll-loops execution test >> ... >> >> >> with no new regressions. After applying both patches the aarch64_be gcc >> testsuite is >> on a parity with the aarch64 testsuite. Furthermore, after applying both of >> these patches: >> >> "[AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe" >> "[AArch64] [BE] Fix vector load/stores to not use ld1/st1" >> >> it then becomes safe for us to remove the CCMC macro, which is the cause of >> unnecessary spills to the stack for certain auto-vectorised code. So really I >> suppose when I posted my second patch >> >> "[AArch64] [BE] [2/2] Make large opaque integer modes endianness-safe" >> >> I should have really just called this >> >> "[AArch64] [BE] Remove CCMC for aarch64" >> >> in order to make it clear exactly what the purpose of these patches is. well, not yet since this very does not remove it :-) >> >> Kind Regards, >> David Sherwood. > > > >
[PATCH] Fix for PR ipa/64146
Hello. In PR64146, for position independent code IPA ICF should be more careful about thunk creation. Patch can bootstrap on x86_64-linux-pc and no new regression was seen. Ready for thunk? Thank you, Martin >From e57dbf95cf27c2d5da2322ee75dca6361ab59c8a Mon Sep 17 00:00:00 2001 From: mliska Date: Wed, 10 Dec 2014 14:46:28 +0100 Subject: [PATCH] IPA ICF: Fix for PR ipa/64146 gcc/ChangeLog: 2014-12-10 Martin Liska PR ipa/64146 * ipa-icf.c (sem_function::merge): Check for decl_binds_to_current_def_p is newly added to merge operation. gcc/testsuite/ChangeLog: 2014-12-10 Martin Liska * g++.dg/ipa/pr64146.C: New test. --- gcc/ipa-icf.c | 8 gcc/testsuite/g++.dg/ipa/pr64146.C | 37 + 2 files changed, 45 insertions(+) create mode 100644 gcc/testsuite/g++.dg/ipa/pr64146.C diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index b193200..91878b2 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -101,6 +101,7 @@ along with GCC; see the file COPYING3. If not see #include #include "ipa-icf-gimple.h" #include "ipa-icf.h" +#include "varasm.h" using namespace ipa_icf_gimple; @@ -624,6 +625,13 @@ sem_function::merge (sem_item *alias_item) return false; } + if (!decl_binds_to_current_def_p (alias->decl)) +{ + if (dump_file) + fprintf (dump_file, "Declaration does not bind to currect definition.\n\n"); + return false; +} + if (redirect_callers) { /* If alias is non-overwritable then diff --git a/gcc/testsuite/g++.dg/ipa/pr64146.C b/gcc/testsuite/g++.dg/ipa/pr64146.C new file mode 100644 index 000..90c5093 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr64146.C @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-fpic -fdump-ipa-icf-details -fipa-icf" } */ + +extern "C" const char* +foo() +{ + return "original"; +} + +const char* +test_foo() +{ + return foo(); +} + +extern "C" const char* +bar() +{ + return "original"; +} + +const char* +test_bar() +{ + return bar(); +} + +int main (int argc, char **argv) +{ + test_foo (); + test_bar (); + + return 0; +} + +/* { dg-final { scan-ipa-dump-times "Declaration does not bind to currect definition." 2 "icf" } } */ +/* { dg-final { scan-ipa-dump "Equal symbols: 2" "icf" } } */ -- 2.1.2
Re: [PATCH] Fix for PR ipa/64146
On Thu, Dec 11, 2014 at 2:49 PM, Martin Liška wrote: > Hello. > > In PR64146, for position independent code IPA ICF should be more careful > about thunk creation. > Patch can bootstrap on x86_64-linux-pc and no new regression was seen. > > Ready for thunk? Hmm, does that merge the functions but keep a call to the original alias which can be overridden at runtime? If so, ok. Thanks, Richard. > Thank you, > Martin
Re: [C++ Patch] Mini maybe_warn_about_useless_cast clean up?
OK. Jason
Re: [patch] Fix tilepro includes
On 12/08/2014 11:23 AM, Jan-Benedict Glaw wrote: On Fri, 2014-11-21 08:45:11 -0500, Andrew MacLeod wrote: During the flattening of optabs.h, I updated all the config/* files which were affected. I've been getting spurious failures with config-list.mk where my changes would "disappear" and tracked down why. I was blissfully unaware that the tilepro ports mul-tables.c file is actually generated from gen-mul-tables.cc. This patch fixes the include issue by adding "#include insn-codes.h" to the generated files. I also added a comment indicating these are generated files, and to make changes in the generator. This allows all the tile* ports to compile properly again. OK for trunk? Seems this wasn't ever ACKed or applied up to now? I'm still seeing compilation errors for the tile targets, see eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=382169 MfG, JBG Now checked in, revision 218624 Andrew
Re: [PATCH/AARCH64] v2 Add aligning of functions/loops/jumps
On 23 November 2014 at 00:09, Andrew Pinski wrote: > Hi, > This is just a rebase of > https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01615.html as requested > by https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01736.html. Nothing > has changed in it. > > OK? Built and tested on aarch64-elf with no regressions. > > Thanks, > Andrew Pinski > > ChangeLog: > > * config/aarch64/aarch64-protos.h (tune_params): Add align field. > * config/aarch64/aarch64.c (generic_tunings): Specify align. > (cortexa53_tunings): Likewise. > (cortexa57_tunings): Likewise. > (thunderx_tunings): Likewise. > (aarch64_override_options): Set align_loops, align_jumps, > align_functions based on what the tuning struct. OK /Marcus
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Ping. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html Thanks, Kyrill On 04/12/14 09:19, Kyrill Tkachov wrote: On 02/12/14 22:58, Ramana Radhakrishnan wrote: On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune->fuseable_ops & ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. Done. It's similar to the aarch64 adrp+add case. It does make it easier to read, thanks. 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + && CONST_INT_P (SET_SRC (prev_set)) + && REG_P (XEXP (set_dest, 0)) + && REG_P (SET_DEST (prev_set)) + && REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM + && REG_P (SET_DEST (curr_set)) + && REG_P (SET_DEST (prev_set)) + && GET_CODE (SET_SRC (prev_set)) == HIGH + && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; Done, but if/when we extend the function to handle more fusion cases it will need to be refactored, since we will want to just bail out of this MOVW+MOVT case rather than the whole function. I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Here's the updated patch. I've tested on arm-none-eabi and made sure that the fusion still happens on the benchmarks I looked at. Ok? Thanks, Kyrill Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH][AArch64] Fix usage of +no in error message for aarch64_parse_extension
On 10 December 2014 at 15:30, Kyrill Tkachov wrote: > 2014-12-10 Kyrylo Tkachov kyrylo.tkac...@arm.com > > * config/aarch64/aarch64.c (aarch64_parse_extension): Update error > message to say +no only when removing extension. OK /Marcus
Re: [PATCH][AARCH64][4.9]Backport "Use selected cpu's tuning when no tuning parameter is specified."
On 10 December 2014 at 13:58, Renlin Li wrote: > This is a backport patch of > https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00287.html > > aarch64-none-elf has been built and tested on the model, no issue. > Okay for branch 4.9? > > Regards, > Renlin Li > > > gcc/ChangeLog: > > 2014-12-10 Renlin Li > > * config/aarch64/aarch64.c (aarch64_parse_cpu): Remove selected_tune > assignment as this will be done later. > (aarch64_override_options): Use selected_cpu's tuning. OK /Marcus
Re: [PATCH][AARCH64]Use AARCH64_FL_FPSIMD flags for all cores in aarch64-cores.def
On 10 December 2014 at 16:34, Renlin Li wrote: > 2014-12-10 Renlin Li > > * config/aarch64/aarch64-cores.def: Change all AARCH64_FL_FPSIMD to > AARCH64_FL_FOR_ARCH8. > * config/aarch64/aarch64.c (all_cores): Use FLAGS from aarch64-cores.def > file > only. OK /Marcus
Re: [PATCH AARCH64]Make ldp/stp case less vulnerable
On 11 December 2014 at 10:06, Bin Cheng wrote: > gcc/testsuite/ChangeLog > 2014-12-11 Bin Cheng > > * gcc.target/aarch64/ldp_stp_2.c: Make test less vulnerable. > * gcc.target/aarch64/ldp_stp_3.c: Ditto. OK /Marcus
[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports
Hi all we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to revision 218412 as r218423. We have also backported this set of revisions: * r213382 as r218352 : [AArch64] arm_neon.h - add vpaddd_f64, vpaddd_s64, vpaddd_u64 intrinsics * r214008 as r218354 : [AArch64] Move some code around in aarch64_expand_mov_immediate * r214948 as r218355 : [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction * r214949 as r218355 : [PATCH AArch64 2/2] Remove vector compare/tst __builtins * r214950 as r218356 : [PATCH AArch64 1/2] Add execution tests of vget_low and vget_high * r214952 as r218356 : [PATCH AArch64 2/2] Replace temporary inline assembler for vget_high * r215013 as r218357 : Remove no-longer-needed fp-bit target macros. * r215046 as r218358 : [AArch64] PR 61749: Do not ICE in lane intrinsics when passed non-constant lane number * r215047 as r218359 : [AArch32] Disable xordi3-opt.c/iordi3-opt.c on thumb1 target * r215071 as r218377 : [AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu * r215072 as r218360 : [AArch64 Testsuite] Add test of vld[234]q? intrinsic * r215077 as r218361 : [AArch64 Testsuite] Extend test of vld1+vst1 intrinsics to cover more variants * r215078 as r218362 : [AArch64 Testsuite] Add a test of vldN_dup intrinsics * r215126 as r218363 : [AArch64 Testsuite] Add a test of the vldN_lane intrinsic * r215129 as r218364 : [AArch64 Testsuite] Add a test of the vst[234](q?) intrinics * r215177 as r218365 : [AArch64 Testsuite] Add execution test of vset(q?)_lane intrinsics. * r215206 as r218351 : [AArch64] Add cost handling of CALLER_SAVE_REGS and POINTER_REGS * r215207 as r218351 : [AArch64] Fix cost for Q register moves * r215208 as r218351 : [AArch64] Add regmove_costs for Cortex-A57 and A53 * r215473 as r218366 : [testsuite] whole_vector_shift * r215475 as r218367 : [testsuite] vect-reduc-or * r215540 as r218368 : PR rtl-optimization/63210 IRA * r215707 as r218370 : Fix IRA ICE tmpdir-gcc-.dg-struct-layout-1/t028 * r215711 as r218371 : Accept cortex-m7/fpv5-sp-16/fpv5-d16 * r215842 as r218370 : Fix IRA ICE tmpdir-gcc-.dg-struct-layout-1/t028 -addon * r215865 as r218373 : Add aarch64 to list of targets that support gold * r216253 as r218374 : Remove unused variable and marco * r216336 as r218375 : Target Legitimze Address * r216444 as r218350 : [testsuite] Fix race in libstdc++ testsuite * r216517 as r218378 : [testsuite] update testcases for GNU11 * r216524 as r218379 : Add -mthunderx option * r216543 as r218380 : [testsuite] fix gcc-dg-prune glitch when filtering "relocation truncation" error * r216544 as r218384 : [testsuite] Update testcases for GNU11 * r216630 as r218385 : PR 63173 fix vldX_dup * r216638 as r218386 : [testsuite] fix wrap_compile_flags * r216765 as r218387 : PR63442 libgcc_cmp_return_mode not always return word_mode * r216996 as r218390 : [Patch 1/7] Hookize *_BY_PIECES_P * r216998 as r218390 : [Patch 2/7 s390] Deprecate *_BY_PIECES_P, move to hookized version * r216999 as r218390 : [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version * r217001 as r218390 : [Patch 4/7 sh] Deprecate *_BY_PIECES_P, move to hookized version * r217002 as r218390 : [Patch 5/7 mips] Deprecate *_BY_PIECES_P, move to hookized version * r217003 as r218390 : [Patch 6/7 AArch64] Deprecate *_BY_PIECES_P, move to hookized version * r217004 as r218390 : [Patch 7/7] Remove *_BY_PIECES_P * r217014 as r218391 : Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes * r217026 as r218393 : ifcvt: Allow CC mode if HAVE_cbranchcc4 * r217076 as r218394 : Fix predicate and constraint mismatch in logical atomic operations * r217079 as r218398 : Migrate to new reduc_plus_scal_optab * r217080 as r218398 : Migrate to new reduc_[us](min|max)_scal_optab * r217742 as r218390 : PR target/63937 fix 216996 * r217971 as r218383 : [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly * r210735 as r218351 : Change CORE_REGS in GENERAL_REGS This will be part of our 2014.12 4.9 release. Thanks Yvan
Re: [PATCH] Fix PR42108
On Thu, Dec 11, 2014 at 11:05:13AM +0100, Richard Biener wrote: > > The following patch fixes the performance regression in PR42108 > by allowing PRE and LIM to see the division (to - from) / step > in translating do loops executed unconditionally. This makes > them not care for the fact that step might be zero and thus > the division might trap. > step cannot be zero in standard conforming code. From F95 8.1.4.4, "Execution of a DO construct": When the DO statement is executed, the DO construct becomes active. If loop-control is [ , ] do-variable = scalar-int-expr1, scalar-int-expr2 [, scalar-int-expr3] the following steps are performed in sequence: (1) The initial parameter m1, the terminal parameter m2, and the incrementation parameter m3 are of type integer with the same kind type parameter as the do-variable. Their values are established by evaluating scalar-int-expr1, scalar-int-expr2, and scalar-int-expr3, respectively, including, if necessary, conversion to the kind type parameter of the do-variable according to the rules for numeric conversion (Table 7.10). If scalar-int-expr3 does not appear, m3 has the value 1. The value m3 shall not be zero. "The value m3 shall not be zero" is a not an enumerated constraint, so the compiler does not need to catch and report m3 being zero. The prohibition is on the programmerr. So, if the division traps, it is a bug in the program. > This makes the runtime of the testcase improve from 10.7s to > 8s (same as gfortran 4.3). > > The caveat is that iff the loop is not executed (to < from > for positive step for example) then there will be an additional > executed division computing the unused countm1. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok > for trunk? > OK. -- Steve
Ping: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
So I'm afraid I'm not going to get involved in a discussion about CANNOT_CHANGE_MODE_CLASS on RS6000, and what you might want to do there - sorry, but I don't think I can really contribute anything there. However, I *am* trying to migrate all platforms off the old reduc_xxx optabs to the new version producing a scalar. Hence, can I ping the attached patch (which is just a simple combination of the previously-posted patch + snippet)? No regressions on gcc112.fsffrance.org. This works in exactly the same way as the old code path, with a second insn to pull the scalar result out of the reduction, just as the expander would have done (or the bitfieldref before that), and avoiding the v2df combine pattern (again, as previously). gcc/ChangeLog: * config/rs6000/altivec.md (reduc_splus_): Rename to... (reduc_plus_scal_): ...this, add rs6000_expand_vector_extract. (reduc_uplus_v16qi): Remove. * config/rs6000/vector.md (VEC_reduc_name): change "splus" to "plus" (reduc__v2df): Remove. (reduc__scal_v2df): New. (reduc__v4sf): Rename to... (reduc__scal_v4sf): ...this, wrap VEC_reduc in a vec_select of element 3, add scratch register. Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using this snippet on top of original patch; no regressions. Alan Lawrence wrote: So I'm no expert on RS6000 here, but following on from Segher's observation about the change in pattern...so the difference in 'expand' is exactly that, a vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combining the two previous insns. However, inspecting the logs from -fdump-rtl-combine-all, *without* my patch, when the combiner tries to put those two together, I see: Trying 30 -> 31: Failed to match this instruction: (set (reg:DF 179 [ stmp_s_5.7D.2196 ]) (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ vect_s_5.6D.2195 ]) (parallel [ (const_int 1 [0x1]) (const_int 0 [0]) ])) (reg:V2DF 173 [ vect_s_5.6D.2195 ])) (parallel [ (const_int 1 [0x1]) ]))) That is, it looks like combine_simplify_rtx has transformed the (vec_concat (vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df insn, into a single vec_select, which does not match the vsx_reduc_plus_v2df_scalar insn. So despite the comment (in vsx.md): ;; Combiner patterns with the vector reduction patterns that knows we can get ;; to the top element of the V2DF array without doing an extract. It looks like the code generation prior to my patch, considered better, was because the combiner didn't actually use the pattern? In that case whilst you may want to dig into register allocation, cannot_change_mode_class, etc., for other reasons, I think the best fix for migrating to reduc_plus_scal... is simply to avoid using the "Combiner" patterns and just emit two insns, the old pattern followed by a vec_extract. The attached snippet does this (I won't call it a patch yet, and it applies on top of the previous patch - I went the route of calling the two gen functions rather than copying their RTL sequences, but could do the latter if that were preferable???), and restores code generation to the original form on your example above; it bootstraps OK but I'm still running check-gcc on the Compile Farm... However, again on your example above, I note that if I *remove* the reduc_plus_scal_v2df pattern altogether, I get: .sum: li 10,512# 52 *movdi_internal64/4 [length = 4] ld 9,.LC2@toc(2) # 20 *movdi_internal64/2 [length = 4] xxlxor 0,0,0 # 17 *vsx_movv2df/12 [length = 4] mtctr 10 # 48 *movdi_internal64/11[length = 4] .align 4 .L2: lxvd2x 12,0,9# 23 *vsx_movv2df/2 [length = 4] addi 9,9,16 # 25 *adddi3_internal1/2 [length = 4] xvadddp 0,0,12 # 24 *vsx_addv2df3/1 [length = 4] bdnz .L2 # 47 *ctrdi_internal1/1 [length = 4] xxsldwi 12,0,0,2 # 30 vsx_xxsldwi_v2df[length = 4] xvadddp 1,0,12 # 31 *vsx_addv2df3/1 [length = 4] nop # 37 *vsx_extract_v2df_internal2/1 [length = 4] blr # 55 return [length = 4] this is presumably using gcc's scalar reduction code, but (to my untrained eye on powerpc!) it looks even better than the first form above (the same in the loop, and in the reduction, an xxpermdi is replaced by a nop !)... --Alan Segher Boessenkool wrote: On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote: However, the double patt
Remove unused arguments of bulitin_unreachable
Hi, in firefox .optimized dumps one can see few places where __builtin_unreachable is called (as a result of devirtualization code proving the code path to be undefined). There is usually some argument setup for the parameters of __builtin_unreachable that are dead. This patch makes it somewhat better so now we get: : # prephitmp_222 = PHI <_52(27), pretmp_245(29)> _57 = prephitmp_222 + 2; pool_40(D)->ptr = _57; __builtin_unreachable (); Why DSE does not eliminate the stores prior noreturn const function? Bootstrapped/regtested x86_64-linux, OK? Honza * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Remove dead parameters of BUILT_IN_UNREACHABLE Index: tree-ssa-dce.c === --- tree-ssa-dce.c (revision 218610) +++ tree-ssa-dce.c (working copy) @@ -250,6 +250,15 @@ mark_stmt_if_obviously_necessary (gimple case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: return; + case BUILT_IN_UNREACHABLE: + /* All parameters of BUILT_IN_UNREACHABLE are dead. Remove them +from the stmt, so we can remove their definitions. */ + if (gimple_call_num_args (stmt)) + { + gimple_set_num_ops (stmt, 3); + update_stmt (stmt); + } + break; default:; }
Re: Remove unused arguments of bulitin_unreachable
On Thu, Dec 11, 2014 at 06:06:55PM +0100, Jan Hubicka wrote: > Hi, > in firefox .optimized dumps one can see few places where __builtin_unreachable > is called (as a result of devirtualization code proving the code path to be > undefined). There is usually some argument setup for the parameters of > __builtin_unreachable that are dead. This patch makes it somewhat better > so now we get: > : > > # prephitmp_222 = PHI <_52(27), pretmp_245(29)> > > _57 = prephitmp_222 + 2; > > pool_40(D)->ptr = _57; > > __builtin_unreachable (); > > > Why DSE does not eliminate the stores prior noreturn const function? > > Bootstrapped/regtested x86_64-linux, OK? > > Honza > * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Remove dead > parameters > of BUILT_IN_UNREACHABLE Shouldn't this be done when you actually change the call to __builtin_unreachable ()? I mean, __builtin_unreachable () has no arguments, so leaving any arguments there is broken IL, even if you clean it up during the next DCE. > --- tree-ssa-dce.c(revision 218610) > +++ tree-ssa-dce.c(working copy) > @@ -250,6 +250,15 @@ mark_stmt_if_obviously_necessary (gimple > case BUILT_IN_ALLOCA: > case BUILT_IN_ALLOCA_WITH_ALIGN: > return; > + case BUILT_IN_UNREACHABLE: > + /* All parameters of BUILT_IN_UNREACHABLE are dead. Remove them > + from the stmt, so we can remove their definitions. */ > + if (gimple_call_num_args (stmt)) > + { > + gimple_set_num_ops (stmt, 3); > + update_stmt (stmt); > + } > + break; > > default:; > } Jakub
Re: [Patch, Fortran] Convert gfc_notify_std to common diagnostics
PING - https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00731.html Tobias Burnus wrote: > This patch requires that the gfc_error patch has been applied, > https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00607.html That patch has now been committed - and my patch still applies, building and regtesting still succeeds without new failures poping up. > The patch does some missing '%s' to %qs and '...' to %<...%> for gfc_error, > does likewise for gfc_notify_std and converts the latter into calls to > gfc_error and gfc_warning * * * Side note: The biggest remaining issue with regards to using the common diagnostic is supporting two locations. For the current plans, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44054#c22 Other remaining items: * Things supported by Fortran and not by the common diagnostic, e.g. "", but that requires some careful review what's missing and whether it matters. * Option handling: PR54687. One could also think of adding "error (OPT_..." support, printing the option in brackets (like: "[-std=f95]"). On the other hand, as that's all what the feature would do (contrary to -W... which also enters in -Werror=...), one could simply leave that in the error calling part. Currently, the Fortran code doesn't print this; printing it with gfc_notify_std for -std= would be trivial. See also PR31601. * libcpp-related features such as macro expansion tracking. Requires libcpp whitespace support, cf. PR64273 and links there in. And PR45179 (support unicode in 4_"..." strings) also depends on the libcpp work. Tobias
RFC: handle cached local static DIEs
Hi Jason. After my last set of dwarf changes for locals, I found some target library building failures which I am now fixing. The problem at hand is that, by design, the caching code in gen_variable_die() refuses to use a previously cached DIE if the current context and the cached context are different: else if (old_die->die_parent != context_die) { /* If the contexts differ, it means we're not talking about the same thing. Clear things so we can get a new DIE. This can happen when creating an inlined instance, in which case we need to create a new DIE that will get annotated with DW_AT_abstract_origin. */ old_die = NULL; gcc_assert (!DECL_ABSTRACT_P (decl)); } This is causing problems with local statics which are handled at dwarf2out_late_global_decl, and which originally have a context of the compilation unit (by virtue of the dwarf2out_decl call). This context then gets changed here: /* For local statics lookup proper context die. */ if (TREE_STATIC (decl) && DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL) context_die = lookup_decl_die (DECL_CONTEXT (decl)); This new context may be correct for front/middle-end purposes, but is not the DIE context I am expecting in gen_variable_die. For example, in the following example, the DECL_CONTEXT for the static is funky's DW_TAG_subprogram, whereas the caching code is expecting the DW_TAG_lexical_block: void funky() { { static const char *nested_static_const = "testing123"; } } My proposed way of handling it (attached) is by tightening the check in gen_variable_die(), and special casing this scenario (assuming, there is no other way to get a differing context). This works, and fixes all the failures, without introducing any regressions. Another approach would be to use whatever context is already cached with just "context_die = lookup_decl_die (decl)", but that feels like cheating. Are you OK with the attached approach, or do you have something else in mind? Thanks. Aldy commit 515a20666d0ea73f2380bae6d9b8ec1d5bb2f001 Author: Aldy Hernandez Date: Thu Dec 11 09:26:25 2014 -0800 * dwarf2out.c (gen_subprogram_die): Handle as cached die if dumped_early bit is set. (dwarf2out_decl): Abstract local static check... (local_function_static): ...into here. (gen_variable_die): Handle different contexts in a cached die gracefully for the non inline case. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index e4a7973..5d55d1f 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -18511,7 +18511,9 @@ gen_subprogram_die (tree decl, dw_die_ref context_die) apply; we just use the old DIE. */ expanded_location s = expand_location (DECL_SOURCE_LOCATION (decl)); struct dwarf_file_data * file_index = lookup_filename (s.file); - if (((is_cu_die (old_die->die_parent) || context_die == NULL) + if (((is_cu_die (old_die->die_parent) + || context_die == NULL + || dumped_early) && (DECL_ARTIFICIAL (decl) || (get_AT_file (old_die, DW_AT_decl_file) == file_index && (get_AT_unsigned (old_die, DW_AT_decl_line) @@ -19132,6 +19134,17 @@ decl_will_get_specification_p (dw_die_ref old_die, tree decl, bool declaration) && get_AT_flag (old_die, DW_AT_declaration) == 1); } +/* Return true if DECL is a local static. */ + +static inline bool +local_function_static (tree decl) +{ + gcc_assert (TREE_CODE (decl) == VAR_DECL); + return TREE_STATIC (decl) +&& DECL_CONTEXT (decl) +&& TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL; +} + /* Generate a DIE to represent a declared data object. Either DECL or ORIGIN must be non-null. */ @@ -19283,13 +19296,35 @@ gen_variable_die (tree decl, tree origin, dw_die_ref context_die) } else if (old_die->die_parent != context_die) { - /* If the contexts differ, it means we're not talking about -the same thing. Clear things so we can get a new DIE. -This can happen when creating an inlined instance, in -which case we need to create a new DIE that will get -annotated with DW_AT_abstract_origin. */ - old_die = NULL; - gcc_assert (!DECL_ABSTRACT_P (decl)); + /* If the contexts differ, it means we _MAY_ not be talking +about the same thing. */ + if (origin) + { + /* If we will be creating an inlined instance, we need a +new DIE that will get annotated with +DW_AT_abstract_origin. Clear things so we can get a +new DIE. */ + gcc_assert (!DECL_ABSTRACT_P (decl)); + old_die = NULL; + } + else +
[patch] Fix std::notify_all_at_thread_exit test for older glibc
I'm seeing this test timeout on glibc 2.13, which I think is because it doesn't provide __cxa_atexit_thread_impl and so the thread_local destructor and the notify_all() happen in an unspecified order. Tested x86_64-linux, committed to trunk. commit 299704c621dd7afaee7c5fb2a354b40ef41c2eba Author: Jonathan Wakely Date: Thu Dec 11 17:12:17 2014 + * testsuite/30_threads/condition_variable/members/3.cc: Only use a thread_local when __cxa_thread_atexit_impl is available. diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc index 0da545d..1788bcf 100644 --- a/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc +++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc @@ -41,7 +41,12 @@ void func() { std::unique_lock lock{mx}; std::notify_all_at_thread_exit(cv, std::move(lock)); +#if _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL + // Correct order of thread_local destruction needs __cxa_thread_atexit_impl static thread_local Inc inc; +#else + Inc inc; +#endif } int main()
Re: [PATCH] libgccjit cleanups
On Wed, 2014-12-10 at 23:32 -0500, Ulrich Drepper wrote: > On Mon, Dec 8, 2014 at 11:36 AM, David Malcolm wrote: > > Thanks. Overall this is good, a few nitpicks inline below: > > I've made the changes and checked in the patch. ...as r218617. Thanks. The jit subdirectory has its own ChangeLog file. I realize now that your ChangeLog entries went in gcc/ChangeLog; they should have been in gcc/jit/ChangeLog. Sorry for not spotting that in review. I've fixed it in r218637. Does your editor do some kind of auto-reindent? FWIW, I see various whitespace-only changes in that commit. I assume we try to avoid such changes I've added documentation of libgccjit++.h to the .rst files as of yesterday. So I've added documentation of the new function as r218636: gcc/jit/ChangeLog: * docs/cp/topics/contexts.rst (gccjit::context::set_str_option): Document new function. * docs/_build/texinfo/libgccjit.texi: Regenerate. --- gcc/jit/docs/cp/topics/contexts.rst | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/gcc/jit/docs/cp/topics/contexts.rst b/gcc/jit/docs/cp/topics/contexts.rst index 72815fb..4becd51 100644 --- a/gcc/jit/docs/cp/topics/contexts.rst +++ b/gcc/jit/docs/cp/topics/contexts.rst @@ -148,9 +148,18 @@ Debugging Options --- -.. - FIXME: gccjit::context::set_str_option doesn't seem to exist yet in the - C++ API +String Options +** + +.. function:: void \ + gccjit::context::set_str_option (enum gcc_jit_str_option, \ + const char *value) + + Set a string option of the context. + + This is a thin wrapper around the C API + :c:func:`gcc_jit_context_set_str_option`; the options have the same + meaning. Boolean options *** -- 1.8.5.3
Re: [Committed/AARCH64] Fix gcc.target/aarch64/test_frame_*.c testcases after ccmp patches
On 22/11/14 23:41, Andrew Pinski wrote: Hi, After the conditional compare patches, the some of the gcc.target/aarch64/test_frame_*.c testcases start to fail. This was due to no longer duplicating simple_return and causing the epilogue to be duplicated. This changes the testcases to expect the non duplicated epilogue. Committed as obvious after a test of aarch64-elf. Thanks, Andrew Pinski ChangeLog: * gcc.target/aarch64/test_frame_1.c: Expect only two loads of x30 (in the epilogue). * gcc.target/aarch64/test_frame_6.c: Likewise. * gcc.target/aarch64/test_frame_2.c: Expect only one pair load of x30 and x19 (in the epilogue). * gcc.target/aarch64/test_frame_4.c: Likewise. * gcc.target/aarch64/test_frame_7.c: Likewise. Hi Andrew, I'm still seeing the original number of ldr x30 and ldp x19, x30 insns for these tests. What am I missing? FAIL: gcc.target/aarch64/test_frame_1.c scan-assembler-times ldr\tx30, \\[sp\\], [0-9]+ 2 FAIL: gcc.target/aarch64/test_frame_2.c scan-assembler-times ldp\tx19, x30, \\[sp\\], [0-9]+ 1 FAIL: gcc.target/aarch64/test_frame_4.c scan-assembler-times ldp\tx19, x30, \\[sp\\], [0-9]+ 1 FAIL: gcc.target/aarch64/test_frame_6.c scan-assembler-times ldr\tx30, \\[sp\\], [0-9]+ 2 FAIL: gcc.target/aarch64/test_frame_7.c scan-assembler-times ldp\tx19, x30, \\[sp\\], [0-9]+ 1 Thanks, Tejas,
Re: [Committed/AARCH64] Fix gcc.target/aarch64/test_frame_*.c testcases after ccmp patches
> On Dec 11, 2014, at 10:06 AM, Tejas Belagod wrote: > >> On 22/11/14 23:41, Andrew Pinski wrote: >> Hi, >> After the conditional compare patches, the some of the >> gcc.target/aarch64/test_frame_*.c testcases start to fail. This was >> due to no longer duplicating simple_return and causing the epilogue to >> be duplicated. >> >> This changes the testcases to expect the non duplicated epilogue. >> >> Committed as obvious after a test of aarch64-elf. >> >> Thanks, >> Andrew Pinski >> >> ChangeLog: >> * gcc.target/aarch64/test_frame_1.c: Expect only two loads of x30 (in >> the epilogue). >> * gcc.target/aarch64/test_frame_6.c: Likewise. >> * gcc.target/aarch64/test_frame_2.c: Expect only one pair load of x30 >> and x19 (in the epilogue). >> * gcc.target/aarch64/test_frame_4.c: Likewise. >> * gcc.target/aarch64/test_frame_7.c: Likewise. > > Hi Andrew, > > I'm still seeing the original number of ldr x30 and ldp x19, x30 insns for > these tests. What am I missing? The ccmp patch had to be reverted. But this patch was forgotten when it was. Just revert the testcase patch. Thanks, Andrew > > FAIL: gcc.target/aarch64/test_frame_1.c scan-assembler-times ldr\tx30, > \\[sp\\], [0-9]+ 2 > FAIL: gcc.target/aarch64/test_frame_2.c scan-assembler-times ldp\tx19, x30, > \\[sp\\], [0-9]+ 1 > FAIL: gcc.target/aarch64/test_frame_4.c scan-assembler-times ldp\tx19, x30, > \\[sp\\], [0-9]+ 1 > FAIL: gcc.target/aarch64/test_frame_6.c scan-assembler-times ldr\tx30, > \\[sp\\], [0-9]+ 2 > FAIL: gcc.target/aarch64/test_frame_7.c scan-assembler-times ldp\tx19, x30, > \\[sp\\], [0-9]+ 1 > > Thanks, > Tejas, > > > > > > >
Re: Remove unused arguments of bulitin_unreachable
> On Thu, Dec 11, 2014 at 06:06:55PM +0100, Jan Hubicka wrote: > > Hi, > > in firefox .optimized dumps one can see few places where > > __builtin_unreachable > > is called (as a result of devirtualization code proving the code path to be > > undefined). There is usually some argument setup for the parameters of > > __builtin_unreachable that are dead. This patch makes it somewhat better > > so now we get: > > : > > > > # prephitmp_222 = PHI <_52(27), pretmp_245(29)> > > > > _57 = prephitmp_222 + 2; > > > > pool_40(D)->ptr = _57; > > > > __builtin_unreachable (); > > > > > > Why DSE does not eliminate the stores prior noreturn const function? > > > > Bootstrapped/regtested x86_64-linux, OK? > > > > Honza > > * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Remove dead > > parameters > > of BUILT_IN_UNREACHABLE > > Shouldn't this be done when you actually change the call to > __builtin_unreachable ()? I mean, __builtin_unreachable () has no > arguments, so leaving any arguments there is broken IL, even if you clean it > up during the next DCE. Hmm, I tought there was some reason to not do so becuase of inplace folding and memory-SSA. I can give a try to update all the places we can put builtin_unreachable into IL. (I wonder if that also include standard constant propagation) Honza > > > --- tree-ssa-dce.c (revision 218610) > > +++ tree-ssa-dce.c (working copy) > > @@ -250,6 +250,15 @@ mark_stmt_if_obviously_necessary (gimple > > case BUILT_IN_ALLOCA: > > case BUILT_IN_ALLOCA_WITH_ALIGN: > > return; > > + case BUILT_IN_UNREACHABLE: > > + /* All parameters of BUILT_IN_UNREACHABLE are dead. Remove them > > +from the stmt, so we can remove their definitions. */ > > + if (gimple_call_num_args (stmt)) > > + { > > + gimple_set_num_ops (stmt, 3); > > + update_stmt (stmt); > > + } > > + break; > > > > default:; > > } > > Jakub
patches for libstdc++ in #64271 (bootstrap on NetBSD)
Here are the three patches as requested for #64271. --- libstdc++-v3/config/os/bsd/netbsd/ctype_inline.h.orig 2014-12-10 22:19:05.0 +0100 +++ libstdc++-v3/config/os/bsd/netbsd/ctype_inline.h2014-12-10 22:20:46.0 +0100 @@ -48,7 +48,7 @@ is(const char* __low, const char* __high, mask* __vec) const { while (__low < __high) - *__vec++ = _M_table[*__low++]; + *__vec++ = _M_table[(unsigned char)*__low++]; return __high; } --- libstdc++-v3/config/os/bsd/netbsd/ctype_configure_char.cc.orig 2014-12-10 22:19:26.0 +0100 +++ libstdc++-v3/config/os/bsd/netbsd/ctype_configure_char.cc 2014-12-10 22:21:15.0 +0100 @@ -38,11 +38,17 @@ // Information as gleaned from /usr/include/ctype.h +#ifndef _CTYPE_BL extern "C" const u_int8_t _C_ctype_[]; +#endif const ctype_base::mask* ctype::classic_table() throw() - { return _C_ctype_ + 1; } +#ifdef _CTYPE_BL + { return _C_ctype_tab_ + 1; } +#else + { return _C_ctype_ + 1; } +#endif ctype::ctype(__c_locale, const mask* __table, bool __del, size_t __refs) @@ -69,14 +75,14 @@ char ctype::do_toupper(char __c) const - { return ::toupper((int) __c); } + { return ::toupper((int)(unsigned char) __c); } const char* ctype::do_toupper(char* __low, const char* __high) const { while (__low < __high) { - *__low = ::toupper((int) *__low); + *__low = ::toupper((int)(unsigned char) *__low); ++__low; } return __high; @@ -84,14 +90,14 @@ char ctype::do_tolower(char __c) const - { return ::tolower((int) __c); } + { return ::tolower((int)(unsigned char) __c); } const char* ctype::do_tolower(char* __low, const char* __high) const { while (__low < __high) { - *__low = ::tolower((int) *__low); + *__low = ::tolower((int)(unsigned char) *__low); ++__low; } return __high; --- libstdc++-v3/config/os/bsd/netbsd/ctype_base.h.orig 2014-12-10 22:18:50.0 +0100 +++ libstdc++-v3/config/os/bsd/netbsd/ctype_base.h 2014-12-10 22:20:31.0 +0100 @@ -43,9 +43,22 @@ // NB: Offsets into ctype::_M_table force a particular size // on the mask type. Because of this, we don't use an enum. -typedef unsigned char mask; -#ifndef _CTYPE_U +#if defined(_CTYPE_BL) +typedef unsigned short mask; +static const mask upper = _CTYPE_U; +static const mask lower = _CTYPE_L; +static const mask alpha = _CTYPE_A; +static const mask digit = _CTYPE_D; +static const mask xdigit= _CTYPE_X; +static const mask space = _CTYPE_S; +static const mask print = _CTYPE_R; +static const mask graph = _CTYPE_G; +static const mask cntrl = _CTYPE_C; +static const mask punct = _CTYPE_P; +static const mask alnum = _CTYPE_A | _CTYPE_D; +#elif !defined(_CTYPE_U) +typedef unsigned char mask; static const mask upper= _U; static const mask lower= _L; static const mask alpha= _U | _L; @@ -58,6 +71,7 @@ static const mask punct= _P; static const mask alnum= _U | _L | _N; #else +typedef unsigned char mask; static const mask upper= _CTYPE_U; static const mask lower= _CTYPE_L; static const mask alpha= _CTYPE_U | _CTYPE_L;
second part of patches for #64271 (bootstrap on NetBSD)
here are the non-libstdc++ patches: --- libgfortran/configure.orig 2014-12-10 22:34:06.0 +0100 +++ libgfortran/configure 2014-12-10 22:33:57.0 +0100 @@ -26447,7 +26447,7 @@ fi case "$host" in -*-*-darwin* | *-*-hpux* | *-*-cygwin* | *-*-mingw* ) +*-*-darwin* | *-*-hpux* | *-*-cygwin* | *-*-mingw* | *-*-netbsd* ) $as_echo "#define GTHREAD_USE_WEAK 0" >>confdefs.h --- libcilkrts/configure.orig 2014-12-10 22:28:55.0 +0100 +++ libcilkrts/configure2014-12-10 22:28:38.0 +0100 @@ -14519,7 +14519,7 @@ CFLAGS="$save_CFLAGS" if test $enable_shared = yes; then - link_cilkrts="-lcilkrts %{static: $LIBS}" + link_cilkrts="-rpath ${prefix}/lib --as-needed -lgcc_s -lcilkrts %{static $LIBS}" else link_cilkrts="-lcilkrts $LIBS" fi --- libcilkrts/runtime/os-unix.c.orig 2014-12-10 22:29:28.0 +0100 +++ libcilkrts/runtime/os-unix.c2014-12-10 22:29:40.0 +0100 @@ -56,7 +56,9 @@ // Uses sysconf(_SC_NPROCESSORS_ONLN) in verbose output #elif defined __DragonFly__ // No additional include files -#elif defined __FreeBSD__ +#elif defined __FreeBSD__ +// No additional include files +#elif defined __NetBSD__ // No additional include files #elif defined __CYGWIN__ // Cygwin on Windows - no additional include files @@ -376,7 +378,7 @@ assert((unsigned)count == count); return count; -#elif defined __FreeBSD__ || defined __CYGWIN__ || defined __DragonFly__ +#elif defined __FreeBSD__ || defined __CYGWIN__ || defined __DragonFly__ || defined __NetBSD__ int ncores = sysconf(_SC_NPROCESSORS_ONLN); return ncores; @@ -400,7 +402,7 @@ COMMON_SYSDEP void __cilkrts_yield(void) { -#if __APPLE__ || __FreeBSD__ || __VXWORKS__ +#if __APPLE__ || __FreeBSD__ || __NetBSD__ || __VXWORKS__ // On MacOS, call sched_yield to yield quantum. I'm not sure why we // don't do this on Linux also. sched_yield();
Re: Ping: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
Sorry - it works exactly as the current optab/expander *in the v2df case*, but is the same as the previous version of the patch in the other cases. --Alan Alan Lawrence wrote: So I'm afraid I'm not going to get involved in a discussion about CANNOT_CHANGE_MODE_CLASS on RS6000, and what you might want to do there - sorry, but I don't think I can really contribute anything there. However, I *am* trying to migrate all platforms off the old reduc_xxx optabs to the new version producing a scalar. Hence, can I ping the attached patch (which is just a simple combination of the previously-posted patch + snippet)? No regressions on gcc112.fsffrance.org. This works in exactly the same way as the old code path, with a second insn to pull the scalar result out of the reduction, just as the expander would have done (or the bitfieldref before that), and avoiding the v2df combine pattern (again, as previously). gcc/ChangeLog: * config/rs6000/altivec.md (reduc_splus_): Rename to... (reduc_plus_scal_): ...this, add rs6000_expand_vector_extract. (reduc_uplus_v16qi): Remove. * config/rs6000/vector.md (VEC_reduc_name): change "splus" to "plus" (reduc__v2df): Remove. (reduc__scal_v2df): New. (reduc__v4sf): Rename to... (reduc__scal_v4sf): ...this, wrap VEC_reduc in a vec_select of element 3, add scratch register. Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using this snippet on top of original patch; no regressions. Alan Lawrence wrote: So I'm no expert on RS6000 here, but following on from Segher's observation about the change in pattern...so the difference in 'expand' is exactly that, a vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combining the two previous insns. However, inspecting the logs from -fdump-rtl-combine-all, *without* my patch, when the combiner tries to put those two together, I see: Trying 30 -> 31: Failed to match this instruction: (set (reg:DF 179 [ stmp_s_5.7D.2196 ]) (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ vect_s_5.6D.2195 ]) (parallel [ (const_int 1 [0x1]) (const_int 0 [0]) ])) (reg:V2DF 173 [ vect_s_5.6D.2195 ])) (parallel [ (const_int 1 [0x1]) ]))) That is, it looks like combine_simplify_rtx has transformed the (vec_concat (vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df insn, into a single vec_select, which does not match the vsx_reduc_plus_v2df_scalar insn. So despite the comment (in vsx.md): ;; Combiner patterns with the vector reduction patterns that knows we can get ;; to the top element of the V2DF array without doing an extract. It looks like the code generation prior to my patch, considered better, was because the combiner didn't actually use the pattern? In that case whilst you may want to dig into register allocation, cannot_change_mode_class, etc., for other reasons, I think the best fix for migrating to reduc_plus_scal... is simply to avoid using the "Combiner" patterns and just emit two insns, the old pattern followed by a vec_extract. The attached snippet does this (I won't call it a patch yet, and it applies on top of the previous patch - I went the route of calling the two gen functions rather than copying their RTL sequences, but could do the latter if that were preferable???), and restores code generation to the original form on your example above; it bootstraps OK but I'm still running check-gcc on the Compile Farm... However, again on your example above, I note that if I *remove* the reduc_plus_scal_v2df pattern altogether, I get: .sum: li 10,512# 52 *movdi_internal64/4 [length = 4] ld 9,.LC2@toc(2) # 20 *movdi_internal64/2 [length = 4] xxlxor 0,0,0 # 17 *vsx_movv2df/12 [length = 4] mtctr 10 # 48 *movdi_internal64/11[length = 4] .align 4 .L2: lxvd2x 12,0,9# 23 *vsx_movv2df/2 [length = 4] addi 9,9,16 # 25 *adddi3_internal1/2 [length = 4] xvadddp 0,0,12 # 24 *vsx_addv2df3/1 [length = 4] bdnz .L2 # 47 *ctrdi_internal1/1 [length = 4] xxsldwi 12,0,0,2 # 30 vsx_xxsldwi_v2df[length = 4] xvadddp 1,0,12 # 31 *vsx_addv2df3/1 [length = 4] nop # 37 *vsx_extract_v2df_internal2/1 [length = 4] blr # 55 return [length = 4] this is presumably using gcc's scalar reduction code, but (to my untrained eye on powerpc!) it looks even better than the first form above (the same in the loop, and in the reduction, an xxpe
Re: [patch] remove unused `depth' argument from dwarf2out.c
OK. Jason
Re: RFC: handle cached local static DIEs
On 12/11/2014 12:44 PM, Aldy Hernandez wrote: This context then gets changed here: /* For local statics lookup proper context die. */ if (TREE_STATIC (decl) && DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL) context_die = lookup_decl_die (DECL_CONTEXT (decl)); Can we remove this and just leave the context as NULL until it gets fixed up? Jason
Re: RFC: handle cached local static DIEs
On 12/11/14 11:23, Jason Merrill wrote: On 12/11/2014 02:19 PM, Jason Merrill wrote: On 12/11/2014 12:44 PM, Aldy Hernandez wrote: This context then gets changed here: /* For local statics lookup proper context die. */ if (TREE_STATIC (decl) && DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL) context_die = lookup_decl_die (DECL_CONTEXT (decl)); Can we remove this and just leave the context as NULL until it gets fixed up? Never mind, it looks like that'll require more work in gen_variable_die. Your patch looks fine. Hah, I was just going to say that :). I will push my patch to the branch. Thanks for your input. Aldy
[patch] remove unused `depth' argument from dwarf2out.c
Looks like `depth' is passed around and never used. OK for mainline? commit d1603304423bcb25c69d0f4bf51b142e07274275 Author: Aldy Hernandez Date: Thu Dec 11 10:51:04 2014 -0800 * dwarf2out.c (gen_lexical_block_die): Remove unused `depth' parameter. (gen_inlined_subroutine_die): Same. (gen_block_die): Same. (decls_for_scope): Same. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 34b327e..4c2ff8d 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3263,8 +3263,8 @@ static void gen_subprogram_die (tree, dw_die_ref); static void gen_variable_die (tree, tree, dw_die_ref); static void gen_const_die (tree, dw_die_ref); static void gen_label_die (tree, dw_die_ref); -static void gen_lexical_block_die (tree, dw_die_ref, int); -static void gen_inlined_subroutine_die (tree, dw_die_ref, int); +static void gen_lexical_block_die (tree, dw_die_ref); +static void gen_inlined_subroutine_die (tree, dw_die_ref); static void gen_field_die (tree, dw_die_ref); static void gen_ptr_to_mbr_type_die (tree, dw_die_ref); static dw_die_ref gen_compile_unit_die (const char *); @@ -3275,8 +3275,8 @@ static void gen_struct_or_union_type_die (tree, dw_die_ref, static void gen_subroutine_type_die (tree, dw_die_ref); static void gen_typedef_die (tree, dw_die_ref); static void gen_type_die (tree, dw_die_ref); -static void gen_block_die (tree, dw_die_ref, int); -static void decls_for_scope (tree, dw_die_ref, int); +static void gen_block_die (tree, dw_die_ref); +static void decls_for_scope (tree, dw_die_ref); static inline int is_redundant_typedef (const_tree); static bool is_naming_typedef_decl (const_tree); static inline dw_die_ref get_context_die (tree); @@ -18696,7 +18696,7 @@ gen_subprogram_die (tree decl, dw_die_ref context_die) if (DECL_NAME (DECL_RESULT (decl))) gen_decl_die (DECL_RESULT (decl), NULL, subr_die); - decls_for_scope (outer_scope, subr_die, 0); + decls_for_scope (outer_scope, subr_die); if (call_arg_locations && !dwarf_strict) { @@ -19294,7 +19294,7 @@ add_high_low_attributes (tree stmt, dw_die_ref die) /* Generate a DIE for a lexical block. */ static void -gen_lexical_block_die (tree stmt, dw_die_ref context_die, int depth) +gen_lexical_block_die (tree stmt, dw_die_ref context_die) { dw_die_ref stmt_die = new_die (DW_TAG_lexical_block, context_die, stmt); @@ -19308,13 +19308,13 @@ gen_lexical_block_die (tree stmt, dw_die_ref context_die, int depth) if (! BLOCK_ABSTRACT (stmt) && TREE_ASM_WRITTEN (stmt)) add_high_low_attributes (stmt, stmt_die); - decls_for_scope (stmt, stmt_die, depth); + decls_for_scope (stmt, stmt_die); } /* Generate a DIE for an inlined subprogram. */ static void -gen_inlined_subroutine_die (tree stmt, dw_die_ref context_die, int depth) +gen_inlined_subroutine_die (tree stmt, dw_die_ref context_die) { tree decl; @@ -19346,7 +19346,7 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref context_die, int depth) add_high_low_attributes (stmt, subr_die); add_call_src_coords_attributes (stmt, subr_die); - decls_for_scope (stmt, subr_die, depth); + decls_for_scope (stmt, subr_die); } } @@ -20240,7 +20240,7 @@ gen_type_die (tree type, dw_die_ref context_die) things which are local to the given block. */ static void -gen_block_die (tree stmt, dw_die_ref context_die, int depth) +gen_block_die (tree stmt, dw_die_ref context_die) { int must_output_die = 0; bool inlined_func; @@ -20259,7 +20259,7 @@ gen_block_die (tree stmt, dw_die_ref context_die, int depth) tree sub; for (sub = BLOCK_SUBBLOCKS (stmt); sub; sub = BLOCK_CHAIN (sub)) - gen_block_die (sub, context_die, depth + 1); + gen_block_die (sub, context_die); return; } @@ -20314,13 +20314,13 @@ gen_block_die (tree stmt, dw_die_ref context_die, int depth) the concrete instance of STMT got inlined, the later will lead to the generation of a DW_TAG_inlined_subroutine DIE. */ if (! BLOCK_ABSTRACT (stmt)) - gen_inlined_subroutine_die (stmt, context_die, depth); + gen_inlined_subroutine_die (stmt, context_die); } else - gen_lexical_block_die (stmt, context_die, depth); + gen_lexical_block_die (stmt, context_die); } else -decls_for_scope (stmt, context_die, depth); +decls_for_scope (stmt, context_die); } /* Process variable DECL (or variable with origin ORIGIN) within @@ -20352,7 +20352,7 @@ process_scope_var (tree stmt, tree decl, tree origin, dw_die_ref context_die) all of its sub-blocks. */ static void -decls_for_scope (tree stmt, dw_die_ref context_die, int depth) +decls_for_scope (tree stmt, dw_die_ref context_die) { tree decl; unsigned int i; @@ -20384,7 +20384,7 @@ decls_for_scope (tree stmt, dw_die_ref context_die, int depth) for (subblocks = BLOCK_SUBBLOCKS (stmt); subblocks != NULL;
Re: [PATCH 00/13] Go closures, libffi, and the static chain
On 12/11/2014 01:06 AM, Dominik Vogt wrote: > reflect.call > ../../../libgo/runtime/go-reflect-call.c:216 > reflect.call.N13_reflect.Value > > GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/value.go:579 > reflect.Call.N13_reflect.Value > > GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/value.go:412 > reflect_test.TestCallWithStruct > > GCCDIR/build-go-closure/x86_64-unknown-linux-gnu/libgo/gotest30365/test/all_test.go:1490 > testing.tRunner > ../../../libgo/go/testing/testing.go:422 Indeed. libgo uses ffi_type_void to represent empty structures, and libffi would crash for x86_64 when passing such parameters. This does go back to an open bug report about how libffi handles empty structures in general. I've fixed this on the branch, and I'll push this through the proper channels later. r~
Re: [gofrontend-dev] Re: [PATCH 00/13] Go closures, libffi, and the static chain
On 12/11/2014 04:25 AM, Dominik Vogt wrote: > Update: If I disable the custom s390x code and switch to the > implementation just using libffi for reflection calls, the same > crash occurs with the testing/quick libgo test case. The called > function sees a bogus value written by the synamic linker as the > closure pointer, for example with this line in the test code: > > CheckEqual(fComplex64, fComplex64, nil) The compiler should be generating a static structure for these. On x86_64, I see Relocation section '.rela.rodata.testing_quick.fComplex64$descriptor' at offset 0x5d4c0 contains 1 entries: Offset Info Type Sym. ValueSym. Name + Addend 00020001 R_X86_64_64 .text + c0 00c0 t quick.fComplex64 so that is in fact a direct relocation, and will not go via the dynamic linker. Is the s390 port somehow putting the address of a plt entry here? r~
Re: RFC: handle cached local static DIEs
On 12/11/2014 02:19 PM, Jason Merrill wrote: On 12/11/2014 12:44 PM, Aldy Hernandez wrote: This context then gets changed here: /* For local statics lookup proper context die. */ if (TREE_STATIC (decl) && DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL) context_die = lookup_decl_die (DECL_CONTEXT (decl)); Can we remove this and just leave the context as NULL until it gets fixed up? Never mind, it looks like that'll require more work in gen_variable_die. Your patch looks fine. Jason
Overload HONOR_INFINITIES, etc macros
Hello, after HONOR_NANS, I am turning the other HONOR_* macros into functions. As a reminder, the goal is both to make uses shorter and to fix the answer for non-native vector types. Bootstrap+testsuite on x86_64-linux-gnu. 2014-12-12 Marc Glisse * real.h (HONOR_SNANS, HONOR_INFINITIES, HONOR_SIGNED_ZEROS, HONOR_SIGN_DEPENDENT_ROUNDING): Replace macros with 3 overloaded declarations. * real.c (HONOR_NANS): Fix indentation. (HONOR_SNANS, HONOR_INFINITIES, HONOR_SIGNED_ZEROS, HONOR_SIGN_DEPENDENT_ROUNDING): Define three overloads. * builtins.c (fold_builtin_cproj, fold_builtin_signbit, fold_builtin_fmin_fmax, fold_builtin_classify): Simplify argument of HONOR_*. * fold-const.c (operand_equal_p, fold_comparison, fold_binary_loc): Likewise. * gimple-fold.c (gimple_val_nonnegative_real_p): Likewise. * ifcvt.c (noce_try_move, noce_try_minmax, noce_try_abs): Likewise. * omp-low.c (omp_reduction_init): Likewise. * rtlanal.c (may_trap_p_1): Likewise. * simplify-rtx.c (simplify_const_relational_operation): Likewise. * tree-ssa-dom.c (record_equality, record_edge_info): Likewise. * tree-ssa-phiopt.c (value_replacement, abs_replacement): Likewise. * tree-ssa-reassoc.c (eliminate_using_constants): Likewise. * tree-ssa-uncprop.c (associate_equivalences_with_edges): Likewise. -- Marc GlisseIndex: gcc/builtins.c === --- gcc/builtins.c (revision 218639) +++ gcc/builtins.c (working copy) @@ -7671,21 +7671,21 @@ build_complex_cproj (tree type, bool neg return type. Return NULL_TREE if no simplification can be made. */ static tree fold_builtin_cproj (location_t loc, tree arg, tree type) { if (!validate_arg (arg, COMPLEX_TYPE) || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE) return NULL_TREE; /* If there are no infinities, return arg. */ - if (! HONOR_INFINITIES (TYPE_MODE (TREE_TYPE (type + if (! HONOR_INFINITIES (type)) return non_lvalue_loc (loc, arg); /* Calculate the result when the argument is a constant. */ if (TREE_CODE (arg) == COMPLEX_CST) { const REAL_VALUE_TYPE *real = TREE_REAL_CST_PTR (TREE_REALPART (arg)); const REAL_VALUE_TYPE *imag = TREE_REAL_CST_PTR (TREE_IMAGPART (arg)); if (real_isinf (real) || real_isinf (imag)) return build_complex_cproj (type, imag->sign); @@ -8942,21 +8942,21 @@ fold_builtin_signbit (location_t loc, tr return (REAL_VALUE_NEGATIVE (c) ? build_one_cst (type) : build_zero_cst (type)); } /* If ARG is non-negative, the result is always zero. */ if (tree_expr_nonnegative_p (arg)) return omit_one_operand_loc (loc, type, integer_zero_node, arg); /* If ARG's format doesn't have signed zeros, return "arg < 0.0". */ - if (!HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (arg + if (!HONOR_SIGNED_ZEROS (arg)) return fold_convert (type, fold_build2_loc (loc, LT_EXPR, boolean_type_node, arg, build_real (TREE_TYPE (arg), dconst0))); return NULL_TREE; } /* Fold function call to builtin copysign, copysignf or copysignl with arguments ARG1 and ARG2. Return NULL_TREE if no simplification can be made. */ @@ -9136,26 +9136,26 @@ fold_builtin_fmin_fmax (location_t loc, tree res = do_mpfr_arg2 (arg0, arg1, type, (max ? mpfr_max : mpfr_min)); if (res) return res; /* If either argument is NaN, return the other one. Avoid the transformation if we get (and honor) a signalling NaN. Using omit_one_operand() ensures we create a non-lvalue. */ if (TREE_CODE (arg0) == REAL_CST && real_isnan (&TREE_REAL_CST (arg0)) - && (! HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg0))) + && (! HONOR_SNANS (arg0) || ! TREE_REAL_CST (arg0).signalling)) return omit_one_operand_loc (loc, type, arg1, arg0); if (TREE_CODE (arg1) == REAL_CST && real_isnan (&TREE_REAL_CST (arg1)) - && (! HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg1))) + && (! HONOR_SNANS (arg1) || ! TREE_REAL_CST (arg1).signalling)) return omit_one_operand_loc (loc, type, arg0, arg1); /* Transform fmin/fmax(x,x) -> x. */ if (operand_equal_p (arg0, arg1, OEP_PURE_SAME)) return omit_one_operand_loc (loc, type, arg0, arg1); /* Convert fmin/fmax to MIN_EXPR/MAX_EXPR. C99 requires these functions to return the numeric arg if the other one is NaN. These tree codes don't honor that, so only transform if @@ -9552,21 +9552,21 @@ fold_builtin_classify (location_t loc, t { tree type = TREE_TYPE (TREE_TYPE (fndecl)); REAL_VALUE_TYPE r; if (!validate_arg (arg, REAL_TYPE)) return NULL_TREE; sw
[PATCH] backport libgo patch to add ioctl consts
Hi all, Please backport the following to gcc 4.9 https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02980.html. There has been a request to get the fixes that went into gcc trunk for gccgo ppc64 & ppc64le backported into gcc 4.9. 2014-12-11 Lynn Boger * libgo/mksysinfo.sh: Add ioctl const values Index: libgo/mksysinfo.sh === --- libgo/mksysinfo.sh (revision 218396) +++ libgo/mksysinfo.sh (working copy) @@ -174,6 +174,9 @@ enum { #ifdef TIOCGWINSZ TIOCGWINSZ_val = TIOCGWINSZ, #endif +#ifdef TIOCSWINSZ + TIOCSWINSZ_val = TIOCSWINSZ, +#endif #ifdef TIOCNOTTY TIOCNOTTY_val = TIOCNOTTY, #endif @@ -192,6 +195,12 @@ enum { #ifdef TIOCSIG TIOCSIG_val = TIOCSIG, #endif +#ifdef TCGETS + TCGETS_val = TCGETS, +#endif +#ifdef TCSETS + TCSETS_val = TCSETS, +#endif }; EOF @@ -780,6 +789,11 @@ if ! grep '^const TIOCGWINSZ' ${OUT} >/dev/null 2> echo 'const TIOCGWINSZ = _TIOCGWINSZ_val' >> ${OUT} fi fi +if ! grep '^const TIOCSWINSZ' ${OUT} >/dev/null 2>&1; then + if grep '^const _TIOCSWINSZ_val' ${OUT} >/dev/null 2>&1; then +echo 'const TIOCSWINSZ = _TIOCSWINSZ_val' >> ${OUT} + fi +fi if ! grep '^const TIOCNOTTY' ${OUT} >/dev/null 2>&1; then if grep '^const _TIOCNOTTY_val' ${OUT} >/dev/null 2>&1; then echo 'const TIOCNOTTY = _TIOCNOTTY_val' >> ${OUT} @@ -812,8 +826,18 @@ if ! grep '^const TIOCSIG' ${OUT} >/dev/null 2>&1; fi # The ioctl flags for terminal control -grep '^const _TC[GS]ET' gen-sysinfo.go | \ +grep '^const _TC[GS]ET' gen-sysinfo.go | grep -v _val | \ sed -e 's/^\(const \)_\(TC[GS]ET[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT} +if ! grep '^const TCGETS' ${OUT} >/dev/null 2>&1; then + if grep '^const _TCGETS_val' ${OUT} >/dev/null 2>&1; then +echo 'const TCGETS = _TCGETS_val' >> ${OUT} + fi +fi +if ! grep '^const TCSETS' ${OUT} >/dev/null 2>&1; then + if grep '^const _TCSETS_val' ${OUT} >/dev/null 2>&1; then +echo 'const TCSETS = _TCSETS_val' >> ${OUT} + fi +fi # ioctl constants. Might fall back to 0 if TIOCNXCL is missing, too, but # needs handling in syscalls.exec.go.
Re: Remove unused arguments of bulitin_unreachable
Hi, On Thu, Dec 11, 2014 at 07:16:43PM +0100, Jan Hubicka wrote: > > On Thu, Dec 11, 2014 at 06:06:55PM +0100, Jan Hubicka wrote: > > > Hi, > > > in firefox .optimized dumps one can see few places where > > > __builtin_unreachable > > > is called (as a result of devirtualization code proving the code path to > > > be > > > undefined). There is usually some argument setup for the parameters of > > > __builtin_unreachable that are dead. This patch makes it somewhat better > > > so now we get: > > > : > > > > > > # prephitmp_222 = PHI <_52(27), pretmp_245(29)> > > > > > > _57 = prephitmp_222 + 2; > > > > > > pool_40(D)->ptr = _57; > > > > > > __builtin_unreachable (); > > > > > > > > > Why DSE does not eliminate the stores prior noreturn const function? > > > > > > Bootstrapped/regtested x86_64-linux, OK? > > > > > > Honza > > > * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Remove dead > > > parameters > > > of BUILT_IN_UNREACHABLE > > > > Shouldn't this be done when you actually change the call to > > __builtin_unreachable ()? I mean, __builtin_unreachable () has no > > arguments, so leaving any arguments there is broken IL, even if you clean it > > up during the next DCE. > > Hmm, I tought there was some reason to not do so becuase of inplace folding > and memory-SSA. > I can give a try to update all the places we can put builtin_unreachable into > IL. > (I wonder if that also include standard constant propagation) I think that's what we ought to do, see also PR 61591. Martin
Re: [patch] Fix ICE on unaligned record field
> Note that I think the place of the check is unfortunate as you for example > will not remove the argument if it is unused. In fact I'm not yet sure > what transform exactly we are disabling. I am guessing we are > passing an aggregate by value that resides at a bit-aligned offset > of some outer object: > > foo (x.aggr); > > and the function then does > > foo (Aggr a) > { > int i = a.foo; > ... > } > > thus use only a part of the aggregate. Then IPA SRA would like to > pass x.aggr.foo instead of x.aggr and thus tries to materialize a > load from x.aggr.foo at all callers but fails to do that in a valid way. Right, it's the usual MEM_EXPR business creating ADDR_EXPRs out of nowhere and miserably failing on something not addressable. > Erics fix did, at all callers > > Aggr tem = x.aggr; > foo (tem.foo); > > ? Yes, because the code wants to take &tem afterwards. > While we should be able to simply do > > foo (BIT_FIELD_REF ) > > with the appropriate bit offset and size? (if that's of register type > you need to do the load in a separate stmt of couse). > > Thus similar to Erics fix but avoiding the aggregate copy. Yes, that should be doable, but I'm not sure it's worth the hassle. -- Eric Botcazou
Fix ipa-comdats crashes
Hi, IPA comdats performs a dataflow identifying section where every symbol is used. It sanity checks that everything is reachable. This sanity check shows latent issue with unreachable function removal. symbol_table::remove_unreachable_nodes has parameter before_inlining_p that says whether extern inline and virtual functions should be eliminated if they are not inlined. This parameter is correctly used only in call within inliner itself and in cgraphunit. All other cleanups happens with before_inlining_p true that may leave some unreachable inlines at ipa-comdats time. I fixed this by adding explicit state of symbol table for IPA passes run after inliner. Another issue found by Trevor is that ipa-pure-const may render function unreachable in case a static cdtor is found to be pure/const. Fixed thus. I also updated ipa.c to be more agressive on removing functions that may be inlined at -O0. This should improve compile times since the functions do not need to bubble down in the queue. In fact there is same issue in reachability computed at callgraph construction time as well as within C++ frontend. I will send separate patches for this: it seems that those may account quite noticeable percentage of memory and compile time. Bootstrapped/regtested x86_64-linux, comitted. I am grateful to Trevor for analysis and initial patch. Honza PR ipa/61324 * testsuite/g++.dg/pr61324.C: New testcase by Trevor Saunders. * testsuite/g++.dg/tm/pr51411-2.C: Update se the extern function is not eliminated early. * testsuite/gcc.target/i386/pr57756.c: Turn extern inline into static inline. * passes.c (execute_todo): Update call of remove_unreachable_nodes. * ipa-chkp.c (chkp_produce_thunks): Use TODO_remove_functions. * cgraphunit.c (symbol_table::process_new_functions): Add IPA_SSA_AFTER_INLINING. (ipa_passes): Update call of remove_unreachable_nodes. (symbol_table::compile): Remove call of remove_unreachable_nodes. * ipa-inline.c (inline_small_functions): Do not ICE with -flto-partition=none (ipa_inline): Update symtab->state; fix formatting update call of remove_unreachable_nodes. * cgraphclones.c (symbol_table::materialize_all_clones): Likewise. * cgraph.h (enum symtab_state): Add IPA_SSA_AFTER_INLINING. (remove_unreachable_nodes): Update. * ipa.c (process_references): Keep external references only when optimizing. (walk_polymorphic_call_targets): Keep possible polymorphic call target only when devirtualizing. (symbol_table::remove_unreachable_nodes): Remove BEFORE_INLINING_P parameter. (ipa_single_use): Update comment. * ipa-pure-const.c (cdtor_p): New function. (propagate_pure_const): Track if some cdtor was turned pure/const. (execute): Return TODO_remove_functions if needed. * ipa-comdats.c (ipa_comdats): Update comment. * lto.c (read_cgraph_and_symbols): Update call of remove_unreachable_nodes. (do_whole_program_analysis): Remove call of symtab->remove_unreachable_nodes Index: testsuite/g++.dg/pr61324.C === --- testsuite/g++.dg/pr61324.C (revision 0) +++ testsuite/g++.dg/pr61324.C (revision 0) @@ -0,0 +1,13 @@ +// { dg-do compile } +// { dg-options "-O -fkeep-inline-functions -fno-use-cxa-atexit" } +void foo (); + +struct S +{ + ~S () + { +foo (); + } +}; + +S s; Index: testsuite/g++.dg/tm/pr51411-2.C === --- testsuite/g++.dg/tm/pr51411-2.C (revision 218610) +++ testsuite/g++.dg/tm/pr51411-2.C (working copy) @@ -26,6 +26,7 @@ public: bool compare(const basic_string& __str) const { return 0; } +void key (); }; typedef basic_string string; @@ -35,7 +36,7 @@ inline bool operator<(const basic_string return __lhs.compare(__rhs); } -extern template class basic_string; +template class basic_string; } Index: testsuite/gcc.target/i386/pr57756.c === --- testsuite/gcc.target/i386/pr57756.c (revision 218610) +++ testsuite/gcc.target/i386/pr57756.c (working copy) @@ -9,7 +9,7 @@ __inline int callee () /* { dg-error "in } __attribute__((target("sse"))) -__inline int caller () +static __inline int caller () { return callee(); /* { dg-error "called from here" } */ } Index: cgraph.h === --- cgraph.h(revision 218610) +++ cgraph.h(working copy) @@ -1801,12 +1801,15 @@ enum symtab_state PARSING, /* Callgraph is being constructed. It is safe to add new functions. */ CONSTRUCTION, - /* Callgraph is being at LTO time. */ + /* Callgraph is being streamed-in at LTO time. */ LTO_STREAMING, - /* Callgraph is b
Fix builtin-arith-overflow-1.c with unsigned char
The char's in gcc.dg/builtin-arith-overflow-1.c are almost all explicitly signed or unsigned, except for 2 of them, but that's enough to make it fail for targets whose char is unsigned. Tested on x86-64 and a private port, applied on mainline as obvious. 2014-12-11 Eric Botcazou * gcc.dg/builtin-arith-overflow-1.c (fn2): Take signed char. (fn3): Likewise. -- Eric BotcazouIndex: gcc.dg/builtin-arith-overflow-1.c === --- gcc.dg/builtin-arith-overflow-1.c (revision 218617) +++ gcc.dg/builtin-arith-overflow-1.c (working copy) @@ -17,7 +17,7 @@ fn1 (int x, unsigned int y) /* MUL_OVERFLOW should be folded into unsigned multiplication, because ovf is never used. */ __attribute__((noinline, noclone)) int -fn2 (char x, long int y) +fn2 (signed char x, long int y) { short int res; int ovf = __builtin_mul_overflow (x, y, &res); @@ -31,7 +31,7 @@ fn2 (char x, long int y) /* ADD_OVERFLOW should be folded into unsigned addition, because it never overflows. */ __attribute__((noinline, noclone)) int -fn3 (char x, unsigned short y, int *ovf) +fn3 (signed char x, unsigned short y, int *ovf) { int res; *ovf = __builtin_add_overflow (x, y, &res);
Fix doc about meaning of (pc) in length calculation
The doc reads: `(pc)' This refers to the address of the _current_ insn. It might have been more consistent with other usage to make this the address of the _next_ insn but this would be confusing because the length of the current insn is to be computed. That's incorrect for forward branches, (pc) points to the next insn for them, this is what final.c:insn_current_reference_address implements. I presume that's a quirk known in circles of seasoned GCC hackers for long, but I just ran into it and that's a little surprising... Tested on x86_64-suse-linux, applied on all active branches. 2014-12-11 Eric Botcazou * doc/md.texi (Insn Lengths): Fix description of (pc). -- Eric BotcazouIndex: doc/md.texi === --- doc/md.texi (revision 218617) +++ doc/md.texi (working copy) @@ -8330,9 +8330,9 @@ must be a @code{label_ref}. @cindex @code{pc} and attributes @item (pc) -This refers to the address of the @emph{current} insn. It might have -been more consistent with other usage to make this the address of the -@emph{next} insn but this would be confusing because the length of the +For non-branch instructions and backward branch instructions, this refers +to the address of the current insn. But for forward branch instructions, +this refers to the address of the next insn, because the length of the current insn is to be computed. @end table
Re: [PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model
On Dec 11, 2014, at 1:32 AM, Kyrill Tkachov wrote: > the patch that adds the libstdc++.exp changes at > https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00952.html using the new > target-utils.exp file is ok too then? Ok.
[PATCH 0/4] GCC port for the Visium
Hi, on behalf of Controls and Data Services, AdaCore would like to contribute a port of the GCC to the Visium. This is a 32-bit RISC architecture with an Extended Arithmetic Module implementing some 64-bit operations and an FPU designed for embedded systems. The binutils port has already been contributed and the ultimate goal is to contribute a port of the entire toolchain with simulator, debugger and embedded libc. The original port had been written by employees of CDS or companies that are now part of CDS, and AdaCore contributed enhancements and modifications on top of it. Both companies have a copyright assignment on file with the FSF for the various components of the toolchain. The Visium is a classic 32-bit RISC architecture whose branches have a delay slot and whose arithmetic and logical instructions all set the flags, and they comprise the moves between GP registers (which are inclusive ORs under the hood in the traditional RISC fashion). The port is nevertheless MODE_CC and it generates code that is as good as the original cc0 implementation with the help of the post-reload compare elimination pass (modulo a small patch for the reorg pass that I'll submit separately). The GCC port is split into 4 patches (toplevel, libgcc, gcc, gcc/testsuite) and is C-only for now, and 'make -k check-c' reports the following results: Target is visium-unknown-elf Host is x86_64-suse-linux-gnu === gcc tests === Running target visium-sim FAIL: gcc.dg/torture/builtin-explog-1.c -O1 (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O2 (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O2 -flto -fno-use-linker-plugin - flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O3 -fomit-frame-pointer (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O3 -g (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -Os (test for excess errors) === gcc Summary for visium-sim === # of expected passes81007 # of unexpected failures6 # of expected failures 94 # of unsupported tests 1796 Running target visium-sim/-mcpu=gr6 FAIL: gcc.dg/torture/builtin-explog-1.c -O1 (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O2 (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O2 -flto -fno-use-linker-plugin - flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O3 -fomit-frame-pointer (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -O3 -g (test for excess errors) FAIL: gcc.dg/torture/builtin-explog-1.c -Os (test for excess errors) === gcc Summary for visium-sim/-mcpu=gr6 === # of expected passes81007 # of unexpected failures6 # of expected failures 94 # of unsupported tests 1796 === gcc Summary === # of expected passes162014 # of unexpected failures12 # of expected failures 188 # of unsupported tests 3592 /home/eric/build/gcc/visium-elf/gcc/xgcc version 5.0.0 20141211 (experimental) [trunk revision 218617] (GCC) after they are applied (on a x86_64-linux host). I think that the failures are common to all newlib targets and very likely related to: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00357.html OK for the mainline? -- Eric Botcazou
[PATCH 1/4] Add Visium support to toplevel
ChangeLog 2014-12-11 Eric Botcazou * config.sub: Update from upstream config repo. * configure.ac: Add Visium support. * configure: Regenerate. -- Eric BotcazouIndex: config.sub === --- config.sub (revision 218617) +++ config.sub (working copy) @@ -2,7 +2,7 @@ # Configuration validation subroutine script. # Copyright 1992-2014 Free Software Foundation, Inc. -timestamp='2014-09-26' +timestamp='2014-12-03' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -313,6 +313,7 @@ case $basic_machine in | tahoe | tic4x | tic54x | tic55x | tic6x | tic80 | tron \ | ubicom32 \ | v850 | v850e | v850e1 | v850e2 | v850es | v850e2v3 \ + | visium \ | we32k \ | x86 | xc16x | xstormy16 | xtensa \ | z8k | z80) @@ -440,6 +441,7 @@ case $basic_machine in | ubicom32-* \ | v850-* | v850e-* | v850e1-* | v850es-* | v850e2-* | v850e2v3-* \ | vax-* \ + | visium-* \ | we32k-* \ | x86-* | x86_64-* | xc16x-* | xps100-* \ | xstormy16-* | xtensa*-* \ Index: configure.ac === --- configure.ac (revision 218617) +++ configure.ac (working copy) @@ -669,6 +669,10 @@ case "${target}" in # for explicit misaligned loads. noconfigdirs="$noconfigdirs target-libssp" ;; + visium-*-*) +# No hosted I/O support. +noconfigdirs="$noconfigdirs target-libssp" +;; esac # Disable libstdc++-v3 for some systems.
[PATCH 2/4] Add Visium support to libgcc
libgcc/ChangeLog 2014-12-11 Eric Botcazou * config.host: Add Visium support. * config/visium: New directory. -- Eric BotcazouIndex: config.host === --- config.host (revision 218617) +++ config.host (working copy) @@ -1233,6 +1233,10 @@ vax-*-netbsdelf*) ;; vax-*-openbsd*) ;; +visium-*-elf*) +extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o" +tmake_file="visium/t-visium t-fdpbit" +;; xstormy16-*-elf) tmake_file="stormy16/t-stormy16 t-fdpbit" ;; libgcc_visium.tar.gz Description: application/compressed-tar
[PATCH 3/4] Add Visium support to gcc
gcc/ChangeLog 2014-12-11 Eric Botcazou * config.gcc: Add Visium support. * configure.ac: Likewise. * configure: Regenerate. * doc/invoke.texi: Document Visium options. * doc/md.texi: Document Visium constraints. * common/config/visium: New directory. * config/visium: Likewise. -- Eric BotcazouIndex: config.gcc === --- config.gcc (revision 218617) +++ config.gcc (working copy) @@ -2853,6 +2853,10 @@ vax-*-openbsd*) extra_options="${extra_options} openbsd.opt" use_collect2=yes ;; +visium-*-elf*) + tm_file="dbxelf.h elfos.h ${tm_file} visium/elf.h newlib-stdint.h" + tmake_file="visium/t-visium visium/t-crtstuff" + ;; xstormy16-*-elf) # For historical reasons, the target files omit the 'x'. tm_file="dbxelf.h elfos.h newlib-stdint.h stormy16/stormy16.h" Index: configure.ac === --- configure.ac (revision 218617) +++ configure.ac (working copy) @@ -4442,7 +4442,7 @@ esac case "$cpu_type" in aarch64 | alpha | arm | avr | bfin | cris | i386 | m32c | m68k | microblaze \ | mips | nios2 | pa | rs6000 | score | sparc | spu | tilegx | tilepro \ - | xstormy16 | xtensa) + | visium | xstormy16 | xtensa) insn="nop" ;; ia64 | s390) Index: doc/invoke.texi === --- doc/invoke.texi (revision 218617) +++ doc/invoke.texi (working copy) @@ -1062,6 +1062,10 @@ See RS/6000 and PowerPC Options. @emph{VAX Options} @gccoptlist{-mg -mgnu -munix} +@emph{Visium Options} +@gccoptlist{-mdebug -msim -mfpu -mno-fpu -mhard-float -msoft-float @gol +-mcpu=@var{cpu-type} -mtune=@var{cpu-type} -msv-mode -muser-mode} + @emph{VMS Options} @gccoptlist{-mvms-return-codes -mdebug-main=@var{prefix} -mmalloc64 @gol -mpointer-size=@var{size}} @@ -11845,6 +11849,7 @@ platform. * TILEPro Options:: * V850 Options:: * VAX Options:: +* Visium Options:: * VMS Options:: * VxWorks Options:: * x86-64 Options:: @@ -22456,6 +22461,77 @@ GNU assembler is being used. Output code for G-format floating-point numbers instead of D-format. @end table +@node Visium Options +@subsection Visium Options +@cindex Visium options + +@table @gcctabopt + +@item -mdebug +@opindex mdebug +A program which performs file I/O and is destined to run on an MCM target +should be linked with this option. It causes the libraries libc.a and +libdebug.a to be linked. The program should be run on the target under +the control of the GDB remote debugging stub. + +@item -msim +@opindex msim +A program which performs file I/O and is destined to run on the simulator +should be linked with option. This causes libraries libc.a and libsim.a to +be linked. + +@item -mfpu +@itemx -mhard-float +@opindex mfpu +@opindex mhard-float +Generate code containing floating-point instructions. This is the +default. + +@item -mno-fpu +@itemx -msoft-float +@opindex mno-fpu +@opindex msoft-float +Generate code containing library calls for floating-point. + +@option{-msoft-float} changes the calling convention in the output file; +therefore, it is only useful if you compile @emph{all} of a program with +this option. In particular, you need to compile @file{libgcc.a}, the +library that comes with GCC, with @option{-msoft-float} in order for +this to work. + +@item -mcpu=@var{cpu_type} +@opindex mcpu +Set the instruction set, register set, and instruction scheduling parameters +for machine type @var{cpu_type}. Supported values for @var{cpu_type} are +@samp{mcm}, @samp{gr5} and @samp{gr6}. + +@samp{mcm} is a synonym of @samp{gr5} present for backward compatibility. + +By default (unless configured otherwise), GCC generates code for the GR5 +variant of the Visium architecture. + +With @option{-mcpu=gr6}, GCC generates code for the GR6 variant of the Visium +architecture. The only difference from GR5 code is that the compiler will +generate block move instructions. + +@item -mtune=@var{cpu_type} +@opindex mtune +Set the instruction scheduling parameters for machine type @var{cpu_type}, +but do not set the instruction set or register set that the option +@option{-mcpu=@var{cpu_type}} would. + +@item -msv-mode +@opindex msv-mode +Generate code for the supervisor mode, where there are no restrictions on +the access to general registers. This is the default. + +@item -muser-mode +@opindex muser-mode +Generate code for the user mode, where the access to some general registers +is forbidden: on the GR5, registers r24 to r31 cannot be accessed in this +mode; on the GR6, only registers r29 to r31 are affected. +@end table + @node VMS Options @subsection VMS Options Index: doc/md.texi === --- doc/md.texi (revision 218642) +++ doc/md.texi (working copy) @@ -3974,6 +3974,56 @@ A 2-element vector constant with identic @end table +@item Visium---@file{co
[PATCH 4/4] Add Visium support to gcc/testsuite
gcc/testsuite/ChangeLog: 2014-12-11 Eric Botcazou * lib/target-supports.exp (check_profiling_available): Return 0 for Visium. (check_effective_target_tls_runtime): Likewise. (check_effective_target_logical_op_short_circuit): Return 1 for Visium. * gcc.dg/20020312-2.c: Adjust for Visium. * gcc.dg/tls/thr-cse-1.c: Likewise * gcc.dg/tree-ssa/20040204-1.c: Likewise * gcc.dg/tree-ssa/loop-1.c: Likewise. * gcc.dg/weak/typeof-2.c: Likewise. -- Eric BotcazouIndex: lib/target-supports.exp === --- lib/target-supports.exp (revision 218617) +++ lib/target-supports.exp (working copy) @@ -538,6 +540,7 @@ proc check_profiling_available { test_wh || [istarget powerpc-*-elf] || [istarget rx-*-*] || [istarget tic6x-*-elf] + || [istarget visium-*-*] || [istarget xstormy16-*] || [istarget xtensa*-*-elf] || [istarget *-*-rtems*] @@ -707,9 +710,9 @@ proc check_effective_target_tls_emulated # Return 1 if TLS executables can run correctly, 0 otherwise. proc check_effective_target_tls_runtime {} { -# MSP430 runtime does not have TLS support, but just +# The runtime does not have TLS support, but just # running the test below is insufficient to show this. -if { [istarget msp430-*-*] } { +if { [istarget msp430-*-*] || [istarget visium-*-*] } { return 0 } return [check_runtime tls_runtime { @@ -6085,6 +6088,7 @@ proc check_effective_target_logical_op_s || [istarget s390*-*-*] || [istarget powerpc*-*-*] || [istarget nios2*-*-*] + || [istarget visium-*-*] || [check_effective_target_arm_cortex_m] } { return 1 } Index: gcc.dg/weak/typeof-2.c === --- gcc.dg/weak/typeof-2.c (revision 218617) +++ gcc.dg/weak/typeof-2.c (working copy) @@ -48,4 +48,6 @@ int bar3 (int x) // { dg-final { if [string match m68k-*-* $target_triplet ] {return} } } // Likewise for moxie targets. // { dg-final { if [string match moxie-*-* $target_triplet ] {return} } } +// Likewise for Visium targets. +// { dg-final { if [string match visium-*-* $target_triplet ] {return} } } // { dg-final { scan-assembler "baz3.*baz3.*baz3.*baz3.*baz3.*baz3" } } Index: gcc.dg/tree-ssa/loop-1.c === --- gcc.dg/tree-ssa/loop-1.c (revision 218617) +++ gcc.dg/tree-ssa/loop-1.c (working copy) @@ -49,7 +49,7 @@ int xxx(void) /* CRIS keeps the address in a register. */ /* m68k sometimes puts the address in a register, depending on CPU and PIC. */ -/* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* sh*-*-* cris-*-* crisv32-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* } } } */ +/* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* sh*-*-* cris-*-* crisv32-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* visium-*-* } } } */ /* { dg-final { scan-assembler-times "foo,%r" 5 { target hppa*-*-* } } } */ /* { dg-final { scan-assembler-times "= foo" 5 { target ia64*-*-* } } } */ /* { dg-final { scan-assembler-times "call\[ \t\]*_foo" 5 { target i?86-*-mingw* i?86-*-cygwin* } } } */ @@ -57,3 +57,4 @@ int xxx(void) /* { dg-final { scan-assembler-times "jsr|bsrf|blink\ttr?,r18" 5 { target sh*-*-* } } } */ /* { dg-final { scan-assembler-times "Jsr \\\$r" 5 { target cris-*-* } } } */ /* { dg-final { scan-assembler-times "\[jb\]sr" 5 { target fido-*-* m68k-*-* } } } */ +/* { dg-final { scan-assembler-times "bra *tr,r\[1-9\]*,r21" 5 { target visium-*-* } } } */ Index: gcc.dg/tree-ssa/20040204-1.c === --- gcc.dg/tree-ssa/20040204-1.c (revision 218617) +++ gcc.dg/tree-ssa/20040204-1.c (working copy) @@ -33,5 +33,5 @@ void test55 (int x, int y) that the && should be emitted (based on BRANCH_COST). Fix this by teaching dom to look through && and register all components as true. */ -/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" { xfail { ! "alpha*-*-* arm*-*-* aarch64*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* nds32*-*-* sparc*-*-* spu-*-* x86_64-*-*" } } } } */ +/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" { xfail { ! "alpha*-*-* arm*-*-* aarch64*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* nds32*-*-* sparc*-*-* spu-*-* visium-*-* x86_64-*-*" } } } } */ /* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc.dg/tls/thr-cse-1.c === --- gcc.dg/tls/thr-cse-1.c (revision 218617) +++ gcc.dg/tls/thr-cse-1.c (working copy) @@ -18,11 +18,11 @@ int foo (int b, int c, int d) return a; } -/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_addres
Re: [PATCH 2/4] Add Visium support to libgcc
Do you have a reason for using fp-bit instead of soft-fp? libgcc files are generally GPL+exception, not LGPL without exception with a very old FSF address (config/visium/div64.c, mod64.c, set_trampoline_parity.c, udiv64.c, udivmod64.c, umod64.c) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 3/4] Add Visium support to gcc
Use of `%s' in diagnostics is long obsoleted by %qs (in this case, using %qE with the identifier directly, rather than using IDENTIFIER_POINTER, is preferred). INTVAL / UINTVAL return HOST_WIDE_INT / unsigned HOST_WIDE_INT, not long / unsigned long. You have lots of uses of fprintf that presume they return long / unsigned long. As you have the interrupt attribute, you need to add this port to the list in extend.texi of ports with this attribute. (Generally, check the checklist of pieces in sourcebuild.texi to update for a new port.) At least one target for this port should be added to contrib/config-list.mk (and you should verify that the port builds cleanly with --enable-werror-always, for both 32-bit and 64-bit hosts, when building using current trunk GCC). -- Joseph S. Myers jos...@codesourcery.com
RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m
Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk? -Hale > -Original Message- > From: Joey Ye [mailto:joey.ye...@gmail.com] > Sent: Thursday, November 27, 2014 10:01 AM > To: Hale Wang > Cc: gcc-patches > Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m > > OK applying to arm/embedded-4_9-branch, though you still need maintainer > approval into trunk. > > - Joey > > On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang wrote: > > Hi, > > > > This patch ports the aeabi_idiv routine from Linaro Cortex-Strings > > (https://git.linaro.org/toolchain/cortex-strings.git), which was > > contributed by ARM under Free BSD license. > > > > The new aeabi_idiv routine is used to replace the one in > > libgcc/config/arm/lib1funcs.S. This replacement happens within the > > Thumb1 wrapper. The new routine is under LGPLv3 license. > > > > The main advantage of this version is that it can improve the > > performance of the aeabi_idiv function for Thumb1. This solution will > > also increase the code size. So it will only be used if __OPTIMIZE_SIZE__ is > not defined. > > > > Make check passed for armv6-m. > > > > OK for trunk? > > > > Thanks, > > Hale Wang > > > > libgcc/ChangeLog: > > > > 2014-11-26 Hale Wang > > > > * config/arm/lib1funcs.S: Add new wrapper. > > > > === > > diff --git a/libgcc/config/arm/lib1funcs.S > > b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644 > > --- a/libgcc/config/arm/lib1funcs.S > > +++ b/libgcc/config/arm/lib1funcs.S > > @@ -306,34 +306,12 @@ LSYM(Lend_fde): > > #ifdef __ARM_EABI__ > > .macro THUMB_LDIV0 name signed > > #if defined(__ARM_ARCH_6M__) > > - .ifc \signed, unsigned > > - cmp r0, #0 > > - beq 1f > > - mov r0, #0 > > - mvn r0, r0 @ 0x > > -1: > > - .else > > - cmp r0, #0 > > - beq 2f > > - blt 3f > > + > > + push{r0, lr} > > mov r0, #0 > > - mvn r0, r0 > > - lsr r0, r0, #1 @ 0x7fff > > - b 2f > > -3: mov r0, #0x80 > > - lsl r0, r0, #24 @ 0x8000 > > -2: > > - .endif > > - push{r0, r1, r2} > > - ldr r0, 4f > > - adr r1, 4f > > - add r0, r1 > > - str r0, [sp, #8] > > - @ We know we are not on armv4t, so pop pc is safe. > > - pop {r0, r1, pc} > > - .align 2 > > -4: > > - .word __aeabi_idiv0 - 4b > > + bl SYM(__aeabi_idiv0) > > + pop {r1, pc} > > + > > #elif defined(__thumb2__) > > .syntax unified > > .ifc \signed, unsigned > > @@ -927,7 +905,158 @@ LSYM(Lover7): > > add dividend, work > >.endif > > LSYM(Lgot_result): > > -.endm > > +.endm > > + > > +#if defined(__prefer_thumb__) > && !defined(__OPTIMIZE_SIZE__) .macro > > +BranchToDiv n, label > > + lsr curbit, dividend, \n > > + cmp curbit, divisor > > + blo \label > > +.endm > > + > > +.macro DoDiv n > > + lsr curbit, dividend, \n > > + cmp curbit, divisor > > + bcc 1f > > + lsl curbit, divisor, \n > > + sub dividend, dividend, curbit > > + > > +1: adc result, result > > +.endm > > + > > +.macro THUMB1_Div_Positive > > + mov result, #0 > > + BranchToDiv #1, LSYM(Lthumb1_div1) > > + BranchToDiv #4, LSYM(Lthumb1_div4) > > + BranchToDiv #8, LSYM(Lthumb1_div8) > > + BranchToDiv #12, LSYM(Lthumb1_div12) > > + BranchToDiv #16, LSYM(Lthumb1_div16) > > +LSYM(Lthumb1_div_large_positive): > > + mov result, #0xff > > + lsl divisor, divisor, #8 > > + rev result, result > > + lsr curbit, dividend, #16 > > + cmp curbit, divisor > > + blo 1f > > + asr result, #8 > > + lsl divisor, divisor, #8 > > + beq LSYM(Ldivbyzero_waypoint) > > + > > +1: lsr curbit, dividend, #12 > > + cmp curbit, divisor > > + blo LSYM(Lthumb1_div12) > > + b LSYM(Lthumb1_div16) > > +LSYM(Lthumb1_div_loop): > > + lsr divisor, divisor, #8 > > +LSYM(Lthumb1_div16): > > + Dodiv #15 > > + Dodiv #14 > > + Dodiv #13 > > + Dodiv #12 > > +LSYM(Lthumb1_div12): > > + Dodiv #11 > > + Dodiv #10 > > + Dodiv #9 > > + Dodiv #8 > > + bcs LSYM(Lthumb1_div_loop) > > +LSYM(Lthumb1_div8): > > + Dodiv #7 > > + Dodiv #6 > > + Dodiv #5 > > +LSYM(Lthumb1_div5): > > + Dodiv #4 > > +LSYM(Lthumb1_div4): > > + Dodiv #3 > > +LSYM(Lthumb1_div3): > > + Dodiv #2 > > +LSYM(Lthumb1_div2): > > + Dodiv #1 > > +LSYM(Lthumb1_div1): > > + sub divisor, dividend, divisor > > + bcs 1f > > + cpy divisor, dividend > > + > > +1: adc result, result > > +
Re: [PATCH 3/4] Add Visium support to gcc
On Fri, 12 Dec 2014, Joseph Myers wrote: > At least one target for this port should be added to > contrib/config-list.mk (and you should verify that the port builds cleanly > with --enable-werror-always, for both 32-bit and 64-bit hosts, when > building using current trunk GCC). While doing that, beware that gcc has bugs causing some ports (I forgot which ones) to get at least one spurious warning apparently not attributable to the quality of the port. PR(s) duly entered, but I can't quote the numbers (finding PRs is not practical for me after the https change, but IIRC Joern was the author). brgds, H-P PS. of course no excuse to not get the low-hanging fruit.
C++ PATCH for c++/57510 (memory leak with initializer_list)
We want to deal with initialization of an array reference/init-list temporary the same way that we handle initialization of an array variable. Tested x86_64-pc-linux-gnu, applying to trunk. commit ad88fa39b2c68d806b58563dfe1e19ecf8d143ba Author: Jason Merrill Date: Wed Dec 10 14:14:05 2014 -0500 PR c++/57510 * typeck2.c (split_nonconstant_init_1): Handle arrays here. (store_init_value): Not here. (split_nonconstant_init): Look through TARGET_EXPR. No longer static. * cp-tree.h: Declare split_nonconstant_init. * call.c (set_up_extended_ref_temp): Use split_nonconstant_init. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index d8075bd..312dfdf 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -9574,7 +9574,7 @@ set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups, else /* Create the INIT_EXPR that will initialize the temporary variable. */ -init = build2 (INIT_EXPR, type, var, expr); +init = split_nonconstant_init (var, expr); if (at_function_scope_p ()) { add_decl_expr (var); diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index d41a834..ad1cc71 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -6291,6 +6291,7 @@ extern int abstract_virtuals_error_sfinae (tree, tree, tsubst_flags_t); extern int abstract_virtuals_error_sfinae (abstract_class_use, tree, tsubst_flags_t); extern tree store_init_value (tree, tree, vec**, int); +extern tree split_nonconstant_init (tree, tree); extern bool check_narrowing (tree, tree, tsubst_flags_t); extern tree digest_init(tree, tree, tsubst_flags_t); extern tree digest_init_flags (tree, tree, int); diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index 92c0417..c53a9b5 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -604,6 +604,17 @@ split_nonconstant_init_1 (tree dest, tree init) case ARRAY_TYPE: inner_type = TREE_TYPE (type); array_type_p = true; + if ((TREE_SIDE_EFFECTS (init) + && TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type)) + || array_of_runtime_bound_p (type)) + { + /* For an array, we only need/want a single cleanup region rather + than one per element. */ + tree code = build_vec_init (dest, NULL_TREE, init, false, 1, + tf_warning_or_error); + add_stmt (code); + return true; + } /* FALLTHRU */ case RECORD_TYPE: @@ -721,11 +732,13 @@ split_nonconstant_init_1 (tree dest, tree init) perform the non-constant part of the initialization to DEST. Returns the code for the runtime init. */ -static tree +tree split_nonconstant_init (tree dest, tree init) { tree code; + if (TREE_CODE (init) == TARGET_EXPR) +init = TARGET_EXPR_INITIAL (init); if (TREE_CODE (init) == CONSTRUCTOR) { code = push_stmt_list (); @@ -830,17 +843,7 @@ store_init_value (tree decl, tree init, vec** cleanups, int flags) && (TREE_SIDE_EFFECTS (value) || array_of_runtime_bound_p (type) || ! reduced_constant_expression_p (value))) -{ - if (TREE_CODE (type) == ARRAY_TYPE - && (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (type)) - || array_of_runtime_bound_p (type))) - /* For an array, we only need/want a single cleanup region rather - than one per element. */ - return build_vec_init (decl, NULL_TREE, value, false, 1, - tf_warning_or_error); - else - return split_nonconstant_init (decl, value); -} +return split_nonconstant_init (decl, value); /* If the value is a constant, just put it in DECL_INITIAL. If DECL is an automatic variable, the middle end will turn this into a dynamic initialization later. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist90.C b/gcc/testsuite/g++.dg/cpp0x/initlist90.C new file mode 100644 index 000..330517a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist90.C @@ -0,0 +1,35 @@ +// PR c++/57510 +// { dg-do run { target c++11 } } + +#include + +struct counter +{ + static int n; + + counter() { ++n; } + counter(const counter&) { ++n; } + ~counter() { --n; } +}; + +int counter::n = 0; + +struct X +{ +X () { if (counter::n > 1) throw 1; } + +counter c; +}; + +int main () +{ + try + { +auto x = { X{}, X{} }; + } + catch (...) + { +if ( counter::n != 0 ) + throw; + } +}
C++ PATCH for c++/64248 (__FUNCTION__ error)
It seems that my strict reading of the standard conflicts with existing practice in a way that is not useful. So I'm reverting my earlier patch. Tested x86_64-pc-linux-gnu, applying to trunk. commit 9dfef51ff302e644acf0111685a7451867049959 Author: Jason Merrill Date: Wed Dec 10 17:33:19 2014 -0500 PR c++/64248 Revert: * parser.c (cp_parser_unqualified_id): Handle __func__ here. (cp_parser_primary_expression): Not here. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 48dd64a..76725ef 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -4503,9 +4503,39 @@ cp_parser_primary_expression (cp_parser *parser, case RID_FUNCTION_NAME: case RID_PRETTY_FUNCTION_NAME: case RID_C99_FUNCTION_NAME: + { + non_integral_constant name; + /* The symbols __FUNCTION__, __PRETTY_FUNCTION__, and - __func__ are the names of variables. */ - goto id_expression; + __func__ are the names of variables -- but they are + treated specially. Therefore, they are handled here, + rather than relying on the generic id-expression logic + below. Grammatically, these names are id-expressions. + + Consume the token. */ + token = cp_lexer_consume_token (parser->lexer); + + switch (token->keyword) + { + case RID_FUNCTION_NAME: + name = NIC_FUNC_NAME; + break; + case RID_PRETTY_FUNCTION_NAME: + name = NIC_PRETTY_FUNC; + break; + case RID_C99_FUNCTION_NAME: + name = NIC_C99_FUNC; + break; + default: + gcc_unreachable (); + } + + if (cp_parser_non_integral_constant_expression (parser, name)) + return error_mark_node; + + /* Look up the name. */ + return finish_fname (token->u.value); + } case RID_VA_ARG: { @@ -4926,7 +4956,6 @@ cp_parser_unqualified_id (cp_parser* parser, bool optional_p) { cp_token *token; - tree id; /* Peek at the next token. */ token = cp_lexer_peek_token (parser->lexer); @@ -4935,6 +4964,8 @@ cp_parser_unqualified_id (cp_parser* parser, { case CPP_NAME: { + tree id; + /* We don't know yet whether or not this will be a template-id. */ cp_parser_parse_tentatively (parser); @@ -5171,9 +5202,10 @@ cp_parser_unqualified_id (cp_parser* parser, } case CPP_KEYWORD: - switch (token->keyword) + if (token->keyword == RID_OPERATOR) { - case RID_OPERATOR: + tree id; + /* This could be a template-id, so we try that first. */ cp_parser_parse_tentatively (parser); /* Try a template-id. */ @@ -5203,19 +5235,6 @@ cp_parser_unqualified_id (cp_parser* parser, } return id; - - case RID_FUNCTION_NAME: - case RID_PRETTY_FUNCTION_NAME: - case RID_C99_FUNCTION_NAME: - cp_lexer_consume_token (parser->lexer); - /* Don't try to declare this while tentatively parsing a function - declarator, as cp_make_fname_decl will fail. */ - if (current_binding_level->kind != sk_function_parms) - finish_fname (token->u.value); - return token->u.value; - - default: - break; } /* Fall through. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-func.C b/gcc/testsuite/g++.dg/cpp0x/decltype-func.C deleted file mode 100644 index 65dd27a..000 --- a/gcc/testsuite/g++.dg/cpp0x/decltype-func.C +++ /dev/null @@ -1,6 +0,0 @@ -// { dg-do compile { target c++11 } } - -void f() { - typedef decltype(__func__) T; - T x = __func__; // { dg-error "array" } -} diff --git a/gcc/testsuite/g++.dg/other/error34.C b/gcc/testsuite/g++.dg/other/error34.C index cb8fdae..d6f3eb5 100644 --- a/gcc/testsuite/g++.dg/other/error34.C +++ b/gcc/testsuite/g++.dg/other/error34.C @@ -4,4 +4,3 @@ S () : str(__PRETTY_FUNCTION__) {} // { dg-error "forbids declaration" "decl" } // { dg-error "only constructors" "constructor" { target *-*-* } 5 } -// { dg-prune-output "__PRETTY_FUNCTION__" } diff --git a/gcc/testsuite/g++.dg/parse/fnname2.C b/gcc/testsuite/g++.dg/parse/fnname2.C new file mode 100644 index 000..7fc0f82 --- /dev/null +++ b/gcc/testsuite/g++.dg/parse/fnname2.C @@ -0,0 +1,19 @@ +// PR c++/64248 + +class A +{ +public: +A(const char* str) {}; +}; + +class B +{ +public: +B(A a) {}; +}; + +int main() +{ + B b(A(__func__)); + return 0; +}
Re: [PATCH] Do not download packages for graphite loop optimizations by default when using ./contrib/download_prerequisites
2014-12-10 21:37 GMT+08:00 Richard Biener : > On Wed, Dec 10, 2014 at 6:16 AM, Chung-Ju Wu wrote: >> >> Thanks for the suggestion. >> The followings are proposed patch to adjust comment: >> >> Index: contrib/ChangeLog >> === >> --- contrib/ChangeLog (revision 218558) >> +++ contrib/ChangeLog (working copy) >> @@ -1,3 +1,7 @@ >> +2014-12-10 Chung-Ju Wu >> + >> + * download_prerequisites: Modify the comment for GRAPHITE_LOOP_OPT. >> + >> 2014-12-09 Laurynas Biveinis >> Yury Gribov >> >> >> Index: contrib/download_prerequisites >> === >> --- contrib/download_prerequisites (revision 218558) >> +++ contrib/download_prerequisites (working copy) >> @@ -19,9 +19,9 @@ >> # You should have received a copy of the GNU General Public License >> # along with this program. If not, see http://www.gnu.org/licenses/. >> >> -# If you want to build GCC with the Graphite loop optimizations, >> -# set GRAPHITE_LOOP_OPT=yes to download optional prerequisties >> -# ISL Library and CLooG. >> +# If you want to disable Graphite loop optimizations while building GCC, >> +# DO NOT set GRAPHITE_LOOP_OPT as yes so that the ISL package will not >> +# be downloaded. >> GRAPHITE_LOOP_OPT=yes >> >> >> Is this OK for trunk? > > Ok. > > Thanks, > Richard. > Thanks for approval. Committed as Rev.218652: https://gcc.gnu.org/r218652 Best regards, jasonwucj
C++ PATCH to remove "array of runtime bound" from -std=c++14
The C++ VLA paper, N3639, was voted into and then back out of the C++14 standard, and currently seems likely not to ever be part of a published standard. So I'm backing out most of my changes for that paper, such that -std=c++14 has the same VLA support as other standard modes. We no longer throw bad_array_length. Tested x86_64-pc-linux-gnu, applying to trunk. commit aa0d87578a3264820105a29099870dcf85e4ef98 Author: Jason Merrill Date: Thu Dec 11 15:20:11 2014 -0500 Remove N3639 "array of runtime length" from -std=c++14. gcc/cp/ * decl.c (compute_array_index_type): VLAs are not part of C++14. (create_array_type_for_decl, grokdeclarator): Likewise. * lambda.c (add_capture): Likewise. * pt.c (tsubst): Likewise. * rtti.c (get_tinfo_decl): Likewise. * semantics.c (finish_decltype_type): Likewise. * typeck.c (cxx_sizeof_or_alignof_type): Likewise. (cp_build_addr_expr_1): Likewise. * init.c (build_vec_init): Don't throw bad_array_length. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Define __cpp_runtime_arrays if we aren't complaining about VLAs. libstdc++-v3/ * libsupc++/new (bad_array_length): Move... * bad_array_length.cc: ...here. * cxxabi.h, eh_aux_runtime.cc (__cxa_throw_bad_array_new_length): Also move to bad_array_length.cc. * c-cppbuiltin.c (c_cpp_builtins): Define __cpp_runtime_arrays if we aren't complaining about VLAs. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index c571d1b..54d3acd 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -828,6 +828,15 @@ c_cpp_builtins (cpp_reader *pfile) and were standardized for C++14. */ if (!pedantic || cxx_dialect > cxx11) cpp_define (pfile, "__cpp_binary_literals=201304"); + + /* Arrays of runtime bound were removed from C++14, but we still + support GNU VLAs. Let's define this macro to a low number + (corresponding to the initial test release of GNU C++) if we won't + complain about use of VLAs. */ + if (c_dialect_cxx () + && (pedantic ? warn_vla == 0 : warn_vla <= 0)) + cpp_define (pfile, "__cpp_runtime_arrays=198712"); + if (cxx_dialect >= cxx11) { /* Set feature test macros for C++11 */ @@ -863,9 +872,6 @@ c_cpp_builtins (cpp_reader *pfile) cpp_define (pfile, "__cpp_variable_templates=201304"); cpp_define (pfile, "__cpp_digit_separators=201309"); //cpp_define (pfile, "__cpp_sized_deallocation=201309"); - /* We'll have to see where runtime arrays wind up. - Let's put it in C++14 for now. */ - cpp_define (pfile, "__cpp_runtime_arrays=201304"); } } /* Note that we define this for C as well, so that we know if diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 9659336..efc2001 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -8515,7 +8515,7 @@ compute_array_index_type (tree name, tree size, tsubst_flags_t complain) /* We don't allow VLAs at non-function scopes, or during tentative template substitution. */ || !at_function_scope_p () - || (cxx_dialect < cxx14 && !(complain & tf_error))) + || !(complain & tf_error)) { if (!(complain & tf_error)) return error_mark_node; @@ -8527,7 +8527,7 @@ compute_array_index_type (tree name, tree size, tsubst_flags_t complain) error ("size of array is not an integral constant-expression"); size = integer_one_node; } - else if (cxx_dialect < cxx14 && pedantic && warn_vla != 0) + else if (pedantic && warn_vla != 0) { if (name) pedwarn (input_location, OPT_Wvla, "ISO C++ forbids variable length array %qD", name); @@ -8585,25 +8585,12 @@ compute_array_index_type (tree name, tree size, tsubst_flags_t complain) stabilize_vla_size (itype); - if (cxx_dialect >= cxx14 && flag_exceptions) + if (flag_sanitize & SANITIZE_VLA + && current_function_decl != NULL_TREE + && !lookup_attribute ("no_sanitize_undefined", +DECL_ATTRIBUTES +(current_function_decl))) { - /* If the VLA bound is larger than half the address space, - or less than zero, throw std::bad_array_length. */ - tree comp = build2 (LT_EXPR, boolean_type_node, itype, - ssize_int (-1)); - comp = build3 (COND_EXPR, void_type_node, comp, - throw_bad_array_length (), void_node); - finish_expr_stmt (comp); - } - else if (flag_sanitize & SANITIZE_VLA - && current_function_decl != NULL_TREE - && !lookup_attribute ("no_sanitize_undefined", - DECL_ATTRIBUTES - (current_function_decl))) - { - /* From C++14 onwards, we throw an exception on a negative - length size of an array; see above. */ - /* We have to add 1 -- in the ubsan routine we generate LE_EXPR rather than LT_EXPR. */ tree t = fold_build2 (PLUS_EXPR, TREE_TYPE (itype), itype, @@ -8730,10 +8717,6 @@ create_array_type_for_decl (tree name, tree type, tree size) retu
c-family PATCH to update __cpp_constexpr macro for C++14 constexpr support
A bit I forgot in the earlier C++14 constexpr work. Tested x86_64-pc-linux-gnu, applying to trunk. commit 0b9dbcebc4d3bf5e9281889f34d189fb7c42dde3 Author: Jason Merrill Date: Thu Dec 11 22:19:36 2014 -0500 * c-cppbuiltin.c (c_cpp_builtins): Enable C++14 __cpp_constexpr. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 54d3acd..2dfecb6 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -845,7 +845,8 @@ c_cpp_builtins (cpp_reader *pfile) cpp_define (pfile, "__cpp_unicode_literals=200710"); cpp_define (pfile, "__cpp_user_defined_literals=200809"); cpp_define (pfile, "__cpp_lambdas=200907"); - cpp_define (pfile, "__cpp_constexpr=200704"); + if (cxx_dialect == cxx11) + cpp_define (pfile, "__cpp_constexpr=200704"); cpp_define (pfile, "__cpp_range_based_for=200907"); cpp_define (pfile, "__cpp_static_assert=200410"); cpp_define (pfile, "__cpp_decltype=200707"); @@ -865,8 +866,7 @@ c_cpp_builtins (cpp_reader *pfile) cpp_define (pfile, "__cpp_return_type_deduction=201304"); cpp_define (pfile, "__cpp_init_captures=201304"); cpp_define (pfile, "__cpp_generic_lambdas=201304"); - //cpp_undef (pfile, "__cpp_constexpr"); - //cpp_define (pfile, "__cpp_constexpr=201304"); + cpp_define (pfile, "__cpp_constexpr=201304"); cpp_define (pfile, "__cpp_decltype_auto=201304"); cpp_define (pfile, "__cpp_aggregate_nsdmi=201304"); cpp_define (pfile, "__cpp_variable_templates=201304"); diff --git a/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C b/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C index d271752..36e1135 100644 --- a/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C +++ b/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C @@ -47,12 +47,6 @@ # error "__cpp_lambdas != 200907" #endif -#ifndef __cpp_constexpr -# error "__cpp_constexpr" -#elif __cpp_constexpr != 200704 -# error "__cpp_constexpr != 200704" -#endif - #ifndef __cpp_range_based_for # error "__cpp_range_based_for" #elif __cpp_range_based_for != 200907 @@ -145,11 +139,10 @@ # error "__cpp_generic_lambdas != 201304" #endif -// TODO: Change 200704 to 201304 when C++14 constexpr goes in. #ifndef __cpp_constexpr # error "__cpp_constexpr" -#elif __cpp_constexpr != 200704 -# error "__cpp_constexpr != 200704" +#elif __cpp_constexpr != 201304 +# error "__cpp_constexpr != 201304" #endif #ifndef __cpp_decltype_auto
[Committed] [PATCH, ifcvt] Fix PR63917
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Wednesday, December 10, 2014 8:55 AM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [Ping] [PATCH, ifcvt] Fix PR63917 > > On 12/04/2014 05:16 PM, Zhenqiang Chen wrote: > > +static rtx > > +cc_in_cond (rtx cond) > > +{ > > + if ((HAVE_cbranchcc4) && cond > > Silly parens around the HAVE_cbranchcc4. Removed. > > + && (GET_MODE_CLASS (GET_MODE (XEXP (cond, 0))) == MODE_CC)) > > More silly parens around the ==. Removed. > > + /* Skip it if the instruction to be moved might clobber CC. */ > > + cc = cc_in_cond (cond); > > + if (cc) > > +if (set_of (cc, insn_a) > > + || (insn_b && set_of (XEXP (cond, 0), insn_b))) > > + return FALSE; > > Don't nest if's when an && will do; if the && won't do, always use braces. > > It looks like the insn_b test can be simpler, since the non-null return from > cc_in_cond is always XEXP (cond, 0). > > So: > > if (cc > && (set_of (cc, insn_a) > || (insn_b && set_of (cc, insn_b))) > return FALSE; > > Ok with those changes. Updated and committed @r218658. Here is the final patch. Index: gcc/ifcvt.c === --- gcc/ifcvt.c (revision 218657) +++ gcc/ifcvt.c (working copy) @@ -1016,6 +1016,18 @@ 0, 0, outmode, y); } +/* Return the CC reg if it is used in COND. */ + +static rtx +cc_in_cond (rtx cond) +{ + if (HAVE_cbranchcc4 && cond + && GET_MODE_CLASS (GET_MODE (XEXP (cond, 0))) == MODE_CC) +return XEXP (cond, 0); + + return NULL_RTX; +} + /* Return sequence of instructions generated by if conversion. This function calls end_sequence() to end the current stream, ensures that are instructions are unshared, recognizable non-jump insns. @@ -1026,6 +1038,7 @@ { rtx_insn *insn; rtx_insn *seq = get_insns (); + rtx cc = cc_in_cond (if_info->cond); set_used_flags (if_info->x); set_used_flags (if_info->cond); @@ -1040,7 +1053,9 @@ allows proper placement of required clobbers. */ for (insn = seq; insn; insn = NEXT_INSN (insn)) if (JUMP_P (insn) - || recog_memoized (insn) == -1) + || recog_memoized (insn) == -1 + /* Make sure new generated code does not clobber CC. */ + || (cc && set_of (cc, insn))) return NULL; return seq; @@ -2544,6 +2559,7 @@ rtx_insn *insn_a, *insn_b; rtx set_a, set_b; rtx orig_x, x, a, b; + rtx cc; /* We're looking for patterns of the form @@ -2655,6 +2671,13 @@ if_info->a = a; if_info->b = b; + /* Skip it if the instruction to be moved might clobber CC. */ + cc = cc_in_cond (cond); + if (cc + && (set_of (cc, insn_a) + || (insn_b && set_of (cc, insn_b +return FALSE; + /* Try optimizations in some approximation of a useful order. */ /* ??? Should first look to see if X is live incoming at all. If it isn't, we don't need anything but an unconditional set. */ @@ -2811,6 +2834,7 @@ rtx cond) { rtx_insn *insn; + rtx cc = cc_in_cond (cond); /* We can only handle simple jumps at the end of the basic block. It is almost impossible to update the CFG otherwise. */ @@ -2868,6 +2892,10 @@ && modified_between_p (src, insn, NEXT_INSN (BB_END (bb return FALSE; + /* Skip it if the instruction to be moved might clobber CC. */ + if (cc && set_of (cc, insn)) + return FALSE; + vals->put (dest, src); regs->safe_push (dest); Index: gcc/testsuite/gcc.dg/pr64007.c === --- gcc/testsuite/gcc.dg/pr64007.c (revision 0) +++ gcc/testsuite/gcc.dg/pr64007.c (revision 0) @@ -0,0 +1,50 @@ +/* { dg-options " -O3 " } */ +/* { dg-do run } */ + +#include + +int d, i; + +struct S +{ + int f0; +} *b, c, e, h, **g = &b; + +static struct S *f = &e; + +int +fn1 (int p) +{ + int a = 0; + return a || p < 0 || p >= 2 || 1 >> p; +} + +int +main () +{ + int k = 1, l, *m = &c.f0; + + for (;;) +{ + l = fn1 (i); + *m = k && i; + if (l) + { + int n[1] = {0}; + } + break; +} + + *g = &h; + + assert (b); + + if (d) +(*m)--; + d = (f != 0) | (i >= 0); + + if (c.f0 != 0) +__builtin_abort (); + + return 0; +}
RE: [PATCH] Fix PR 61225
> -Original Message- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, December 10, 2014 3:16 AM > To: Segher Boessenkool; Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix PR 61225 > > On 12/09/14 12:07, Segher Boessenkool wrote: > > On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote: > >>> Do you need to verify SETA and SETB satisfy single_set? Or has that > >>> already been done elsewhere? > >> > >> A is NEXT_INSN (insn) > >> B is prev_nonnote_nondebug_insn (insn), > >> > >> For I1 -> I2 -> B; I2 -> A; > >> LOG_LINK can make sure I1 and I2 are single_set, > > > > It cannot, not anymore anyway. LOG_LINKs can point to an insn with > > multiple SETs; multiple LOG_LINKs can point to such an insn. > So let's go ahead and put a single_set test in this function. > > Is this fragment really needed? Does it ever trigger? I'd think > that > >>> for > 2 uses punting would be fine. Do we really commonly have > >>> cases with > 2 uses, but where they're all in SETA and SETB? > > > > Can't you just check for a death note on the second insn? Together > > with reg_used_between_p? > Yea, that'd accomplish the same thing I think Zhenqiang is trying to catch and > is simpler than walking the lists. Updated. Check for a death note is enough since b is prev_nonnote_nondebug_insn (a). > > > + /* Try to combine a compare insn that sets CC > + with a preceding insn that can set CC, and maybe with its > + logical predecessor as well. > + We need this special code because data flow connections > + do not get entered in LOG_LINKS. */ > > > > I think you mean "not _all_ data flow connections"? > I almost said something about this comment, but figured I was nitpicking too > much :-) Updated. > >>> So you've got two new combine cases here, but I think the testcase > >>> only tests one of them. Can you include a testcase for both of hte > >>> major paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN) > >> > >> pr61225.c is the case to cover I1->I2->I3; I2->insn. > >> > >> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 > >> can also handle them. So I removed the code from the patch. > > > > Why? The simpler case has much better chances of being used. > The question does it actually catch anything not already handled? I guess you > could argue that doing it in combine is better than peep2 and I'd agree with > that. > > > > > In fact, there are many more cases you could handle: > > > > You handle > > > > I1 -> I2 -> I3; I2 -> insn > >I2 -> I3; I2 -> insn > > > > but there are also > > > > I1,I2 -> I3; I2 -> insn > > > > and the many 4-insn combos, too. > Yes, but I wonder how much of this is really necessary in practice. We > could do exhaustive testing here, but I suspect the payoff isn't all > that great. Thus I'm comfortable with faulting in the cases we actually > find are useful in practice. > > > > >> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. > >> + It returns TRUE, if reg1 == reg2, and no other refer of reg1 > >> + except A and B. */ > > > > That sound like the only correct inputs are such a compare etc., but the > > routine tests whether that is true. > Correct, the RTL has to have a specific form and that is tested for. > Comment updates can't hurt. Updated. > > > >> +static bool > >> +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b) > >> +{ > >> + rtx seta = single_set (a); > >> + rtx setb = single_set (b); > >> + > >> + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) > > > > Neither the comment nor the function name mention this. This test is > > better placed in the caller of this function, anyway. > Didn't consider it terribly important. Moving it to the caller doesn't > change anything significantly, though I would agree it's martinally cleaner. Updated. > > > >> @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, > rtx_insn > >> *i1, rtx_insn *i0, > >> rtx old = newpat; > >> total_sets = 1 + extra_sets; > >> newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets)); > >> -XVECEXP (newpat, 0, 0) = old; > >> + > >> +if (to_combined_insn) > >> + XVECEXP (newpat, 0, --total_sets) = old; > >> +else > >> + XVECEXP (newpat, 0, 0) = old; > >>} > > > > Is this correct? If so, it needs a big fat comment, because it is > > not exactly obvious :-) > > > > Also, it doesn't handle at all the case where the new pattern already is > > a PARALLEL; can that never happen? > I'd convinced myself it was. But yes, a comment here would be good. The following comments are added. + /* This is a hack to match i386 instruction pattern, which + is like + (parallel [ + (set (reg:CCZ 17 flags) + ...) + (set ...)}) + we h
[Patch, gcc/flag-types.h + Fortran] PR54687 - Fortran options cleanup
This patch cleans up Fortran's option handling and moves it closer to the common way of option handling. That's a nice cleanup and additionally, as Manuel points out in the PR, there are a couple of reasons why this makes sense in addition. I have not yet touched all options but one has to start somewhere. Built and currently regtesting on x86-64-gnu-linux. OK for the trunk? Tobias 2014-12-12 Tobias Burnus PR fortran/54687 gcc/ * flag-types.h (gfc_init_local_real, gfc_fcoarray, gfc_convert): New enums; moved from fortran/. gcc/fortran/ * gfortran.h (gfc_option_t): Remove flags which now have a Var(). (init_local_real, gfc_fcoarray): Moved to ../flag-types.h. * libgfortran.h (unit_convert): Add comment. * lang.opt (flag-aggressive_function_elimination, flag-align_commons, flag-all_intrinsics, flag-allow_leading_underscore, flag-automatic, flag-backslash, flag-backtrace, flag-blas_matmul_limit, flag-convert, flag-cray_pointer, flag-dollar_ok, flag-dump_fortran_original, flag-dump_fortran_optimized, flag-external_blas, flag-f2c, flag-implicit_none, flag-init_real, flag-max_array_constructor, flag-module_private, flag-pack_derived, flag-range_check, flag-recursive, flag-repack_arrays, flag-coarray, flag-sign_zero, flag-underscoring): Add Var() and, where applicable, Enum(). * options.c (gfc_handle_coarray_option): Remove. (gfc_init_options, gfc_post_options, gfc_handle_fpe_option, gfc_handle_option): Update for *.opt changes. * arith.c: Update for flag-variable name changes. * array.c: Ditto. * check.c: Ditto. * cpp.c: Ditto. * decl.c: Ditto. * expr.c: Ditto. * f95-lang.c: Ditto. * frontend-passes.c: Ditto. * intrinsic.c: Ditto. * io.c: Ditto. * match.c: Ditto. * module.c: Ditto. * parse.c: Ditto. * primary.c: Ditto. * resolve.c: Ditto. * scanner.c: Ditto. * simplify.c: Ditto. * symbol.c: Ditto. * trans-array.c: Ditto. * trans-common.c: Ditto. * trans-decl.c: Ditto. * trans-expr.c: Ditto. * trans-intrinsic.c: Ditto. * trans-openmp.c: Ditto. * trans-stmt.c: Ditto. * trans-types.c: Ditto. * trans.c: Ditto. diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 52ff7ee..81e8fb8 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -263,4 +263,38 @@ enum lto_partition_model { LTO_PARTITION_MAX = 4 }; + +/* gfortran -finit-real= values. */ + +enum gfc_init_local_real +{ + GFC_INIT_REAL_OFF = 0, + GFC_INIT_REAL_ZERO, + GFC_INIT_REAL_NAN, + GFC_INIT_REAL_SNAN, + GFC_INIT_REAL_INF, + GFC_INIT_REAL_NEG_INF +}; + +/* gfortran -fcoarray= values. */ + +enum gfc_fcoarray +{ + GFC_FCOARRAY_NONE = 0, + GFC_FCOARRAY_SINGLE, + GFC_FCOARRAY_LIB +}; + + +/* gfortran -fconvert= values; used for unformatted I/O. + Keep in sync with GFC_CONVERT_* in gcc/fortran/libgfortran.h. */ +enum gfc_convert +{ + GFC_FLAG_CONVERT_NATIVE = 0, + GFC_FLAG_CONVERT_SWAP, + GFC_FLAG_CONVERT_BIG, + GFC_FLAG_CONVERT_LITTLE +}; + + #endif /* ! GCC_FLAG_TYPES_H */ diff --git a/gcc/fortran/arith.c b/gcc/fortran/arith.c index 6394547..e8a5efe 100644 --- a/gcc/fortran/arith.c +++ b/gcc/fortran/arith.c @@ -301,7 +301,7 @@ gfc_check_integer_range (mpz_t p, int kind) } - if (gfc_option.flag_range_check == 0) + if (flag_range_check == 0) return result; if (mpz_cmp (p, gfc_integer_kinds[i].min_int) < 0 @@ -333,12 +333,12 @@ gfc_check_real_range (mpfr_t p, int kind) if (mpfr_inf_p (p)) { - if (gfc_option.flag_range_check != 0) + if (flag_range_check != 0) retval = ARITH_OVERFLOW; } else if (mpfr_nan_p (p)) { - if (gfc_option.flag_range_check != 0) + if (flag_range_check != 0) retval = ARITH_NAN; } else if (mpfr_sgn (q) == 0) @@ -348,14 +348,14 @@ gfc_check_real_range (mpfr_t p, int kind) } else if (mpfr_cmp (q, gfc_real_kinds[i].huge) > 0) { - if (gfc_option.flag_range_check == 0) + if (flag_range_check == 0) mpfr_set_inf (p, mpfr_sgn (p)); else retval = ARITH_OVERFLOW; } else if (mpfr_cmp (q, gfc_real_kinds[i].subnormal) < 0) { - if (gfc_option.flag_range_check == 0) + if (flag_range_check == 0) { if (mpfr_sgn (p) < 0) { @@ -736,7 +736,7 @@ gfc_arith_divide (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp) break; case BT_REAL: - if (mpfr_sgn (op2->value.real) == 0 && gfc_option.flag_range_check == 1) + if (mpfr_sgn (op2->value.real) == 0 && flag_range_check == 1) { rc = ARITH_DIV0; break; @@ -748,7 +748,7 @@ gfc_arith_divide (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp) case BT_COMPLEX: if (mpc_cmp_si_si (op2->value.complex, 0, 0) == 0 - && gfc_option.flag_range_check == 1) + && flag_range_check == 1) { rc = ARITH_DIV0; break; @@ -863,7 +863,7 @@ arith_power (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp) int i; i = gfc_validate_kind (BT_INTEGER, result->ts.kind, false); - if (gfc_option.flag_range_check) + if (flag_range_check)
RE: [PATCH, AARCH64] Fix ICE in CCMP (PR64015)
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Tuesday, November 25, 2014 5:25 PM > To: Zhenqiang Chen > Cc: Marcus Shawcroft; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, AARCH64] Fix ICE in CCMP (PR64015) > > On 11/25/2014 09:41 AM, Zhenqiang Chen wrote: > > I want to confirm with you two things before I rework it. > > (1) expand_insn needs an optab_handler as input. Do I need to define a > ccmp_optab with different mode support in optabs.def? > > No, look again: expand_insn needs an enum insn_code as input. Since this is > the backend, you can use any icode name you like, which means that you can > use CODE_FOR_ccmp_and etc directly. > > > (2) To make sure later operands not clobber CC, all operands are expanded > before ccmp-first in current implementation. If taking tree/gimple as input, > what's your preferred logic to guarantee CC not clobbered? > > Hmm. Perhaps the target hook will need to output two sequences, each of > which will be concatenated while looping around the calls to gen_ccmp_next. > The first sequence will be operand preparation and the second sequence will > be ccmp generation. > > Something like > > bool > aarch64_gen_ccmp_start(rtx *prep_seq, rtx *gen_seq, >int cmp_code, int bit_code, >tree op0, tree op1) { > bool success; > > start_sequence (); > // Widen and expand operands > *prep_seq = get_insns (); > end_sequence (); > > start_sequence (); > // Generate the first compare > *gen_seq = get_insns (); > end_sequence (); > > return success; > } > > bool > aarch64_gen_ccmp_next(rtx *prep_seq, rtx *gen_seq, > rtx prev, int cmp_code, int bit_code, > tree op0, tree op1) { > bool success; > > push_to_sequence (*prep_seq); > // Widen and expand operands > *prep_seq = get_insns (); > end_sequence (); > > push_to_sequence (*gen_seq); > // Generate the next ccmp > *gen_seq = get_insns (); > end_sequence (); > > return success; > } > > If there are ever any failures, the middle-end can simply discard the > sequences. If everything succeeds, it simply calls emit_insn on both > sequences. > Thanks for the comments. The updated patch is attached. Note: Function "aarch64_code_to_ccmode" is the same as it before reverting. ChangeLog: 2014-12-12 Zhenqiang Chen * ccmp.c (expand_ccmp_next): New function. (expand_ccmp_expr_1, expand_ccmp_expr): Handle operand insn sequence and compare insn sequence. * config/aarch64/aarch64.c (aarch64_code_to_ccmode, aarch64_gen_ccmp_first, aarch64_gen_ccmp_next): New functions. (TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): New MICRO. * config/aarch64/aarch64.md (*ccmp_and): Changed to ccmp_and. (*ccmp_ior): Changed to ccmp_ior. (cmp): New pattern. * doc/tm.texi (TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): Update parameters. * target.def (gen_ccmp_first, gen_ccmp_next): Update parameters. testsuite/ChangeLog: 2014-12-12 Zhenqiang Chen * gcc.dg/pr64015.c: New test. gen-ccmp.patch Description: Binary data