[patch, fortran] PR 57071, some power optimizations
Hello world, the attached patch does some optimization on power using ishft and iand, as discussed in the PR. I have left out handling real numbers, that should be left to the middle-end (PR 57073). Regression-tested. OK for trunk? Thomas 2013-04-28 Thomas Koenig PR fortran/57071 * frontend-passes (optimize_power): New function. (optimize_op): Use it. 2013-04-28 Thomas Koenig PR fortran/57071 * gfortran.dg/power_3.f90: New test. * gfortran.dg/power_4.f90: New test. Index: frontend-passes.c === --- frontend-passes.c (Revision 198340) +++ frontend-passes.c (Arbeitskopie) @@ -1091,7 +1091,66 @@ combine_array_constructor (gfc_expr *e) return true; } +/* Change (-1)**k into 1-ishift(iand(k,1),1) and + 2**k into ishift(1,k) */ +static bool +optimize_power (gfc_expr *e) +{ + gfc_expr *op1, *op2; + gfc_expr *iand, *ishft; + + if (e->ts.type != BT_INTEGER) +return false; + + op1 = e->value.op.op1; + + if (op1 == NULL || op1->expr_type != EXPR_CONSTANT) +return false; + + if (mpz_cmp_si (op1->value.integer, -1L) == 0) +{ + gfc_free_expr (op1); + + op2 = e->value.op.op2; + + if (op2 == NULL) + return false; + + iand = gfc_build_intrinsic_call (current_ns, GFC_ISYM_IAND, + "_internal_iand", e->where, 2, op2, + gfc_get_int_expr (e->ts.kind, + &e->where, 1)); + + ishft = gfc_build_intrinsic_call (current_ns, GFC_ISYM_ISHFT, + "_internal_ishft", e->where, 2, iand, + gfc_get_int_expr (e->ts.kind, + &e->where, 1)); + + e->value.op.op = INTRINSIC_MINUS; + e->value.op.op1 = gfc_get_int_expr (e->ts.kind, &e->where, 1); + e->value.op.op2 = ishft; + return true; +} + else if (mpz_cmp_si (op1->value.integer, 2L) == 0) +{ + gfc_free_expr (op1); + + op2 = e->value.op.op2; + if (op2 == NULL) + return false; + + ishft = gfc_build_intrinsic_call (current_ns, GFC_ISYM_ISHFT, + "_internal_ishft", e->where, 2, + gfc_get_int_expr (e->ts.kind, + &e->where, 1), + op2); + *e = *ishft; + return true; +} + return false; +} + /* Recursive optimization of operators. */ static bool @@ -1152,6 +1211,10 @@ optimize_op (gfc_expr *e) case INTRINSIC_DIVIDE: return combine_array_constructor (e) || changed; +case INTRINSIC_POWER: + return optimize_power (e); + break; + default: break; } ! { dg-do run } ! { dg-options "-ffrontend-optimize -fdump-tree-original" } ! PR 57071 - Check that (-1)**k is transformed into 1-2*iand(k,1). program main implicit none integer, parameter :: n = 3 integer(kind=8), dimension(-n:n) :: a, b integer, dimension(-n:n) :: c, d, e integer :: m integer :: i, v integer (kind=2) :: i2 m = n v = -1 ! Test in scalar expressions do i=-n,n if (v**i /= (-1)**i) call abort end do ! Test in array constructors a(-m:m) = [ ((-1)**i, i= -m, m) ] b(-m:m) = [ ( v**i, i= -m, m) ] if (any(a .ne. b)) call abort ! Test in array expressions c = [ ( i, i = -n , n ) ] d = (-1)**c e = v**c if (any(d .ne. e)) call abort ! Test in different kind expressions do i2=-n,n if (v**i2 /= (-1)**i2) call abort end do end program main ! { dg-final { scan-tree-dump-times "_gfortran_pow_i4_i4" 4 "original" } } ! { dg-final { cleanup-tree-dump "original" } } ! { dg-do run } ! { dg-options "-ffrontend-optimize -fdump-tree-original" } ! PR 57071 - Check that 2**k is transformed into ishift(1,k). program main implicit none integer :: i,m,v integer, parameter :: n=30 integer, dimension(-n:n) :: a,b,c,d,e m = n v = 2 ! Test scalar expressions. do i=-n,n if (2**i /= v**i) call abort end do ! Test array constructors b = [(2**i,i=-m,m)] c = [(v**i,i=-m,m)] if (any(b /= c)) call abort ! Test array expressions a = [(i,i=-m,m)] d = 2**a e = v**a if (any(d /= e)) call abort end program main ! { dg-final { scan-tree-dump-times "_gfortran_pow_i4_i4" 3 "original" } } ! { dg-final { cleanup-tree-dump "original" } }
Re: [patch, fortran] PR 57071, some power optimizations
Am 28.04.2013 10:32, schrieb Thomas Koenig: the attached patch does some optimization on power using ishft and iand, as discussed in the PR. I have left out handling real numbers, that should be left to the middle-end (PR 57073). Regression-tested. OK for trunk? OK - thanks for the patch. I wonder whether one should also handle: 1**k == 1 That should only happen (in-real-world code) due to simplifying other expressions but it is simple to implement. (0**k is also possible, but it gets more complicated: 0 for k > 0, 1 for k == 0 and invalid for k < 0 [which one might ignore]. As this is really special, one can also leave the library call.) Tobias 2013-04-28 Thomas Koenig PR fortran/57071 * frontend-passes (optimize_power): New function. (optimize_op): Use it. 2013-04-28 Thomas Koenig PR fortran/57071 * gfortran.dg/power_3.f90: New test. * gfortran.dg/power_4.f90: New test.
New Swedish PO file for 'gcc' (version 4.8.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Swedish team of translators. The file is available at: http://translationproject.org/latest/gcc/sv.po (This file, 'gcc-4.8.0.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [patch, fortran] PR 57071, some power optimizations
On Sun, Apr 28, 2013 at 10:32:28AM +0200, Thomas Koenig wrote: > Hello world, > > the attached patch does some optimization on > power using ishft and iand, as discussed in the PR. > > I have left out handling real numbers, that should be left > to the middle-end (PR 57073). > > Regression-tested. OK for trunk? > % cat foo.f90 function foo(k) integer k integer foo foo = (1)**k end function foo % cat foo.f90.003t.original foo (integer(kind=4) & restrict k) { integer(kind=4) __result_foo; __result_foo = _gfortran_pow_i4_i4 (1, *k); return __result_foo; } % cat foo.f90.143t.optimized ;; Function foo (foo_) foo (integer(kind=4) & restrict k) { integer(kind=4) __result_foo.0; integer(kind=4) D.1479; : D.1479_2 = *k_1(D); __result_foo.0_3 = _gfortran_pow_i4_i4 (1, D.1479_2); [tail call] return __result_foo.0_3; } % nm foo.o U _gfortran_pow_i4_i4 T foo_ :-) -- Steve
[patch] libstdc++/51365 for shared_ptr
This fixes shared_ptr to allow 'final' allocators to be used. As an added bonus it also reduces the memory footprint of the shared_ptr control block when constructing a shared_ptr with an empty deleter or when using make_shared/allocate_shared. I decided not to use std::tuple here, because it's a pretty heavy template to instantiate, so added another EBO helper like the one in hashtable_policy.h -- they should be merged and reused for the other containers to fix the rest of PR 51365. PR libstdc++/51365 * include/bits/shared_ptr_base (_Sp_ebo_helper): Helper class to implement EBO safely. (_Sp_counted_base::_M_get_deleter): Add noexcept. (_Sp_counter_ptr): Use noexcept instead of comments. (_Sp_counted_deleter): Likewise. Use _Sp_ebo_helper. (_Sp_counted_ptr_inplace): Likewise. * testsuite/20_util/shared_ptr/cons/51365.cc: New. * testsuite/20_util/shared_ptr/cons/52924.cc: Add rebind member to custom allocator and test construction with custom allocator. * testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust dg-error line number. Tested x86_64-linux, committed to trunk. commit 4c5977aae386fa8c60519a8c7b9ba3e448f43c22 Author: Jonathan Wakely Date: Sun Apr 28 12:10:00 2013 +0100 PR libstdc++/51365 * include/bits/shared_ptr_base (_Sp_ebo_helper): Helper class to implement EBO safely. (_Sp_counted_base::_M_get_deleter): Add noexcept. (_Sp_counter_ptr): Use noexcept instead of comments. (_Sp_counted_deleter): Likewise. Use _Sp_ebo_helper. (_Sp_counted_ptr_inplace): Likewise. * testsuite/20_util/shared_ptr/cons/51365.cc: New. * testsuite/20_util/shared_ptr/cons/52924.cc: Add rebind member to custom allocator and test construction with custom allocator. * testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust dg-error line number. diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index f463645..a0f513f 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -126,7 +126,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { delete this; } virtual void* - _M_get_deleter(const std::type_info&) = 0; + _M_get_deleter(const std::type_info&) noexcept = 0; void _M_add_ref_copy() @@ -284,7 +284,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { public: explicit - _Sp_counted_ptr(_Ptr __p) + _Sp_counted_ptr(_Ptr __p) noexcept : _M_ptr(__p) { } virtual void @@ -296,14 +296,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { delete this; } virtual void* - _M_get_deleter(const std::type_info&) - { return 0; } + _M_get_deleter(const std::type_info&) noexcept + { return nullptr; } _Sp_counted_ptr(const _Sp_counted_ptr&) = delete; _Sp_counted_ptr& operator=(const _Sp_counted_ptr&) = delete; -protected: - _Ptr _M_ptr; // copy constructor must not throw +private: + _Ptr _M_ptr; }; template<> @@ -318,59 +318,91 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline void _Sp_counted_ptr::_M_dispose() noexcept { } + template +struct _Sp_ebo_helper; + + /// Specialization using EBO. + template +struct _Sp_ebo_helper<_Nm, _Tp, true> : private _Tp +{ + explicit _Sp_ebo_helper(const _Tp& __tp) : _Tp(__tp) { } + + static _Tp& + _S_get(_Sp_ebo_helper& __eboh) { return static_cast<_Tp&>(__eboh); } +}; + + /// Specialization not using EBO. + template +struct _Sp_ebo_helper<_Nm, _Tp, false> +{ + explicit _Sp_ebo_helper(const _Tp& __tp) : _M_tp(__tp) { } + + static _Tp& + _S_get(_Sp_ebo_helper& __eboh) + { return __eboh._M_tp; } + +private: + _Tp _M_tp; +}; + // Support for custom deleter and/or allocator template class _Sp_counted_deleter final : public _Sp_counted_base<_Lp> { - // Helper class that stores the Deleter and also acts as an allocator. - // Used to dispose of the owned pointer and the internal refcount - // Requires that copies of _Alloc can free each other's memory. - struct _My_Deleter - : public _Alloc // copy constructor must not throw + class _Impl : _Sp_ebo_helper<0, _Deleter>, _Sp_ebo_helper<1, _Alloc> { - _Deleter _M_del;// copy constructor must not throw - _My_Deleter(_Deleter __d, const _Alloc& __a) - : _Alloc(__a), _M_del(__d) { } + typedef _Sp_ebo_helper<0, _Deleter> _Del_base; + typedef _Sp_ebo_helper<1, _Alloc> _Alloc_base; + + public: + _Impl(_Ptr __p, _Deleter __d, const _Alloc& __a) noexcept + : _M_ptr(__p), _Del_base(__d), _Alloc_base(__a) + { } + + _Deleter& _M_del() noexcept { return _Del_base::_S_get(*this); } + _Alloc& _M_alloc
Re: [PATCH] Preserve loops from CFG build until after RTL loop opts
On 26/04/13 16:27, Tom de Vries wrote: > On 25/04/13 16:19, Richard Biener wrote: > >> and compared to the previous patch changed the tree-ssa-tailmerge.c >> part to deal with merging of loop latch and loop preheader (even >> if that's a really bad idea) to not regress gcc.dg/pr50763.c. >> Any suggestion on how to improve that part welcome. > So I think this is really a cornercase, and we should disregard it if that > makes > things simpler. > > Rather than fixing up the loop structure, we could prevent tail-merge in these > cases. > > The current fix tests for current_loops == NULL, and I'm not sure that can > still > happen there, given that we have PROP_loops. Richard, I've found that it happens in these g++ test-cases: g++.dg/ext/mv1.C g++.dg/ext/mv12.C g++.dg/ext/mv2.C g++.dg/ext/mv5.C g++.dg/torture/covariant-1.C g++.dg/torture/pr43068.C g++.dg/torture/pr47714.C This seems rare enough to just bail out of tail-merge in those cases. > It's not evident to me that the test bb2->loop_father->latch == bb2 is > sufficient. Before calling tail_merge_optimize, we call > loop_optimizer_finalize > in which we assert that LOOPS_MAY_HAVE_MULTIPLE_LATCHES from there on, so in > theory we might miss some latches. > > But I guess that pre (having started out with simple latches) maintains simple > latches throughout, and that tail-merge does the same. I've added a comment related to this in the patch. Bootstrapped and reg-tested (ada inclusive) on x86_64. OK for trunk? Thanks, - Tom 2013-04-28 Tom de Vries * tree-ssa-tail-merge.c (find_same_succ_bb): Skip loop latch bbs. (replace_block_by): Don't set LOOPS_NEED_FIXUP. (tail_merge_optimize): Handle current_loops == NULL. * gcc.dg/pr50763.c: Update test. diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index f2ab7444..5467ae5 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -689,7 +689,15 @@ find_same_succ_bb (basic_block bb, same_succ *same_p) edge_iterator ei; edge e; - if (bb == NULL) + if (bb == NULL + /* Be conservative with loop structure. It's not evident that this test + is sufficient. Before tail-merge, we've just called + loop_optimizer_finalize, and LOOPS_MAY_HAVE_MULTIPLE_LATCHES is now + set, so there's no guarantee that the loop->latch value is still valid. + But we assume that, since we've forced LOOPS_HAVE_SIMPLE_LATCHES at the + start of pre, we've kept that property intact throughout pre, and are + keeping it throughout tail-merge using this test. */ + || bb->loop_father->latch == bb) return; bitmap_set_bit (same->bbs, bb->index); FOR_EACH_EDGE (e, ei, bb->succs) @@ -1460,17 +1468,6 @@ replace_block_by (basic_block bb1, basic_block bb2) /* Mark the basic block as deleted. */ mark_basic_block_deleted (bb1); - /* ??? If we merge the loop preheader with the loop latch we are creating - additional entries into the loop, eventually rotating it. - Mark loops for fixup in this case. - ??? This is a completely unwanted transform and will wreck most - loops at this point - but with just not considering loop latches as - merge candidates we fail to commonize the two loops in gcc.dg/pr50763.c. - A better fix to avoid that regression is needed. */ - if (current_loops - && bb2->loop_father->latch == bb2) -loops_state_set (LOOPS_NEED_FIXUP); - /* Redirect the incoming edges of bb1 to bb2. */ for (i = EDGE_COUNT (bb1->preds); i > 0 ; --i) { @@ -1612,7 +1609,19 @@ tail_merge_optimize (unsigned int todo) int iteration_nr = 0; int max_iterations = PARAM_VALUE (PARAM_MAX_TAIL_MERGE_ITERATIONS); - if (!flag_tree_tail_merge || max_iterations == 0) + if (!flag_tree_tail_merge + || max_iterations == 0 + /* We try to be conservative with respect to loop structure, since: + - the cases where tail-merging could both affect loop structure and be + benificial are rare, + - it prevents us from having to fixup the loops using + loops_state_set (LOOPS_NEED_FIXUP), and + - keeping loop structure may allow us to simplify the pass. + In order to be conservative, we need loop information. In rare cases + (about 7 test-cases in the g++ testsuite) there is none (because + loop_optimizer_finalize has been called before tail-merge, and + PROP_loops is not set), so we bail out. */ + || current_loops == NULL) return 0; timevar_push (TV_TREE_TAIL_MERGE); diff --git a/gcc/testsuite/gcc.dg/pr50763.c b/gcc/testsuite/gcc.dg/pr50763.c index 60025e3..695b61c 100644 --- a/gcc/testsuite/gcc.dg/pr50763.c +++ b/gcc/testsuite/gcc.dg/pr50763.c @@ -12,5 +12,5 @@ foo (int c, int d) while (c == d); } -/* { dg-final { scan-tree-dump-times "== 33" 1 "pre"} } */ +/* { dg-final { scan-tree-dump-times "== 33" 2 "pre"} } */ /* { dg-final { cleanup-tree-dump "pre" } } */
[patch] tweak some libstdc++ comments
* include/bits/hashtable_policy.h (_Hashtable_ebo_helper): Fix comment. * include/std/mutex (__recursive_mutex_base): Likewise. Tested x86_64-linux, committed to trunk. commit eebe0bf329438168c002754c4a0d5b7e1b59e6f5 Author: Jonathan Wakely Date: Sun Apr 28 12:49:20 2013 +0100 * include/bits/hashtable_policy.h (_Hashtable_ebo_helper): Fix comment. * include/std/mutex (__recursive_mutex_base): Likewise. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index 1cf6cb2..1c76af0 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -844,8 +844,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /** * Primary class template _Hashtable_ebo_helper. * - * Helper class using EBO when it is not forbidden, type is not - * final, and when it worth it, type is empty. + * Helper class using EBO when it is not forbidden (the type is not + * final) and when it is worth it (the type is empty.) */ template diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex index 67f3418..3c666c1 100644 --- a/libstdc++-v3/include/std/mutex +++ b/libstdc++-v3/include/std/mutex @@ -78,7 +78,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __mutex_base& operator=(const __mutex_base&) = delete; }; - // Common base class for std::recursive_mutex and std::timed_recursive_mutex + // Common base class for std::recursive_mutex and std::recursive_timed_mutex class __recursive_mutex_base { protected:
Re: [patch, fortran] PR 57071, some power optimizations
Am 28.04.2013 12:10, schrieb Tobias Burnus: OK - thanks for the patch. I wonder whether one should also handle: 1**k == 1 Committed, thanks for the review! I will do 1**k as a followup patch which should be quite obvious :-) Regarding PR 57073, the real case - I have never worked with GIMPLE, so I have no real idea how to start this. Any volunteers? Or should we handle this in the Fortran front end after all? Thomas
RE: [gomp4] Some progress on #pragma omp simd
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez > Sent: Saturday, April 27, 2013 1:30 PM > To: Iyer, Balaji V > Cc: Jakub Jelinek; Richard Henderson; gcc-patches@gcc.gnu.org > Subject: Re: [gomp4] Some progress on #pragma omp simd > > Hi Balaji. > > >> "The syntax and semantics of the various simd-openmp-data-clauses are > >> detailed in the OpenMP specification. > >> (http://www.openmp.org/mp-documents/spec30.pdf, Section 2.9.3)." > >> > >> Balaji, can you verify which is correct? For that matter, which are > >> the official specs from which we should be basing this work? > > > > Privatization clause makes a variable private for the simd lane. In > > general, I would follow the spec. If you have further questions, > > please feel free to ask. > > Ok, so the Cilk Plus 1.1 spec is incorrectly pointing to the OpenMP 3.0 spec, > because the OpenMP 3.0 spec has the private clause being task/thread private. > Since the OpenMP 4.0rc2 explicitly says that the private clause is for the > SIMD > lane (as you've stated), can we assume that when the Cilk Plus 1.1 spec > mentions > OpenMP, it is talking about the OpenMP 4.0 spec? I don't know of all the references to the OMP manual in the spec, so I will be a bit hesitant to make a blanket assumption like that. In this case, I think you can assume that it behaves in the same way as 4.0. If you have further questions, please feel free to ask. In general, #pragma simd, array notation and elemental functions deal with vectorization, not threading. But, Cilk part (Cilk keywords and reducers) deal with threading. All these parts can be mixed and matched (with restrictions) to take advantage of both threading and vectorization. > > One more question Balaji, the Cilk Plus spec says that for #pragma simd, the > private, firstprivate, lastprivate, and reduction clauses are as OpenMP. > However, for <#omp simd>, there is no firstprivate in the OpenMP 4.0rc2 spec. > Is the firstprivate clause valid for Cilk Plus' > <#pragma simd>? > > Thanks. > Aldy
Re: [wwwdocs] C++14 support for binary literals says Noinstead of Yes
On 04/27/2013 02:59 AM, Jakub Jelinek wrote: On Sat, Apr 27, 2013 at 01:03:17AM -0400, Ed Smith-Rowland wrote: In htdocs/projects/cxx1y.html it says no for support of binary literals. I think that's a Yes actually. Here is a little patchlet. Am I missing something? So yes... ;-) I had tested on g++-4.1 and I guess pedantic let that go through back then. I swear -pedantic gave nothing... Oh well. I like the patch. It compiled and tested clean on x86_64-linux. I'm working on a little proposal to add std::bin manipulator and related stuff in analogy to std::hex, etc. to the library. i think you should be able to write and extract binary literals. Given ./xg++ -B ./ a.C -std=c++1y -pedantic-errors -S a.C:1:9: error: binary constants are a GCC extension int i = 0b110101; ^ I'd say the fact that it is available as a GNU extension isn't sufficient to mark this as supported. I think you need something like (untested so far except for make check-g++ RUNTESTFLAGS=*binary_const*): 2013-04-27 Jakub Jelinek N3472 binary constants * include/cpplib.h (struct cpp_options): Fix a typo in user_literals field comment. Add binary_constants field. * init.c (struct lang_flags): Add binary_constants field. (lang_defaults): Add bin_cst column to the table. (cpp_set_lang): Initialize CPP_OPTION (pfile, binary_constants). * expr.c (cpp_classify_number): Talk about C++11 instead of C++0x in diagnostics. Accept binary constants if CPP_OPTION (pfile, binary_constants) even when pedantic. Adjust pedwarn message. * g++.dg/cpp/limits.C: Adjust warning wording. * g++.dg/system-binary-constants-1.C: Likewise. * g++.dg/cpp1y/system-binary-constants-1.C: New test. --- libcpp/include/cpplib.h.jj 2013-04-25 23:47:58.0 +0200 +++ libcpp/include/cpplib.h 2013-04-27 08:31:52.349122712 +0200 @@ -423,7 +423,7 @@ struct cpp_options /* True for traditional preprocessing. */ unsigned char traditional; - /* Nonzero for C++ 2011 Standard user-defnied literals. */ + /* Nonzero for C++ 2011 Standard user-defined literals. */ unsigned char user_literals; /* Nonzero means warn when a string or character literal is followed by a @@ -434,6 +434,9 @@ struct cpp_options literal number suffixes as user-defined literal number suffixes. */ unsigned char ext_numeric_literals; + /* Nonzero for C++ 2014 Standard binary constants. */ + unsigned char binary_constants; + /* Holds the name of the target (execution) character set. */ const char *narrow_charset; --- libcpp/init.c.jj 2013-04-25 23:47:58.0 +0200 +++ libcpp/init.c 2013-04-27 08:34:54.103120530 +0200 @@ -83,24 +83,25 @@ struct lang_flags char uliterals; char rliterals; char user_literals; + char binary_constants; }; static const struct lang_flags lang_defaults[] = -{ /* c99 c++ xnum xid std // digr ulit rlit user_literals */ - /* GNUC89 */ { 0, 0, 1, 0, 0, 1, 1, 0, 0,0 }, - /* GNUC99 */ { 1, 0, 1, 0, 0, 1, 1, 1, 1,0 }, - /* GNUC11 */ { 1, 0, 1, 0, 0, 1, 1, 1, 1,0 }, - /* STDC89 */ { 0, 0, 0, 0, 1, 0, 0, 0, 0,0 }, - /* STDC94 */ { 0, 0, 0, 0, 1, 0, 1, 0, 0,0 }, - /* STDC99 */ { 1, 0, 1, 0, 1, 1, 1, 0, 0,0 }, - /* STDC11 */ { 1, 0, 1, 0, 1, 1, 1, 1, 0,0 }, - /* GNUCXX */ { 0, 1, 1, 0, 0, 1, 1, 0, 0,0 }, - /* CXX98*/ { 0, 1, 1, 0, 1, 1, 1, 0, 0,0 }, - /* GNUCXX11 */ { 1, 1, 1, 0, 0, 1, 1, 1, 1,1 }, - /* CXX11*/ { 1, 1, 1, 0, 1, 1, 1, 1, 1,1 }, - /* GNUCXX1Y */ { 1, 1, 1, 0, 0, 1, 1, 1, 1,1 }, - /* CXX1Y*/ { 1, 1, 1, 0, 1, 1, 1, 1, 1,1 }, - /* ASM */ { 0, 0, 1, 0, 0, 1, 0, 0, 0,0 } +{ /* c99 c++ xnum xid std // digr ulit rlit udlit bin_cst */ + /* GNUC89 */ { 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,0 }, + /* GNUC99 */ { 1, 0, 1, 0, 0, 1, 1, 1, 1, 0,0 }, + /* GNUC11 */ { 1, 0, 1, 0, 0, 1, 1, 1, 1, 0,0 }, + /* STDC89 */ { 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0 }, + /* STDC94 */ { 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,0 }, + /* STDC99 */ { 1, 0, 1, 0, 1, 1, 1, 0, 0, 0,0 }, + /* STDC11 */ { 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,0 }, + /* GNUCXX */ { 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,0 }, + /* CXX98*/ { 0, 1, 1, 0, 1, 1, 1, 0, 0, 0,0 }, + /* GNUCXX11 */ { 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,0 }, + /* CXX11*/ { 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,0 }, + /* GNUCXX1Y */ { 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,1 }, + /* CXX1Y*/ { 1, 1,
Re: RFA: enable LRA for rs6000
- Original Message - From: "Michael Meissner" To: "Vladimir Makarov" Cc: "Michael Meissner" , "David Edelsohn" , "gcc-patches" , "Peter Bergner" , aavru...@redhat.com Sent: Friday, April 26, 2013 7:13:55 PM Subject: Re: RFA: enable LRA for rs6000 On Fri, Apr 26, 2013 at 07:00:37PM -0400, Vladimir Makarov wrote: > 2013-04-26 Vladimir Makarov > > * lra.c (setup_operand_alternative): Ignore '?'. > * lra-constraints.c (process_alt_operands): Print cost dump for > alternatives. Check only moves for cycling. > (curr_insn_transform): Print insn name. I'm not sure I'm comfortable with ignoring the '?' altogether. For example, if you do something in the GPR unit, instructions run at one cycle, while if you do it in the vector unit, it runs in two cycles. In the past, I've seen cases where it wanted to spill floating point values from the floating point registers to the CTR. And if you spill to the LR, it can interfere with the call cache. Admitily, when to use '!', '?', and '*' is unclear, and unfortunately it has changed over time. --- I don't like to change '?' semantics too. So I found another solution. I've committed it to the branch although it might be not final solution -- I'd like to see how it affects x86/x86-64. Mike, could you send me the config file (if it is possible of course) for spec2006 you are using in order to be in sync. Thanks, Vlad. 2013-04-28 Vladimir Makarov * lra.c (setup_operand_alternative): Restore taking into account '?'. * lra-constraints.c (process_alt_operands): Discourage a bit more using memory for pseudos. Remove printing undefined values. Modify cost values for conflicts with early clobbers. Index: lra.c === --- lra.c (revision 198350) +++ lra.c (working copy) @@ -784,6 +784,9 @@ setup_operand_alternative (lra_insn_reco lra_assert (i != nop - 1); break; + case '?': + op_alt->reject += LRA_LOSER_COST_FACTOR; + break; case '!': op_alt->reject += LRA_MAX_REJECT; break; Index: lra-constraints.c === --- lra-constraints.c (revision 198373) +++ lra-constraints.c (working copy) @@ -2007,7 +2007,7 @@ process_alt_operands (int only_alternati although it might takes the same number of reloads. */ if (no_regs_p && REG_P (op)) - reject++; + reject += 2; #ifdef SECONDARY_MEMORY_NEEDED /* If reload requires moving value through secondary @@ -2040,9 +2040,9 @@ process_alt_operands (int only_alternati if ((best_losers == 0 || losers != 0) && best_overall < overall) { if (lra_dump_file != NULL) - fprintf (lra_dump_file, " alt=%d,overall=%d,losers=%d," -"small_class_ops=%d,rld_nregs=%d -- reject\n", -nalt, overall, losers, small_class_operands_num, reload_nregs); + fprintf (lra_dump_file, +" alt=%d,overall=%d,losers=%d -- reject\n", +nalt, overall, losers); goto fail; } @@ -2139,7 +2139,10 @@ process_alt_operands (int only_alternati curr_alt_dont_inherit_ops[curr_alt_dont_inherit_ops_num++] = last_conflict_j; losers++; - overall += LRA_LOSER_COST_FACTOR; + /* Early clobber was already reflected in REJECT. */ + lra_assert (reject > 0); + reject--; + overall += LRA_LOSER_COST_FACTOR - 1; } else { @@ -2163,7 +2166,10 @@ process_alt_operands (int only_alternati } curr_alt_win[i] = curr_alt_match_win[i] = false; losers++; - overall += LRA_LOSER_COST_FACTOR; + /* Early clobber was already reflected in REJECT. */ + lra_assert (reject > 0); + reject--; + overall += LRA_LOSER_COST_FACTOR - 1; } } small_class_operands_num = 0;
Fix minor regression with size functions
It's a minor regression present on mainline and 4.8 branch: the size functions are output as (no-fn) in the .original dump file. Bootstrapped/regtested on x86_64-suse-linux, applied on the mainline and 4.8 branch as obvious (this only affects the Ada compiler). 2013-04-28 Eric Botcazou * stor-layout.c (finalize_size_functions): Allocate a structure and reset cfun before dumping the functions. -- Eric Botcazou Index: stor-layout.c === --- stor-layout.c (revision 198366) +++ stor-layout.c (working copy) @@ -290,6 +290,8 @@ finalize_size_functions (void) for (i = 0; size_functions && size_functions->iterate (i, &fndecl); i++) { + allocate_struct_function (fndecl, false); + set_cfun (NULL); dump_function (TDI_original, fndecl); gimplify_function_tree (fndecl); dump_function (TDI_generic, fndecl);
Re: [patch, fortran] PR 57071, some power optimizations
Thomas Koenig wrote: Regarding PR 57073, the real case - I have never worked with GIMPLE, so I have no real idea how to start this. Any volunteers? Or should we handle this in the Fortran front end after all? I think one should do it in the middle end. Actually, it shouldn't be that difficult. The starting point is gcc/builtins.c's fold_builtin_powi - all ingredients are there; (1.0)**k, x**0, x**1 and x**(-1) are already handled. Thus, only (-1)**k is missing. For "= (a)?(b):(c)" one uses "fold_build3_loc (loc, COND_EXPR, ..." and to check whether a real argument is -1, you can use "real_minus_onep". How about writing your first middle end patch? Tobias
[Patch, fortran] Fix sign error in SYSTEM_CLOCK kind=4 Windows version
Hi, committed the patch below as obvious: Index: ChangeLog === --- ChangeLog (revision 198377) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-04-28 Janne Blomqvist + + * intrinsics/system_clock.c (system_clock_4): Fix sign error in + Windows version. + 2013-04-15 Tobias Burnus * list_read.c (finish_separator): Initialize variable. @@ -37,7 +42,7 @@ (nml_get_obj_data): Likewise use the correct error mechanism. * io/transfer.c (hit_eof): Do not set AFTER_ENDFILE if in namelist mode. - + 2013-03-29 Tobias Burnus PR fortran/56737 Index: intrinsics/system_clock.c === --- intrinsics/system_clock.c (revision 198377) +++ intrinsics/system_clock.c (working copy) @@ -134,7 +134,7 @@ system_clock_4(GFC_INTEGER_4 *count, GFC QueryPerformanceCounter has potential issues. */ uint32_t cnt = GetTickCount (); if (cnt > GFC_INTEGER_4_HUGE) - cnt -= GFC_INTEGER_4_HUGE - 1; + cnt = cnt - GFC_INTEGER_4_HUGE - 1; *count = cnt; } if (count_rate) -- Janne Blomqvist
Re: [Patch, fortran] PR 56981 Improve unbuffered unformatted performance
PING On Fri, Apr 19, 2013 at 1:30 PM, Janne Blomqvist wrote: > Hi, > > the attached patch improves the performance for unformatted and > unbuffered files. Currently unbuffered unformatted really means that > we don't buffer anything and use the POSIX I/O syscalls directly. With > the patch, we use the buffer but flush it at the end of each I/O > statement. > > (For formatted I/O we essentially already do this, as the format > buffer (fbuf) buffers each record). > > For the ever important benchmark of writing small (containing a single > scalar 4 byte value) unformatted sequential records to /dev/null, the > patch reduces the number of syscalls by a factor of 6, and performance > improves by more than double. > > For trunk, time for the benchmark in the PR: > > real0m0.727s > user0m0.272s > sys 0m0.452s > > With the patch: > > real0m0.313s > user0m0.220s > sys 0m0.092s > > For comparison, writing to a file where we use buffered I/O: > > real0m0.202s > user0m0.180s > sys 0m0.020s > > > As a semi-unrelated minor improvement, the patch also changes the > ordering when writing out unformatted sequential record markers. > Currently we do > > write bogus marker > write record data > write tail marker > seek back to before the bogus marker > write the correct head marker > seek to the end of the record, behind the tail marker > > With the patch we instead do > > write bogus marker > write record data > seek back to before the bogus marker > write the correct head marker > seek to the end of the record data > write tail marker > > With the patch, the slightly shorter seek distances ever-so-slightly > increase the chance that the seeks will be contained within the buffer > so we don't have to flush. > > Regtested on x86_64-unknown-linux-gnu, Ok for trunk? > > 2013-04-19 Janne Blomqvist > > PR fortran/56981 > * io/transfer.c (next_record_w_unf): First fix head marker, then > write tail. > (next_record): Call flush_if_unbuffered. > * io/unix.c (struct unix_stream): Add field unbuffered. > (flush_if_unbuffered): New function. > (fd_to_stream): New argument. > (open_external): Fix fd_to_stream call. > (input_stream): Likewise. > (output_stream): Likewise. > (error_stream): Likewise. > * io/unix.h (flush_if_unbuffered): New prototype. > > > -- > Janne Blomqvist -- Janne Blomqvist
Re: [wwwdocs] C++14 support for binary literals says Noinstead of Yes
The patch looks good to me. Jason
Re: [C++ Patch/RFC] PR 56450
On 04/27/2013 05:17 PM, Paolo Carlini wrote: Assuming as obvious that we don't want to crash on it, the interesting issue is whether we want the static_asserts to both fail or succeed: in practice, a rather recent ICC I have at hand fails both; a rather recent clang++ passes both (consistently with the expectation of Bug submitter). In fact, as I'm reading now 7.1.6.2/4, since we are dealing with a class member access in any case, it may well be possible that *ICC* is right. Yes, I think so. Since it's a class member access, decltype should be the declared type, i.e. const int. Jason
Re: vtables patch 1/3: allow empty array initializations
On Apr 26, 2013, at 4:05 AM, Bernd Schmidt wrote: > On 04/24/2013 09:14 PM, DJ Delorie wrote: >>> 24 bits stored as three bytes, or four? How does this affect vtable >>> layout? I would have expected the C++ frontend and libsupc++ to >>> currently be inconsistent with each other given such a setup. >> >> In memory, four, I think. The address registers really are three >> bytes though. They're PSImode and gcc doesn't really have a good way >> of using any specified PSImode precision. I have patches to let one specify a precision for partial int types, easy enough to do, and the rest of the compiler plays nicely for the most part with it...
Re: [PATCH] Allow nested use of attributes in MD-files
On Apr 26, 2013, at 8:17 AM, Michael Zolotukhin wrote: > This patch allows to use attributes inside other attributes in MD-files. Sweet, I like it…
[Patch, Fortran] PR57093 - fix to-small malloc size with scalar coarrays of type character
The problem is a bit nested but the solution is obvious: The type (TREE_TYPE) of an array is an array type - and one needs to drill one level deeper to get the element type. For allocatable scalar coarrays, one has an array descriptor to handle the bounds (and, with -fcoarray=lib, to store the coarray token) but the type is a scalar. gfc_get_element_type by default applies TREE_TYPE twice - which is once to much for coarrays. There was a check that this is not done for arrays. Unfortunately, character strings are internally arrays as well. Thus, the second TREE_TYPE didn't give the array (with the proper string length) but an element of the string, which only had size 1 (for kind=1 characters). The solution is obvious, see patch. Committed as Rev. 198379 after bootstrapping and regtesting on x86-64-gnu-linux. Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 198378) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,3 +1,10 @@ +2013-04-28 Tobias Burnus + + PR fortran/57093 + * trans-types.c (gfc_get_element_type): Fix handling + of scalar coarrays of type character. + * intrinsic.texi (PACK): Add missing ")". + 2013-04-28 Thomas Koenig PR fortran/57071 @@ -6,9 +13,9 @@ 2013-04-25 Janne Blomqvist -PR bootstrap/57028 -* Make-lang.in (f951): Link in ZLIB. -(CFLAGS-fortran/module.o): Add zlib include directory. + PR bootstrap/57028 + * Make-lang.in (f951): Link in ZLIB. + (CFLAGS-fortran/module.o): Add zlib include directory. 2013-04-22 Janus Weil Index: gcc/fortran/intrinsic.texi === --- gcc/fortran/intrinsic.texi (Revision 198378) +++ gcc/fortran/intrinsic.texi (Arbeitskopie) @@ -9619,7 +9619,7 @@ Fortran 95 and later Transformational function @item @emph{Syntax}: -@code{RESULT = PACK(ARRAY, MASK[,VECTOR]} +@code{RESULT = PACK(ARRAY, MASK[,VECTOR])} @item @emph{Arguments}: @multitable @columnfractions .15 .70 Index: gcc/fortran/trans-types.c === --- gcc/fortran/trans-types.c (Revision 198378) +++ gcc/fortran/trans-types.c (Arbeitskopie) @@ -1179,7 +1179,7 @@ gfc_get_element_type (tree type) element = TREE_TYPE (element); /* For arrays, which are not scalar coarrays. */ - if (TREE_CODE (element) == ARRAY_TYPE) + if (TREE_CODE (element) == ARRAY_TYPE && !TYPE_STRING_FLAG (element)) element = TREE_TYPE (element); } Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (Revision 198378) +++ gcc/testsuite/ChangeLog (Arbeitskopie) @@ -1,3 +1,8 @@ +2013-04-28 Tobias Burnus + + PR fortran/57093 + * gfortran.dg/coarray_30.f90: New. + 2013-04-28 Thomas Koenig PR fortran/57071 @@ -43,7 +48,7 @@ 2013-04-25 Marek Polacek PR tree-optimization/57066 -* gcc.dg/torture/builtin-logb-1.c: Adjust testcase. + * gcc.dg/torture/builtin-logb-1.c: Adjust testcase. 2013-04-25 James Greenhalgh Tejas Belagod Index: gcc/testsuite/gfortran.dg/coarray_30.f90 === --- gcc/testsuite/gfortran.dg/coarray_30.f90 (Revision 0) +++ gcc/testsuite/gfortran.dg/coarray_30.f90 (Arbeitskopie) @@ -0,0 +1,15 @@ +! { dg-do compile } +! { dg-options "-fcoarray=single -fdump-tree-original" } +! +! PR fortran/57093 +! +! Contributed by Damian Rouson +! +program main + character(len=25), allocatable :: greeting[:] + allocate(greeting[*]) + write(greeting,"(a)") "z" +end + +! { dg-final { scan-tree-dump-times "greeting.data = \\(void . restrict\\) __builtin_malloc \\(25\\);" 1 "original" } } +! { dg-final { cleanup-tree-dump "original" } }
[Patch, libfortran] Simplify SYSTEM_CLOCK implementation
Hi, while looking at system_clock due to the recent Windows patches, it occurred to me that the Unix versions can be simplified somewhat. The attached patch does this. Regtested on x86_64-unknown-linux-gnu, Ok for trunk? 2013-04-28 Janne Blomqvist * intrinsics/system_clock (gf_gettime_mono): Use variable resolution for fractional seconds argument. (system_clock_4): Simplify, update for gf_gettime_mono change. (system_clock_8): Likewise. -- Janne Blomqvist system_clock_simplify.diff Description: Binary data
Re: [C++ Patch/RFC] PR 56450
Hi, On 04/28/2013 09:10 PM, Jason Merrill wrote: On 04/27/2013 05:17 PM, Paolo Carlini wrote: Assuming as obvious that we don't want to crash on it, the interesting issue is whether we want the static_asserts to both fail or succeed: in practice, a rather recent ICC I have at hand fails both; a rather recent clang++ passes both (consistently with the expectation of Bug submitter). In fact, as I'm reading now 7.1.6.2/4, since we are dealing with a class member access in any case, it may well be possible that *ICC* is right. Yes, I think so. Since it's a class member access, decltype should be the declared type, i.e. const int. Thanks. Is the below Ok, then? Tested (again) on x86_64-linux. Thanks, Paolo. // /cp 2013-04-28 Paolo Carlini PR c++/56450 * semantics.c (finish_decltype_type): Handle COMPOUND_EXPR. /testsuite 2013-04-28 Paolo Carlini PR c++/56450 * testsuite/g++.dg/cpp0x/decltype52.C: New. Index: cp/semantics.c === --- cp/semantics.c (revision 198366) +++ cp/semantics.c (working copy) @@ -5398,6 +5398,7 @@ finish_decltype_type (tree expr, bool id_expressio break; case COMPONENT_REF: + case COMPOUND_EXPR: mark_type_use (expr); type = is_bitfield_expr_with_lowered_type (expr); if (!type) Index: testsuite/g++.dg/cpp0x/decltype52.C === --- testsuite/g++.dg/cpp0x/decltype52.C (revision 0) +++ testsuite/g++.dg/cpp0x/decltype52.C (working copy) @@ -0,0 +1,18 @@ +// PR c++/56450 +// { dg-do compile { target c++11 } } + +template +T&& declval(); + +template +struct is_same +{ static constexpr bool value = false; }; + +template +struct is_same +{ static constexpr bool value = true; }; + +struct A { static const int dummy = 0; }; + +static_assert(is_same().dummy), const int>::value, ""); +static_assert(!is_same().dummy), const int&>::value, "");
Re: vtables patch 1/3: allow empty array initializations
> I have patches to let one specify a precision for partial int types, > easy enough to do, and the rest of the compiler plays nicely for the > most part with it... If you can make size_t truly be a 24-bit value, I'd be very happy :-)
[PATCH] Improve vec_widen_?mult_odd_* (take 2)
On Sat, Apr 27, 2013 at 11:07:50AM +0200, Uros Bizjak wrote: > Yes, please add a new predicate, the pattern is much more descriptive > this way. (Without the predicate, it looks like an expander that > generates a RTX fragment, used instead of gen_RTX... sequences). Ok, updated patch below. Bootstrapped/regtested again on x86_64-linux and i686-linux. > OTOH, does vector mode "general_operand" still accept scalar > immediates? The predicate, proposed above is effectively general_operand doesn't accept most of CONST_VECTOR constants (because they aren't targetm.legitimate_constant_p). It won't accept CONST_INT or CONST_DOUBLE due to: /* Don't accept CONST_INT or anything similar if the caller wants something floating. */ if (GET_MODE (op) == VOIDmode && mode != VOIDmode && GET_MODE_CLASS (mode) != MODE_INT && GET_MODE_CLASS (mode) != MODE_PARTIAL_INT) return 0; 2013-04-28 Jakub Jelinek * config/i386/predicates.md (general_vector_operand): New predicate. * config/i386/i386.c (const_vector_equal_evenodd_p): New function. (ix86_expand_mul_widen_evenodd): Force op1 resp. op2 into register if they aren't nonimmediate operands. If their original values satisfy const_vector_equal_evenodd_p, don't shift them. * config/i386/sse.md (mul3): Use general_vector_operand predicates. For the SSE4.1 case force operands[{1,2}] into registers if not nonimmediate_operand. (vec_widen_smult_even_v4si): Use nonimmediate_operand predicates instead of register_operand. (vec_widen_mult_odd_): Use general_vector_operand predicates. --- gcc/config/i386/predicates.md.jj2013-04-28 21:29:40.0 +0200 +++ gcc/config/i386/predicates.md 2013-04-28 21:32:30.308907622 +0200 @@ -1303,3 +1303,8 @@ (define_predicate "avx2_pblendw_operand" HOST_WIDE_INT low = val & 0xff; return val == ((low << 8) | low); }) + +;; Return true if OP is nonimmediate_operand or CONST_VECTOR. +(define_predicate "general_vector_operand" + (ior (match_operand 0 "nonimmediate_operand") + (match_code "const_vector"))) --- gcc/config/i386/i386.c.jj 2013-04-28 21:29:44.248917523 +0200 +++ gcc/config/i386/i386.c 2013-04-28 21:32:07.274908996 +0200 @@ -40827,6 +40827,24 @@ ix86_expand_vecop_qihi (enum rtx_code co gen_rtx_fmt_ee (code, qimode, op1, op2)); } +/* Helper function of ix86_expand_mul_widen_evenodd. Return true + if op is CONST_VECTOR with all odd elements equal to their + preceeding element. */ + +static bool +const_vector_equal_evenodd_p (rtx op) +{ + enum machine_mode mode = GET_MODE (op); + int i, nunits = GET_MODE_NUNITS (mode); + if (GET_CODE (op) != CONST_VECTOR + || nunits != CONST_VECTOR_NUNITS (op)) +return false; + for (i = 0; i < nunits; i += 2) +if (CONST_VECTOR_ELT (op, i) != CONST_VECTOR_ELT (op, i + 1)) + return false; + return true; +} + void ix86_expand_mul_widen_evenodd (rtx dest, rtx op1, rtx op2, bool uns_p, bool odd_p) @@ -40834,6 +40852,12 @@ ix86_expand_mul_widen_evenodd (rtx dest, enum machine_mode mode = GET_MODE (op1); enum machine_mode wmode = GET_MODE (dest); rtx x; + rtx orig_op1 = op1, orig_op2 = op2; + + if (!nonimmediate_operand (op1, mode)) +op1 = force_reg (mode, op1); + if (!nonimmediate_operand (op2, mode)) +op2 = force_reg (mode, op2); /* We only play even/odd games with vectors of SImode. */ gcc_assert (mode == V4SImode || mode == V8SImode); @@ -40852,10 +40876,12 @@ ix86_expand_mul_widen_evenodd (rtx dest, } x = GEN_INT (GET_MODE_UNIT_BITSIZE (mode)); - op1 = expand_binop (wmode, lshr_optab, gen_lowpart (wmode, op1), - x, NULL, 1, OPTAB_DIRECT); - op2 = expand_binop (wmode, lshr_optab, gen_lowpart (wmode, op2), - x, NULL, 1, OPTAB_DIRECT); + if (!const_vector_equal_evenodd_p (orig_op1)) + op1 = expand_binop (wmode, lshr_optab, gen_lowpart (wmode, op1), + x, NULL, 1, OPTAB_DIRECT); + if (!const_vector_equal_evenodd_p (orig_op2)) + op2 = expand_binop (wmode, lshr_optab, gen_lowpart (wmode, op2), + x, NULL, 1, OPTAB_DIRECT); op1 = gen_lowpart (mode, op1); op2 = gen_lowpart (mode, op2); } --- gcc/config/i386/sse.md.jj 2013-04-28 21:29:44.244917523 +0200 +++ gcc/config/i386/sse.md 2013-04-28 21:32:07.280908995 +0200 @@ -5631,14 +5631,16 @@ (define_insn "*sse2_pmaddwd" (define_expand "mul3" [(set (match_operand:VI4_AVX2 0 "register_operand") (mult:VI4_AVX2 - (match_operand:VI4_AVX2 1 "nonimmediate_operand") - (match_operand:VI4_AVX2 2 "nonimmediate_operand")))] + (match_operand:VI4_AVX2 1 "general_vector_operand") + (match_operand:VI4_AVX2 2 "general_vector_operand")))] "TARGET_SSE2" { if (TARGET_SSE4_1) { - if (CONSTANT_P (op
Re: vtables patch 1/3: allow empty array initializations
On 04/28/2013 11:13 PM, DJ Delorie wrote: > >> I have patches to let one specify a precision for partial int types, >> easy enough to do, and the rest of the compiler plays nicely for the >> most part with it... > > If you can make size_t truly be a 24-bit value, I'd be very happy :-) This confuses me a little. Currently, size_t is 16 bits, correct? How large is the address space really? Are pointers 24 bit flat objects, or are the upper 16 bit some kind of segment selector? Bernd
Re: [C++ Patch/RFC] PR 56450
OK. Jason
Re: vtables patch 1/3: allow empty array initializations
For m32c chips, The address space is a flat 24-bit address space. Address registers are 24 bits (i.e. they cannot hold an SImode) but size_t is 16 bits originally because there aren't enough 24-bit math ops and 32-bit math is too expensive. I've tried to use PSImode for size_t recently (different port) and it just doesn't work, partly because size_t is defined by a *string* that must match a C type, and partly because PSImode turns into BLKmode in many cases (not 2**N sized).
Re: [Patch, fortran] PR 56981 Improve unbuffered unformatted performance
On 04/28/2013 11:31 AM, Janne Blomqvist wrote: > PING > > On Fri, Apr 19, 2013 at 1:30 PM, Janne Blomqvist > wrote: >> Hi, >> >> the attached patch improves the performance for unformatted and >> unbuffered files. Currently unbuffered unformatted really means that >> we don't buffer anything and use the POSIX I/O syscalls directly. With >> the patch, we use the buffer but flush it at the end of each I/O >> statement. >> >> (For formatted I/O we essentially already do this, as the format >> buffer (fbuf) buffers each record). >> >> For the ever important benchmark of writing small (containing a single >> scalar 4 byte value) unformatted sequential records to /dev/null, the >> patch reduces the number of syscalls by a factor of 6, and performance >> improves by more than double. >> >> For trunk, time for the benchmark in the PR: >> >> real0m0.727s >> user0m0.272s >> sys 0m0.452s >> >> With the patch: >> >> real0m0.313s >> user0m0.220s >> sys 0m0.092s >> >> For comparison, writing to a file where we use buffered I/O: >> >> real0m0.202s >> user0m0.180s >> sys 0m0.020s >> >> >> As a semi-unrelated minor improvement, the patch also changes the >> ordering when writing out unformatted sequential record markers. >> Currently we do >> >> write bogus marker >> write record data >> write tail marker >> seek back to before the bogus marker >> write the correct head marker >> seek to the end of the record, behind the tail marker >> >> With the patch we instead do >> >> write bogus marker >> write record data >> seek back to before the bogus marker >> write the correct head marker >> seek to the end of the record data >> write tail marker >> >> With the patch, the slightly shorter seek distances ever-so-slightly >> increase the chance that the seeks will be contained within the buffer >> so we don't have to flush. >> >> Regtested on x86_64-unknown-linux-gnu, Ok for trunk? >> OK Janne and thanks for the patch. What are your thoughts about special casing nul devices/ Jerry
Re: [Patch, libfortran] Simplify SYSTEM_CLOCK implementation
On 04/28/2013 01:16 PM, Janne Blomqvist wrote: > Hi, > > while looking at system_clock due to the recent Windows patches, it > occurred to me that the Unix versions can be simplified somewhat. The > attached patch does this. > > Regtested on x86_64-unknown-linux-gnu, Ok for trunk? > OK for trunk. Thanks! Jerry
Re: Proposition
Greetings from South Korea I have a proposition for you to the tune of Fifty Million EUR, if interested, kindly reply for specifics Regards, Shin Eon-seong
Re: [PATCH] Improve vec_widen_?mult_odd_* (take 2)
On Sun, Apr 28, 2013 at 11:43 PM, Jakub Jelinek wrote: > On Sat, Apr 27, 2013 at 11:07:50AM +0200, Uros Bizjak wrote: >> Yes, please add a new predicate, the pattern is much more descriptive >> this way. (Without the predicate, it looks like an expander that >> generates a RTX fragment, used instead of gen_RTX... sequences). > > Ok, updated patch below. Bootstrapped/regtested again on x86_64-linux and > i686-linux. > >> OTOH, does vector mode "general_operand" still accept scalar >> immediates? The predicate, proposed above is effectively > > general_operand doesn't accept most of CONST_VECTOR constants (because they > aren't targetm.legitimate_constant_p). It won't accept CONST_INT > or CONST_DOUBLE due to: > /* Don't accept CONST_INT or anything similar > if the caller wants something floating. */ > if (GET_MODE (op) == VOIDmode && mode != VOIDmode > && GET_MODE_CLASS (mode) != MODE_INT > && GET_MODE_CLASS (mode) != MODE_PARTIAL_INT) > return 0; > > 2013-04-28 Jakub Jelinek > > * config/i386/predicates.md (general_vector_operand): New predicate. > * config/i386/i386.c (const_vector_equal_evenodd_p): New function. > (ix86_expand_mul_widen_evenodd): Force op1 resp. op2 into register > if they aren't nonimmediate operands. If their original values > satisfy const_vector_equal_evenodd_p, don't shift them. > * config/i386/sse.md (mul3): Use general_vector_operand > predicates. For the SSE4.1 case force operands[{1,2}] into registers > if > not nonimmediate_operand. > (vec_widen_smult_even_v4si): Use nonimmediate_operand predicates > instead of register_operand. > (vec_widen_mult_odd_): Use general_vector_operand predicates. OK for mainline. Thanks, Uros.