Re: [C++ Patch] More accurate location for conditional expressions
Paolo Carlini writes: > * call.c (build_conditional_expr_1): Add location_t parameter. > (build_conditional_expr): Likewise. ../../gcc/objc/objc-next-runtime-abi-02.c: In function 'tree_node* build_v2_build_objc_method_call(int, tree, tree, tree, tree, bool)': ../../gcc/objc/objc-next-runtime-abi-02.c:1642:83: error: invalid conversion from 'tree' to 'location_t {aka unsigned int}' [-fpermissive] ret_val = build_conditional_expr (ifexp, ret_val, ftree, tf_warning_or_error); ^ ../../gcc/objc/objc-next-runtime-abi-02.c:1642:83: error: cannot convert 'tsubst_flags' to 'tree' for argument '4' to 'tree_node* build_conditional_expr(location_t, tree, tree, tree, tsubst_flags_t)' make[3]: *** [objcp/objc-next-runtime-abi-02.o] Error 1 Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [PATCH] Allow nested use of attributes in MD-files
> Ok. Thanks a lot! Checked into trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-05/msg00698.html Thanks, K
Re: [PATCH] Allow nested use of attributes in MD-files
BTW, do we need to update GCC internals with this change? Thanks, K On Wed, May 22, 2013 at 11:49 AM, Kirill Yukhin wrote: >> Ok. > Thanks a lot! > Checked into trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-05/msg00698.html > > Thanks, K
Re: [4.8 PATCH, i386]: Fix PR 57356, SSE2 instructions generated with '-mno-sse2'
On Tue, May 21, 2013 at 07:03:54PM +0200, Uros Bizjak wrote: > This patch avoids movdqu/movdqa when SSE2 is disabled. Although > -mno-sse2 is bordering on ABI violation for 64bit targets, the patch > is simple enough to fix movti_internal_rex64 pattern. If the TImode attr variant isn't valid for !SSE2, I'd find it clearer to do (match_test "!TARGET_SSE2") (const_string "V4SF") much earlier in the cond, before TARGET_AVX case that returns TImode (sure, when TARGET_AVX is true, then !TARGET_SSE2 is false), and not added using ior to an unrelated condition. That said, you are the maintainer, thus your decision; I think this is fine for 4.8 today (if committed before RC2 is rolled). > 2013-05-21 Uros Bizjak > > PR target/57356 > * config/i386/i386.md (*movti_internal_rex64): Emit movaps/movups > for non-sse2 targets. > > Tested on x86_64-pc-linux-gnu {,-m32}. OK for 4.8 branch? > > (The patch is also needed for 4.7 branch, but not for mainline, where > this problem is fixed by movti_internal/movti_internal_rex64 merge). > Index: config/i386/i386.md > === > --- config/i386/i386.md (revision 199017) > +++ config/i386/i386.md (working copy) > @@ -1805,7 +1805,8 @@ >(const_string "V4SF") > (match_test "TARGET_AVX") >(const_string "TI") > -(match_test "optimize_function_for_size_p (cfun)") > +(ior (not (match_test "TARGET_SSE2")) > + (match_test "optimize_function_for_size_p (cfun)")) >(const_string "V4SF") > ] > (const_string "TI")))]) Jakub
Re: [4.8 PATCH, i386]: Fix PR 57356, SSE2 instructions generated with '-mno-sse2'
On Wed, May 22, 2013 at 9:55 AM, Jakub Jelinek wrote: > On Tue, May 21, 2013 at 07:03:54PM +0200, Uros Bizjak wrote: >> This patch avoids movdqu/movdqa when SSE2 is disabled. Although >> -mno-sse2 is bordering on ABI violation for 64bit targets, the patch >> is simple enough to fix movti_internal_rex64 pattern. > > If the TImode attr variant isn't valid for !SSE2, I'd find it clearer > to do > (match_test "!TARGET_SSE2") (const_string "V4SF") > much earlier in the cond, before TARGET_AVX case that returns TImode > (sure, when TARGET_AVX is true, then !TARGET_SSE2 is false), and > not added using ior to an unrelated condition. Yes, this is a good suggestion, I was trying to be consistent with the movti_internal pattern. I will change both in the way you suggested ASAP. Thanks, Uros.
Re: [patch, powerpc] increase array alignment for Altivec
On Wed, May 22, 2013 at 3:57 AM, David Edelsohn wrote: > On Tue, May 21, 2013 at 7:13 PM, Sandra Loosemore > wrote: >> On 05/21/2013 04:04 PM, David Edelsohn wrote: >>> >>> >>> There are three issues here: >>> >>> 1) Someone in the LTC toolchain team needs to benchmark this patch on >>> POWER7. >> >> >> That would be great if somebody else could help with that. >> >> >>> 2) We need to clarify how the patch affects the ABI because it cannot >>> break the ABI. >> >> >> I understand this. >> >> >>> 3) Please stop saying that you cannot justify trying to get the patch >>> in mainline. Other developers have pointed out how the patch may be >>> incorrect. Do you want to deliver a broken compiler to CodeSourcery's >>> customers? The comment sets a bad tone for engaging with the GCC >>> community. >> >> >> I think you've misunderstood my position, here. Delivering a broken >> compiler is just what I want to avoid! We've had the original >> local-arrays-only patch in our local tree for a couple of years now, but we >> no longer have a customer for it. I thought the comments from the previous >> review would be straightforward to address and it would be worth making one >> more attempt to revise and resubmit the patch, but if the feedback we get >> from the community is that this is still broken in other ways and is going >> to need a lot more work before it's acceptable, we're going to give up on it >> and revert the previous version of the patch locally too. We have a lot of >> higher-priority patches in our local tree that we'd like to get on mainline, >> and limited resources for working on it, so we need to pick our battles. >> That's all. :-) > > I think the local arrays patch makes sense, if it does not hurt > performance. We had another recent case where increasing GCC's > knowledge about the alignment of memory returned by malloc allowed > additional vectorization opportunities, but hurt performance because > of bad spilling choices by GCC RA. This alignment patch may expose > similar RA problems. We may need to apply the patch with the > optimization disabled until the RA spilling problem is fixed. > > Increasing the alignment of arrays within structs and unions would be > nice, but that probably will change the ABI. I think that they best we > may be able to do is increase the alignment if the array is the first > element of the struct or union, see ROUND_TYPE_ALIGN for AIX. > Although this might be more trouble than it is worth. Maybe the idea was to increase the alignment of the struct (without affecting it's layout) when that increases the alignment of a contained array member. Like for struct { int i; int j; float a[1024]; int x; }; where aligning to sizeof (int) * 2 would get 'a' a bigger alignment. Richard. > Pat or Bill, can you test the performance of the array alignment patch? > > Thanks, David
Re: [Patch] Extend script ./contrib/download_prerequisites usage for isl and cloog
Il 18/05/2013 04:37, Chung-Ju Wu ha scritto: > Hi all, > > Using current trunk repository, it is now able to build > compiler with in-tree isl and cloog. > > This patch is to extend ./contrib/download_prerequisites > usage to download isl and cloog conditionally in case > people would like to build gcc with graphite loop optimizations. > > OK for the trunk? > > > contrib/ChangeLog > > 2013-05-18 Chung-Ju Wu > > * download_prerequisites: Download isl and cloog conditionally. > > > > Index: contrib/download_prerequisites > === > --- contrib/download_prerequisites (revision 199006) > +++ contrib/download_prerequisites (working copy) > @@ -19,6 +19,12 @@ > # You should have received a copy of the GNU General Public License > # along with this program. If not, see http://www.gnu.org/licenses/. > > +# If you want to build GCC with the Graphite loop optimizations, > +# set GRAPHITE_LOOP_OPT=yes to download optional prerequisties > +# ISL Library and CLooG. > +GRAPHITE_LOOP_OPT=no > + > +# Necessary to build GCC. > MPFR=mpfr-2.4.2 > GMP=gmp-4.3.2 > MPC=mpc-0.8.1 > @@ -36,3 +42,19 @@ > ln -sf $MPC mpc || exit 1 > > rm $MPFR.tar.bz2 $GMP.tar.bz2 $MPC.tar.gz || exit 1 > + > +# Necessary to build GCC with the Graphite loop optimizations. > +if [ "$GRAPHITE_LOOP_OPT" == "yes" ] ; then > + ISL=isl-0.11.1 > + CLOOG=cloog-0.18.0 > + > + wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1 > + tar xjf $ISL.tar.bz2 || exit 1 > + ln -sf $ISL isl || exit 1 > + > + wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$CLOOG.tar.gz || exit 1 > + tar xzf $CLOOG.tar.gz || exit 1 > + ln -sf $CLOOG cloog || exit 1 > + > + rm $ISL.tar.bz2 $CLOOG.tar.gz || exit 1 I would leave the tar files in place. Otherwise ok. Paolo > +fi > > > Best regards, > jasonwucj >
[patch] Default to --enable-libstdcxx-time=auto
This alters the configure script to enable C++11 thread library features based on targets that are known to support the features, rather than based on link tests which are disabled by default. With Glibc 2.17 this enables a nanosecond resolution std::system_clock in the default configuration, yay! I've tested this on two versions of Fedora and Debian, but would be grateful for test results on Solaris, Cygwin and BSD targets, and for cross-compilers to any of those targets. * acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Add KIND=auto to enable features if target OS is known to support them. * configure.ac (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Default to 'auto'. * configure: Regenerate. Tested x86_64-linux, committed to trunk. commit 47e011c33d78a00015265be39f2845209eeb97fc Author: Jonathan Wakely Date: Sun Feb 3 20:45:51 2013 + * acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Add KIND=auto to enable features if target OS is known to support them. * configure.ac (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Default to 'auto'. * configure: Regenerate. diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 619fff0..efeb6d4 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -871,7 +871,8 @@ dnl(FEATURE, DEFAULT, HELP-ARG, HELP-STRING) dnl(FEATURE, DEFAULT, HELP-ARG, HELP-STRING, permit a|b|c) dnl(FEATURE, DEFAULT, HELP-ARG, HELP-STRING, SHELL-CODE-HANDLER) dnl -dnl See docs/html/17_intro/configury.html#enable for documentation. +dnl See manual/appendix_porting.html#appendix.porting.build_hacking for +dnl documentation. dnl m4_define([GLIBCXX_ENABLE],[dnl m4_define([_g_switch],[--enable-$1])dnl @@ -1161,8 +1162,9 @@ dnlnanosleep and sched_yield in libc and libposix4 and, if needed, dnllinks in the latter. dnl --enable-libstdcxx-time=rt dnlalso searches (and, if needed, links) librt. Note that this is -dnlnot always desirable because, in glibc, for example, in turn it -dnltriggers the linking of libpthread too, which activates locking, +dnlnot always desirable because, in glibc 2.16 and earlier, for +dnlexample, in turn it triggers the linking of libpthread too, +dnlwhich activates locking, dnla large overhead for single-thread programs. dnl --enable-libstdcxx-time=no dnl --disable-libstdcxx-time @@ -1175,8 +1177,7 @@ dnl os_defines.h and also defines _GLIBCXX_USE_SCHED_YIELD. dnl AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [ - AC_MSG_CHECKING([for clock_gettime, nanosleep and sched_yield]) - GLIBCXX_ENABLE(libstdcxx-time,$1,[[[=KIND]]], + GLIBCXX_ENABLE(libstdcxx-time,auto,[[[=KIND]]], [use KIND for check type], [permit yes|no|rt]) @@ -1188,9 +1189,59 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [ ac_has_clock_monotonic=no ac_has_clock_realtime=no - AC_MSG_RESULT($enable_libstdcxx_time) + ac_has_nanosleep=no + ac_has_sched_yield=no + + if test x"$enable_libstdcxx_time" = x"auto"; then + +case "${target_os}" in + cygwin*) +ac_has_nanosleep=yes +;; + darwin*) +ac_has_nanosleep=yes +ac_has_sched_yield=yes +;; + gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu) +AC_MSG_CHECKING([for at least GNU libc 2.17]) +AC_TRY_COMPILE( + [#include ], + [ + #if ! __GLIBC_PREREQ(2, 17) + #error + #endif + ], + [glibcxx_glibc217=yes], [glibcxx_glibc217=no]) +AC_MSG_RESULT($glibcxx_glibc217) + +if test x"$glibcxx_glibc217" = x"yes"; then + ac_has_clock_monotonic=yes + ac_has_clock_realtime=yes +fi +ac_has_nanosleep=yes +ac_has_sched_yield=yes +;; + freebsd*|netbsd*) +ac_has_clock_monotonic=yes +ac_has_clock_realtime=yes +ac_has_nanosleep=yes +ac_has_sched_yield=yes +;; + openbsd*) +ac_has_clock_monotonic=yes +ac_has_clock_realtime=yes +ac_has_nanosleep=yes +;; + solaris*) +GLIBCXX_LIBS="$GLIBCXX_LIBS -lrt" +ac_has_clock_monotonic=yes +ac_has_clock_realtime=yes +ac_has_nanosleep=yes +ac_has_sched_yield=yes +;; +esac - if test x"$enable_libstdcxx_time" != x"no"; then + elif test x"$enable_libstdcxx_time" != x"no"; then if test x"$enable_libstdcxx_time" = x"rt"; then AC_SEARCH_LIBS(clock_gettime, [rt posix4]) @@ -1214,19 +1265,16 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [ case "$ac_cv_search_sched_yield" in -lposix4*) GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_sched_yield" - AC_DEFINE(_GLIBCXX_USE_SCHED_YIELD, 1, - [ Defined if sched_yield is available. ]) + ac_has_sched_yield=yes ;; -lrt*) if test x"$enable_libstdcxx_time" = x"rt"; then GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_sched_
Re: [patch] Preserve the CFG until after final
> That is only partially true. Currently the transition is already de > facto going on: Just look at how many back ends use > compute_bb_for_insn to re-initialize the BLOCK_FOR_INSN pointers right > after pass_free_cfg (it's usually the first thing they do in the > machine-reorg pass). Yes, and we should do something about it. Btw, why is the line removed from ia64_reorg in the patch (and not mentioned in the ChangeLog)? > Some ports never call free_bb_or_insn after that, > and expect BLOCK_FOR_INSN to be valid in 'final'. One of those ports > is i386, look at where BLOCK_FOR_INSN is used in i386.c (in functions > deciding what asm output templates to use). AFAICS it's only used for splitters. And using BLOCK_FOR_INSN after the last split pass (pass_split_for_shorten_branches) is dubious. Here's the list: ./frv/frv.c: || BLOCK_FOR_INSN (insn) == ce_info->else_bb Only used during if-conversion. ./rs6000/rs6000.c: bb = BLOCK_FOR_INSN (label); Only used during compute_alignments. ./spu/spu.c: bitmap_set_bit (blocks, BLOCK_FOR_INSN (branch)->index); Only used during md_reorg. ./c6x/c6x.c: BLOCK_FOR_INSN (bundle) = BLOCK_FOR_INSN (slot[0]); Only used during md_reorg. ./mips/mips.c: basic_block bb = BLOCK_FOR_INSN (insn); ./mips/mips.c: /* Restore the BLOCK_FOR_INSN pointers, which are needed by DF. Also during Only used for splitting decision, deal with null BLOCK_FOR_INSN. ./i386/i386.c: basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL; ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); ./i386/i386.c: basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL; ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); ./i386/i386.c: rtx start = BB_HEAD (BLOCK_FOR_INSN (insn)); ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); Only used for splitting decision (and scheduling). > Also, right now I'm stuck with a chicken-and-egg problem: dbr_schedule > is not CFG-aware, but my still slowly progressing work on a > replacement can't have a CFG because it has to run after machine-reorg > passes, and therefore after pass_free_cfg. That's why I'm suggesting to get rid of or modify pass_free_cfg so that the CFG is still usable up to the point where it is really destroyed. > > I think that an incremental step would be to allow the machine reorg > > pass to use the CFG, even if it doesn't maintain it. > > That is the current state of things already. No, as you noted earlier, because of pass_free_cfg. > But I need this exactly for that reason: To remove that blocker! :-) I don't think so. The goal for now shouldn't be to "preserve the CFG until after final" since we know it isn't feasible in the short term, but rather to preserve the CFG as long as possible. So what about doing the following instead: 1. Remove pass_free_cfg from the pipeline, 2. Remove compute_bb_for_insn from all the md_reorg passes that call it, 3. (Optionally) Do pass_free_cfg from all the md_reorg passes that don't, 4. Do pass_free_cfg at the beginning of pass_delay_slots, 5. Do pass_free_cfg at the end of pass_split_for_shorten_branches, -- Eric Botcazou
Re: [fortran, doc] Improve random_seed example
On Tue, May 21, 2013 at 2:50 AM, Steve Kargl wrote: > On Fri, May 17, 2013 at 03:18:01PM +0300, Janne Blomqvist wrote: >> Hi, >> >> the example we provide for the usage of the random_seed intrinsic >> could be better. At least one user has already been tripped over by >> the fact that on some targets the first call to system_clock returns >> 0, resulting in a poor seed. Below is an improved(?!) example program. >> >> Ok for trunk/4.8/4.7? (Changelog + patch is of course trivial once >> there is agreement on the code itself) >> > > Looks OK to me. > > It should probably also be noted in the manual that one can get > real poor results if one does not set all seed values. Thanks. Attached is the patch I committed following 'make pdf' testing. Committed to trunk/4.8 (after Jakub's approval on IRC)/4.7. -- Janne Blomqvist rseed.diff Description: Binary data
Re: [C++ Patch] More accurate location for conditional expressions
Andreas Schwab ha scritto: >Paolo Carlini writes: > >> * call.c (build_conditional_expr_1): Add location_t parameter. >> (build_conditional_expr): Likewise. Argh, I'll fix it momentarily sorry. I admit I forgot to enable objc and obj-c++ when testing. Paolo
Re: RFA: Use gen_int_mode in plus_constant
Andreas Krebbel writes: > On 21/05/13 11:26, Richard Sandiford wrote: >> gcc/ >> * recog.c (offsettable_address_addr_space_p): Fix calculation of >> address mode. Move pointer mode initialization to the same place. > > Thanks! This fixed the failure (and others). Bootstrapped on s390x - no > regressions: Thanks for testing. Also now bootstrapped & regression-tested on x86_64-linux-gnu. OK to install? OK for 4.8 once 4.8.1 is released? Thanks, Richard
Re: RFA: fix rtl-optimization/56833
> The problem was that we had some optimzations added to the > reload_cse_move2add pass that would attempt transformations with > multi-hard-register registers, without keeping track of the validity of the > values in all hard registers involved. That's not clear to me: for example in move2add_note_store, we reset the info about any multi hard-register; moveover 5 arrays are supposed to be maintained for each hard-register: for (i = FIRST_PSEUDO_REGISTER - 1; i >= 0; i--) { reg_set_luid[i] = 0; reg_offset[i] = 0; reg_base_reg[i] = 0; reg_symbol_ref[i] = NULL_RTX; reg_mode[i] = VOIDmode; } so I'm not sure whether we really properly handle multi hard-registers. So what about explicitly punting for multi hard-registers in reload_cse_move2add? Do you think that this would pessimize too much in practice? -- Eric Botcazou
Re: RFA: Use gen_int_mode in plus_constant
On Wed, May 22, 2013 at 11:02 AM, Richard Sandiford wrote: > Andreas Krebbel writes: >> On 21/05/13 11:26, Richard Sandiford wrote: >>> gcc/ >>> * recog.c (offsettable_address_addr_space_p): Fix calculation of >>> address mode. Move pointer mode initialization to the same place. >> >> Thanks! This fixed the failure (and others). Bootstrapped on s390x - no >> regressions: > > Thanks for testing. Also now bootstrapped & regression-tested on > x86_64-linux-gnu. OK to install? OK for 4.8 once 4.8.1 is released? Ok. Thanks, Richard. > Thanks, > Richard >
Re: [patch] install host specific {bits,ext}/opt_random.h headers in host specific c++ incdir
On 05/21/2013 10:25 AM, Matthias Klose wrote: Am 19.05.2013 11:40, schrieb Paolo Carlini: On 05/19/2013 11:35 AM, Andreas Schwab wrote: Tests that now fail, but worked before: Thanks Andreas. Matthias, please revert ASAP, thanks. you already did that. Looks like ext/random includes opt_random.h, which is not on any include dir, so make it ext/opt_random.h. tests all pass, and check the include with an installed version too. Ok, if nobody has further comments over the next day or so, let's try again ;) Thanks, Paolo.
Re: [PATCH] Add explicit default constructors where required by the standard
OK to merge to google/4_7 and google/4_8? On Wed, May 15, 2013 at 12:28 PM, Evgeniy Stepanov wrote: > Thanks! > > On Wed, May 15, 2013 at 3:05 AM, Jonathan Wakely > wrote: >> On 14 May 2013 10:56, Jonathan Wakely wrote: >>> On 14 May 2013 10:45, Evgeniy Stepanov wrote: This must have fallen through the cracks. >>> >>> It's still in my Git branch at home. I've been too busy to push any >>> commits recently, but I haven't forgotten it. >>> >>> I realized we also need it in the 4_7 branch. Could you backport the change there, too, if it is not too much trouble? >>> >>> Yes, for such a small, safe change I'm happy to apply it to the 4.7 branch >>> too. >> >> Done.
Re: [patch] Default to --enable-libstdcxx-time=auto
Hi Jon, On 05/22/2013 10:14 AM, Jonathan Wakely wrote: This alters the configure script to enable C++11 thread library features based on targets that are known to support the features, rather than based on link tests which are disabled by default. With Glibc 2.17 this enables a nanosecond resolution std::system_clock in the default configuration, yay! I've tested this on two versions of Fedora and Debian, but would be grateful for test results on Solaris, Cygwin and BSD targets, and for cross-compilers to any of those targets. * acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Add KIND=auto to enable features if target OS is known to support them. * configure.ac (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Default to 'auto'. * configure: Regenerate. Tested x86_64-linux, committed to trunk. Thanks a lot! Could you please double check abi_check? On this glibc 2.17 machine I'm seeing: 1 incompatible symbols 0 _ZNSt6chrono12steady_clock3nowEv std::chrono::steady_clock::now() version status: incompatible GLIBCXX_3.4.17 type: function status: added I'm going to try again, just in case I goofed something. Thanks! Paolo.
[PATCH] Fix PR57349
The following fixes FDO on setjmp using functions. We may not blindly split blocks before setjmp receivers as that disconnects the PHIs and will end up creating a bogus CFG when IPA inlining then creates new abnormal edges to the new setjmp block - we're going to have a hard time coalescing SSA names that way. Fixed by following what the SJLJ machinery does - not split blocks before ECF_RETURNS_TWICE. Bootstrap / regtest running on x86_64-unknown-linux-gnu (253.perlbmk with FDO passed with this patch). Richard. 2013-05-22 Richard Biener PR middle-end/57349 * profile.c (branch_prob): Do not split blocks that are abnormally receiving from ECF_RETURNS_TWICE functions. Index: gcc/profile.c === *** gcc/profile.c (revision 199138) --- gcc/profile.c (working copy) *** branch_prob (void) *** 1085,1102 or __builtin_setjmp_dispatcher calls. These are very special and don't expect anything to be inserted before them. */ ! if (!is_gimple_call (first) ! || (fndecl = gimple_call_fndecl (first)) == NULL ! || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL ! || (DECL_FUNCTION_CODE (fndecl) != BUILT_IN_SETJMP_RECEIVER ! && (DECL_FUNCTION_CODE (fndecl) ! != BUILT_IN_SETJMP_DISPATCHER))) ! { ! if (dump_file) ! fprintf (dump_file, "Splitting bb %i after labels\n", !bb->index); ! split_block_after_labels (bb); ! } } } } --- 1085,1104 or __builtin_setjmp_dispatcher calls. These are very special and don't expect anything to be inserted before them. */ ! if (is_gimple_call (first) ! && (((fndecl = gimple_call_fndecl (first)) != NULL ! && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL ! && (DECL_FUNCTION_CODE (fndecl) ! == BUILT_IN_SETJMP_RECEIVER ! || (DECL_FUNCTION_CODE (fndecl) ! == BUILT_IN_SETJMP_DISPATCHER))) ! || gimple_call_flags (first) & ECF_RETURNS_TWICE)) ! continue; ! ! if (dump_file) ! fprintf (dump_file, "Splitting bb %i after labels\n", !bb->index); ! split_block_after_labels (bb); } } }
RE: [PATCH][gensupport] Add optional attributes field to define_cond_exec
> > We have a new define_subst which may help here. I *think* that > > define_cond_exec is (or is nearly) a subset of define_subst. On my > > medium term > > to-do list is to support define_cond_exec within gensupport via the > > define_subst infrastructure, removing everything except the actual > > parsing of > > define_cond_exec. > > An clearer example of what I'm trying to achieve can be found here: > http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01139.html > > I've been trying to wrap my head around define_subst, and it was > considered > initially, but from the online documentation > (http://gcc.gnu.org/onlinedocs/gccint/Define-Subst.html#Define-Subst) > I couldn't think of a way to use it. > > What I want essentially, is to disable the cond_exec variant of an insn > based on some variable. > This stems from the fact that the "predicable" attribute cannot depend > on > anything else > (has to be a compile-compile time constant). > > > > > Your "predicated" attribute could be mapped to a define_subst > > substitution. > > You could expose this via a true attribute, but I'm not 100% certain > > that you'd > > need to do so. > > > > I havn't thought about all of the ramifications here, but I'd be > > interested to > > hear about your thoughts or experiments down this line. So, thinking about it a bit more... I can see how define_subst is a generalised version of define_cond_exec, but what I need in my case is more fine-grained control over the alternatives. I'd like to say, for example, that the predicated version of the 3rd alternative is disabled for, say, TARGET_RESTRICT_CE. I don't think define_subst would allow me to do this in its current form. From what I understand, using define_subst would mean creating a define_subst for every pattern that can be "predicable"? There are at least 600 predicable patterns in the arm backend, so that would be infeasible. Kyrill
Unreviewed libffi patch
The following libffi patch [libffi] Fix several libffi testsuite failures on 64-bit SPARC and PowerPC (PR libffi/56033) http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00840.html has remained unreviewed for a week. It needs a libffi maintainer, which I suppose Anthony still is, despite not being listed in MAINTAINERS. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [patch] Default to --enable-libstdcxx-time=auto
On 22 May 2013 10:47, Paolo Carlini wrote: > > Could you please double check abi_check? On this glibc 2.17 machine I'm > seeing: > > 1 incompatible symbols > 0 > _ZNSt6chrono12steady_clock3nowEv > std::chrono::steady_clock::now() > version status: incompatible > GLIBCXX_3.4.17 > type: function > status: added > > I'm going to try again, just in case I goofed something. No, I don't think you goofed, I see it too on glibc 2.17, sorry about that. How should we handle that export? It is available on older systems without glibc 2.17 and with older GCC versions if you configure with --enable-libstdcxx-time=rt, so it wouldn't be correct to just add it to the GLIBCXX_3.4.19 version.
Re: [patch] Default to --enable-libstdcxx-time=auto
Hi, On 05/22/2013 11:59 AM, Jonathan Wakely wrote: On 22 May 2013 10:47, Paolo Carlini wrote: Could you please double check abi_check? On this glibc 2.17 machine I'm seeing: 1 incompatible symbols 0 _ZNSt6chrono12steady_clock3nowEv std::chrono::steady_clock::now() version status: incompatible GLIBCXX_3.4.17 type: function status: added I'm going to try again, just in case I goofed something. No, I don't think you goofed, I see it too on glibc 2.17, sorry about that. How should we handle that export? It is available on older systems without glibc 2.17 and with older GCC versions if you configure with --enable-libstdcxx-time=rt, I don't think it could be available on older systems without something like --enable-libstdcxx-time=rt on the command line. That is, without a non-default configure switch. Anyway, indeed it could be available, depending on the configuration and we wanted it to be exported, in case. I think there isn't a risk of real, substantive abi-breakage here, the issue is largely about the way we are testing abi_check. I bet Jakub has ideas about the best way to approach this, like with the alias mechanism?!? Essentially we want to arrange things in such a way that this specific symbol is fine to be optionally exported at *any* version >= GLIBCXX_3.4.17 (thus, no abi_check error), depending on the configure switch. Paolo.
Re: [patch] Default to --enable-libstdcxx-time=auto
.. an idea I have got: we could arrange for defining the symbol unconditionally, even when isn't really meaningful in that clock configuration, and therefore exporting it uncondtionally, a dummy implementation in many configuration. We also have at the same time to regenerate the baselines to include it, not a big deal. Maybe Rainer can also help... he pays a lot of attention to this kind of issue. Paolo.
Re: rtl expansion without zero/sign extension based on VRP
On Fri, May 17, 2013 at 1:40 PM, Kugan wrote: > On 13/05/13 17:47, Richard Biener wrote: >> >> On Mon, May 13, 2013 at 5:45 AM, Kugan >> wrote: >>> >>> Hi, >>> >>> This patch removes some of the redundant sign/zero >>> extensions using value ranges informations generated by VRP. >>> >>> When GIMPLE_ASSIGN stmts with LHS type smaller than word is expanded to >>> rtl, if we can prove that RHS expression value can always fit in LHS >>> type and there is no sign conversion, truncation and extension to fit >>> the type is redundant. Subreg and Zero/sign extensions are therefore >>> redundant. >>> >>> For example, when an expression is evaluated and it's value is assigned >>> to variable of type short, the generated rtl would look something like >>> the following. >>> >>> (set (reg:SI 110) >>>(zero_extend:SI (subreg:HI (reg:SI 117) 0))) >>> >>> However, if during value range propagation, if we can say for certain >>> that the value of the expression which is present in register 117 is >>> within the limits of short and there is no sign conversion, we don’t >>> need to perform the subreg and zero_extend; instead we can generate the >>> following rtl. >>> >>> (set (reg:SI 110) >>>(reg:SI 117))) >>> >>> Attached patch adds value range information to gimple stmts during value >>> range propagation pass and expands rtl without subreg and zero/sign >>> extension if the value range is within the limits of it's type. >>> >>> This change improve the geomean of spec2k int benchmark with ref by about >>> ~3.5% on an arm chromebook. >>> >>> Tested on X86_64 and ARM. >>> >>> I would like review comments on this. >> >> >> A few comments on the way you preserve this information. >> > > Thanks Richard for reviewing it. > > >> --- a/gcc/gimple.h >> +++ b/gcc/gimple.h >> @@ -191,6 +191,11 @@ struct GTY((chain_next ("%h.next"))) >> gimple_statement_base { >>in there. */ >> unsigned int subcode : 16; >> >> + /* if an assignment gimple statement has RHS expression that can fit >> + LHS type, zero/sign extension to truncate is redundant. >> + Set this if we detect extension as redundant during VRP. */ >> + unsigned sign_zero_ext_redundant : 1; >> + >> >> this enlarges all gimple statements by 8 bytes, so it's out of the >> question. >> >> My original plan to preserve value-range information was to re-use >> SSA_NAME_PTR_INFO which is where we already store value-range >> information for pointers: >> >> struct GTY(()) ptr_info_def >> { >>/* The points-to solution. */ >>struct pt_solution pt; >> >>/* Alignment and misalignment of the pointer in bytes. Together >> align and misalign specify low known bits of the pointer. >> ptr & (align - 1) == misalign. */ >> >>/* When known, this is the power-of-two byte alignment of the object >> this >> pointer points into. This is usually DECL_ALIGN_UNIT for decls and >> MALLOC_ABI_ALIGNMENT for allocated storage. When the alignment is >> not >> known, it is zero. Do not access directly but use functions >> get_ptr_info_alignment, set_ptr_info_alignment, >> mark_ptr_info_alignment_unknown and similar. */ >>unsigned int align; >> >>/* When alignment is known, the byte offset this pointer differs from >> the >> above alignment. Access only through the same helper functions as >> align >> above. */ >>unsigned int misalign; >> }; >> >> what you'd do is to make the ptr_info_def reference from tree_ssa_name a >> reference to a union of the above (for pointer SSA names) and an alternate >> info like >> >> struct range_info_def { >>double_int min; >>double_int max; >> }; >> > > I have now changed the way I am preserving this information as you have > suggested. > > >> or a more compressed format (a reference to a mode or type it fits for >> example). >> >> The very specific case in question also points at the fact that we have >> two conversion tree codes, NOP_EXPR and CONVERT_EXPR, and we've >> tried to unify them (but didn't finish up that task)... >> > > We can also remove sign/zero extension in cases other than NOP_EXPR and > CONVERT_EXPR, if the value range of the RHS expressions fits LHS type > without sign change. > > For example, please look at the following gimple stmt and the resulting rtl: > > ;; c_3 = c_2(D) & 15; > > ;; Without the patch > (insn 6 5 7 (set (reg:SI 116) > (and:SI (reg/v:SI 115 [ c ]) > (const_int 15 [0xf]))) c5.c:4 -1 >(nil)) > > (insn 7 6 0 (set (reg/v:SI 111 [ c ]) > (zero_extend:SI (subreg:QI (reg:SI 116) 0))) c5.c:4 -1 >(nil)) > > ;; with the patch > (insn 6 5 7 (set (reg:SI 116) > (and:SI (reg/v:SI 115 [ c ]) > (const_int 15 [0xf]))) c5.c:4 -1 >(nil)) > > (insn 7 6 0 (set (reg/v:SI 111 [ c ]) > (reg:SI 116)) c5.c:4 -1 >(nil)) > > > >> +static void >> +process_stmt_for_ext_elimination () >> +{ >> >> we alread
Re: [patch] Default to --enable-libstdcxx-time=auto
.. put an abort or something in the dummy implementations, to make sure people don't try to run an executable built with headers which have the clock available linked at run time to a .so which doesn't, really. Suboptimal solution because ideally we would like to fail at build time, but this is the best I have got so far. Be very clear in the docs, anyway. Paolo.
[ARM] fix PR debug/57351 ICE: internal compiler error: in dbx_reg_number,
Hello, arm_dwarf_register_span converts regno to a dbx register number while building the PARALLEL rtx. But since http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01131.html this information is centralized in DBX_REGISTER_NUMBER that will be called when processing the operands in reg_loc_descriptor, so the DBX conversion information doesn't need to be duplicated. Build and regtested with a arm-none-eabi newlib toolset configured with --with-fpu=neon-vfpv4 --with-float=hard --with-arch=armv7-a OK for trunk ? Thanks Christian 2013-05-22 Christian Bruel PR debug/57351 * config/arm/arm.c (arm_dwarf_register_span): Do not use dbx number. 2013-05-22 Christian Bruel PR debug/57351 * gcc.dg/debug/pr57351.c: New test Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 199179) +++ gcc/config/arm/arm.c (working copy) @@ -25861,9 +25861,8 @@ arm_dwarf_register_span (rtx rtl) nregs = GET_MODE_SIZE (GET_MODE (rtl)) / 8; p = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (nregs)); - regno = (regno - FIRST_VFP_REGNUM) / 2; for (i = 0; i < nregs; i++) -XVECEXP (p, 0, i) = gen_rtx_REG (DImode, 256 + regno + i); +XVECEXP (p, 0, i) = gen_rtx_REG (DImode, regno + i); return p; } Index: gcc/testsuite/gcc.dg/debug/pr57351.c === --- gcc/testsuite/gcc.dg/debug/pr57351.c (revision 0) +++ gcc/testsuite/gcc.dg/debug/pr57351.c (revision 0) @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon } */ +/* { dg-options "-std=c99 -Os -g -march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard" { target { arm-*-* } } } */ + +typedef unsigned int size_t; +typedef int ptrdiff_t; +typedef signed char int8_t ; +typedef signed long long int64_t; +typedef int8_t GFC_INTEGER_1; +typedef GFC_INTEGER_1 GFC_LOGICAL_1; +typedef int64_t GFC_INTEGER_8; +typedef GFC_INTEGER_8 GFC_LOGICAL_8; +typedef ptrdiff_t index_type; +typedef struct descriptor_dimension +{ + index_type lower_bound; + index_type _ubound; +} +descriptor_dimension; +typedef struct { GFC_LOGICAL_1 *base_addr; size_t offset; index_type dtype; descriptor_dimension dim[7];} gfc_array_l1; +typedef struct { GFC_LOGICAL_8 *base_addr; size_t offset; index_type dtype; descriptor_dimension dim[7];} gfc_array_l8; +void +all_l8 (gfc_array_l8 * const restrict retarray, + gfc_array_l1 * const restrict array, + const index_type * const restrict pdim) +{ + GFC_LOGICAL_8 * restrict dest; + index_type n; + index_type len; + index_type delta; + index_type dim; + dim = (*pdim) - 1; + len = ((array)->dim[dim]._ubound + 1 - (array)->dim[dim].lower_bound); + for (n = 0; n < dim; n++) +{ + const GFC_LOGICAL_1 * restrict src; + GFC_LOGICAL_8 result; + { + result = 1; + { + for (n = 0; n < len; n++, src += delta) + { + if (! *src) +{ + result = 0; + break; +} + } + *dest = result; + } + } +} +}
[SPARC] Small tweaks to config/sparc/sol2-unwind.h
This silences a maybe-uninitialized warning, removes an obsolete comment and changes the CFA offset of individual registers to comprise the global CFA offset, thus making it possible to remove the special-casing of signal frames from sparc64_frob_update_context. Tested on SPARC/Solaris, applied on the mainline. 2013-05-22 Eric Botcazou * config/sparc/sol2-unwind.h (sparc64_frob_update_context): Do it for signal frames as well. (MD_FALLBACK_FRAME_STATE_FOR): Do minor cleanups throughout and add the STACK_BIAS to the CFA offset. -- Eric BotcazouIndex: libgcc/config/sparc/sol2-unwind.h === --- libgcc/config/sparc/sol2-unwind.h (revision 199091) +++ libgcc/config/sparc/sol2-unwind.h (working copy) @@ -155,12 +155,10 @@ sparc64_frob_update_context (struct _Unw { /* The column of %sp contains the old CFA, not the old value of %sp. The CFA offset already comprises the stack bias so, when %sp is the - CFA register, we must avoid counting the stack bias twice. Do not - do that for signal frames as the offset is artificial for them. */ + CFA register, we must avoid counting the stack bias twice. */ if (fs->regs.cfa_reg == __builtin_dwarf_sp_column () && fs->regs.cfa_how == CFA_REG_OFFSET - && fs->regs.cfa_offset != 0 - && !fs->signal_frame) + && fs->regs.cfa_offset != 0) { long i; @@ -296,9 +294,8 @@ MD_FALLBACK_FRAME_STATE_FOR (struct _Unw _Unwind_FrameState *fs) { void *pc = context->ra; - struct frame *fp = (struct frame *) context->cfa; - int nframes; void *this_cfa = context->cfa; + int nframes = 0; long new_cfa; void *ra_location, *shifted_ra_location; mcontext_t *mctx; @@ -318,21 +315,22 @@ MD_FALLBACK_FRAME_STATE_FOR (struct _Unw return _URC_NO_REASON; } + /* Do some pattern matching at the return address. */ if (IS_SIGHANDLER (pc, this_cfa, &nframes)) { + struct frame *fp = (struct frame *) this_cfa; struct handler_args { struct frame frwin; ucontext_t ucontext; } *handler_args; ucontext_t *ucp; - /* context->cfa points into the frame after the saved frame pointer and + /* this_cfa points into the frame after the saved frame pointer and saved pc (struct frame). The ucontext_t structure is in the kernel frame after a struct frame. Since the frame sizes vary even within OS releases, we need to walk the stack to get there. */ - for (i = 0; i < nframes; i++) fp = (struct frame *) ((char *)fp->fr_savfp + STACK_BIAS); @@ -340,19 +338,15 @@ MD_FALLBACK_FRAME_STATE_FOR (struct _Unw ucp = &handler_args->ucontext; mctx = &ucp->uc_mcontext; } - - /* Exit if the pattern at the return address does not match the - previous three patterns. */ else return _URC_END_OF_STACK; - new_cfa = mctx->gregs[REG_SP]; /* The frame address is %sp + STACK_BIAS in 64-bit mode. */ - new_cfa += STACK_BIAS; + new_cfa = mctx->gregs[REG_SP] + STACK_BIAS; fs->regs.cfa_how = CFA_REG_OFFSET; fs->regs.cfa_reg = __builtin_dwarf_sp_column (); - fs->regs.cfa_offset = new_cfa - (long) this_cfa; + fs->regs.cfa_offset = new_cfa - (long) this_cfa + STACK_BIAS; /* Restore global and out registers (in this order) from the ucontext_t structure, uc_mcontext.gregs field. */ @@ -372,7 +366,7 @@ MD_FALLBACK_FRAME_STATE_FOR (struct _Unw for (i = 0; i < 16; i++) { fs->regs.reg[i + 16].how = REG_SAVED_OFFSET; - fs->regs.reg[i + 16].loc.offset = i*sizeof(long); + fs->regs.reg[i + 16].loc.offset = i * sizeof(long); } /* Check whether we need to restore FPU registers. */
Re: [patch] Default to --enable-libstdcxx-time=auto
On 22 May 2013 11:30, Paolo Carlini wrote: > .. put an abort or something in the dummy implementations, to make sure > people don't try to run an executable built with headers which have the > clock available linked at run time to a .so which doesn't, really. > Suboptimal solution because ideally we would like to fail at build time, but > this is the best I have got so far. Be very clear in the docs, anyway. Based on feedback for users much prefer to have something undefined than to have it defined but non-functional or causing an abort.
Re: [patch] Default to --enable-libstdcxx-time=auto
On 22 May 2013 11:40, Jonathan Wakely wrote: > On 22 May 2013 11:30, Paolo Carlini wrote: >> .. put an abort or something in the dummy implementations, to make sure >> people don't try to run an executable built with headers which have the >> clock available linked at run time to a .so which doesn't, really. >> Suboptimal solution because ideally we would like to fail at build time, but >> this is the best I have got so far. Be very clear in the docs, anyway. > > Based on feedback for users much prefer to have something > undefined than to have it defined but non-functional or causing an > abort. Could leave the steady_clock::now() symbol "undesignated", like the __once_call and __once_callable symbols that are only present when _GLIBCXX_HAVE_TLS is defined?
Re: RFA: fix rtl-optimization/56833
Quoting Eric Botcazou : The problem was that we had some optimzations added to the reload_cse_move2add pass that would attempt transformations with multi-hard-register registers, without keeping track of the validity of the values in all hard registers involved. That's not clear to me: for example in move2add_note_store, we reset the info about any multi hard-register; moveover 5 arrays are supposed to be maintained for each hard-register: for (i = FIRST_PSEUDO_REGISTER - 1; i >= 0; i--) { reg_set_luid[i] = 0; reg_offset[i] = 0; reg_base_reg[i] = 0; reg_symbol_ref[i] = NULL_RTX; reg_mode[i] = VOIDmode; } so I'm not sure whether we really properly handle multi hard-registers. Initializing all the values is nice to have defined contents, but the values in reg_offset[i] / reg_base_reg[i] / reg_symbol_ref[i] just tell what's in these registers if they are valid, not if they are valid. But I can see that there could be a problem with an earlier value that used to be valid in a multi-hard-register sub register to be considered to be still valid. Setting the mode of every constituent register but the first one (which has the new value recorded) to VOIDmode at the same time as updating reg_set_luid should be sufficent to address this issue. So what about explicitly punting for multi hard-registers in reload_cse_move2add? Do you think that this would pessimize too much in practice? The pass was originally written with word_mode == Pmode targets like the SH in mind, where multi-hard-register values are uninteresting. But for targets like the avr, most or all of the interesting values will be in multi-hard-register registers. 2013-05-22 Joern Rennecke PR rtl-optimization/56833 * postreload.c (move2add_use_add2_insn): Update reg_set_luid and reg_mode for all affected hard registers. (move2add_use_add3_insn): Likewise. Check that all source hard regs have been set by the same set. (reload_cse_move2add): Before concluding that we have a pre-existing value, check that all destination hard registers have been set by the same set. Index: postreload.c === --- postreload.c(revision 199190) +++ postreload.c(working copy) @@ -1749,7 +1749,13 @@ move2add_use_add2_insn (rtx reg, rtx sym } } } - reg_set_luid[regno] = move2add_luid; + int i = hard_regno_nregs[regno][GET_MODE (reg)] -1; + do +{ + reg_set_luid[regno + i] = move2add_luid; + reg_mode[regno + i] = VOIDmode; +} + while (--i >= 0); reg_base_reg[regno] = -1; reg_mode[regno] = GET_MODE (reg); reg_symbol_ref[regno] = sym; @@ -1793,6 +1799,14 @@ move2add_use_add3_insn (rtx reg, rtx sym && reg_symbol_ref[i] != NULL_RTX && rtx_equal_p (sym, reg_symbol_ref[i])) { + int j; + + for (j = hard_regno_nregs[i][reg_mode[i]] - 1; +j > 0 && reg_set_luid[i+j] == reg_set_luid[i];) + j--; + if (j != 0) + continue; + rtx new_src = gen_int_mode (INTVAL (off) - reg_offset[i], GET_MODE (reg)); /* (set (reg) (plus (reg) (const_int 0))) is not canonical; @@ -1835,7 +1849,13 @@ move2add_use_add3_insn (rtx reg, rtx sym if (validate_change (insn, &SET_SRC (pat), tem, 0)) changed = true; } - reg_set_luid[regno] = move2add_luid; + i = hard_regno_nregs[regno][GET_MODE (reg)] -1; + do +{ + reg_set_luid[regno + i] = move2add_luid; + reg_mode[regno + i] = VOIDmode; +} + while (--i >= 0); reg_base_reg[regno] = -1; reg_mode[regno] = GET_MODE (reg); reg_symbol_ref[regno] = sym; @@ -1890,7 +1910,10 @@ reload_cse_move2add (rtx first) /* Check if we have valid information on the contents of this register in the mode of REG. */ - if (reg_set_luid[regno] > move2add_last_label_luid + for (i = hard_regno_nregs[regno][GET_MODE (reg)] - 1; + i > 0 && reg_set_luid[regno + i] == reg_set_luid[regno];) + i--; + if (i == 0 && reg_set_luid[regno] > move2add_last_label_luid && MODES_OK_FOR_MOVE2ADD (GET_MODE (reg), reg_mode[regno]) && dbg_cnt (cse2_move2add)) { @@ -2021,7 +2044,10 @@ reload_cse_move2add (rtx first) /* If the reg already contains the value which is sum of sym and some constant value, we can use an add2 insn. */ - if (reg_set_luid[regno] > move2add_last_label_luid + for (i = hard_regno_nregs[regno][GET_MODE (reg)] - 1; + i > 0 && reg_set_luid[regno + i] == reg_set_luid[regno];) + i--; + if (i == 0 && reg_set_luid[regno] > move2add_last_label_luid && MODES_OK_FOR_MOVE2ADD (GET_MODE (reg), reg_mode[regno]) && reg_base_reg[regno] < 0
Re: [patch] Default to --enable-libstdcxx-time=auto
On 05/22/2013 12:40 PM, Jonathan Wakely wrote: On 22 May 2013 11:30, Paolo Carlini wrote: .. put an abort or something in the dummy implementations, to make sure people don't try to run an executable built with headers which have the clock available linked at run time to a .so which doesn't, really. Suboptimal solution because ideally we would like to fail at build time, but this is the best I have got so far. Be very clear in the docs, anyway. Based on feedback for users much prefer to have something undefined than to have it defined but non-functional or causing an abort. That for sure. But I don't think the situation is exactly the same, to be honest. Note, I don't know what exactly "undesignated" means, I understand you already used that approach, thus don't hesitate to try it here too! Take your time (no pun intended ;) you may want to disable temporarily the auto with a FIXME comment. Paolo.
Re: [Patch, Fortran] PR57338 - add more missing constraint checks for assumed-rank
Le 21/05/2013 20:05, Tobias Burnus a écrit : > That's a follow-up the just committed patch - which came too late in > some cases. > > Build and regtested on x86-64-gnu-linux. > OK for the trunk? > OK, thanks Mikael
Re: [patch] Default to --enable-libstdcxx-time=auto
On Wed, May 22, 2013 at 11:42:45AM +0100, Jonathan Wakely wrote: > On 22 May 2013 11:40, Jonathan Wakely wrote: > > On 22 May 2013 11:30, Paolo Carlini wrote: > >> .. put an abort or something in the dummy implementations, to make sure > >> people don't try to run an executable built with headers which have the > >> clock available linked at run time to a .so which doesn't, really. > >> Suboptimal solution because ideally we would like to fail at build time, > >> but > >> this is the best I have got so far. Be very clear in the docs, anyway. > > > > Based on feedback for users much prefer to have something > > undefined than to have it defined but non-functional or causing an > > abort. > > Could leave the steady_clock::now() symbol "undesignated", like the > __once_call and __once_callable symbols that are only present when > _GLIBCXX_HAVE_TLS is defined? For now() I think it was a mistake not to export it (generally, exports conditional on supposedly non-ABI changing configure options are bad IMHO), can't it just return some perhaps less precise clock or something instead? The problem with having libstdc++.so's without now()@@GLIBCXX_3.4.17 exported in the wild and now new default flag builds having those is quite a big ABI problem, say rpm dependency tracking will not spot those, it only looks at symbol version names, not at individual symbols, so programs that will try to call that symbol might fail only at runtime. If now() can be perhaps with worse precision emulated in configurations not built against glibc 2.17, perhaps best solution would be to add now()@@GLIBCXX_3.4.18 into 4.8.1 (and change all 3.4.18 symbols to 3.4.19) and have now()@GLIBCXX_3.4.17 (note, just one @) as compatibility alias to that. Jakub
[AArch64] Support for CLZ
Hello, This patch adds support to AdvSIMD CLZ instruction and adds tests for the same. Regression test done for aarch64-none-elf with no issues. OK? Regards VP --- gcc/ChangeLog 2013-05-22 Vidya Praveen * config/aarch64/aarch64-simd.md (clzv4si2): Support for CLZ instruction (AdvSIMD). * config/aarch64/aarch64-builtins.c (aarch64_builtin_vectorized_function): Handler for BUILT_IN_CLZ. * config/aarch64/aarch-simd-builtins.def: Entry for CLZ. * testsuite/gcc.target/aarch64/vect-clz.c: New file. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 4fdfe24..2a0e5fd 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1245,6 +1245,16 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in) return AARCH64_FIND_FRINT_VARIANT (sqrt); #undef AARCH64_CHECK_BUILTIN_MODE #define AARCH64_CHECK_BUILTIN_MODE(C, N) \ + (out_mode == SImode && out_n == C \ + && in_mode == N##Imode && in_n == C) +case BUILT_IN_CLZ: + { +if (AARCH64_CHECK_BUILTIN_MODE (4, S)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_clzv4si]; +return NULL_TREE; + } +#undef AARCH64_CHECK_BUILTIN_MODE +#define AARCH64_CHECK_BUILTIN_MODE(C, N) \ (out_mode == N##Imode && out_n == C \ && in_mode == N##Fmode && in_n == C) case BUILT_IN_LFLOOR: diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index e420173..5134f96 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -49,6 +49,7 @@ BUILTIN_VDQF (UNOP, sqrt, 2) BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) + VAR1 (UNOP, clz, 2, v4si) BUILTIN_VD_RE (REINTERP, reinterpretdi, 0) BUILTIN_VDC (REINTERP, reinterpretv8qi, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9069a73..82fe1ad 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1611,6 +1611,15 @@ DONE; }) +(define_insn "clz2" + [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w") + (clz:VDQ_BHSI (match_operand:VDQ_BHSI 1 "register_operand" "w")))] + "TARGET_SIMD" + "clz\\t%0., %1." + [(set_attr "simd_type" "simd_cls") + (set_attr "simd_mode" "")] +) + ;; 'across lanes' max and min ops. (define_insn "reduc__" diff --git a/gcc/testsuite/gcc.target/aarch64/vect-clz.c b/gcc/testsuite/gcc.target/aarch64/vect-clz.c new file mode 100644 index 000..8f1fe70 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-clz.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -save-temps -fno-inline" } */ + +extern void abort (); + +void +count_lz_v4si (unsigned *__restrict a, int *__restrict b) +{ + int i; + + for (i = 0; i < 4; i++) +b[i] = __builtin_clz (a[i]); +} + +/* { dg-final { scan-assembler "clz\tv\[0-9\]+\.4s" } } */ + +int +main () +{ + unsigned int x[4] = { 0x0, 0x, 0x1, 0x }; + int r[4] = { 32, 16, 15, 0 }; + int d[4], i; + + count_lz_v4si (x, d); + + for (i = 0; i < 4; i++) +{ + if (d[i] != r[i]) + abort (); +} + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
Re: [patch] Default to --enable-libstdcxx-time=auto
On 22 May 2013 12:15, Jakub Jelinek wrote: > > If now() can be perhaps with worse precision emulated in configurations not > built against glibc 2.17, perhaps best solution would be to > add now()@@GLIBCXX_3.4.18 into 4.8.1 (and change all 3.4.18 symbols to > 3.4.19) and have now()@GLIBCXX_3.4.17 (note, just one @) as compatibility > alias to that. The problem for steady_clock::now() isn't one of precision, it's that it isn't steady if we don't use the monotonic clock. We could define a non-steady steady_clock (with the same precision as system_clock) but is that helpful to users? If that's what we want then yes, we can do the symbol versioning you suggest.
Re: Unreviewed libffi patch
Anthony Green writes: > Rainer - sorry, I've been travelling and falling behind on email. > This patch is fine. Please commit it to GCC. I'll put it in the > libffi git tree. No worries, I just saw your vacation note. I've commited to mainline now. Jakub, what should we do about the 4.8 branch? Wait until 4.8.1 is released or apply now? This is a regression from 4.7. Thanks. Rainer > On Wed, May 22, 2013 at 5:58 AM, Andrew Haley wrote: > > On 05/22/2013 10:55 AM, Rainer Orth wrote: > > The following libffi patch > > > > [libffi] Fix several libffi testsuite failures on 64-bit > SPARC and PowerPC (PR libffi/56033) > > http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00840.html > > > > has remained unreviewed for a week. It needs a libffi > maintainer, which > > I suppose Anthony still is, despite not being listed in > MAINTAINERS. > > Forwarding to libffi. > > > > > > -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [ARM] fix PR debug/57351 ICE: internal compiler error: in dbx_reg_number,
On 05/22/13 11:31, Christian Bruel wrote: Hello, arm_dwarf_register_span converts regno to a dbx register number while building the PARALLEL rtx. But since http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01131.html this information is centralized in DBX_REGISTER_NUMBER that will be called when processing the operands in reg_loc_descriptor, so the DBX conversion information doesn't need to be duplicated. Build and regtested with a arm-none-eabi newlib toolset configured with --with-fpu=neon-vfpv4 --with-float=hard --with-arch=armv7-a OK for trunk ? Thanks Christian In the test use /* { dg-options "-std=c99 -Os -g -march=armv7-a" } */ /* { dg-add-options arm_neon } */ instead of /* { dg-options "-std=c99 -Os -g -march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard" { target { arm-*-* } } } */ Ok with that change. regards Ramana
Re: [4.8 PATCH, i386]: Fix PR 57356, SSE2 instructions generated with '-mno-sse2'
On Wed, May 22, 2013 at 10:01 AM, Uros Bizjak wrote: >>> This patch avoids movdqu/movdqa when SSE2 is disabled. Although >>> -mno-sse2 is bordering on ABI violation for 64bit targets, the patch >>> is simple enough to fix movti_internal_rex64 pattern. >> >> If the TImode attr variant isn't valid for !SSE2, I'd find it clearer >> to do >> (match_test "!TARGET_SSE2") (const_string "V4SF") >> much earlier in the cond, before TARGET_AVX case that returns TImode >> (sure, when TARGET_AVX is true, then !TARGET_SSE2 is false), and >> not added using ior to an unrelated condition. > > Yes, this is a good suggestion, I was trying to be consistent with the > movti_internal pattern. > > I will change both in the way you suggested ASAP. I have committed attached patch to 4.8 branch. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 199189) +++ config/i386/i386.md (working copy) @@ -1798,6 +1798,8 @@ (set (attr "mode") (cond [(eq_attr "alternative" "0,1") (const_string "DI") + (not (match_test "TARGET_SSE2")) +(const_string "V4SF") (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "V4SF") (and (eq_attr "alternative" "4") @@ -1854,15 +1856,16 @@ [(set_attr "type" "sselog1,ssemov,ssemov") (set_attr "prefix" "maybe_vex") (set (attr "mode") - (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (cond [(not (match_test "TARGET_SSE2")) +(const_string "V4SF") + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "V4SF") (and (eq_attr "alternative" "2") (match_test "TARGET_SSE_TYPELESS_STORES")) (const_string "V4SF") (match_test "TARGET_AVX") (const_string "TI") - (ior (not (match_test "TARGET_SSE2")) - (match_test "optimize_function_for_size_p (cfun)")) + (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") ] (const_string "TI")))])
Re: [patch] Default to --enable-libstdcxx-time=auto
On 05/22/2013 01:49 PM, Jonathan Wakely wrote: On 22 May 2013 12:15, Jakub Jelinek wrote: If now() can be perhaps with worse precision emulated in configurations not built against glibc 2.17, perhaps best solution would be to add now()@@GLIBCXX_3.4.18 into 4.8.1 (and change all 3.4.18 symbols to 3.4.19) and have now()@GLIBCXX_3.4.17 (note, just one @) as compatibility alias to that. The problem for steady_clock::now() isn't one of precision, it's that it isn't steady if we don't use the monotonic clock. We could define a non-steady steady_clock (with the same precision as system_clock) but is that helpful to users? If that's what we want then yes, we can do the symbol versioning you suggest. If I understand correctly Jakub's idea, nothing would change from the user point of view in the headers, using now would be still a no-no if the real facility isn't implemented. It would be only a run-time fall back for the situation I mentioned before: the binary is built with headers providing a good clock and then moved to another machine, and dynamically linked to a library which doesn't and then run. We could also spill a warning, at run-time. That kind of solution would be definitely Ok with me. Paolo.
Re: [patch] Default to --enable-libstdcxx-time=auto
On 22 May 2013 12:49, Jonathan Wakely wrote: > On 22 May 2013 12:15, Jakub Jelinek wrote: >> >> If now() can be perhaps with worse precision emulated in configurations not >> built against glibc 2.17, perhaps best solution would be to >> add now()@@GLIBCXX_3.4.18 into 4.8.1 (and change all 3.4.18 symbols to >> 3.4.19) and have now()@GLIBCXX_3.4.17 (note, just one @) as compatibility >> alias to that. > > The problem for steady_clock::now() isn't one of precision, it's that > it isn't steady if we don't use the monotonic clock. We could define > a non-steady steady_clock (with the same precision as system_clock) > but is that helpful to users? If that's what we want then yes, we can > do the symbol versioning you suggest. In fact we were already providing a non-steady steady_clock in the default configuration, we just define it as a typedef for system_clock, so we can export steady_clock::now() as an alias for system_clock::now() when the two clocks are the same type. When they're not the same type there will be a real symbol present in the .so
Re: [patch] Default to --enable-libstdcxx-time=auto
On Wed, May 22, 2013 at 12:49:44PM +0100, Jonathan Wakely wrote: > On 22 May 2013 12:15, Jakub Jelinek wrote: > > > > If now() can be perhaps with worse precision emulated in configurations not > > built against glibc 2.17, perhaps best solution would be to > > add now()@@GLIBCXX_3.4.18 into 4.8.1 (and change all 3.4.18 symbols to > > 3.4.19) and have now()@GLIBCXX_3.4.17 (note, just one @) as compatibility > > alias to that. > > The problem for steady_clock::now() isn't one of precision, it's that > it isn't steady if we don't use the monotonic clock. We could define > a non-steady steady_clock (with the same precision as system_clock) > but is that helpful to users? If that's what we want then yes, we can > do the symbol versioning you suggest. There are two things. One is offering to use steady_clock::now() at compile time, even if we know the clock isn't steady (here sure, I think preferrably don't offer that), but the second thing is, if you have already linked program which uses steady clock, what is better, not be able to dynamically link it at all (steady_clock use might be in some library and you don't ever call now() in your program), or to abort/std::terminate when now() is called in your program, or you offer non-steady clock instead. Or, have you also considered just using for this routine #if _GLIBCXX_HAS_SYS_SYSCALL_H #include #endif #if defined (SYS_clock_gettime) && defined (CLOCK_MONOTONIC) syscall (SYS_clock_gettime, CLOCK_MONOTONIC, &tp); #endif if clock_gettime isn't available, at least on Linux? The implementation seems to ignore ENOSYS from clock_gettime, so ignoring it even here wouldn't make it worse. Jakub
Re: Unreviewed libffi patch
On Wed, May 22, 2013 at 01:49:47PM +0200, Rainer Orth wrote: > Jakub, what should we do about the 4.8 branch? Wait until 4.8.1 is > released or apply now? This is a regression from 4.7. You can apply it now. Jakub
[Patch, Fortran, committed] PR57364 - add missing gfc_commit_symbol (4.8/4.9 regression)
A rather obvious patch. Committed to the trunk as Rev. 199196 after build+regtesting on x86-64-gnu-linux. I will backport the patch to 4.9 in a while. Tobias 2013-05-22 Tobias Burnus PR fortran/57364 * resolve.c (get_temp_from_expr): Commit created sym. 2013-05-22 Tobias Burnus PR fortran/57364 * gfortran.dg/defined_assignment_6.f90: New. diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 74e0aa4..6f32df8 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -9300,6 +9300,7 @@ get_temp_from_expr (gfc_expr *e, gfc_namespace *ns) gfc_set_sym_referenced (tmp->n.sym); gfc_add_flavor (&tmp->n.sym->attr, FL_VARIABLE, name, NULL); + gfc_commit_symbol (tmp->n.sym); e = gfc_lval_expr_from_sym (tmp->n.sym); /* Should the lhs be a section, use its array ref for the diff --git a/gcc/testsuite/gfortran.dg/defined_assignment_6.f90 b/gcc/testsuite/gfortran.dg/defined_assignment_6.f90 new file mode 100644 index 000..a5666fe --- /dev/null +++ b/gcc/testsuite/gfortran.dg/defined_assignment_6.f90 @@ -0,0 +1,36 @@ +! { dg-do compile } +! +! PR fortran/57364 +! +! Contributed by Damian Rouson +! +module ref_counter_implementation + type ref_counter + contains +procedure :: assign +generic :: assignment(=) => assign + end type +contains + subroutine assign (lhs, rhs) +class (ref_counter), intent(inout) :: lhs +class (ref_counter), intent(in) :: rhs + end subroutine +end module +module foo_parent_implementation + use ref_counter_implementation ,only: ref_counter + type :: foo_parent +type(ref_counter) :: counter + end type +contains + type(foo_parent) function new_foo_parent() + end function +end module +module foo_implementation + use foo_parent_implementation ,only: foo_parent,new_foo_parent + type, extends(foo_parent) :: foo + end type +contains + type(foo) function new_foo() +new_foo%foo_parent = new_foo_parent() + end function +end module
Re: Unreviewed libffi patch
Jakub Jelinek writes: > On Wed, May 22, 2013 at 01:49:47PM +0200, Rainer Orth wrote: >> Jakub, what should we do about the 4.8 branch? Wait until 4.8.1 is >> released or apply now? This is a regression from 4.7. > > You can apply it now. Done, thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [AArch64] Support for CLZ
On 22 May 2013 12:47, Vidya Praveen wrote: > Hello, > > This patch adds support to AdvSIMD CLZ instruction and adds tests for the > same. > Regression test done for aarch64-none-elf with no issues. > > OK? > > Regards > VP > > --- > > gcc/ChangeLog > > 2013-05-22 Vidya Praveen > > * config/aarch64/aarch64-simd.md (clzv4si2): Support for CLZ > instruction (AdvSIMD). > * config/aarch64/aarch64-builtins.c > (aarch64_builtin_vectorized_function): Handler for BUILT_IN_CLZ. > * config/aarch64/aarch-simd-builtins.def: Entry for CLZ. > * testsuite/gcc.target/aarch64/vect-clz.c: New file. OK
Re: [rs6000] Add register save/restore routines for cross
I don't believe those functions should be provided by libgcc, at least not by shared libgcc.so, as explained by Alan. - David
Re: [patch, powerpc] increase array alignment for Altivec
On 05/22/2013 02:01 AM, Richard Biener wrote: On Wed, May 22, 2013 at 3:57 AM, David Edelsohn wrote: Increasing the alignment of arrays within structs and unions would be nice, but that probably will change the ABI. I think that they best we may be able to do is increase the alignment if the array is the first element of the struct or union, see ROUND_TYPE_ALIGN for AIX. Although this might be more trouble than it is worth. Maybe the idea was to increase the alignment of the struct (without affecting it's layout) when that increases the alignment of a contained array member. Like for struct { int i; int j; float a[1024]; int x; }; where aligning to sizeof (int) * 2 would get 'a' a bigger alignment. In fact, this is what the patch I posted earlier this week does, except the alignment it looks for is 16 bytes. In other words, it doesn't change the alignment of arrays inside structs (which would cause all sorts of compatibility problems), but it checks to see whether there is already some array inside the struct aligned on a 16-byte offset so that the whole containing struct object would benefit from being aligned on a 16-byte boundary. This was a generalization of the suggestion from the previous review to align structs that contain an array as the first or only element I realized we'd want to handle unions too, and look at all fields in a union instead of just the first one in a struct, and unifying the two cases made the code come out tidier. :-) -Sandra
Re: [rs6000] Add register save/restore routines for cross
On Wed, May 22, 2013 at 09:40:22AM -0400, David Edelsohn wrote: > I don't believe those functions should be provided by libgcc, at least > not by shared libgcc.so, as explained by Alan. David, I think t-savresfgpr satifies that requirement. # These can't end up in shared libgcc LIB2ADD_ST += \ ... -- Alan Modra Australia Development Lab, IBM
Re: [patch, powerpc] increase array alignment for Altivec
On Wed, May 22, 2013 at 3:57 PM, Sandra Loosemore wrote: > On 05/22/2013 02:01 AM, Richard Biener wrote: >> >> On Wed, May 22, 2013 at 3:57 AM, David Edelsohn wrote: >>> >>> >>> Increasing the alignment of arrays within structs and unions would be >>> nice, but that probably will change the ABI. I think that they best we >>> may be able to do is increase the alignment if the array is the first >>> element of the struct or union, see ROUND_TYPE_ALIGN for AIX. >>> Although this might be more trouble than it is worth. >> >> >> Maybe the idea was to increase the alignment of the struct (without >> affecting it's layout) when that increases the alignment of a contained >> array member. Like for >> >> struct { int i; int j; float a[1024]; int x; }; >> >> where aligning to sizeof (int) * 2 would get 'a' a bigger alignment. > > > In fact, this is what the patch I posted earlier this week does, except the > alignment it looks for is 16 bytes. In other words, it doesn't change the > alignment of arrays inside structs (which would cause all sorts of > compatibility problems), but it checks to see whether there is already some > array inside the struct aligned on a 16-byte offset so that the whole > containing struct object would benefit from being aligned on a 16-byte > boundary. > > This was a generalization of the suggestion from the previous review to > align structs that contain an array as the first or only element I > realized we'd want to handle unions too, and look at all fields in a union > instead of just the first one in a struct, and unifying the two cases made > the code come out tidier. :-) Right. Btw, the code should all be target independent ... just the desired alignment for an array should be queried from it. pass_ipa_increase_alignment and/or the vectorizer code itself should use that info. Richard. > -Sandra >
Re: [rs6000] Add register save/restore routines for cross
Why does cross need the functions in libgcc and not provided by the linker? - David On Wed, May 22, 2013 at 10:00 AM, Alan Modra wrote: > On Wed, May 22, 2013 at 09:40:22AM -0400, David Edelsohn wrote: >> I don't believe those functions should be provided by libgcc, at least >> not by shared libgcc.so, as explained by Alan. > > David, I think t-savresfgpr satifies that requirement. > > # These can't end up in shared libgcc > LIB2ADD_ST += \ > ... > > -- > Alan Modra > Australia Development Lab, IBM
Re: [patch] Default to --enable-libstdcxx-time=auto
Jonathan Wakely writes: > This alters the configure script to enable C++11 thread library > features based on targets that are known to support the features, > rather than based on link tests which are disabled by default. With > Glibc 2.17 this enables a nanosecond resolution std::system_clock in > the default configuration, yay! > > I've tested this on two versions of Fedora and Debian, but would be > grateful for test results on Solaris, Cygwin and BSD targets, and for > cross-compilers to any of those targets. Apart from the abi_check failure already reported, I get the following testsuite regressions on Solaris 10/x86: FAIL: 30_threads/async/54297.cc (test for excess errors) WARNING: 30_threads/async/54297.cc compilation failed to produce executable FAIL: 30_threads/condition_variable_any/53830.cc (test for excess errors) WARNING: 30_threads/condition_variable_any/53830.cc compilation failed to produ e executable FAIL: 30_threads/this_thread/3.cc (test for excess errors) WARNING: 30_threads/this_thread/3.cc compilation failed to produce executable FAIL: 30_threads/this_thread/4.cc (test for excess errors) WARNING: 30_threads/this_thread/4.cc compilation failed to produce executable FAIL: 30_threads/thread/native_handle/cancel.cc (test for excess errors) WARNING: 30_threads/thread/native_handle/cancel.cc compilation failed to produc e executable All of them have the same root cause: Excess errors: Undefined first referenced symbol in file nanosleep /var/tmp//ccQhmiwd.o (symbol belongs to implicit dependency /lib/librt.so.1) ld: fatal: symbol referencing errors. No output written to ./54297.exe collect2: error: ld returned 1 exit status It seems that now every single C++ program needs to be linked with -lrt, not only libstdc++.so. This will also happen on Solaris 9 (bootstrap still running), while on Solaris 11 nanosleep and the others were integrated into libc.so.1. Speaking of Solaris 9, there's another caveat: unlike Solaris 10 and up, CLOCK_MONOTONIC isn't defined, while the equivalent non-standard CLOCK_HIGHRES is. Instead of handling this in libstdc++-v3/src/c++11/chrono.cc directly, I've chosen the following route which allows libstdc++ to build on Solaris 9: 2013-05-22 Rainer Orth * config/os/solaris/solaris2.9/os_defines.h [!CLOCK_MONOTONIC] (CLOCK_MONOTONIC): Define. # HG changeset patch # Parent 079c0691568d1b9ef88a62ebf89ea19d3a8272ab Use CLOCK_HIGHRES on Solaris 9 libstdc++-v3: * config/os/solaris/solaris2.9/os_defines.h [!CLOCK_MONOTONIC] (CLOCK_MONOTONIC): Define. diff --git a/libstdc++-v3/config/os/solaris/solaris2.9/os_defines.h b/libstdc++-v3/config/os/solaris/solaris2.9/os_defines.h --- a/libstdc++-v3/config/os/solaris/solaris2.9/os_defines.h +++ b/libstdc++-v3/config/os/solaris/solaris2.9/os_defines.h @@ -35,5 +35,10 @@ #define __CORRECT_ISO_CPP_WCHAR_H_PROTO #endif +/* Solaris 9 uses the non-standard CLOCK_HIGHRES instead. */ +#ifndef CLOCK_MONOTONIC +#define CLOCK_MONOTONIC CLOCK_HIGHRES #endif +#endif + Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[Patch, Fortran] Create valid temporary variable to avoid assembler errors
With one Fortran file, I get the following assembler errors: /tmp/cc28epKK.s:2075: Error: junk `@1.2304+16' after expression That's due to the way a temporary variable is generated. While that variable is local to the procedure, the name somehow escapes into the assembler file. The dump looks as follows. static struct foo DA@0; static struct universal DA@1; static struct universal DA@2; ... class.30._data = &DA@0.foo_parent.universal.counter; I have now changed the mangling, see attached patch. (The test file uses finalization - hence, I do not include it into the patch. I will include it in the FINAL patch.) Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias 2013-05-22 Tobias Burnus * resolve.c (get_temp_from_expr): Fix temp var mangling. diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 6f32df8..5fabc9a 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -9254,7 +9254,7 @@ get_temp_from_expr (gfc_expr *e, gfc_namespace *ns) gfc_array_ref *aref; gfc_ref *ref; - sprintf (name, "DA@%d", serial++); + sprintf (name, "DA" GFC_PREFIX("%d"), serial++); gfc_get_sym_tree (name, ns, &tmp, false); gfc_add_type (tmp->n.sym, &e->ts, NULL);
Re: [patch, powerpc] increase array alignment for Altivec
On Wed, May 22, 2013 at 07:57:34AM -0600, Sandra Loosemore wrote: > On 05/22/2013 02:01 AM, Richard Biener wrote: > >Maybe the idea was to increase the alignment of the struct (without > >affecting it's layout) when that increases the alignment of a contained > >array member. Like for > > > >struct { int i; int j; float a[1024]; int x; }; > > > >where aligning to sizeof (int) * 2 would get 'a' a bigger alignment. > > In fact, this is what the patch I posted earlier this week does, > except the alignment it looks for is 16 bytes. In other words, it > doesn't change the alignment of arrays inside structs (which would > cause all sorts of compatibility problems), but it checks to see > whether there is already some array inside the struct aligned on a > 16-byte offset so that the whole containing struct object would > benefit from being aligned on a 16-byte boundary. Right, and I like the patch. However it does have a potential ABI problem. For example a shared library that exports a variable affected by this change might do so on a smaller alignment than new code expects. Fiddling with that variable in a new executable might give wrong results. Or the PR56564 case of linking C++ relocatable objects compiled with different compilers might bite us. Note that rs6000.h DATA_ALIGNMENT already assumes alignment greater than the ppc64 ABI guarantees for byte arrays. So we're already guilty of this sin. -- Alan Modra Australia Development Lab, IBM
Re: [rs6000] Add register save/restore routines for cross
On Wed, May 22, 2013 at 10:05:47AM -0400, David Edelsohn wrote: > Why does cross need the functions in libgcc and not provided by the linker? Only the ppc64 linker provides save/restore functions magically. -- Alan Modra Australia Development Lab, IBM
Re: [rs6000] Add register save/restore routines for cross
On Wed, May 22, 2013 at 10:35 AM, Alan Modra wrote: > On Wed, May 22, 2013 at 10:05:47AM -0400, David Edelsohn wrote: >> Why does cross need the functions in libgcc and not provided by the linker? > > Only the ppc64 linker provides save/restore functions magically. Okay, then the patch is okay. Thanks, David
Re: [patch] Default to --enable-libstdcxx-time=auto
On 22 May 2013 15:05, Rainer Orth wrote: > Jonathan Wakely writes: > >> This alters the configure script to enable C++11 thread library >> features based on targets that are known to support the features, >> rather than based on link tests which are disabled by default. With >> Glibc 2.17 this enables a nanosecond resolution std::system_clock in >> the default configuration, yay! >> >> I've tested this on two versions of Fedora and Debian, but would be >> grateful for test results on Solaris, Cygwin and BSD targets, and for >> cross-compilers to any of those targets. > > Apart from the abi_check failure already reported, I get the following > testsuite regressions on Solaris 10/x86: Thanks for checking it. > FAIL: 30_threads/async/54297.cc (test for excess errors) > WARNING: 30_threads/async/54297.cc compilation failed to produce executable > FAIL: 30_threads/condition_variable_any/53830.cc (test for excess errors) > WARNING: 30_threads/condition_variable_any/53830.cc compilation failed to > produ > e executable > FAIL: 30_threads/this_thread/3.cc (test for excess errors) > WARNING: 30_threads/this_thread/3.cc compilation failed to produce executable > FAIL: 30_threads/this_thread/4.cc (test for excess errors) > WARNING: 30_threads/this_thread/4.cc compilation failed to produce executable > FAIL: 30_threads/thread/native_handle/cancel.cc (test for excess errors) > WARNING: 30_threads/thread/native_handle/cancel.cc compilation failed to > produc > e executable > > All of them have the same root cause: > > Excess errors: > Undefined first referenced > symbol in file > nanosleep /var/tmp//ccQhmiwd.o (symbol belongs to > implicit dependency /lib/librt.so.1) > ld: fatal: symbol referencing errors. No output written to ./54297.exe > collect2: error: ld returned 1 exit status > > It seems that now every single C++ program needs to be linked with -lrt, > not only libstdc++.so. This will also happen on Solaris 9 (bootstrap > still running), while on Solaris 11 nanosleep and the others were > integrated into libc.so.1. I see. Would you prefer to disable use of nanosleep on Solaris 9 and 10, or to keep using it and link to librt?
Re: [patch] Default to --enable-libstdcxx-time=auto
Jonathan Wakely writes: >> All of them have the same root cause: >> >> Excess errors: >> Undefined first referenced >> symbol in file >> nanosleep /var/tmp//ccQhmiwd.o (symbol belongs to implicit dependency >> /lib/librt.so.1) >> ld: fatal: symbol referencing errors. No output written to ./54297.exe >> collect2: error: ld returned 1 exit status >> >> It seems that now every single C++ program needs to be linked with -lrt, >> not only libstdc++.so. This will also happen on Solaris 9 (bootstrap >> still running), while on Solaris 11 nanosleep and the others were >> integrated into libc.so.1. > > I see. Would you prefer to disable use of nanosleep on Solaris 9 and > 10, or to keep using it and link to librt? I think it's fine for both C++11 programs and libstdc++.so to depend on librt. Maybe one could restrict linking with -lrt to C++11 mode? But I doubt that: you can probably link both C++99 and C++11 objects into a single executable, and with libstdc++.so already depending on librt, there's not much point in the added complexity. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix incorrect discriminator assignment.
Sounds reasonable. The patch is updated, bootstrapped and passed all regression test. Dehao On Tue, May 21, 2013 at 5:34 AM, Richard Biener wrote: > On Fri, May 17, 2013 at 5:48 PM, Dehao Chen wrote: >> On Fri, May 17, 2013 at 1:22 AM, Richard Biener >> wrote: >>> On Wed, May 15, 2013 at 6:50 PM, Cary Coutant wrote: > gcc/ChangeLog: > > * tree-cfg.c (locus_descrim_hasher::hash): Only hash lineno. > (locus_descrim_hasher::equal): Likewise. > (next_discriminator_for_locus): Likewise. > (assign_discriminator): Add return value. > (make_edges): Assign more discriminator if necessary. > (make_cond_expr_edges): Likewise. > (make_goto_expr_edges): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.dg/debug/dwarf2/discriminator.c: New Test. Looks OK to me, but I can't approve patches for tree-cfg.c. Two comments: (1) In the C++ conversion, it looks like someone misspelled "discrim" in "locus_descrim_hasher". (2) Is this worth fixing in trunk when we'll eventually switch to a per-instruction discriminator? >>> This patch fixes a common case where one line is distributed to 3 BBs, but only 1 discriminator is assigned. >>> >>> As far as I can see the patch makes discriminators coarser (by hashing >>> and comparing on LOCATION_LINE and not on the location). It also has >>> the changes like >>> >>> - assign_discriminator (entry_locus, then_bb); >>> + if (assign_discriminator (entry_locus, then_bb)) >>> +assign_discriminator (entry_locus, bb); >>> >>> which is what the comment refers to? I think those changes are >>> short-sighted >>> because what happens if the 2nd assign_discriminator call has exactly the >>> same issue? Especially with the make_goto_expr_edges change there >>> may be multiple predecessors where I cannot see how the change is >>> correct. Yes, the change changes something and thus may fix the testcase >>> but it doesn't change things in a predictable way as far as I can see. >>> >>> So - the change to compare/hash LOCATION_LINE looks good to me, >>> but the assign_discriminator change needs more explaining. >> >> The new discriminator assignment algorithm is: >> >> 1 FOR_EACH_BB: >> 2 FOR_EACH_SUCC_EDGE: >> 3 if (same_line(bb, succ_bb)) >> 4 if (succ_bb has no discriminator) >> 5 succ_bb->discriminator = new_discriminator >> 6 else if (bb has no discriminator) >> 7 bb->discriminator = new_discriminator >> >> Because there are so many places to handle SUCC_EDGE, thus the logic >> line #3, #4 is embedded within assign_discriminator function, and the >> logic in line #6 is encoded as the return value of >> assign_discriminator. > > Hmm, as of current the code is hardly readable while the above makes sense > (well, apart from what happens if both succ and bb already have a > discriminator). > > Can I make you refactor the current code to postpone discriminator assignment > until after make_edges completed, thus, do it in a postprocessing step done > exactly like outlined above? That way also the whole thing, including > the currently global discriminator_per_locus can be localized into a single > function. > > Thanks, > Richard. > > >> Thanks, >> Dehao >> >>> >>> Richard. >>> -cary Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 198891) +++ gcc/tree-cfg.c (working copy) @@ -105,7 +105,7 @@ struct locus_descrim_hasher : typed_free_remove locus; + return LOCATION_LINE (item->locus); } /* Equality function for the locus-to-discriminator map. A and B @@ -114,7 +114,7 @@ locus_descrim_hasher::hash (const value_type *item inline bool locus_descrim_hasher::equal (const value_type *a, const compare_type *b) { - return a->locus == b->locus; + return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus); } static hash_table discriminator_per_locus; @@ -125,11 +125,11 @@ static void factor_computed_gotos (void); /* Edges. */ static void make_edges (void); +static void assign_discriminators (void); static void make_cond_expr_edges (basic_block); static void make_gimple_switch_edges (basic_block); static void make_goto_expr_edges (basic_block); static void make_gimple_asm_edges (basic_block); -static void assign_discriminator (location_t, basic_block); static edge gimple_redirect_edge_and_branch (edge, basic_block); static edge gimple_try_redirect_by_replacing_jump (edge, basic_block); static unsigned int split_critical_edges (void); @@ -231,6 +231,7 @@ build_gimple_cfg (gimple_seq seq) /* Create the edges of the flowgraph. */ discriminator_per_locus.create (13); make_edges (); + assign_discriminators (); cleanup_dead_labels (); discriminator_per_locus.dispose (); } @@ -690,11 +691,7 @@ make_edges (void) fallthru = true; if (fallthru) -{ - make_edge (bb, bb->next_bb, EDGE_FALLTHRU); - if
Re: [PATCH] Add explicit default constructors where required by the standard
On 2013-05-22 05:32 , Evgeniy Stepanov wrote: OK to merge to google/4_7 and google/4_8? Yes. Patches coming from trunk or other release branches need no further approval for backporting. You just need to make sure you don't introduce any regressions, of course. Diego.
[PATCH] Do not allow non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs
Hi, earlier this week I asked on IRC whether we could have non-top-level BIT_FIELD_REFs and Richi said that we could. However, when I later looked at SRA code, quite apparently it is not designed to handle non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs. So in order to test whether that assumption is OK, I added the following into the gimple verifier and ran bootstrap and testsuite of all languages including Ada and ObjC++ on x86_64. It survived, which makes me wondering whether we do not want it in trunk. What do you think? Martin 2013-05-22 Martin Jambor * tree-cfg.c (verify_types_in_gimple_reference): Do not allow non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs. Index: src/gcc/tree-cfg.c === --- src.orig/gcc/tree-cfg.c +++ src/gcc/tree-cfg.c @@ -2858,6 +2858,8 @@ verify_types_in_gimple_min_lval (tree ex static bool verify_types_in_gimple_reference (tree expr, bool require_lvalue) { + bool top_level = true; + while (handled_component_p (expr)) { tree op = TREE_OPERAND (expr, 0); @@ -2944,6 +2946,17 @@ verify_types_in_gimple_reference (tree e return false; } + if (!top_level + && (TREE_CODE (expr) == BIT_FIELD_REF + || TREE_CODE (expr) == IMAGPART_EXPR + || TREE_CODE (expr) == REALPART_EXPR)) + { + error ("non-top-level BIT_FIELD_REF, IMAGPART_EXPR or REALPART_EXPR"); + debug_generic_stmt (expr); + return true; + } + + top_level = false; expr = op; }
RE: [PING]RE: [patch] cilkplus: Array notation for C patch
Hello Richard, Thank you for reviewing my code. Please see my responses below. Thanks, Balaji V. Iyer. > -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Tuesday, May 21, 2013 10:57 PM > To: Iyer, Balaji V > Cc: 'Joseph S. Myers'; 'Aldy Hernandez'; 'gcc-patches' > Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch > > Let me start by saying that I think the patch is generally ok, especially > considering the advice that's already been given. That said... > > > +++ b/gcc/c/c-array-notation.c > > @@ -0,0 +1,3121 @@ > > So, like, are we going to need to replicate 3000 lines to add array notation > to > c++ too? How much of this are we going to be able to share? The overall function names are same, but the components inside it function differs greatly from C and C++. For example, in C++ I can't use build_modify_expr, but build_x_modify_expr. Also, I need to handle overloaded function types, and that requires further checking. Similarly, the trees are different from C and C++, and so I have to do more checks and handle them differently in C++. In my mind, this seem to be the most straight-forward and legible way to do it. > > While I don't think we should make the array notation global, we might be able > to share the code via c-family/. The Ideal way I think would be via > c-gimplify, in > which we don't expand the array notation until later. But even if delaying > the > expansion causes other problems that I havn't thought about, just placing the > common expansion logic in the shared directory would be better. > > > + for (ii = 0; ii < rank ; ii++) > > +{ > > + /* This will create the if statement label. */ > > + if_stmt_label[ii] = build_decl (location, LABEL_DECL, NULL_TREE, > > + void_type_node); > > + DECL_CONTEXT (if_stmt_label[ii]) = current_function_decl; > > + DECL_ARTIFICIAL (if_stmt_label[ii]) = 0; > > + DECL_IGNORED_P (if_stmt_label[ii]) = 1; > > + > > + /* This label statement will point to the loop body. */ > > + body_label[ii] = build_decl (location, LABEL_DECL, NULL_TREE, > > + void_type_node); > > + DECL_CONTEXT (body_label[ii]) = current_function_decl; > > + DECL_ARTIFICIAL (body_label[ii]) = 0; > > + DECL_IGNORED_P (body_label[ii]) = 1; > > + body_label_expr[ii] = build1 (LABEL_EXPR, void_type_node, > > + body_label[ii]) > > ; > > + > > + /* This will create the exit label, i.e. where the while loop will > > branch > > +out of. */ > > + exit_label[ii] = build_decl (location, LABEL_DECL, NULL_TREE, > > + void_type_node); > > + DECL_CONTEXT (exit_label[ii]) = current_function_decl; > > + DECL_ARTIFICIAL (exit_label[ii]) = 0; > > + DECL_IGNORED_P (exit_label[ii]) = 1; > > + exit_label_expr[ii] = build1 (LABEL_EXPR, void_type_node, > > exit_label[ii]); > > +} > > Is there any particular reason why you're open-coding the loop expansion > instead of using existing infrastructure like c_finish_loop? Before I started my implementation, I did briefly consider using c_finish_loop, but adding multiple expressions got a bit complicated. For example, if I have 2 increment/condition expressions (my initial implementation had it but later on I eliminated that need), I had to create a statement list and then add them using append_to_statement_list() and then pass that into c_finish_loop. Also, if I am not mistaken, I still had to create some of the individual labels when using c_finish_loop(). Overall, I didn't find much code-size decrease, and I found this a bit more straight forward for me. I apologize if I am mistaken. > > Yes, the specific example of c_finish_loop goes against the c++ sharing, but > it's > fairly easy to add new interfaces to c-common.h that are implemented in the > two front ends. For C++, I looked into creating FOR_STMT instead. I found this slightly more convenient than c_finish_loop, but decided to stick with this approach for convenience, and I didn't find much code-size decrease. > Are these changes a blocking issue for acceptance into trunk? Or, would it be ok to accept it as-is and then I make these changes at a later date? > > > r~
[PATCH, AArch64] Fix invalid assembler in scalar_intrinsics.c test
The test file scalar_intrinsics.c (in gcc.target/aarch64) is currently compile-only. If you attempt to make it run, as opposed to just generate assembler, you can't because it won't assemble. There are two issues causing trouble here: 1) Use of invalid instruction "mov d0, d1". It should be "mov d0, v1.d[0]". 2) The vdupd_lane_s64 and vdupd_lane_u64 calls are being given a lane that is out of range, which causes invalid assembler output. This patch fixes both, so that we can build on this to make executable test cases for scalar intrinsics. OK for trunk? Cheers, Ian 2013-05-22 Ian Bolton testsuite/ * gcc.target/aarch64/scalar_intrinsics.c (force_simd): Use a valid instruction. (test_vdupd_lane_s64): Pass a valid lane argument. (test_vdupd_lane_u64): Likewise.diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c index 7427c62..16537ce 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c @@ -4,7 +4,7 @@ #include /* Used to force a variable to a SIMD register. */ -#define force_simd(V1) asm volatile ("mov %d0, %d1" \ +#define force_simd(V1) asm volatile ("mov %d0, %1.d[0]" \ : "=w"(V1) \ : "w"(V1)\ : /* No clobbers */); @@ -228,13 +228,13 @@ test_vdups_lane_u32 (uint32x4_t a) int64x1_t test_vdupd_lane_s64 (int64x2_t a) { - return vdupd_lane_s64 (a, 2); + return vdupd_lane_s64 (a, 1); } uint64x1_t test_vdupd_lane_u64 (uint64x2_t a) { - return vdupd_lane_u64 (a, 2); + return vdupd_lane_u64 (a, 1); } /* { dg-final { scan-assembler-times "\\tcmtst\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 2 } } */
Re: [Patch ARM] Fix PR target/19599
This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57372 A fix is forthcoming - this is a dup of PR57340. Ramana -- Roman
Re: *ping* Re: [PATCH, PR preprocessor/42014] Added LAST_SOURCE_COLUMN in while loop
> "Shakthi" == Shakthi Kannan writes: Shakthi> Is the following patch okay for trunk? I still think it needs a test case. I also don't recall -- did you check to see if the column number that is emitted is actually correct? You may want to change the Subject line of your note, as well. The bug is filed against the preprocessor but it would have to be reviewed by a diagnostics maintainer. Tom
Re: [PATCH, AArch64] Fix invalid assembler in scalar_intrinsics.c test
On 22 May 2013 16:18, Ian Bolton wrote: > The test file scalar_intrinsics.c (in gcc.target/aarch64) > is currently compile-only. > > If you attempt to make it run, as opposed to just generate > assembler, you can't because it won't assemble. > > There are two issues causing trouble here: > > 1) Use of invalid instruction "mov d0, d1". >It should be "mov d0, v1.d[0]". > > 2) The vdupd_lane_s64 and vdupd_lane_u64 calls are being given >a lane that is out of range, which causes invalid assembler >output. > > This patch fixes both, so that we can build on this to make > executable test cases for scalar intrinsics. > > OK for trunk? OK /Marcus
Re: [PING]RE: [patch] cilkplus: Array notation for C patch
On 2013-05-21 23:15, Jakub Jelinek wrote: Furthermore, do you want to generate just normal loop out of it, or shouldn't we generate a #pragma omp simd loop out of it instead? Haven't read the spec if array notation guarantees lack of forward/backward loop dependencies and what are the restrictions on the calls you can do inside of array notation. Perhaps the lowering to normal vs. simd loop could depend on whether all called functions in the expression are elemental. The only overlap allowed is exact, i.e. a[0:5] = a[0:5] + b[1:5], but not a[0:5] = a[1:5] + b[1:5]. So, yes, I believe that safelen is essentially unlimited. r~
[Patch ARM] Fix PR57340
Hi, This fixes PR target/57340 a fallout from my PR target/19599 patch. Unfortunately this didn't show up in the testing I did and I'm sorry about the breakage. The problem here is that we choose r3 as a padding register with indirect tailcalls that could use r3 for this purpose. Ofcourse this decision is made when the frame offsets is calculated and r3 isn't in the fusage of the function. The simplest way to fix this was to check if the tailcall in this case is also an indirect tailcall. If so, we don't allow the use of r3 as a padding register. Additionally, in the prologue code we choose an alternate callee save register that is free if r3 is not available so we are free to use this instead of another callee save register. Tested by bootstrapping armv5te and observing it pass well into stage2. Tested by bootstrapping and reg testing Thumb2 configurations with A9 where this problem was also observed on a version prior to breakages from PR57351. Applied. regards Ramana 2013-05-22 Ramana Radhakrishnan PR target/19599 PR target/57340 * config/arm/arm.c (any_sibcall_uses_r3): Rename to .. (any_sibcall_could_use_r3): this and handle indirect calls. (arm_get_frame_offsets): Rename use of any_sibcall_uses_r3.Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 199202) +++ gcc/config/arm/arm.c(working copy) @@ -17568,11 +17568,27 @@ thumb_force_lr_save (void) || df_regs_ever_live_p (LR_REGNUM)); } +/* We do not know if r3 will be available because + we do have an indirect tailcall happening in this + particular case. */ +static bool +is_indirect_tailcall_p (rtx call) +{ + rtx pat = PATTERN (call); + + /* Indirect tail call. */ + pat = XVECEXP (pat, 0, 0); + if (GET_CODE (pat) == SET) +pat = SET_SRC (pat); + + pat = XEXP (XEXP (pat, 0), 0); + return REG_P (pat); +} /* Return true if r3 is used by any of the tail call insns in the current function. */ static bool -any_sibcall_uses_r3 (void) +any_sibcall_could_use_r3 (void) { edge_iterator ei; edge e; @@ -17586,7 +17602,8 @@ any_sibcall_uses_r3 (void) if (!CALL_P (call)) call = prev_nonnote_nondebug_insn (call); gcc_assert (CALL_P (call) && SIBLING_CALL_P (call)); - if (find_regno_fusage (call, USE, 3)) + if (find_regno_fusage (call, USE, 3) + || is_indirect_tailcall_p (call)) return true; } return false; @@ -17753,7 +17770,7 @@ arm_get_frame_offsets (void) /* If it is safe to use r3, then do so. This sometimes generates better code on Thumb-2 by avoiding the need to use 32-bit push/pop instructions. */ - if (! any_sibcall_uses_r3 () + if (! any_sibcall_could_use_r3 () && arm_size_return_regs () <= 12 && (offsets->saved_regs_mask & (1 << 3)) == 0 && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))
RE: [PING]RE: [patch] cilkplus: Array notation for C patch
Hi Jakub, Please see my response below. Thanks, Balaji V. Iyer. > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Wednesday, May 22, 2013 2:15 AM > To: Richard Henderson > Cc: Iyer, Balaji V; 'Joseph S. Myers'; 'Aldy Hernandez'; 'gcc-patches' > Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch > > On Tue, May 21, 2013 at 07:57:10PM -0700, Richard Henderson wrote: > > >+ /* This will create the if statement label. */ > > >+ if_stmt_label[ii] = build_decl (location, LABEL_DECL, NULL_TREE, > > >+ void_type_node); > > >+ DECL_CONTEXT (if_stmt_label[ii]) = current_function_decl; > > >+ DECL_ARTIFICIAL (if_stmt_label[ii]) = 0; > > >+ DECL_IGNORED_P (if_stmt_label[ii]) = 1; > > >+ > > >+ /* This label statement will point to the loop body. */ > > >+ body_label[ii] = build_decl (location, LABEL_DECL, NULL_TREE, > > >+ void_type_node); > > >+ DECL_CONTEXT (body_label[ii]) = current_function_decl; > > >+ DECL_ARTIFICIAL (body_label[ii]) = 0; > > >+ DECL_IGNORED_P (body_label[ii]) = 1; > > >+ body_label_expr[ii] = build1 (LABEL_EXPR, void_type_node, > > >+ body_label[ii]) > > >; > > >+ > > >+ /* This will create the exit label, i.e. where the while loop will > > >branch > > >+out of. */ > > >+ exit_label[ii] = build_decl (location, LABEL_DECL, NULL_TREE, > > >+ void_type_node); > > >+ DECL_CONTEXT (exit_label[ii]) = current_function_decl; > > >+ DECL_ARTIFICIAL (exit_label[ii]) = 0; > > >+ DECL_IGNORED_P (exit_label[ii]) = 1; > > >+ exit_label_expr[ii] = build1 (LABEL_EXPR, void_type_node, > > >exit_label[ii]); > > >+} > > > > Is there any particular reason why you're open-coding the loop > > expansion instead of using existing infrastructure like c_finish_loop? > > > > Yes, the specific example of c_finish_loop goes against the c++ > > sharing, but it's fairly easy to add new interfaces to c-common.h that > > are implemented in the two front ends. > > Furthermore, do you want to generate just normal loop out of it, or shouldn't > we generate a #pragma omp simd loop out of it instead? > Haven't read the spec if array notation guarantees lack of forward/backward > loop dependencies and what are the restrictions on the calls you can do > inside of > array notation. Perhaps the lowering to normal vs. simd loop could depend on > whether all called functions in the expression are elemental. What you mentioned is a good performance optimization. It is something I have planned to do in future. But, the spec does not require the loop (or array notation expansion) to be vectorized. > > Jakub
RE: [PING]RE: [patch] cilkplus: Array notation for C patch
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Wednesday, May 22, 2013 11:30 AM > To: Jakub Jelinek > Cc: Iyer, Balaji V; 'Joseph S. Myers'; 'Aldy Hernandez'; 'gcc-patches' > Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch > > On 2013-05-21 23:15, Jakub Jelinek wrote: > > Furthermore, do you want to generate just normal loop out of it, or > > shouldn't we generate a #pragma omp simd loop out of it instead? > > Haven't read the spec if array notation guarantees lack of > > forward/backward loop dependencies and what are the restrictions on > > the calls you can do inside of array notation. Perhaps the lowering > > to normal vs. simd loop could depend on whether all called functions > > in the expression are elemental. > > The only overlap allowed is exact, i.e. a[0:5] = a[0:5] + b[1:5], but not > a[0:5] = > a[1:5] + b[1:5]. So, yes, I believe that safelen is essentially unlimited. Yes, the vectorlength/safelen can be the entire loop-size. > > > r~
Re: [google gcc-4_7] coverage callback instrumentation (issue9630043)
Looks ok to me in general. 1) the parameter name is not ideal -- it is not callonce. 2) it might be better to extend the callonce parameter into -ftest-coverage option such as -ftest-coverage=exec_once? 3) need documentation in invoke.texi 4) watch out for long lines. cc Teresa. David On Tue, May 21, 2013 at 3:28 PM, Rong Xu wrote: > This patch is to be used with customized coverage reduction. > > The functionalities are under two parameter options: > --param=COVERAGE-CALLBACK={0|1} > when enabled with 1, it injects a callback function for > each arc counter. Default off. > --param=COVERAGE-CALLONCE={0|1} > when enabled with 1, it stops incrementing the arc counter > when they flip to 1. Default off. Even with an extra condition > check, this is drastically faster than the normal coverage test > for highly threaded applications. > > It also fixes a bug in profile sampling where some counters > (like those after the call stmt) are not sampled. > > It disables the may-uninit check if the warning will not be > emitted. > > Tested with bootstrap, regressions test and google internal > benchmarks. > > Thanks, > > -Rong > > 2013-05-21 Rong Xu > > * gcc/tree-ssa-uninit.c (gate_warn_uninitialized): Disable if > may-uninit warning will not emitted. > * gcc/params.def : (PARAM_COVERAGE_CALLBACK) New. > (PARAM_COVERAGE_CALLONCE): NEW. > * gcc/profile.c (branch_prob): Not increment edge counter after > first update. > (tree_init_instrumentation_sampling): Ditto. > (COVERAGE_CALLBACK_FUNC_NAME): New. > (COVERAGE_INSERT_CALL): New. > (insert_if_then): Update branch probability. > (add_sampling_wrapper): Ditto. > (add_callback_wrapper): New. > (add_sampling_to_edge_counters): Make predicate insertion > complete. > (gimple_gen_edge_profiler): Add coverage callback function. > (gimple_gen_ic_profiler): Fix trailing space. > (gimple_gen_ic_func_topn_profiler): Ditto. > (gimple_gen_dc_profiler): Ditto. > * libgcc/libgcov.c (__coverage_callback): Add an empty callback > function. > > Index: libgcc/libgcov.c > === > --- libgcc/libgcov.c(revision 198952) > +++ libgcc/libgcov.c(working copy) > @@ -135,6 +135,14 @@ extern int gcov_dump_complete ATTRIBUTE_HIDDEN; > these symbols will always need to be resolved. */ > void (*__gcov_dummy_ref1)() = &__gcov_reset; > void (*__gcov_dummy_ref2)() = &__gcov_dump; > + > +__attribute__((weak)) void > +__coverage_callback (gcov_type funcdef_no __attribute__ ((unused)), > + int edge_no __attribute__ ((unused))) > +{ > + /* nothing */ > +} > + > #endif /* __GCOV_KERNEL__ */ > > /* Utility function for outputing errors. */ > Index: gcc/profile.c > === > --- gcc/profile.c (revision 198952) > +++ gcc/profile.c (working copy) > @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see > #include "timevar.h" > #include "cfgloop.h" > #include "tree-pass.h" > +#include "params.h" > > #include "profile.h" > > @@ -1480,7 +1481,8 @@ branch_prob (void) >/* Commit changes done by instrumentation. */ >gsi_commit_edge_inserts (); > > - if (flag_profile_generate_sampling) > + if (flag_profile_generate_sampling > + || PARAM_VALUE (PARAM_COVERAGE_CALLONCE)) > add_sampling_to_edge_counters (); > } > > Index: gcc/params.def > === > --- gcc/params.def (revision 198952) > +++ gcc/params.def (working copy) > @@ -1060,6 +1060,16 @@ DEFPARAM (PARAM_PROFILE_GENERATE_SAMPLING_PERIOD, > "sampling rate with -fprofile-generate-sampling", > 100, 0, 20) > > +DEFPARAM (PARAM_COVERAGE_CALLBACK, > + "coverage-callback", > + "callback a user-define function when for arc counter increments.", > + 0, 0, 1) > + > +DEFPARAM (PARAM_COVERAGE_CALLONCE, > + "coverage-callonce", > + "Stop increment coverage counts once they become 1.", > + 0, 0, 1) > + > /* Used for debugging purpose. Tell the compiler to find > the gcda file in the current directory. */ > DEFPARAM (PARAM_GCOV_DEBUG, > Index: gcc/tree-profile.c > === > --- gcc/tree-profile.c (revision 198952) > +++ gcc/tree-profile.c (working copy) > @@ -54,6 +54,15 @@ along with GCC; see the file COPYING3. If not see > #include "target.h" > #include "output.h" > > +/* Default name for coverage callback function. */ > +#define COVERAGE_CALLBACK_FUNC_NAME "__coverage_callback" > + > +/* If we insert a callback to edge instrumentation code. Avoid this > + for the callback function itself. */ > +#define COVERAGE_INSERT_CALL ((PARAM_VALUE (PARAM_COVERAGE_CALL
Re: [PING]RE: [patch] cilkplus: Array notation for C patch
On 05/22/2013 09:37 AM, Iyer, Balaji V wrote: Furthermore, do you want to generate just normal loop out of it, or shouldn't we generate a #pragma omp simd loop out of it instead? Haven't read the spec if array notation guarantees lack of forward/backward loop dependencies and what are the restrictions on the calls you can do inside of array notation. Perhaps the lowering to normal vs. simd loop could depend on whether all called functions in the expression are elemental. What you mentioned is a good performance optimization. It is something I have planned to do in future. But, the spec does not require the loop (or array notation expansion) to be vectorized. So if we can represent array notation as an OpenMP SIMD loop and re-use the OpenMP code generation, that's a significant win. I realize the OpenMP SIMD stuff is still in-progress, but from a design standpoint we'd like to separate out the front-end issues from the code generation issues -- with the goal being that we can re-use the OpenMP SIMD path to also handle Cilk SIMD & Cilk array notation and possible bits of OpenACC. Balaji, how feasible is it to rewire the code generation aspects of the array notation patch to instead feed into the OpenMP SIMD bits Jakub is working on? Jeff
[PATCH, PR 57347] Do not create aggregate jump functions for bit-fields
Hi, I have not intended aggregate jump functions to work with bit-fields but apparently forgot to include the test to ignore them. PR 57347 testcase gives a good example why they need to be avoided. If we ever decide to optimize for them too (and not just in IPA land), they should be lowered earlier and jump functions can then be built for the stores to the representatives. The following patch disables their generation. It passes bootstrap and testing on x8664-linux on trunk, the same for the 4.8 branch is currently underway. OK for trunk and for the branch if it passes? Thanks, Martin 2013-05-21 Martin Jambor PR middle-end/57347 * tree.h (contains_bitfld_component_ref_p): Declare. * tree-sra.c (contains_bitfld_comp_ref_p): Move... * tree.c (contains_bitfld_component_ref_p): ...here. Adjust its caller. * ipa-prop.c (determine_known_aggregate_parts): Check that LHS does not access a bit-field. Assert all final offsets are byte-aligned. testsuite/ * gcc.dg/ipa/pr57347.c: New test. Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -1327,7 +1327,9 @@ determine_known_aggregate_parts (gimple lhs = gimple_assign_lhs (stmt); rhs = gimple_assign_rhs1 (stmt); - if (!is_gimple_reg_type (rhs)) + if (!is_gimple_reg_type (rhs) + || TREE_CODE (lhs) == BIT_FIELD_REF + || contains_bitfld_component_ref_p (lhs)) break; lhs_base = get_ref_base_and_extent (lhs, &lhs_offset, &lhs_size, @@ -1418,6 +1420,7 @@ determine_known_aggregate_parts (gimple { struct ipa_agg_jf_item item; item.offset = list->offset - arg_offset; + gcc_assert ((item.offset % BITS_PER_UNIT) == 0); item.value = unshare_expr_without_location (list->constant); jfunc->agg.items->quick_push (item); } Index: src/gcc/testsuite/gcc.dg/ipa/pr57347.c === --- /dev/null +++ src/gcc/testsuite/gcc.dg/ipa/pr57347.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O3" } */ + +struct S1 { int f0; int f1 : 10; int f2 : 13; }; +int i; +int *j = &i; + +static void +foo (struct S1 s) +{ + int *p; + int l[88]; + int **pp = &p; + *pp = &l[1]; + l[0] = 1; + *j = 1 && s.f2; +} + +int +main () +{ + struct S1 s = { 0, 0, 1 }; + foo (s); + if (i != 1) +__builtin_abort (); + return 0; +} Index: src/gcc/tree-sra.c === --- src.orig/gcc/tree-sra.c +++ src/gcc/tree-sra.c @@ -2998,23 +2998,6 @@ get_repl_default_def_ssa_name (struct ac return get_or_create_ssa_default_def (cfun, racc->replacement_decl); } -/* Return true if REF has a COMPONENT_REF with a bit-field field declaration - somewhere in it. */ - -static inline bool -contains_bitfld_comp_ref_p (const_tree ref) -{ - while (handled_component_p (ref)) -{ - if (TREE_CODE (ref) == COMPONENT_REF - && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))) -return true; - ref = TREE_OPERAND (ref, 0); -} - - return false; -} - /* Return true if REF has an VIEW_CONVERT_EXPR or a COMPONENT_REF with a bit-field field declaration somewhere in it. */ @@ -3110,7 +3093,7 @@ sra_modify_assign (gimple *stmt, gimple_ ??? This should move to fold_stmt which we simply should call after building a VIEW_CONVERT_EXPR here. */ if (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) - && !contains_bitfld_comp_ref_p (lhs)) + && !contains_bitfld_component_ref_p (lhs)) { lhs = build_ref_for_model (loc, lhs, 0, racc, gsi, false); gimple_assign_set_lhs (*stmt, lhs); Index: src/gcc/tree.c === --- src.orig/gcc/tree.c +++ src/gcc/tree.c @@ -11785,4 +11785,21 @@ warn_deprecated_use (tree node, tree att } } +/* Return true if REF has a COMPONENT_REF with a bit-field field declaration + somewhere in it. */ + +bool +contains_bitfld_component_ref_p (const_tree ref) +{ + while (handled_component_p (ref)) +{ + if (TREE_CODE (ref) == COMPONENT_REF + && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))) +return true; + ref = TREE_OPERAND (ref, 0); +} + + return false; +} + #include "gt-tree.h" Index: src/gcc/tree.h === --- src.orig/gcc/tree.h +++ src/gcc/tree.h @@ -5974,6 +5974,7 @@ extern tree block_ultimate_origin (const extern tree get_binfo_at_offset (tree, HOST_WIDE_INT, tree); extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *, HOST_WIDE_INT *, HOST_WIDE_INT *); +extern bool contains_bitfld_component_ref_p (const_tree); /* In tree-nested.c */ extern tree build_addr (tree,
Re: [PATCH, AArch64] Allow insv_imm to handle bigger immediates via masking to 16-bits
On 17 May 2013 19:20, Ian Bolton wrote: > The MOVK instruction is currently not used when operand 2 is > more than 16 bits, which leads to sub-optimal code. > > This patch improves those situations by removing the check and > instead masking down to 16 bits within the new "X" format specifier > I added recently. > > OK for trunk? > > Cheers, > Ian > > > 2013-05-17 Ian Bolton > > * config/aarch64/aarch64.c (aarch64_print_operand): Change the X > format > specifier to only display bottom 16 bits. > > * config/aarch64/aarch64.md (insv_imm): Allow any-sized > immediate > to match for operand 2, since it will be masked. OK
Re: [PATCH] Fix incorrect discriminator assignment.
@@ -105,7 +105,7 @@ struct locus_descrim_hasher : typed_free_remove locus; + return LOCATION_LINE (item->locus); } /* Equality function for the locus-to-discriminator map. A and B @@ -114,7 +114,7 @@ locus_descrim_hasher::hash (const value_type *item inline bool locus_descrim_hasher::equal (const value_type *a, const compare_type *b) { - return a->locus == b->locus; + return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus); } While you're in that part of the code, could you fix the spelling of locus_descrim_hasher? Should be "locus_discrim_hasher". -cary
Re: [PATCH] Fix incorrect discriminator assignment.
Sure, will update the patch for that. Dehao On Wed, May 22, 2013 at 9:40 AM, Cary Coutant wrote: > @@ -105,7 +105,7 @@ struct locus_descrim_hasher : typed_free_remove inline hashval_t > locus_descrim_hasher::hash (const value_type *item) > { > - return item->locus; > + return LOCATION_LINE (item->locus); > } > > /* Equality function for the locus-to-discriminator map. A and B > @@ -114,7 +114,7 @@ locus_descrim_hasher::hash (const value_type *item > inline bool > locus_descrim_hasher::equal (const value_type *a, const compare_type *b) > { > - return a->locus == b->locus; > + return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus); > } > > > While you're in that part of the code, could you fix the spelling of > locus_descrim_hasher? Should be "locus_discrim_hasher". > > -cary
Re: [PATCH] Disable profile-use if no .gcda file is found
> If -fprofile-use is specified, but no .gcda file is found, reset all > the flags back to the values they would have had if -fprofile-use was > not specified. Since the code path where -fprofile-use is on and > .gcda files are not found is not a well tested pass, this will > increase the robustness of FDO. Further, in the event an ICE is found > when doing FDO, this allows users to delete .gcda files as a work > around without having to implement complex modifications to the build > system. To ensure the testsuite still passes (and is relevant), some > tests which used -fprofile-use but had no .gcda file had to be moved > so that they would have a .gcda file generated. This also involved > adding a main to these unit tests. > > Bootstrapped and regtested on x86_64. Ok for trunk? > > 2009-09-29 Neil Vachharajani > * coverage.c (read_counts_file): Disable profile use if no .gcda > file is found. > * opts.c (common_handle_option): Call set_profile_use instead of > directly setting parameters in -fprofile-use. > (set_profile_use): New function. > * opts.h (set_profile_use): New function. > * testsuite/g++.dg/tree-ssa/dom-invalid.C: Moved to > testsuite/g++.dg/tree-prof. > * testsuite/g++.dg/tree-prof/dom-invalid.C: See above. > * testsuite/gcc.dg/pr26570.c: Moved to > testsuite/gcc.dg/tree-prof. > * testsuite/gcc.dg/tree-prof/pr26570.c: See above. main added. > * testsuite/gcc.dg/pr32773.c: Moved to > testsuite/gcc.dg/tree-prof. > * testsuite/gcc.dg/tree-prof/pr32773.c: See above. main added. > > + > +/* Set FLAG_PROFILE_USE and dependent flags based on VALUE. This > + function is used to handle the -f(no)profile-use option as well as > + to reset flags if a .gcda file is not found. */ > + > +void > +set_profile_use (bool value, bool force) > +{ > + flag_profile_use = value; > + if (!flag_branch_probabilities_set || force) > +flag_branch_probabilities = value; > + if (!flag_profile_values_set || force) > +flag_profile_values = value; > + if (!flag_unroll_loops_set) > +flag_unroll_loops = value; > + if (!flag_peel_loops_set) > +flag_peel_loops = value; > + if (!flag_tracer_set) > +flag_tracer = value; > + if (!flag_value_profile_transformations_set || force) > +flag_value_profile_transformations = value; > + if (!flag_inline_functions_set) > +flag_inline_functions = value; > + if (!flag_ipa_cp_set) > +flag_ipa_cp = value; > + if (!flag_ipa_cp_clone_set) > +{ > + if (!flag_ipa_cp_set || flag_ipa_cp) > + flag_ipa_cp_clone = value; > +} > + if (!flag_predictive_commoning_set) > +flag_predictive_commoning = value; > + if (!flag_unswitch_loops_set) > +flag_unswitch_loops = value; > + if (!flag_gcse_after_reload_set) > +flag_gcse_after_reload = value; > +} It seems to make sense to disable profile feedback when GCDA is missing. However I wonder if we can't figure out before finishing of parsing of flags? Machine descriptions may want to alter options depending on profile-use so this approach seems fragile. Honza
Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
This patch adds support for the byte, half-word, and quad-word atomic memory operations that were added in ISA 2.07 (i.e. power8). Like the other patches, this passes bootstrap and had no regressions in make check. Is it ok to commit this patch after the previous 6 patches have been applied? [gcc] 2013-05-22 Michael Meissner Pat Haugen Peter Bergner * config/rs6000/rs6000.c (emit_load_locked): Add support for power8 byte, half-word, and quad-word atomic instructions. (emit_store_conditional): Likewise. (rs6000_expand_atomic_compare_and_swap): Likewise. (rs6000_expand_atomic_op): Likewise. * config/rs6000/sync.md (larx): Add new modes for power8. (stcx): Likewise. (AINT): New mode iterator to include TImode as well as normal integer modes on power8. (fetchop_pred): Use int_reg_operand instead of gpc_reg_operand so that VSX registers are not considered. Use AINT mode iterator instead of INT1 to allow inclusion of quad word atomic operations on power8. (load_locked): Likewise. (store_conditional): Likewise. (atomic_compare_and_swap): Likewise. (atomic_exchange): Likewise. (atomic_nand): Likewise. (atomic_fetch_): Likewise. (atomic_nand_fetch): Likewise. (mem_thread_fence): Use gen_loadsync_ instead of enumerating each type. (ATOMIC): On power8, add QImode, HImode modes. (load_locked_si): Varients of load_locked for QI/HI modes that promote to SImode. (load_lockedti): Convert TImode arguments to PTImode, so that we get a guaranteed even/odd register pair. (load_lockedpti): Likewise. (store_conditionalti): Likewise. (store_conditionalpti): Likewise. * config/rs6000/rs6000.md (QHI): New mode iterator for power8 atomic load/store instructions. (HSI): Likewise. [gcc/testsuite] 2013-05-22 Michael Meissner Pat Haugen Peter Bergner * gcc.target/powerpc/atomic-p7.c: New file, add tests for atomic load/store instructions on power7, power8. * gcc.target/powerpc/atomic-p8.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/testsuite/gcc.target/powerpc/atomic-p7.c === --- gcc/testsuite/gcc.target/powerpc/atomic-p7.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/atomic-p7.c(revision 0) @@ -0,0 +1,207 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-mcpu=power7 -O2" } */ +/* { dg-final { scan-assembler-not "lbarx" } } */ +/* { dg-final { scan-assembler-not "lharx" } } */ +/* { dg-final { scan-assembler-times "lwarx" 18 } } */ +/* { dg-final { scan-assembler-times "ldarx" 6 } } */ +/* { dg-final { scan-assembler-not "lqarx" } } */ +/* { dg-final { scan-assembler-not "stbcx" } } */ +/* { dg-final { scan-assembler-not "sthcx" } } */ +/* { dg-final { scan-assembler-times "stwcx" 18 } } */ +/* { dg-final { scan-assembler-times "stdcx" 6 } } */ +/* { dg-final { scan-assembler-not "stqcx" } } */ +/* { dg-final { scan-assembler-times "bl __atomic" 6 } } */ +/* { dg-final { scan-assembler-times "isync" 12 } } */ +/* { dg-final { scan-assembler-times "lwsync" 8 } } */ +/* { dg-final { scan-assembler-not "mtvsrd" } } */ +/* { dg-final { scan-assembler-not "mtvsrwa" } } */ +/* { dg-final { scan-assembler-not "mtvsrwz" } } */ +/* { dg-final { scan-assembler-not "mfvsrd" } } */ +/* { dg-final { scan-assembler-not "mfvsrwz" } } */ + +/* Test for the byte atomic operations on power8 using lbarx/stbcx. */ +char +char_fetch_add_relaxed (char *ptr, int value) +{ + return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED); +} + +char +char_fetch_sub_consume (char *ptr, int value) +{ + return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME); +} + +char +char_fetch_and_acquire (char *ptr, int value) +{ + return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE); +} + +char +char_fetch_ior_release (char *ptr, int value) +{ + return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE); +} + +char +char_fetch_xor_acq_rel (char *ptr, int value) +{ + return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL); +} + +char +char_fetch_nand_seq_cst (char *ptr, int value) +{ + return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST); +} + +/* Test for the half word atomic operations on power8 using lharx/sthcx. */ +short +short_fetch_add_relaxed (short *ptr, int value) +{ + return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED); +} + +short +short_fetch_sub_consume (short *ptr, int value) +{ + return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME); +} +
website patch committed: GCC 4.8.1 does not implement Go 1.1
I unfortunately have not had time to keep up with gccgo, so I'm adding a note that GCC 4.8.1 does not yet implement Go 1.1. Committed to the web site. Ian Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.117 diff -u -r1.117 changes.html --- changes.html 20 May 2013 16:38:06 - 1.117 +++ changes.html 22 May 2013 17:01:39 - @@ -441,9 +441,8 @@ Go -GCC 4.8.0 implements a preliminary version of the upcoming Go - 1.1 release. The library support is not quite complete, due to - release timing. +GCC 4.8.0 and 4.8.1 implement a preliminary version of the Go + 1.1 release. The library support is not quite complete. Go has been tested on GNU/Linux and Solaris platforms for various processors including x86, x86_64, PowerPC, SPARC, and Alpha. It may work on other platforms as well.
Re: [PATCH][gensupport] Add optional attributes field to define_cond_exec
On 05/22/2013 02:54 AM, Kyrylo Tkachov wrote: > From what I understand, using define_subst would mean creating a > define_subst for every pattern that can be "predicable"? There are at least > 600 predicable patterns in the arm backend, so that would be infeasible. No, define_subst works across patterns, keyed by attributes. Exactly like cond_exec, really. But what you ought to be able to do right now is (define_subst "ds_predicable" [(match_operand 0)] "" [(cond_exec (blah) (match_dup 0))]) (define_subst_attr "ds_predicable_enabled" "ds_predicable" "no" "yes"0 (define_insn "blah" [(blah)] "" "@ blah blah" [(set_attr "ds_predicable" "yes") (set_attr "ds_predicated" "")]) At which point you can define "enabled" in terms of ds_predicated plus whatever. With a small bit of work we ought to be able to move that ds_predicated attribute to the define_subst itself, so that you don't have to replicate that set_attr line N times. I think that's more or less what you were suggesting with your cond_exec extension, yes? r~
Re: [PING]RE: [patch] cilkplus: Array notation for C patch
On 2013-05-22 08:18, Iyer, Balaji V wrote: The overall function names are same, but the components inside it function differs greatly from C and C++. For example, in C++ I can't use build_modify_expr, but build_x_modify_expr. Also, I need to handle overloaded function types, and that requires further checking. Similarly, the trees are different from C and C++, and so I have to do more checks and handle them differently in C++. In my mind, this seem to be the most straight-forward and legible way to do it. Really, that's surprising: /* For use from the C common bits. */ tree build_modify_expr (location_t /*location*/, tree lhs, tree /*lhs_origtype*/, enum tree_code modifycode, location_t /*rhs_location*/, tree rhs, tree /*rhs_origtype*/) { return cp_build_modify_expr (lhs, modifycode, rhs, tf_warning_or_error); } I can definitely see how function overloading could potentially get in the way. Although by delaying expansion of ARRAY_NOTATION_REF, given that the TREE_TYPE of the ARRAY_NOTATION_REF is not some complex artificial slice type but the element type of the array, I wonder how much of the overloaded function selection would just happen automatically? Which reminds me: What is ARRAY_NOTATION_TYPE for? Isn't it always the same as the TREE_TYPE of the ARRAY_NOTATION_REF? Before I started my implementation, I did briefly consider using c_finish_loop, but adding multiple expressions got a bit complicated. For example, if I have 2 increment/condition expressions (my initial implementation had it but later on I eliminated that need), I had to create a statement list and then add them using append_to_statement_list() and then pass that into c_finish_loop. Also, if I am not mistaken, I still had to create some of the individual labels when using c_finish_loop(). Overall, I didn't find much code-size decrease, and I found this a bit more straight forward for me. I apologize if I am mistaken. Well, append_to_statement_list etc was certainly not needed. Mirroring exactly how you'd write it in C, with a comma operator also works. That's a COMPOUND_EXPR, fyi. There are continue/break label arguments to c_finish_loop, but they may be (and often are) null. They will only be non-null if the relevant statements actually exist inside the loop. C.f. c_finish_bc_stmt and c_break_label. You're right about it not saving *that* much code, but the point is more about if we make improvements to normal C loop representation -- such as to support #pragma simd -- then we want to be able to take advantage of that here. Are these changes a blocking issue for acceptance into trunk? Or, would it be ok to accept it as-is and then I make these changes at a later date? I'm leaning to honoring the advice you were given last year and accepting the patch more or less in its current form. r~
Re: [PATCH] Disable profile-use if no .gcda file is found
Jan Hubicka wrote: >> If -fprofile-use is specified, but no .gcda file is found, reset all >> the flags back to the values they would have had if -fprofile-use was >> not specified. Since the code path where -fprofile-use is on and >> .gcda files are not found is not a well tested pass, this will >> increase the robustness of FDO. Further, in the event an ICE is >found >> when doing FDO, this allows users to delete .gcda files as a work >> around without having to implement complex modifications to the build >> system. To ensure the testsuite still passes (and is relevant), some >> tests which used -fprofile-use but had no .gcda file had to be moved >> so that they would have a .gcda file generated. This also involved >> adding a main to these unit tests. >> >> Bootstrapped and regtested on x86_64. Ok for trunk? >> >> 2009-09-29 Neil Vachharajani >> * coverage.c (read_counts_file): Disable profile use if no .gcda >> file is found. >> * opts.c (common_handle_option): Call set_profile_use instead of >> directly setting parameters in -fprofile-use. >> (set_profile_use): New function. >> * opts.h (set_profile_use): New function. >> * testsuite/g++.dg/tree-ssa/dom-invalid.C: Moved to >> testsuite/g++.dg/tree-prof. >> * testsuite/g++.dg/tree-prof/dom-invalid.C: See above. >> * testsuite/gcc.dg/pr26570.c: Moved to >> testsuite/gcc.dg/tree-prof. >> * testsuite/gcc.dg/tree-prof/pr26570.c: See above. main added. >> * testsuite/gcc.dg/pr32773.c: Moved to >> testsuite/gcc.dg/tree-prof. >> * testsuite/gcc.dg/tree-prof/pr32773.c: See above. main added. >> > >> + >> +/* Set FLAG_PROFILE_USE and dependent flags based on VALUE. This >> + function is used to handle the -f(no)profile-use option as well >as >> + to reset flags if a .gcda file is not found. */ >> + >> +void >> +set_profile_use (bool value, bool force) >> +{ >> + flag_profile_use = value; >> + if (!flag_branch_probabilities_set || force) >> +flag_branch_probabilities = value; >> + if (!flag_profile_values_set || force) >> +flag_profile_values = value; >> + if (!flag_unroll_loops_set) >> +flag_unroll_loops = value; >> + if (!flag_peel_loops_set) >> +flag_peel_loops = value; >> + if (!flag_tracer_set) >> +flag_tracer = value; >> + if (!flag_value_profile_transformations_set || force) >> +flag_value_profile_transformations = value; >> + if (!flag_inline_functions_set) >> +flag_inline_functions = value; >> + if (!flag_ipa_cp_set) >> +flag_ipa_cp = value; >> + if (!flag_ipa_cp_clone_set) >> +{ >> + if (!flag_ipa_cp_set || flag_ipa_cp) >> +flag_ipa_cp_clone = value; >> +} >> + if (!flag_predictive_commoning_set) >> +flag_predictive_commoning = value; >> + if (!flag_unswitch_loops_set) >> +flag_unswitch_loops = value; >> + if (!flag_gcse_after_reload_set) >> +flag_gcse_after_reload = value; >> +} > >It seems to make sense to disable profile feedback when GCDA is >missing. However I wonder if we can't >figure out before finishing of parsing of flags? >Machine descriptions may want to alter options depending on profile-use >so this approach seems fragile. You'd have to move handling of fprofile-use to the driver then. And think about what to do with lto. Btw, why bother to reset any flags besides of those controlling profile computation at all? Richard. >Honza
[C++ Patch] PR 57352
Hi, avoiding this ICE on invalid seems just matter of setting up the parser->type_definition_forbidden_message string. Tested x86_64-linux. Thanks, Paolo. /// /cp 2013-05-22 Paolo Carlini PR c++/57352 * parser.c (cp_parser_conversion_type_id): Set up parser->type_definition_forbidden_message before calling cp_parser_type_specifier_seq. /testsuite 2013-05-22 Paolo Carlini PR c++/57352 * g++.dg/parse/crash62.C: New. Index: cp/parser.c === --- cp/parser.c (revision 199204) +++ cp/parser.c (working copy) @@ -11715,13 +11715,22 @@ cp_parser_conversion_type_id (cp_parser* parser) cp_decl_specifier_seq type_specifiers; cp_declarator *declarator; tree type_specified; + const char *saved_message; /* Parse the attributes. */ attributes = cp_parser_attributes_opt (parser); + + saved_message = parser->type_definition_forbidden_message; + parser->type_definition_forbidden_message += G_("types may not be defined in a conversion-type-id"); + /* Parse the type-specifiers. */ cp_parser_type_specifier_seq (parser, /*is_declaration=*/false, /*is_trailing_return=*/false, &type_specifiers); + + parser->type_definition_forbidden_message = saved_message; + /* If that didn't work, stop. */ if (type_specifiers.type == error_mark_node) return error_mark_node; Index: testsuite/g++.dg/parse/crash62.C === --- testsuite/g++.dg/parse/crash62.C(revision 0) +++ testsuite/g++.dg/parse/crash62.C(working copy) @@ -0,0 +1,6 @@ +// PR c++/57352 + +struct x +{ + operator class {} (); // { dg-error "types|expected" } +};
[Patch, Fortran] Enable the generation of the FINALization wrapper function
Pre-remark: This patch does *not* enable finalization or polymorphic deallocation. * * * Dear all, The attached patch is a bit boring and invasive, but it paves the way to FINAL support. Changes of technical kind: * Changed ABI for CLASS's virtual table (due to _final) - and, hence, it bumps the .mod version * The finalization wrapper is now generated (this should not but might lead to ICEs) * It also causes that the virtual table is now more often generated New feature: _copy no longer deallocates the "dst" argument. Doing so lead to bogus finalization with ALLOCATE (exposed with the pending FINAL patch). As a sideeffect, memset could be removed and CALLOC could be replased by MALLOC (minute performance advantage). In order to keep the deallocation in gfc_trans_class_array_init_assign, there is now a call to the finalization wrapper. Next steps: * Add end-of-scope/intent(out) deallocation for polymorphic arrays * Enable FINAL parsing * Stepwise enabling for polymorphic deallocation/finalization * Fix issues with ELEMENTAL(+optional) with intent(out) * Fix some issues related to intrinsic assignment * Fix fallout of any of those items Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias 2013-05-22 Tobias Burnus * class.c (finalize_component): Fix coarray array refs. (gfc_find_derived_vtab): _copy's dst is now intent(inout). Enable finalization-wrapper generation. * module.c (MOD_VERSION): Bump. (gfc_dump_module, gfc_use_module): Remove empty line in .mod. * trans-array.c (gfc_conv_descriptor_token): Accept nonrestricted void pointer. (gfc_array_allocate, structure_alloc_comps): Don't nullify for BT_CLASS allocations. * trans-stmt.c (gfc_trans_allocate): Ditto. * trans-expr.c (gfc_trans_class_array_init_assign): Call _final before _copy. 2013-05-22 Tobias Burnus * gfortran.dg/auto_dealloc_2.f90: Update _free count in the dump. * gfortran.dg/class_19.f03: Ditto. diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c index 349f494..c41b95a 100644 --- a/gcc/fortran/class.c +++ b/gcc/fortran/class.c @@ -832,17 +832,18 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp, ref->u.c.component = comp; e->ts = comp->ts; - if (comp->attr.dimension + if (comp->attr.dimension || comp->attr.codimension || (comp->ts.type == BT_CLASS && CLASS_DATA (comp) - && CLASS_DATA (comp)->attr.dimension)) + && (CLASS_DATA (comp)->attr.dimension + || CLASS_DATA (comp)->attr.codimension))) { ref->next = gfc_get_ref (); ref->next->type = REF_ARRAY; - ref->next->u.ar.type = AR_FULL; ref->next->u.ar.dimen = 0; ref->next->u.ar.as = comp->ts.type == BT_CLASS ? CLASS_DATA (comp)->as : comp->as; e->rank = ref->next->u.ar.as->rank; + ref->next->u.ar.type = e->rank ? AR_FULL : AR_ELEMENT; } /* Call DEALLOCATE (comp, stat=ignore). */ @@ -2363,7 +2364,7 @@ gfc_find_derived_vtab (gfc_symbol *derived) dst->attr.flavor = FL_VARIABLE; dst->attr.dummy = 1; dst->attr.artificial = 1; - dst->attr.intent = INTENT_OUT; + dst->attr.intent = INTENT_INOUT; gfc_set_sym_referenced (dst); copy->formal->next = gfc_get_formal_arglist (); copy->formal->next->sym = dst; @@ -2382,9 +2383,6 @@ gfc_find_derived_vtab (gfc_symbol *derived) components and the calls to finalization subroutines. Note: The actual wrapper function can only be generated at resolution time. */ - /* FIXME: Enable ABI-breaking "_final" generation. */ - if (0) - { if (!gfc_add_component (vtype, "_final", &c)) goto cleanup; c->attr.proc_pointer = 1; @@ -2392,7 +2390,6 @@ gfc_find_derived_vtab (gfc_symbol *derived) c->tb = XCNEW (gfc_typebound_proc); c->tb->ppc = 1; generate_finalization_wrapper (derived, ns, tname, c); - } /* Add procedure pointers for type-bound procedures. */ if (!derived->attr.unlimited_polymorphic) @@ -2651,7 +2648,7 @@ gfc_find_intrinsic_vtab (gfc_typespec *ts) dst->ts.kind = ts->kind; dst->attr.flavor = FL_VARIABLE; dst->attr.dummy = 1; - dst->attr.intent = INTENT_OUT; + dst->attr.intent = INTENT_INOUT; gfc_set_sym_referenced (dst); copy->formal->next = gfc_get_formal_arglist (); copy->formal->next->sym = dst; diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c index e6a4cd7..9486b28 100644 --- a/gcc/fortran/module.c +++ b/gcc/fortran/module.c @@ -80,10 +80,8 @@ along with GCC; see the file COPYING3. If not see #define MODULE_EXTENSION ".mod" /* Don't put any single quote (') in MOD_VERSION, if you want it to be - recognized. - TODO: When the version is bumped, remove the extra empty line at - the beginning of module files. */ -#define MOD_VERSION "10" + recognized. */ +#define MOD_VERSION "11" /* Structure that describes a position within a module file. */ @@ -5571,7 +5569,7 @@ gfc
[wwwdocs] correct email address for assignments
Fix typo in assignment email address, to make gcc match GNU site: https://www.gnu.org/prep/maintain/maintain.html#Legal-Matters It would be awesome if now GCC assigments were magically less work. -benjamin2013-05-22 Benjamin Kosnik * htdocs/contribute.html: Use ass...@gnu.org, match main GNU docs. Index: htdocs/contribute.html === RCS file: /cvs/gcc/wwwdocs/htdocs/contribute.html,v retrieving revision 1.79 diff -c -p -r1.79 contribute.html *** htdocs/contribute.html 22 Oct 2011 13:46:22 - 1.79 --- htdocs/contribute.html 22 May 2013 19:05:44 - *** the relevant forms. The most common for *** 44,50 specific change, an assignment for all future changes, and an employer disclaimer, if an employer or school owns work created by the developer. It's a good idea to send ! assignme...@gnu.org a copy of your request. If a contributor is reluctant to sign a copyright assignment for a --- 44,50 specific change, an assignment for all future changes, and an employer disclaimer, if an employer or school owns work created by the developer. It's a good idea to send ! ass...@gnu.org a copy of your request. If a contributor is reluctant to sign a copyright assignment for a
Re: [Patch, Fortran] Create valid temporary variable to avoid assembler errors
Hi Tobias, I have now changed the mangling, see attached patch. (The test file uses finalization - hence, I do not include it into the patch. I will include it in the FINAL patch.) Build and regtested on x86-64-gnu-linux. OK for the trunk? OK (obvious really). Thanks for the patch! Thomas
RE: [PING]RE: [patch] cilkplus: Array notation for C patch
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jeff Law > Sent: Wednesday, May 22, 2013 12:20 PM > To: Iyer, Balaji V > Cc: Jakub Jelinek; Richard Henderson; 'Joseph S. Myers'; 'Aldy Hernandez'; > 'gcc- > patches' > Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch > > On 05/22/2013 09:37 AM, Iyer, Balaji V wrote: > >>> > >> Furthermore, do you want to generate just normal loop out of it, or > >> shouldn't we generate a #pragma omp simd loop out of it instead? > >> Haven't read the spec if array notation guarantees lack of > >> forward/backward loop dependencies and what are the restrictions on > >> the calls you can do inside of array notation. Perhaps the lowering > >> to normal vs. simd loop could depend on whether all called functions > >> in the expression are elemental. > > > > What you mentioned is a good performance optimization. It is something > > I have planned to do in future. But, the spec does not require the > > loop (or array notation expansion) to be vectorized. > So if we can represent array notation as an OpenMP SIMD loop and re-use the > OpenMP code generation, that's a significant win. I realize the OpenMP SIMD > stuff is still in-progress, but from a design standpoint we'd like to > separate out > the front-end issues from the code generation issues -- with the goal being > that > we can re-use the OpenMP SIMD path to also handle Cilk SIMD & Cilk array > notation and possible bits of OpenACC. > > Balaji, how feasible is it to rewire the code generation aspects of the array > notation patch to instead feed into the OpenMP SIMD bits Jakub is working on? Hi Jeff, Yes, converting the array notation expansion to #pragma simd (or #pragma omp simd) trees will be beneficial performance wise. But, it will require semi-significant re-write and this can destabilize a currently stable implementation. IMHO, for now I would like to keep array notation as is and let us see how it interacts with other flavors of gcc backends. At the moment, we do not see any reason for instability in this, but we have not done any significant testing on non x86 backends. After OMP 4 hits the main-line and everything is stable, we could proceed as you indicated. Thanks, Balaji V. Iyer. > > > Jeff
Re: [Patch, Fortran] Create valid temporary variable to avoid assembler errors
On Wed, May 22, 2013 at 4:20 PM, Tobias Burnus wrote: > With one Fortran file, I get the following assembler errors: > > /tmp/cc28epKK.s:2075: Error: junk `@1.2304+16' after expression > > That's due to the way a temporary variable is generated. While that variable > is local to the procedure, the name somehow escapes into the assembler file. > The dump looks as follows. > > static struct foo DA@0; > static struct universal DA@1; > static struct universal DA@2; > ... > class.30._data = &DA@0.foo_parent.universal.counter; > > I have now changed the mangling, see attached patch. (The test file uses > finalization - hence, I do not include it into the patch. I will include it > in the FINAL patch.) > > Build and regtested on x86-64-gnu-linux. > OK for the trunk? Hm, DA_F_0 is a valid variable name. So can't your patch cause clashes with user names? Ciao! Steven
Re: [PING]RE: [patch] cilkplus: Array notation for C patch
On 05/22/2013 01:13 PM, Iyer, Balaji V wrote: Hi Jeff, Yes, converting the array notation expansion to #pragma simd (or #pragma omp simd) trees will be beneficial performance wise. But, it will require semi-significant re-write and this can destabilize a currently stable implementation. IMHO, for now I would like to keep array notation as is and let us see how it interacts with other flavors of gcc backends. At the moment, we do not see any reason for instability in this, but we have not done any significant testing on non x86 backends. After OMP 4 hits the main-line and everything is stable, we could proceed as you indicated. I was thinking more from maintenance standpoint. It seems kindof silly to have multiple blobs of code generating loops. The fact that it gets us vectorized code more often is just icing on the cake. This is certainly something that I want to see happen, though I'll defer to Richard, Joseph, Aldy, etc on the timing of that change. jeff
Re: [rs6000] Add register save/restore routines for cross
On 22 May 2013 16:36:52 David Edelsohn wrote: On Wed, May 22, 2013 at 10:35 AM, Alan Modra wrote: > On Wed, May 22, 2013 at 10:05:47AM -0400, David Edelsohn wrote: >> Why does cross need the functions in libgcc and not provided by the linker? > > Only the ppc64 linker provides save/restore functions magically. Okay, then the patch is okay. Hmm. I cannot look right now, does that sound like a cure for PR53803 too? Sent with AquaMail for Android http://www.aqua-mail.com
A trivial script to scrub ChangeLog changes from "git show"
I've been writing some automation around testing gcc patches, and kept running into ChangeLog conflicts, so I wrote the following to make it easier. The attached one-liner wraps "git show" into a form that omits changes to ChangeLog files, for use when generating patches from a git repo, so that you can run things like: # Get last commit, omitting ChangeLog changes: ./contrib/git-show-non-changelog.sh # Get a specific commit, omitting ChangeLog changes: ./contrib/git-show-non-changelog.sh 60623f1616b3144310f432174ebbb3e2cc6dff28 # Get all my non-pushed changes, omitting ChangeLog changes: ./contrib/git-show-non-changelog.sh origin/master.. etc If there's a pre-existing way of doing this, I'm all ears, naturally [1] OK for trunk? (and of course, the contrib/ChangeLog entry will no doubt conflict, irony of ironies). Dave [1] gnulib appears to have a "git-merge-changelog", but that requires building/installing it at the receiving end commit 120b040899e61ca3b9214bd71a545038025e7e68 Author: David Malcolm Date: Wed May 22 15:17:35 2013 -0400 contrib/ * git-show-non-changelog.sh: New. diff --git a/contrib/ChangeLog b/contrib/ChangeLog index 1fd90b7..b0b63f3 100644 --- a/contrib/ChangeLog +++ b/contrib/ChangeLog @@ -1,3 +1,7 @@ +2013-05-22 David Malcolm + + * git-show-non-changelog.sh: New. + 2013-05-16 Rainer Orth * config-list.mk (LIST): Add -enable-obsolete for diff --git a/contrib/git-show-non-changelog.sh b/contrib/git-show-non-changelog.sh new file mode 100755 index 000..5b6ec99 --- /dev/null +++ b/contrib/git-show-non-changelog.sh @@ -0,0 +1,22 @@ +#!/bin/sh + +# "git show OBJECT" but omitting ChangeLog files + +# Copyright (C) 2013 Free Software Foundation, Inc. +# Contributed by David Malcolm + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +git show $@ $(git show $@ | diffstat -l -p1 | grep -v ChangeLog)
Re: A trivial script to scrub ChangeLog changes from "git show"
On Wed, May 22, 2013 at 1:10 PM, David Malcolm wrote: > I've been writing some automation around testing gcc patches, and kept > running into ChangeLog conflicts, so I wrote the following to make it > easier. > > The attached one-liner wraps "git show" into a form that omits changes > to ChangeLog files, for use when generating patches from a git repo, > so that you can run things like: > > # Get last commit, omitting ChangeLog changes: > ./contrib/git-show-non-changelog.sh > > # Get a specific commit, omitting ChangeLog changes: > ./contrib/git-show-non-changelog.sh 60623f1616b3144310f432174ebbb3e2cc6dff28 > > # Get all my non-pushed changes, omitting ChangeLog changes: > ./contrib/git-show-non-changelog.sh origin/master.. > > etc > > If there's a pre-existing way of doing this, I'm all ears, naturally [1] > > OK for trunk? (and of course, the contrib/ChangeLog entry will no doubt > conflict, irony of ironies). Note changelog entries in emails should not be part of the patch anyways. Thanks, Andrew > > Dave > > [1] gnulib appears to have a "git-merge-changelog", but that requires > building/installing it at the receiving end
Re: [Patch] Fix PR56780: --disable-install-libiberty still installs libiberty.a
On Wed, Apr 3, 2013 at 7:03 AM, Matt Burgess wrote: > > 2013-04-03 Matt Burgess > > other/PR56780 > * libiberty/configure.ac: > Move test for --enable-install-libiberty outside of the > 'with_target_subdir' test so that it actually gets run. > Add output messages to show the test result. > > * libiberty/configure: > Regenerate. > > * libiberty/Makefile.in (install_to_libdir): > Place the installation of the libiberty library in the same guard as > that used for the headers to prevent it being installed unless requested > via --enable-install-libiberty. Note that in the ChangeLog entry you should not have any blank lines, and you should not break the lines after the colon. See the existing entries and follow their format. > +[ --enable-install-libiberty Install headers for end users], Change this to "Install headers and library for end users" or something like that. This is OK with those changes. Thanks, and sorry for the slow review. Ian
Re: [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc.
This is the final set of patches that I have available right now. We will be doing additional patches over the summer. The primary thing in this patch is to add support for load fusion in the power8. Power8 has two types of fusion: addi ,, lxvd2x ,, and: addis ,, ld ,() These instructions must be adjacent to each other, and in the case of fusion in loading GPRs, the register being loaded must be the base register to load from it. In this patch, I added peepholes to cover this case. In the future, I plan on reworking the problem by being more liberal in what addresses are allowed before reload/lra, and in lra, generate these forms. However, these peepholes do help find fusion cases. I also added two switches (-mlra and -mconstrain-regs) that were used in converting the powerpc port to use the LRA register allocator. Note, at the present time, Vlad and I are going back on forth on additional things needed for LRA. This patch bootstraps and has no regressions in the test suite. Is it ok to check in after the previous 7 patches have been applied? FWIW, patches 1-2 that were approved have now been checked in. 2013-05-22 Michael Meissner * config/rs6000/predicates.md (fusion_gpr_addis): New predicates to support power8 load fusion. (fusion_gpr_mem_load): Likewise. * config/rs6000/rs6000-modes.def (PTImode): Update a comment. * config/rs6000/rs6000-protos.h (fusion_gpr_load_p): New declarations for power8 load fusion. (emit_fusion_gpr_load): Likewise. * config/rs6000/rs6000.opt (-mlra): New undocumented switch to turn on using the LRA register allocator. (-mconstrain-regs): New undocumented switch to constrain non-integer values from being loaded into the LR or CTR registers. * config/rs6000/rs6000.c (TARGET_LRA_P): If -mlra, turn on using the LRA register allocator. (rs6000_lra_p): Likewise. (rs6000_hard_regno_mode_ok): Allow DI/DD/SF/SD modes in altivec registers if power8. If -mconstrain-regs, only allow int modes into LR, CTR, and special purpose registers. (rs6000_debug_reg_global): Print -mlra, -mconstrain-regs status if debugging. (rs6000_init_hard_regno_mode_ok): Mark that SFmode can use Altivec registers in the future. (rs6000_option_override_internal): If tuning for power8, turn on fusion mode by default. Turn on sign extending fusion mode if normal fusion mode is on, and we are at -O2 or -O3. (rs6000_opt_masks): Add -mlra, -mconstrain-regs. (fusion_gpr_load_p): New function, return true if we can fuse an addis instruction with a dependent load to a GPR. (emit_fusion_gpr_load): Emit the instructions for power8 load fusion to GPRs. * config/rs6000/vsx.md (VSX load fusion peepholes): Add peepholes to fuse together an addi instruction with a VSX load instruction. * config/rs6000/rs6000.md (GPR load fusion peepholes): Add peepholes to fuse an addis instruction with a load to a GPR base register, if the addis instruction is dead after the load, by using the register to be loaded for the addis. If we are supporting sign extending fusions, convert sign extending loads to zero extending loads and an explicit sign extension. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 199168) +++ gcc/config/rs6000/predicates.md (working copy) @@ -1654,3 +1654,99 @@ (define_predicate "small_toc_ref" return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL; }) + +;; Match the first insn (addis) in fusing the combination of addis and loads to +;; GPR registers on power8. Power8 currently will only do the fusion if the +;; top 11 bits of the addis value are all 1's or 0's. +(define_predicate "fusion_gpr_addis" + (match_code "const_int,high,plus") +{ + HOST_WIDE_INT value; + rtx int_const; + + /* 32-bit is not done yet. */ + if (TARGET_ELF && !TARGET_POWERPC64) +return 0; + + if (GET_CODE (op) == HIGH) +return 1; + + if (CONST_INT_P (op)) +int_const = op; + + else if (GET_CODE (op) == PLUS + && base_reg_operand (XEXP (op, 0), Pmode) + && CONST_INT_P (XEXP (op, 1))) +int_const = XEXP (op, 1); + + else +return 0; + + value = INTVAL (int_const); + if ((value & (HOST_WIDE_INT)0x) != 0) +return 0; + + if ((value & (HOST_WIDE_INT)0x) == 0) +return 0; + + return (IN_RANGE (value >> 16, -32, 31)); +}) + +;; Match the second insn (lbz, lhz, lwz, ld) in fusing the combination of addis +;; and loads to GPR registers on power8. +(define_predicate "fusio
Re: [C++ Patch] for c++/54537
Hi, Again sorry for such a delay. Paolo subtly ping'ed me on DR39 (c++/13590), and I remember that this bug (c++/54537) was blocking the patch I wrote last year for DR39. 2012/11/15 Jason Merrill : [...] > I was only thinking of the primary signature; putting > > extern "C" double pow (double, double); > > in any namespace will result in calling the C library function if that > declaration is selected by overload resolution. Ah, I see, it works. Then we end up with three solutions: 1) remove the pow(double,double) overload 2) add a specialization template <> pow(double,double) 3) add an extern "C" declaration to refer to the C library function I don't have a strong preference, I would say that all solutions are more or less equivalent. In the end, builtins are just reached in different ways I guess, depending on the optimisations enabled. Given that Paolo already OKed the solution 1), I would go for it. Shall I commit if it still passes regtests ? Thank you for your patience... -- Fabien
Re: [C++ Patch] PR 57352
OK. Jason
Re: Fix PR 53743 and other -freorder-blocks-and-partition failures
Revised patch included below. The spacing of my pasted in patch text looks funky again, let me know if you want the patch as an attachment instead. I addressed all of Steven's comments, except for the suggestion to use gcc_assert instead of error() in verify_hot_cold_block_grouping() to keep this consistent with the rest of the verify_flow_info subroutines (let me know if this is ok). The other main changes: (1) Added several test cases (cloned from the torture subdirectories, where I manually built/ran with FDO and -freorder-blocks-and-partition with both the current trunk and my fixed trunk compiler, and was able to expose some failures I fixed. (2) Changed existing tree-prof tests that used -freorder-blocks-and-partition to be built with -O2 instead of -O, so that partitioning actually kicks in. (3) Fixed a couple of failures in the new verify_hot_cold_block_grouping() checks exposed by the torture tests I ran manually with splitting (2 of the tests cloned to tree-prof in this patch). One was in computed goto where we were too aggressive about cloning crossing edges, and the other was in rtl_split_edge called from the "stack" pass which was not correctly inserting the new bb in the correct partition since bb layout is complete at that point. Re-tested on x86_64-unknown-linux-gnu with bootstrap and profiledbootstrap builds and regression testing. Re-built/ran cpu2006int with profile feedback and -freorder-blocks-and-partition enabled. Ok for trunk? Thanks! Teresa 2013-05-22 Teresa Johnson * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert as this is now done by redirect_edge_and_branch_force. * function.c (thread_prologue_and_epilogue_insns): Insert new bb after barriers, and fix interaction with splitting. * emit-rtl.c (try_split): Copy REG_CROSSING_JUMP notes. * cfgcleanup.c (try_forward_edges): Fix early return value to properly reflect changes made in the routine. * bb-reorder.c (emit_barrier_after_bb): Move to cfgrtl.c. (fix_up_fall_thru_edges): Remove incorrect check for bb layout order since this is called in cfglayout mode, and replace partition fixup with assert as that is now done by force_nonfallthru_and_redirect. (add_reg_crossing_jump_notes): Handle the fact that some jumps may already be marked with region crossing note. (insert_section_boundary_note): Make non-static, gate on flag has_bb_partition, rewrite to also check for multiple partitions. (rest_of_handle_reorder_blocks): Remove call to insert_section_boundary_note, now done later during free_cfg. (duplicate_computed_gotos): Don't duplicate partition crossing edge. * bb-reorder.h (insert_section_boundary_note): Declare. * Makefile.in (cfgrtl.o): Depend on bb-reorder.h * cfgrtl.c (rest_of_pass_free_cfg): If partitions exist invoke insert_section_boundary_note. (try_redirect_by_replacing_jump): Remove unnecessary check for region crossing note. (fixup_partition_crossing): New function. (rtl_redirect_edge_and_branch): Fixup partition boundaries. (emit_barrier_after_bb): Move here from bb-reorder.c, handle insertion in non-cfglayout mode. (force_nonfallthru_and_redirect): Fixup partition boundaries, remove old code that tried to do this. Emit barrier correctly when we are in cfglayout mode. (last_bb_in_partition): New function. (rtl_split_edge): Correctly fixup partition boundaries. (commit_one_edge_insertion): Remove old code that tried to fixup region crossing edge since this is now handled in split_block, and set up insertion point correctly since block may now end in a jump. (verify_hot_cold_block_grouping): Guard against checking when not in linearized RTL mode. (rtl_verify_edges): Add checks for incorrect/missing REG_CROSSING_JUMP notes. (rtl_verify_flow_info_1): Move verify_hot_cold_block_grouping to rtl_verify_flow_info, so not called in cfglayout mode. (rtl_verify_flow_info): Move verify_hot_cold_block_grouping here. (fixup_reorder_chain): Remove old code that attempted to fixup region crossing note as this is now handled in force_nonfallthru_and_redirect. (duplicate_insn_chain): Don't duplicate switch section notes. (rtl_can_remove_branch_p): Remove unnecessary check for region crossing note. * basic-block.h (emit_barrier_after_bb): Declare. * testsuite/gcc.dg/tree-prof/va-arg-pack-1.c: Cloned from c-torture, made into -freorder-blocks-and-partition test. * testsuite/gcc.dg/tree-prof/comp-goto-1.c: Ditto. * testsuite/gcc.dg/tree-prof/20041218-1.c: Ditto. * testsuite/gcc.dg/tree-prof/pr52027.c: Use -O2. * testsuite/gcc.dg/tree-prof/pr50907.c: Ditto. * testsuite/gcc.dg/tree-prof/pr45354.c: Ditto. * testsuite/g++.dg/tree-prof/partition2.C: Ditto. * testsuite/g++.dg/tree-prof/partition3.C: Ditto. Index: ifcvt.c === --- ifcvt.c (revi
Re: [patch] Preserve the CFG until after final
On Wed, May 22, 2013 at 10:21 AM, Eric Botcazou wrote: >> That is only partially true. Currently the transition is already de >> facto going on: Just look at how many back ends use >> compute_bb_for_insn to re-initialize the BLOCK_FOR_INSN pointers right >> after pass_free_cfg (it's usually the first thing they do in the >> machine-reorg pass). > > Yes, and we should do something about it. Btw, why is the line removed from > ia64_reorg in the patch (and not mentioned in the ChangeLog)? Mistake. I wrote these patches on an ia64 machine, an older version of the patch was also bootstrapped&tested there. >> Some ports never call free_bb_or_insn after that, >> and expect BLOCK_FOR_INSN to be valid in 'final'. One of those ports >> is i386, look at where BLOCK_FOR_INSN is used in i386.c (in functions >> deciding what asm output templates to use). > > AFAICS it's only used for splitters. And using BLOCK_FOR_INSN after the last > split pass (pass_split_for_shorten_branches) is dubious. Actually it's even wrong *during* pass_split_for_shorten_branches right now. It may work in some ports but not in others, depending whether the port's machine reorg has done its TLC on the CFG or not. This can lead to very interesting behavior. I chased a bug recently where a splitter during split5 used the DF_INSN_USES cache on an insn introduced by another splitter. Because split5 uses split_all_insns_noflow, the insn from the splitter somehow ended up a NULL BLOCK_FOR_INSN (I'm still not sure how that happened, but it did). It turns out df-scan ignores insns with a NULL BLOCK_FOR_INSN, and hence NULL DF_INSN_USES, and that lead to a dependency violation in my splitter because a USE got overlooked! Ah, the crazy stuff one can do after machine reorg. It's the Wild West of GCC :-) > Here's the list: > > ./frv/frv.c: || BLOCK_FOR_INSN (insn) == ce_info->else_bb > > Only used during if-conversion. > > ./rs6000/rs6000.c: bb = BLOCK_FOR_INSN (label); > > Only used during compute_alignments. > > ./spu/spu.c: bitmap_set_bit (blocks, BLOCK_FOR_INSN (branch)->index); > > Only used during md_reorg. > > ./c6x/c6x.c: BLOCK_FOR_INSN (bundle) = BLOCK_FOR_INSN (slot[0]); > > Only used during md_reorg. > > ./mips/mips.c: basic_block bb = BLOCK_FOR_INSN (insn); > ./mips/mips.c: /* Restore the BLOCK_FOR_INSN pointers, which are needed by > DF. Also during > > Only used for splitting decision, deal with null BLOCK_FOR_INSN. Yes, these are all fine. > ./i386/i386.c: basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL; > ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); > ./i386/i386.c: basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL; > ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); > ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); > ./i386/i386.c: rtx start = BB_HEAD (BLOCK_FOR_INSN (insn)); > ./i386/i386.c: basic_block bb = BLOCK_FOR_INSN (insn); > > Only used for splitting decision (and scheduling). Sadly no. Most of these (the *agu* ones) are also reached from final. For example: movdi_internal -> ix86_use_lea_for_mov -> ix86_lea_outperforms -> distance_non_agu_define -> distance_non_agu_define_in_bb Likewise for movsi_internal, and zero_extendsidi2. For the mov?i_internal define_insns, it's been like that since at least r181077 (November 2011). I must admit I was surprised by that, too. It may have been coincidence that it worked when this patch was (IMHO wrongfully) accepted. Someone got away with it because i386 calls compute_bb_for_insn in its machine-reorg, and does *not* call free_bb_for_insn, leaving the BLOCK_FOR_INSN pointers in place all the way through final. There are no passes between machine-reorg and final that run for i386 and damage the CFG because split5 doesn't run on i386 (because of STACK_REGS) and the other passes, like shorten_branches, don't modify the insns chain. Most ports that call compute_bb_for_insn do not call free_bb_for_insn: $ grep compute_bb_for_insn config/*/*.c config/arm/arm.c: compute_bb_for_insn (); config/bfin/bfin.c: compute_bb_for_insn (); config/c6x/c6x.c: compute_bb_for_insn (); config/frv/frv.c: compute_bb_for_insn (); config/i386/i386.c: compute_bb_for_insn (); config/ia64/ia64.c: compute_bb_for_insn (); config/mep/mep.c: compute_bb_for_insn (); config/mips/mips.c:compute_bb_for_insn (); config/mn10300/mn10300.c: compute_bb_for_insn (); config/picochip/picochip.c: compute_bb_for_insn (); config/spu/spu.c: compute_bb_for_insn (); config/spu/spu.c: compute_bb_for_insn (); config/tilegx/tilegx.c: compute_bb_for_insn (); config/tilepro/tilepro.c: compute_bb_for_insn (); $ grep free_bb_for_insn config/*/*.c config/mips/mips.c: free_bb_for_insn (); config/spu/spu.c: free_bb_for_insn (); config/spu/spu.c: free_bb_for_insn (); Most of these, eh, let's call them "compute_bb_for_insn ports", they come out of the machine reorg pass with an insns chain that doesn't pass verify_flow_info. The most commo
RE: [PING]RE: [patch] cilkplus: Array notation for C patch
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Wednesday, May 22, 2013 1:34 PM > To: Iyer, Balaji V > Cc: 'Joseph S. Myers'; 'Aldy Hernandez'; 'gcc-patches' > Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch > > On 2013-05-22 08:18, Iyer, Balaji V wrote: > > The overall function names are same, but the components inside it > > function differs greatly from C and C++. For example, in C++ I can't > > use build_modify_expr, but build_x_modify_expr. Also, I need to handle > > overloaded function types, and that requires further checking. > > Similarly, the trees are different from C and C++, and so I have to do > > more checks and handle them differently in C++. In my mind, this seem > > to be the most straight-forward and legible way to do it. > > Really, that's surprising: > > /* For use from the C common bits. */ > tree > build_modify_expr (location_t /*location*/, > tree lhs, tree /*lhs_origtype*/, > enum tree_code modifycode, > location_t /*rhs_location*/, tree rhs, > tree /*rhs_origtype*/) { >return cp_build_modify_expr (lhs, modifycode, rhs, tf_warning_or_error); } > > I can definitely see how function overloading could potentially get in the > way. There are couple other ones such as build_x_unary_op, build_x_conditional_expr which does not have such a mapping that I know of. > > Although by delaying expansion of ARRAY_NOTATION_REF, given that the > TREE_TYPE of the ARRAY_NOTATION_REF is not some complex artificial slice > type but the element type of the array, I wonder how much of the overloaded > function selection would just happen automatically? > > Which reminds me: What is ARRAY_NOTATION_TYPE for? Isn't it always the > same as the TREE_TYPE of the ARRAY_NOTATION_REF? Yes, they are both the same. A while back, I found a couple corner cases where the TREE_TYPE of the array notations inside __sec_reduce functions that was getting changed. This is a storage location that will be untouched so that I can get the original type. > > > Before I started my implementation, I did briefly consider using > > c_finish_loop, but adding multiple expressions got a bit complicated. > > For example, if I have 2 increment/condition expressions (my initial > > implementation had it but later on I eliminated that need), I had to > > create a statement list and then add them using > > append_to_statement_list() and then pass that into c_finish_loop. > > Also, if I am not mistaken, I still had to create some of the > > individual labels when using c_finish_loop(). Overall, I didn't find > > much code-size decrease, and I found this a bit more straight forward for > > me. I > apologize if I am mistaken. > > Well, append_to_statement_list etc was certainly not needed. Mirroring > exactly > how you'd write it in C, with a comma operator also works. That's a > COMPOUND_EXPR, fyi. > > There are continue/break label arguments to c_finish_loop, but they may be > (and often are) null. They will only be non-null if the relevant statements > actually exist inside the loop. C.f. c_finish_bc_stmt and c_break_label. > > You're right about it not saving *that* much code, but the point is more > about if > we make improvements to normal C loop representation -- such as to support > #pragma simd -- then we want to be able to take advantage of that here. OK. I see your point. I will look into changing them. > > > Are these changes a blocking issue for acceptance into trunk? Or, > > would it be ok to accept it as-is and then I make these changes at a later > > date? > > I'm leaning to honoring the advice you were given last year and accepting the > patch more or less in its current form. Thank you! > > > r~
Re: [patch] Preserve the CFG until after final
On 05/22/2013 03:17 PM, Steven Bosscher wrote: Ah, the crazy stuff one can do after machine reorg. It's the Wild West of GCC :-) I still look at that hook as the contribution I most wish I'd never made. The abuses are, umm, amazing. jeff
[Patch, Fortran] Deallocate CLASS(...),INTENT(OUT),allocatable arrays
A rather simple patch found while testing the draft finalization patch. For a "class(...), allocatable, intent(out)" dummy argument, the actual argument has to be deallocated. That worked for scalar polymorphic vars, but not for polymorphic arrays. Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias PS: I think there is also a related issue for polymorphic arrays and end-of-scope. I will handle it in a follow-up patch. 2013-05-22 Tobias Burnus * trans-expr.c (gfc_conv_procedure_call): Deallocate polymorphic arrays for allocatable intent(out) dummies. 2013-05-22 Tobias Burnus * gfortran.dg/class_array_16.f90: New. diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index f8d99fd..ba878e0 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -4334,6 +4334,64 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, { /* Pass a class array. */ gfc_conv_expr_descriptor (&parmse, e); + + /* If an ALLOCATABLE dummy argument has INTENT(OUT) and is + allocated on entry, it must be deallocated. */ + if (fsym->attr.intent == INTENT_OUT + && CLASS_DATA (fsym)->attr.allocatable) + { + stmtblock_t block; + tree ptr; + + gfc_init_block (&block); + ptr = parmse.expr; + ptr = gfc_class_data_get (ptr); + + tmp = gfc_deallocate_with_status (ptr, NULL_TREE, + NULL_TREE, NULL_TREE, + NULL_TREE, true, e, + false); + gfc_add_expr_to_block (&block, tmp); + tmp = fold_build2_loc (input_location, MODIFY_EXPR, + void_type_node, ptr, + null_pointer_node); + gfc_add_expr_to_block (&block, tmp); + + if (UNLIMITED_POLY (fsym)) + gfc_add_modify (&block, ptr, +fold_convert (TREE_TYPE (ptr), + null_pointer_node)); + else + { + gfc_symbol *vtab; + vtab = gfc_find_derived_vtab (fsym->ts.u.derived); + tmp = gfc_get_symbol_decl (vtab); + tmp = gfc_build_addr_expr (NULL_TREE, tmp); + ptr = gfc_class_vptr_get (parmse.expr); + gfc_add_modify (&block, ptr, + fold_convert (TREE_TYPE (ptr), tmp)); + } + gfc_add_expr_to_block (&block, tmp); + + if (fsym->attr.optional + && e->expr_type == EXPR_VARIABLE + && (!e->ref + || (e->ref->type == REF_ARRAY + && !e->ref->u.ar.type != AR_FULL)) + && e->symtree->n.sym->attr.optional) + { + tmp = fold_build3_loc (input_location, COND_EXPR, +void_type_node, +gfc_conv_expr_present (e->symtree->n.sym), +gfc_finish_block (&block), +build_empty_stmt (input_location)); + } + else + tmp = gfc_finish_block (&block); + + gfc_add_expr_to_block (&se->pre, tmp); +} + /* The conversion does not repackage the reference to a class array - _data descriptor. */ gfc_conv_class_to_class (&parmse, e, fsym->ts, false, --- /dev/null 2013-05-22 07:37:12.475061900 +0200 +++ gcc/gcc/testsuite/gfortran.dg/class_array_16.f90 2013-05-22 23:12:43.271073681 +0200 @@ -0,0 +1,20 @@ +! { dg-do compile } +! { dg-options "-fdump-tree-original" } +! +module m + type t + end type t +contains + subroutine sub(x) + class(t), allocatable, intent(out) :: x(:) + end subroutine sub +end module m + +use m +class(t), save, allocatable :: y(:) +call sub(y) +end + +! { dg-final { scan-tree-dump-times "__builtin_free" 1 "original" } } +! { dg-final { scan-tree-dump-times "finally" 0 "original" } } +! { dg-final { cleanup-tree-dump "original" } }