Re: FW: Non-temporal move
On Tue, Feb 25, 2014 at 7:10 AM, Gopalasubramanian, Ganesh wrote: > I could see "storent" pattern in x86 machine descriptions (in sse.md)., but > internals doc don't mention it. Should we add a description about this in the > internals doc? This pattern was added way back in 2007 and was not documented properly. It should be documented in "Standard Pattern Names For Generation" in md.texi. Uros.
Re: Non-temporal move
The original patch http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01862.htm Thanks, Andrew Pinski > On Feb 25, 2014, at 12:00 AM, Uros Bizjak wrote: > > On Tue, Feb 25, 2014 at 7:10 AM, Gopalasubramanian, Ganesh > wrote: >> I could see "storent" pattern in x86 machine descriptions (in sse.md)., but >> internals doc don't mention it. Should we add a description about this in >> the internals doc? > > This pattern was added way back in 2007 and was not documented > properly. It should be documented in "Standard Pattern Names For > Generation" in md.texi. > > Uros.
Re: [PATCH] Fix PR60319, mixing -f[no-]strict-overflow units with LTO
On Mon, 24 Feb 2014, Jan Hubicka wrote: > > > > This addresses miscompilations caused by improperly mixing > > -f[no-]strict-overflow units (and thus optimize <= 1 and optimize >= 2 > > units!). It does so by merging -f[no-]strict-overflow and the > > related -f[no-]wrapv and -f[no-]trapv conservatively, that is, > > if any unit uses (implicitely) -fno-strict-overflow, -fwrapv > > or -fno-trapv then force that upon the compilation in and after > > WPA stage. > > > > I know honza wants to eventually handle this via optimize attributes > > but currently at least -fstrict-overflow is not suitable for that > > (it's not marked 'Optimization'). Nor do we even check for > > -f[no-]wrapv mismatch in can_inline_edge_p. > > Yep, we need to fix all that for 4.10 > > > LTO bootstrap & test ongoing on x86_64-unknown-linux-gnu, ok for trunk? > > > > Any other "obvious" candidates for conservative merging? > > > > Note that this makes it quite harmful (well - maybe?) if your > > LTO compile contains a single non-optimized TU ... but you > > at least know if you inspect the lto_opts section of one of the > > ltrans files. > > Indeed, i think we should at least mention that in news.html and perhsps > have short section on LTO and flags in the doc/invoke.texi? I'll try to come up with sth for invoke.texi / lto.texi. > Users generally get wrong the fact that they need -flto and optimization > flags at command line and we do crazy things behind their back. > Like last week I noticed that we build a static binary of Firefox's shell > with -fpic just because they manage to merge in an object file copiled -fPIC. > (while LLVM give non-PIC binary) > -fno-pic helps there, what will bebehaviour with -fstrict-overflow? Options given at link-time will override those constructed from the input TUs as far as the option processing machinery handles it (more specific options always trump general ones, so -O2 at link-time doesn't override -fno-strict-overflow from compile-time). This is implemented by simply appending all link-time options to those built from compile-time. > > Eventually this asks for a diagnostic by lto-wrapper (but that > > gets delayed until very late ...). > > I am definitely trying to push myself to make command line options an > priority for 4.10 stage1. Lets try to wrok on it, I think it is one of > main sowstoppers for LTO at the moment. (along with debug info and fun > with ASM statements definitng symbols I suppose) I think the basic scheme implemented right now is sound - we have to pass on some options from compile to link-time for correctness reasons. What is missing is 1) diagnosing harmful mismatches 2) diagnose options that are dropped (we don't know which at the moment as we don't stream all options from compile-time) 3) pass on the general optimization level (don't default to -O0) 4) eventually use optimize/target attributes to mitigate 1) and 2) I would go 4) at the very last resort only - it's not a good solution IMHO (maybe a good solution for target options, but certainly not for general opts). Oh, and of course we need to improve our IL so that its meaning doesn't depend on flags like -fwrapv. I have committed the patch now. Thanks, Richard. > Honza > > > > Thanks, > > Richard. > > > > 2014-02-24 Richard Biener > > > > PR lto/60319 > > * lto-opts.c (lto_write_options): Output non-explicit conservative > > -fwrapv, -fno-trapv and -fno-strict-overflow. > > * lto-wrapper.c (merge_and_complain): Handle merging those options. > > (run_gcc): And pass them through. > > > > Index: gcc/lto-opts.c > > === > > *** gcc/lto-opts.c (revision 208077) > > --- gcc/lto-opts.c (working copy) > > *** lto_write_options (void) > > *** 117,122 > > --- 117,134 > > default: > > gcc_unreachable (); > > } > > + /* We need to merge -f[no-]strict-overflow, -f[no-]wrapv and > > -f[no-]trapv > > + conservatively, so stream out their defaults. */ > > + if (!global_options_set.x_flag_wrapv > > + && global_options.x_flag_wrapv) > > + append_to_collect_gcc_options (&temporary_obstack, &first_p, > > "-fwrapv"); > > + if (!global_options_set.x_flag_trapv > > + && !global_options.x_flag_trapv) > > + append_to_collect_gcc_options (&temporary_obstack, &first_p, > > "-fno-trapv"); > > + if (!global_options_set.x_flag_strict_overflow > > + && !global_options.x_flag_strict_overflow) > > + append_to_collect_gcc_options (&temporary_obstack, &first_p, > > + "-fno-strict-overflow"); > > > > /* Output explicitly passed options. */ > > for (i = 1; i < save_decoded_options_count; ++i) > > Index: gcc/lto-wrapper.c > > === > > *** gcc/lto-wrapper.c (revision 208077) > > --- gcc/lto-wrapper.c (working copy) > > ***
Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
On 21 Feb 18:35, Uros Bizjak wrote: > On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar wrote: > >> > Latest version of AVX512 spec > >> > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf > >> > Has a few changes. > >> > > >> > 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. > >> > We can either support new CPUID or disable PREFETCHWT1 from generating, > >> > without removing code, and enable it in 4.9.1/latest version. > >> > I am not sure that adding new -m flag and related stuff this late > >> > is a good idea. Should still add it? > >> > >> Please submit the patch anyway. We can relax release constraints on > >> non-algorithmic patch a bit, weighting in benefits of having gcc > >> release that fully conforms to some published specification. > >> > > Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, > > and uses them for prefetchwt1 instruction. Bootstraps/passes testing. > > Ok for trunk? > > > > * gcc.target/i386/avx-1.c: Update __builtin_prefetch. > > Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and > g++.dg/other/i386-{2,3} and new options to > gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and > repost the patch. > I've added new switch to those tests. However when I add prefetchwt1 to pragma GCC target ("sse") sse-22a.c test fails with: pmmintrin.h: In function ‘_mm_loaddup_pd’: emmintrin.h:119:1: error: inlining failed in call to always_inline ‘_mm_load1_pd’: target specific option mismatch I've checked and this isn't a problem with prefetchwt1. I get the same error when I add any other option (e. g. sha) to #pragma GCC target ("sse"). So I haven't added anything there. As that was the only fail, I'm reposting this patch. ChangeLog for GCC: * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. (ix86_handle_option): Handle OPT_mprefetchwt1. * config/i386/cpuid.h (bit_PREFETCHWT1): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect PREFETCHWT1 CPUID. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_PREFETCHWT1. * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. (PTA_PREFETCHWT1): New. (ix86_option_override_internal): Handle PTA_PREFETCHWT1. (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): New. * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 (*prefetch_avx512pf__: Change into ... (*prefetch_prefetchwt1_: This. * config/i386/i386.opt (mprefetchwt1): New. * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. (_mm_prefetch): Handle intent to write. * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. ChangeLog for tests: * gcc.target/i386/avx-1.c: Update __builtin_prefetch. * gcc.target/i386/prefetchwt1-1.c: New. * g++.dg/other/i386-2.C: Add new option. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/sse-12.c: Ditto. * gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option. * gcc.target/i386/sse-22.c: Add new option. * gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option. --- gcc/common/config/i386/i386-common.c | 15 +++ gcc/config/i386/cpuid.h | 4 gcc/config/i386/driver-i386.c | 7 +-- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c| 6 ++ gcc/config/i386/i386.h| 2 ++ gcc/config/i386/i386.md | 13 ++--- gcc/config/i386/i386.opt | 4 gcc/config/i386/xmmintrin.h | 6 -- gcc/doc/invoke.texi | 4 +++- gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-1.c | 2 +- gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++ gcc/testsuite/gcc.target/i386/sse-12.c| 2 +- gcc/testsuite/gcc.target/i386/sse-13.c| 4 ++-- gcc/testsuite/gcc.target/i386/sse-14.c| 2 +- gcc/testsuite/gcc.target/i386/sse-22.c| 2 +- gcc/testsuite/gcc.target/i386/sse-23.c| 4 ++-- 19 files changed, 75 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index b7f9ff6..a6ab555 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW #de
Re: [PATCH] Fix PR 60268
Andrey Belevantsev writes: > Fixed by placing the initialization properly at the end of sched_rgn_init > and also moving the check for sched_pressure != NONE outside of the if > statement in schedule_region as discussed in the PR trail with Jakub. > > Bootstrapped and tested on x86-64, ok? This breaks m68k: $ gcc/xgcc -Bgcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -flive-range-shrinkage -c -o pr60268.o ../gcc/testsuite/gcc.c-torture/compile/pr60268.c ../gcc/testsuite/gcc.c-torture/compile/pr60268.c: In function ‘f’: ../gcc/testsuite/gcc.c-torture/compile/pr60268.c:6:1: internal compiler error: in m68k_sched_issue_rate, at config/m68k/m68k.c:5978 0xbabc8b m68k_sched_issue_rate ../../gcc/config/m68k/m68k.c:5978 0xc3d9dc sched_init() ../../gcc/haifa-sched.c:6657 0xc3eecf haifa_sched_init() ../../gcc/haifa-sched.c:6719 0x8e807c schedule_insns ../../gcc/sched-rgn.c:3407 0x8e87cb schedule_insns ../../gcc/sched-rgn.c:3401 0x8e87cb rest_of_handle_live_range_shrinkage ../../gcc/sched-rgn.c:3614 0x8e87cb execute ../../gcc/sched-rgn.c:3704 Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
[PATCH, ARM] Set max_insns_skipped to MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2
Hi, Current value for max_insns_skipped is 6. For THUMB2, it needs 2 (IF-THEN) or 3 (IF-THEN-ELSE) IT blocks to hold all the instructions. The overhead of IT is 4 or 6 BYTES. If we do not generate IT blocks, for IF-THEN, the overhead of conditional jump is 2 or 4; for IF-THEN-ELSE, the overhead is 4, 6, or 8. Most THUMB2 jump instructions are 2 BYTES. Tests on CSiBE show no one file has code size regression. So The patch sets max_insns_skipped to MAX_INSN_PER_IT_BLOCK. No make check regression on cortex-m3. For CSiBE, no any file has code size regression. And overall there is >0.01% code size improvement for cortex-a9 and cortex-m4. Is it OK? Thanks! -Zhenqiang 2014-02-25 Zhenqiang Chen * config/arm/arm.c (arm_option_override): Set max_insns_skipped to MAX_INSN_PER_IT_BLOCK when optimize_size for THUMB2. testsuite/ChangeLog: 2014-02-25 Zhenqiang Chen * gcc.target/arm/max-insns-skipped.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index b49f43e..99cdbc4 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2743,6 +2743,15 @@ arm_option_override (void) /* If optimizing for size, bump the number of instructions that we are prepared to conditionally execute (even on a StrongARM). */ max_insns_skipped = 6; + + /* For THUMB2, it needs 2 (IF-THEN) or 3 (IF-THEN-ELSE) IT blocks to +hold all the instructions. The overhead of IT is 4 or 6 BYTES. +If we do not generate IT blocks, for IF-THEN, the overhead of +conditional jump is 2 or 4; for IF-THEN-ELSE, the overhead is 4, 6 +or 8. Most THUMB2 jump instructions are 2 BYTES. +So set max_insns_skipped to MAX_INSN_PER_IT_BLOCK. */ + if (TARGET_THUMB2) + max_insns_skipped = MAX_INSN_PER_IT_BLOCK; } else max_insns_skipped = current_tune->max_insns_skipped; diff --git a/gcc/testsuite/gcc.target/arm/max-insns-skipped.c b/gcc/testsuite/gcc.target/arm/max-insns-skipped.c new file mode 100644 index 000..0a11554 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/max-insns-skipped.c @@ -0,0 +1,21 @@ +/* { dg-do assemble { target arm_thumb2 } } */ +/* { dg-options " -Os " } */ + +int t (int a, int b, int c, int d) +{ + int r; + if (a > 0) { +r = a + b; +r += 0x456; +r *= 0x1234567; +} + else { +r = b - a; +r -= 0x123; +r *= 0x12387; +r += d; + } + return r; +} + +/* { dg-final { object-size text <= 40 } } */
Re: [PATCH] Fix PR 60268
On 25.02.2014 13:14, Andreas Schwab wrote: Andrey Belevantsev writes: Fixed by placing the initialization properly at the end of sched_rgn_init and also moving the check for sched_pressure != NONE outside of the if statement in schedule_region as discussed in the PR trail with Jakub. Bootstrapped and tested on x86-64, ok? This breaks m68k: $ gcc/xgcc -Bgcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -flive-range-shrinkage -c -o pr60268.o ../gcc/testsuite/gcc.c-torture/compile/pr60268.c ../gcc/testsuite/gcc.c-torture/compile/pr60268.c: In function ‘f’: ../gcc/testsuite/gcc.c-torture/compile/pr60268.c:6:1: internal compiler error: in m68k_sched_issue_rate, at config/m68k/m68k.c:5978 The patch itself has nothing to do with the ICE, probably it means that -flive-range-shrinkage was never tried without tuning on m68k, because in m68k.c there is 630 /* Setup scheduling options. */ 631 if (TUNE_CFV1) 632 m68k_sched_cpu = CPU_CFV1; 633 else if (TUNE_CFV2) 634 m68k_sched_cpu = CPU_CFV2; 635 else if (TUNE_CFV3) 636 m68k_sched_cpu = CPU_CFV3; 637 else if (TUNE_CFV4) 638 m68k_sched_cpu = CPU_CFV4; 639 else 640 { 641 m68k_sched_cpu = CPU_UNKNOWN; 642 flag_schedule_insns = 0; 643 flag_schedule_insns_after_reload = 0; 644 flag_modulo_sched = 0; 645 } And on line 641 m68k_sched_cpu set to CPU_UNKNOWN is causing the ICE. I guess you need to turn the live range shrinkage off in that piece of code, the below patch fixes the ICE for me: diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index f20d071..ea1bcd4 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -642,6 +642,7 @@ m68k_option_override (void) flag_schedule_insns = 0; flag_schedule_insns_after_reload = 0; flag_modulo_sched = 0; + flag_live_range_shrinkage = 0; } if (m68k_sched_cpu != CPU_UNKNOWN) Yours, Andrey 0xbabc8b m68k_sched_issue_rate ../../gcc/config/m68k/m68k.c:5978 0xc3d9dc sched_init() ../../gcc/haifa-sched.c:6657 0xc3eecf haifa_sched_init() ../../gcc/haifa-sched.c:6719 0x8e807c schedule_insns ../../gcc/sched-rgn.c:3407 0x8e87cb schedule_insns ../../gcc/sched-rgn.c:3401 0x8e87cb rest_of_handle_live_range_shrinkage ../../gcc/sched-rgn.c:3614 0x8e87cb execute ../../gcc/sched-rgn.c:3704 Andreas.
Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
On Tue, Feb 25, 2014 at 10:13 AM, Ilya Tocar wrote: >> >> > Latest version of AVX512 spec >> >> > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf >> >> > Has a few changes. >> >> > >> >> > 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. >> >> > We can either support new CPUID or disable PREFETCHWT1 from generating, >> >> > without removing code, and enable it in 4.9.1/latest version. >> >> > I am not sure that adding new -m flag and related stuff this late >> >> > is a good idea. Should still add it? >> >> >> >> Please submit the patch anyway. We can relax release constraints on >> >> non-algorithmic patch a bit, weighting in benefits of having gcc >> >> release that fully conforms to some published specification. >> >> >> > Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, >> > and uses them for prefetchwt1 instruction. Bootstraps/passes testing. >> > Ok for trunk? >> > > >> > * gcc.target/i386/avx-1.c: Update __builtin_prefetch. >> >> Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and >> g++.dg/other/i386-{2,3} and new options to >> gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and >> repost the patch. >> > > I've added new switch to those tests. However when I add prefetchwt1 > to pragma GCC target ("sse") sse-22a.c test fails with: > pmmintrin.h: In function '_mm_loaddup_pd': > emmintrin.h:119:1: error: inlining failed in call to always_inline > '_mm_load1_pd': target specific option mismatch > > I've checked and this isn't a problem with prefetchwt1. I get the same > error when I add any other option (e. g. sha) to #pragma GCC target ("sse"). > So I haven't added anything there. As that was the only fail, > I'm reposting this patch. > > ChangeLog for GCC: > > * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), > (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. > (ix86_handle_option): Handle OPT_mprefetchwt1. > * config/i386/cpuid.h (bit_PREFETCHWT1): New. > * config/i386/driver-i386.c (host_detect_local_cpu): Detect > PREFETCHWT1 CPUID. > * config/i386/i386-c.c (ix86_target_macros_internal): Handle > OPTION_MASK_ISA_PREFETCHWT1. > * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. > (PTA_PREFETCHWT1): New. > (ix86_option_override_internal): Handle PTA_PREFETCHWT1. > (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. > * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): > New. > * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 > (*prefetch_avx512pf__: Change into ... > (*prefetch_prefetchwt1_: This. > * config/i386/i386.opt (mprefetchwt1): New. > * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. > (_mm_prefetch): Handle intent to write. > * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. > > ChangeLog for tests: > > * gcc.target/i386/avx-1.c: Update __builtin_prefetch. > * gcc.target/i386/prefetchwt1-1.c: New. > * g++.dg/other/i386-2.C: Add new option. > * g++.dg/other/i386-3.C: Ditto. > * gcc.target/i386/sse-12.c: Ditto. > * gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option. > * gcc.target/i386/sse-22.c: Add new option. > * gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option. The patch is OK for mainline. Thanks, Uros.
[PATCH][ARM][v2] Fix PR 55426
Hi all, A while back I sent a patch to fix this PR (http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00652.html) by generalising the neon_vld1_dupv2di splitter. There is an alternative safer approach at this stage, which is to relax CANNOT_CHANGE_MODE_CLASS to allow conversions from 128 to 64-bit modes. In that case the layout in the d-registers happens to be valid in big-endian, so we don't end up generating subregs after reg allocation. This regression appears on 4.8 as well as trunk. Built and bootstrapped trunk and 4.8 on arm-none-linux-gnueabihf. Ok for those branches? Thanks, Kyrill 2014-02-25 Kyrylo Tkachov PR target/55426 * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Allow 128 to 64-bit conversions.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index bed056e..df30f77 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1257,9 +1257,13 @@ enum reg_class VFPv2. In big-endian mode, modes greater than word size (i.e. DFmode) are stored in VFP registers in little-endian order. We can't describe that accurately to - GCC, so avoid taking subregs of such values. */ + GCC, so avoid taking subregs of such values. + The only exception is going from a 128-bit to a 64-bit type. In that case + the data layout happens to be consistent for big-endian, so we explicitly allow + that case. */ #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ - (TARGET_VFP && TARGET_BIG_END\ + (TARGET_VFP && TARGET_BIG_END \ + && !(GET_MODE_SIZE (FROM) == 16 && GET_MODE_SIZE (TO) == 8) \ && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD \ || GET_MODE_SIZE (TO) > UNITS_PER_WORD) \ && reg_classes_intersect_p (VFP_REGS, (CLASS)))
[C++ Patch] PR 60314 (ICE with decltype(auto))
Hi, here we ICE exactly as we did in c++/53756: the only difference is the use of decltype(auto) instead of auto. Now, if we compare is_cxx_auto to is_auto (the front-end helper), evidently there is an inconsistency about the handling of decltype(auto) and the below fixes the ICE. However, also clearly the patchlet needs a review, because an out of class decltype(auto) is already fine. Also, I'm not 100% sure we don't need a decltype_auto_die, etc. Tested x86_64-linux. Thanks, Paolo. /// 2014-02-25 Paolo Carlini PR c++/60314 * dwarf2out.c (is_cxx_auto): Handle decltype(auto). /testsuite 2014-02-25 Paolo Carlini PR c++/60314 * g++.dg/cpp1y/auto-fn24.C: New. Index: dwarf2out.c === --- dwarf2out.c (revision 208113) +++ dwarf2out.c (working copy) @@ -10230,7 +10230,8 @@ is_cxx_auto (tree type) tree name = TYPE_NAME (type); if (TREE_CODE (name) == TYPE_DECL) name = DECL_NAME (name); - if (name == get_identifier ("auto")) + if (name == get_identifier ("auto") + || name == get_identifier ("decltype(auto)")) return true; } return false; Index: testsuite/g++.dg/cpp1y/auto-fn24.C === --- testsuite/g++.dg/cpp1y/auto-fn24.C (revision 0) +++ testsuite/g++.dg/cpp1y/auto-fn24.C (working copy) @@ -0,0 +1,12 @@ +// PR c++/60314 +// { dg-options "-std=c++1y -g" } + +// fine +decltype(auto) qux() { return 42; } + +struct foo +{ + // also ICEs if not static + static decltype(auto) bar() + { return 42; } +};
[PATCH][AArch64] Fix default CPU configurations
Hi all, The problem solved in this patch is that when gcc is configured with --with-arch=armv8-a gcc will go into aarch64-arches.def, pick the representative CPU (Cortex-A53 for ARMv8-A) and use that CPUs ISA flags. Now that we specified that Cortex-A53 has CRC and crypto though, this means that gcc will choose by default to enable CRC and Crypto. What it should be doing though is to use the 4th field in the AARCH64_ARCH macro that specifies the ISA flags implied by the architecture. This patch does that by looking in aarch64-arches.def and extracting the 4th field appropriately and using that as the ext_mask when processing a --with-arch option. Furthermore, if no --with-arch or --with-cpu directives are specified config.gcc will set TARGET_DEFAULT_CPU to TARGET_CPU_generic. What it should be doing, is leaving it undefined so that the backend in aarch64.h can define its own default with the correct ISA options (currently we have this scheme where the TARGET_CPU_ is encoded in the first 6 bits of TARGET_DEFAULT_CPU and the ISA flags are encoded in the upper part of it. We should clean that up in the next release). Before this patch, the code in aarch64.h that does that initialisation was never even exercised because TARGET_CPU_DEFAULT was always defined by config.gcc no matter what! config.gcc defined it as TARGET_CPU_generic but without encoding the appropriate ISA flags in the upper bits, leading to a cpu configured without fp or simd. After a discussion with Richard, this patch sets the default CPU (if no -mcpu,-march,--with-cpu,--with-arch is given) to be generic+fp+simd. The generic CPU already schedules like the Cortex-A53, so it should give a decent generic tuning. This patch should improve the current situation a bit. With this patch: - If --with-arch=armv8-a is specified we will use generic+fp+simd as the CPU (without the patch it's cortex-a53+fp+simd+crc+crypto) - If no arch or cpu options specified anywhere, we will use the generic+fp+simd CPU (without the patch it would be just generic) Tested aarch64-none-elf on a model and checked the .cpu directive in the generated assembly for a variety of --with-cpu, --with-arch combinations I'm proposing this patch as an alternative to http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01072.html. Ok for trunk? Thanks, Kyrill 2014-02-25 Kyrylo Tkachov * config.gcc (aarch64*-*-*): Use ISA flags from aarch64-arches.def. Do not define target_cpu_default2 to generic. * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Use generic cpu. * config/aarch64/aarch64.c (aarch64_override_options): Update comment. * config/aarch64/aarch64-arches.def (armv8-a): Use generic cpu. (aarch64_override_options): Update comment about default cpu.commit 918bb596fde24640a68e5d1febd6d94d6acbd9ab Author: Kyrylo Tkachov Date: Mon Feb 24 09:10:03 2014 + [AArch64] Fix default CPU options diff --git a/gcc/config.gcc b/gcc/config.gcc index 2156640..89da61b 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3391,6 +3391,11 @@ case "${target}" in ${srcdir}/config/aarch64/$def | \ sed -e 's/^[^,]*,[ ]*//' | \ sed -e 's/,.*$//'` +# Extract the architecture flags from aarch64-arches.def +ext_mask=`grep "^$pattern(\"$base_val\"," \ + ${srcdir}/config/aarch64/$def | \ + sed -e 's/)$//' | \ + sed -e 's/^.*,//'` else base_id=`grep "^$pattern(\"$base_val\"," \ ${srcdir}/config/aarch64/$def | \ @@ -4052,10 +4057,8 @@ esac target_cpu_default2= case ${target} in aarch64*-*-*) - if test x$target_cpu_cname = x + if test x"$target_cpu_cname" != x then - target_cpu_default2=TARGET_CPU_generic - else target_cpu_default2=$target_cpu_cname fi ;; diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 5028f61..4b796d8 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -26,4 +26,4 @@ this architecture. ARCH is the architecture revision. FLAGS are the flags implied by the architecture. */ -AARCH64_ARCH("armv8-a", cortexa53, 8, AARCH64_FL_FOR_ARCH8) +AARCH64_ARCH("armv8-a", generic, 8, AARCH64_FL_FOR_ARCH8) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 21054e5..624a7bb 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5307,7 +5307,7 @@ aarch64_override_options (void) /* If the user did not specify a processor, choose the default one for them. This will be the CPU set during configuration using - --with-cpu, otherwise it is "cortex-a53". */ + --with-cpu, otherwise it is "generic". */ if (!selected_cpu) { selected_cpu = &all_cores[TARGET_CPU_DEFAULT & 0x3f]; diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 13c424c..d0463be 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h
Re: [PATCH][ARM][v2] Fix PR 55426
On 25/02/14 09:56, Kyrill Tkachov wrote: > Hi all, > > A while back I sent a patch to fix this PR > (http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00652.html) by generalising the > neon_vld1_dupv2di splitter. There is an alternative safer approach at this > stage, which is to relax CANNOT_CHANGE_MODE_CLASS to allow conversions from > 128 > to 64-bit modes. In that case the layout in the d-registers happens to be > valid > in big-endian, so we don't end up generating subregs after reg allocation. > > This regression appears on 4.8 as well as trunk. > > Built and bootstrapped trunk and 4.8 on arm-none-linux-gnueabihf. > Ok for those branches? > > Thanks, > Kyrill > > 2014-02-25 Kyrylo Tkachov > > PR target/55426 > * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Allow 128 to 64-bit > conversions. > Please fix the backslashes so that they all line up correctly. OK both with that tweak. R. > > change-mode-class.patch > > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index bed056e..df30f77 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -1257,9 +1257,13 @@ enum reg_class > VFPv2. > In big-endian mode, modes greater than word size (i.e. DFmode) are stored > in > VFP registers in little-endian order. We can't describe that accurately > to > - GCC, so avoid taking subregs of such values. */ > + GCC, so avoid taking subregs of such values. > + The only exception is going from a 128-bit to a 64-bit type. In that case > + the data layout happens to be consistent for big-endian, so we explicitly > allow > + that case. */ > #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)\ > - (TARGET_VFP && TARGET_BIG_END \ > + (TARGET_VFP && TARGET_BIG_END \ > + && !(GET_MODE_SIZE (FROM) == 16 && GET_MODE_SIZE (TO) == 8) \ > && (GET_MODE_SIZE (FROM) > UNITS_PER_WORD \ > || GET_MODE_SIZE (TO) > UNITS_PER_WORD) \ > && reg_classes_intersect_p (VFP_REGS, (CLASS))) >
Re: [PATCH][AArch64] Fix default CPU configurations
On 25/02/14 10:08, Kyrill Tkachov wrote: 2014-02-25 Kyrylo Tkachov * config.gcc (aarch64*-*-*): Use ISA flags from aarch64-arches.def. Do not define target_cpu_default2 to generic. * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Use generic cpu. * config/aarch64/aarch64.c (aarch64_override_options): Update comment. * config/aarch64/aarch64-arches.def (armv8-a): Use generic cpu. (aarch64_override_options): Update comment about default cpu. Apologies, the ChangeLog should not have that last line in it: 2014-02-25 Kyrylo Tkachov * config.gcc (aarch64*-*-*): Use ISA flags from aarch64-arches.def. Do not define target_cpu_default2 to generic. * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Use generic cpu. * config/aarch64/aarch64.c (aarch64_override_options): Update comment. * config/aarch64/aarch64-arches.def (armv8-a): Use generic cpu.
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Tue, Feb 25, 2014 at 6:12 AM, bin.cheng wrote: > Hi, > This patch is to fix regression reported in PR60280 by removing forward loop > headers/latches in cfg cleanup if possible. Several tests are broken by > this change since cfg cleanup is shared by all optimizers. Some tests has > already been fixed by recent patches, I went through and fixed the others. > One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c". When > GCC removing a basic block, it checks profile information by calling > check_bb_profile after redirecting incoming edges of the bb. This certainly > results in warnings about invalid profile information and causes the case to > fail. I will send a patch to skip checking profile information for a > removing basic block in stage 1 if it sounds reasonable. For now I just > twisted the case itself. > > Bootstrap and tested on x86_64 and arm_a15. > > Is it OK? Can you document the extra threading we do in pr21559.c? The comment still talks about two threadings we should perform. Also the ivopt_* adjustmens would be better done by matching "ivtmp.[0-9_]* = PHI" instead of matching ivtmp in one of the PHI arguments. @@ -497,6 +507,9 @@ remove_forwarder_block (basic_block bb) set_immediate_dominator (CDI_DOMINATORS, dest, dom); } + if (current_loops && bb->loop_father->latch == bb) +bb->loop_father->latch = dest; + /* And kill the forwarder block. */ delete_basic_block (bb); can you add a comment here? I had @@ -497,7 +500,12 @@ remove_forwarder_block (basic_block bb) set_immediate_dominator (CDI_DOMINATORS, dest, dom); } - /* And kill the forwarder block. */ + /* And kill the forwarder block, but first adjust its parent loop + latch info as otherwise the cfg hook has a hard time not to + kill the loop. */ + if (current_loops + && bb->loop_father->latch == bb) +bb->loop_father->latch = dest; delete_basic_block (bb); return true; in my patch. Thanks, Richard. > > 2014-02-25 Bin Cheng > > PR target/60280 > * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop > preheaders and latches only if requested. Fix latch if it > is removed. > * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set > LOOPS_HAVE_PREHEADERS. > > gcc/testsuite/ChangeLog > 2014-02-25 Bin Cheng > > PR target/60280 > * gnat.dg/renaming5.adb: Change to two expected gotos. > * gcc.dg/tree-ssa/pr21559.c: Change back to three expected > jump threads. > * gcc.dg/tree-prof/update-loopch.c: Check two "Invalid sum" > messages for removed basic block. > * gcc.dg/tree-ssa/ivopt_1.c: Fix unreliable scanning string. > * gcc.dg/tree-ssa/ivopt_2.c: Ditto. > * gcc.dg/tree-ssa/ivopt_3.c: Ditto. > * gcc.dg/tree-ssa/ivopt_4.c: Ditto.
[RS6000, patch] pr57936, ICE in rs6000_secondary_reload_inner
PR57936 is regarding a reload problem with rs6000_secondary_reload_inner. This is the failing instruction, as at the start of reload: (insn 61 60 62 3 (set (reg:V16QI 220) (unspec:V16QI [ (subreg:V16QI (reg:V2DI 159 [ D.2446 ]) 0) (subreg:V16QI (reg:V2DI 159 [ D.2446 ]) 0) (reg:V16QI 213) ] UNSPEC_VPERM)) /home/pthaugen/src/gcc/gcc-4_9-power8/gcc/gcc/testsuite/g++.dg/torture/vshuf-main.inc:15 1253 {altivec_vperm_v16qi} (expr_list:REG_DEAD (reg:V16QI 213) (expr_list:REG_DEAD (reg:V2DI 159 [ D.2446 ]) (nil You'll notice that none of the operands got a hard register. Here are the reloads: Reloads for insn # 61 Reload 0: BASE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine reload_in_reg: (plus:SI (reg/f:SI 1 1) (const_int 64 [0x40])) Reload 1: BASE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 2), can't combine reload_in_reg: (plus:SI (reg/f:SI 1 1) (const_int 64 [0x40])) Reload 2: BASE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:SI 10 10) Reload 3: reload_in (V16QI) = (reg:V16QI 32 0) ALTIVEC_REGS, RELOAD_FOR_INPUT (opnum = 1), can't combine reload_in_reg: (subreg:V16QI (reg:V2DI 159 [ D.2446 ]) 0) reload_reg_rtx: (reg:V16QI 78 1) secondary_in_reload = 2 secondary_in_icode = reload_v16qi_si_load Reload 4: BASE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 2), can't combine, secondary_reload_p reload_reg_rtx: (reg:SI 10 10) Reload 5: reload_in (V16QI) = (reg:V16QI 32 0) ALTIVEC_REGS, RELOAD_FOR_INPUT (opnum = 2), can't combine reload_in_reg: (subreg:V16QI (reg:V2DI 159 [ D.2446 ]) 0) reload_reg_rtx: (reg:V16QI 78 1) secondary_in_reload = 4 secondary_in_icode = reload_v16qi_si_load Reload 3 completely reloads pseudo reg 159, which lives at (mem/c:V16QI (plus:SI (reg/f:SI 1 1) (const_int 64 [0x40])) [4 %sfp+64 S16 A128]) That's good and proper (except that duplicate reload 5 does the same thing unnecessarily, which is cleaned up later BTW). Pseudos that don't get a hard reg must be reloaded to a reg to satisfy the altivec_vperm insn constraint of "v". Normally, reload 2, the secondary reload for reload 3, would result in a call to reload_v16qi_si_load with (reg:V16QI 78 1) as its "reg" argument, and the mem home for pseudo reg 159 as its "mem" arg. However, reload1.c:choose_reload_regs has code to, as the comment says: /* First see if this pseudo is already available as reloaded for a previous insn. In this case reload finds such a register, so reload_v16qi_si_load then sees the register as its "mem" argument, and spits the dummy. I don't think the generic reload machinery is doing anything wrong at this point, although it is a little unusual in that secondary reloads like reload 2 above are not deleted when inheriting reloads. (Other reloads are. See remove_address_replacements calls in choose_reload_regs.) I guess there may be some circumstances when the secondary reload can't just be replaced with an insn moving from one reg to another.. Bootstrapped and regression tested powerpc64-linux. OK to apply? PR target/57936 * config/rs6000/rs6000.c (rs6000_secondary_reload_inner): Do not fail when "mem" arg is not a MEM. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 208097) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -16363,7 +16363,18 @@ rs6000_secondary_reload_inner (rtx reg, rtx mem, r rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); if (GET_CODE (mem) != MEM) -rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); +{ + /* If MEM was the home of a pseudo reg that wasn't allocated a +hard register, then when optimising reload will look at +previous instructions to see whether the MEM has already been +reloaded into a hard register. If that register is still +valid then we'll see it here rather than the MEM. */ + if (store_p) + emit_insn (gen_rtx_SET (VOIDmode, mem, reg)); + else + emit_insn (gen_rtx_SET (VOIDmode, reg, mem)); + return; +} rclass = REGNO_REG_CLASS (regno); addr = find_replacement (&XEXP (mem, 0)); -- Alan Modra Australia Development Lab, IBM
RE: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
Updated as comments. Thanks, bin > -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, February 25, 2014 6:38 PM > To: Bin Cheng > Cc: GCC Patches > Subject: Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop > preheaders and latches > > On Tue, Feb 25, 2014 at 6:12 AM, bin.cheng wrote: > > Hi, > > This patch is to fix regression reported in PR60280 by removing > > forward loop headers/latches in cfg cleanup if possible. Several > > tests are broken by this change since cfg cleanup is shared by all > > optimizers. Some tests has already been fixed by recent patches, I went > through and fixed the others. > > One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c". > > When GCC removing a basic block, it checks profile information by > > calling check_bb_profile after redirecting incoming edges of the bb. > > This certainly results in warnings about invalid profile information > > and causes the case to fail. I will send a patch to skip checking > > profile information for a removing basic block in stage 1 if it sounds > > reasonable. For now I just twisted the case itself. > > > > Bootstrap and tested on x86_64 and arm_a15. > > > > Is it OK? > > Can you document the extra threading we do in pr21559.c? The comment > still talks about two threadings we should perform. > > Also the ivopt_* adjustmens would be better done by matching > "ivtmp.[0-9_]* = PHI" instead of matching ivtmp in one of the PHI arguments. > > @@ -497,6 +507,9 @@ remove_forwarder_block (basic_block bb) >set_immediate_dominator (CDI_DOMINATORS, dest, dom); > } > > + if (current_loops && bb->loop_father->latch == bb) > +bb->loop_father->latch = dest; > + >/* And kill the forwarder block. */ >delete_basic_block (bb); > > can you add a comment here? I had > > @@ -497,7 +500,12 @@ remove_forwarder_block (basic_block bb) >set_immediate_dominator (CDI_DOMINATORS, dest, dom); > } > > - /* And kill the forwarder block. */ > + /* And kill the forwarder block, but first adjust its parent loop > + latch info as otherwise the cfg hook has a hard time not to > + kill the loop. */ > + if (current_loops > + && bb->loop_father->latch == bb) > +bb->loop_father->latch = dest; >delete_basic_block (bb); > >return true; > > in my patch. > > Thanks, > Richard. > > > > > 2014-02-25 Bin Cheng > > > > PR target/60280 > > * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop > > preheaders and latches only if requested. Fix latch if it > > is removed. > > * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set > > LOOPS_HAVE_PREHEADERS. > > > > gcc/testsuite/ChangeLog > > 2014-02-25 Bin Cheng > > > > PR target/60280 > > * gnat.dg/renaming5.adb: Change to two expected gotos. > > * gcc.dg/tree-ssa/pr21559.c: Change back to three expected > > jump threads. > > * gcc.dg/tree-prof/update-loopch.c: Check two "Invalid sum" > > messages for removed basic block. > > * gcc.dg/tree-ssa/ivopt_1.c: Fix unreliable scanning string. > > * gcc.dg/tree-ssa/ivopt_2.c: Ditto. > > * gcc.dg/tree-ssa/ivopt_3.c: Ditto. > > * gcc.dg/tree-ssa/ivopt_4.c: Ditto. Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 207938) +++ gcc/tree-cfgcleanup.c (working copy) @@ -308,14 +308,24 @@ tree_forwarder_block_p (basic_block bb, bool phi_w if (current_loops) { basic_block dest; - /* Protect loop latches, headers and preheaders. */ + /* Protect loop headers. */ if (bb->loop_father->header == bb) return false; + dest = EDGE_SUCC (bb, 0)->dest; + /* Protect loop preheaders and latches if requested. */ + if (dest->loop_father->header == dest) + { + if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + && bb->loop_father->header != dest) + return false; - if (dest->loop_father->header == dest) - return false; + if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + && bb->loop_father->header == dest) + return false; + } } + return true; } @@ -497,6 +507,11 @@ remove_forwarder_block (basic_block bb) set_immediate_dominator (CDI_DOMINATORS, dest, dom); } + /* Adjust latch infomation of BB's parent loop as otherwise + the cfg hook has a hard time not to kill the loop. */ + if (current_loops && bb->loop_father->latch == bb) +bb->loop_father->latch = dest; + /* And kill the forwarder block. */ delete_basic_block (bb); Index: gcc/tree-ssa-dom.c === --- gcc/tree-ssa-dom.c (revision 207938) +++ gcc/tree-ssa-dom.c (working copy) @@ -849,9 +849,15 @@ tree_ssa_dominator_optimize
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Tue, Feb 25, 2014 at 12:20 PM, bin.cheng wrote: > Updated as comments. Ok. Thanks, Richard. > Thanks, > bin > >> -Original Message- >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> Sent: Tuesday, February 25, 2014 6:38 PM >> To: Bin Cheng >> Cc: GCC Patches >> Subject: Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop >> preheaders and latches >> >> On Tue, Feb 25, 2014 at 6:12 AM, bin.cheng wrote: >> > Hi, >> > This patch is to fix regression reported in PR60280 by removing >> > forward loop headers/latches in cfg cleanup if possible. Several >> > tests are broken by this change since cfg cleanup is shared by all >> > optimizers. Some tests has already been fixed by recent patches, I went >> through and fixed the others. >> > One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c". >> > When GCC removing a basic block, it checks profile information by >> > calling check_bb_profile after redirecting incoming edges of the bb. >> > This certainly results in warnings about invalid profile information >> > and causes the case to fail. I will send a patch to skip checking >> > profile information for a removing basic block in stage 1 if it sounds >> > reasonable. For now I just twisted the case itself. >> > >> > Bootstrap and tested on x86_64 and arm_a15. >> > >> > Is it OK? >> >> Can you document the extra threading we do in pr21559.c? The comment >> still talks about two threadings we should perform. >> >> Also the ivopt_* adjustmens would be better done by matching >> "ivtmp.[0-9_]* = PHI" instead of matching ivtmp in one of the PHI > arguments. >> >> @@ -497,6 +507,9 @@ remove_forwarder_block (basic_block bb) >>set_immediate_dominator (CDI_DOMINATORS, dest, dom); >> } >> >> + if (current_loops && bb->loop_father->latch == bb) >> +bb->loop_father->latch = dest; >> + >>/* And kill the forwarder block. */ >>delete_basic_block (bb); >> >> can you add a comment here? I had >> >> @@ -497,7 +500,12 @@ remove_forwarder_block (basic_block bb) >>set_immediate_dominator (CDI_DOMINATORS, dest, dom); >> } >> >> - /* And kill the forwarder block. */ >> + /* And kill the forwarder block, but first adjust its parent loop >> + latch info as otherwise the cfg hook has a hard time not to >> + kill the loop. */ >> + if (current_loops >> + && bb->loop_father->latch == bb) >> +bb->loop_father->latch = dest; >>delete_basic_block (bb); >> >>return true; >> >> in my patch. >> >> Thanks, >> Richard. >> >> > >> > 2014-02-25 Bin Cheng >> > >> > PR target/60280 >> > * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop >> > preheaders and latches only if requested. Fix latch if it >> > is removed. >> > * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set >> > LOOPS_HAVE_PREHEADERS. >> > >> > gcc/testsuite/ChangeLog >> > 2014-02-25 Bin Cheng >> > >> > PR target/60280 >> > * gnat.dg/renaming5.adb: Change to two expected gotos. >> > * gcc.dg/tree-ssa/pr21559.c: Change back to three expected >> > jump threads. >> > * gcc.dg/tree-prof/update-loopch.c: Check two "Invalid sum" >> > messages for removed basic block. >> > * gcc.dg/tree-ssa/ivopt_1.c: Fix unreliable scanning string. >> > * gcc.dg/tree-ssa/ivopt_2.c: Ditto. >> > * gcc.dg/tree-ssa/ivopt_3.c: Ditto. >> > * gcc.dg/tree-ssa/ivopt_4.c: Ditto.
[AArch64] Logical vector shift right conformance
Hi, This patch fixes a bug in vshr_n_u64 and vshrd_n_u64 intrinsic behavior in case of shift by 64. Shift by 64 is strictly defined in ACLE to use ushr instruction intended by those intrinsics. The testcase provided also tests the behavior for intrinsics mentioned above with values other then 64. Besides, the test checks that an illeagal ushr shift by 0 is not generated, expecting the test to compile and run correctly generating instructions other than ushr. The patch was tested for LE and BE with no regressions. Is given patch ok for stage-4? Thanks, Alex gcc/ 2014-02-25 Alex Velenko * config/aarch64/aarch64-simd-builtins.def (lshr): DI mode excluded. (lshr_simd): DI mode added. * config/aarch64/aarch64-simd.md (aarch64_lshr_simddi): New pattern. (aarch64_ushr_simddi): Likewise. * config/aarch64/aarch64.md (UNSPEC_USHR64): New unspec. * config/aarch64/arm_neon.h (vshr_n_u64): Intrinsic fixed. (vshrd_n_u64): Likewise. gcc/testsuite/ 2014-02-25 Alex Velenko * gcc.target/aarch64/ushr64_1.c: New testcase. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index ebab2ce8347a4425977c5cbd0f285c3ff1d9f2f1..ac5522cac00e6dd8a808ac3c68b4fa8cc15d9120 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -183,6 +183,10 @@ aarch64_types_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TYPES_GETLANE (aarch64_types_getlane_qualifiers) #define TYPES_SHIFTIMM (aarch64_types_getlane_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_unsigned_shift_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate }; +#define TYPES_USHIFTIMM (aarch64_types_unsigned_shift_qualifiers) +static enum aarch64_type_qualifiers aarch64_types_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate }; #define TYPES_SETLANE (aarch64_types_setlane_qualifiers) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index e5f71b479ccfd1a9cbf84aed0f96b49762053f59..c9b7570e565979cb454d594c84e625380419d0e6 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -192,7 +192,8 @@ BUILTIN_VDQ_I (SHIFTIMM, ashr, 3) VAR1 (SHIFTIMM, ashr_simd, 0, di) - BUILTIN_VSDQ_I_DI (SHIFTIMM, lshr, 3) + BUILTIN_VDQ_I (SHIFTIMM, lshr, 3) + VAR1 (USHIFTIMM, lshr_simd, 0, di) /* Implemented by aarch64_shr_n. */ BUILTIN_VSDQ_I_DI (SHIFTIMM, srshr_n, 0) BUILTIN_VSDQ_I_DI (SHIFTIMM, urshr_n, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4dffb59e856aeaafb79007255d3b91a73ef1ef13..6048d605c72e6a43b9a004a8bc89dbfa89f3ed5b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -724,6 +724,31 @@ DONE; }) +(define_expand "aarch64_lshr_simddi" + [(match_operand:DI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "w") + (match_operand:SI 2 "aarch64_shift_imm64_di" "")] + "TARGET_SIMD" + { +if (INTVAL (operands[2]) == 64) + emit_insn (gen_aarch64_ushr_simddi (operands[0], operands[1])); +else + emit_insn (gen_lshrdi3 (operands[0], operands[1], operands[2])); +DONE; + } +) + +;; SIMD shift by 64. This pattern is a special case as standard pattern does +;; not handle NEON shifts by 64. +(define_insn "aarch64_ushr_simddi" + [(set (match_operand:DI 0 "register_operand" "=w") +(unspec:DI + [(match_operand:DI 1 "register_operand" "w")] UNSPEC_USHR64))] + "TARGET_SIMD" + "ushr\t%d0, %d1, 64" + [(set_attr "type" "neon_shift_imm")] +) + (define_expand "vec_set" [(match_operand:VQ_S 0 "register_operand") (match_operand: 1 "register_operand") diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8fcbdcd24a0ea18cc037bef9cf72070281..c86a29d8e7f8df21f25e14d22df1c3e8c37c907f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -101,6 +101,7 @@ UNSPEC_TLS UNSPEC_TLSDESC UNSPEC_USHL_2S +UNSPEC_USHR64 UNSPEC_VSTRUCTDUMMY ]) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 6af99361b8e265f66026dc506cfc23f044d153b4..612b899f31584378844f1b82353e8d1dd3d5ec61 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -23364,7 +23364,7 @@ vshr_n_u32 (uint32x2_t __a, const int __b) __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vshr_n_u64 (uint64x1_t __a, const int __b) { - return (uint64x1_t) __builtin_aarch64_lshrdi ((int64x1_t) __a, __b); + return __builtin_aarch64_lshr_simddi_uus ( __a, __b); } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) @@ -23421,10 +23421,10 @@ vshrd_n_s64 (int64x1_t __a, const int __b) return (i
[Patch] Added Comment
Hi I added a comment to cpplib.h file for understanding. Thanks.
[Patch] Added Comment
Added a comment to cpplib.h file. Index: cpplib.h === --- cpplib.h (revision 208118) +++ cpplib.h (working copy) @@ -103,7 +103,7 @@ OP(SEMICOLON, ";") /* structure */\ OP(ELLIPSIS, "...") \ OP(PLUS_PLUS, "++") /* increment */\ - OP(MINUS_MINUS, "--") \ + OP(MINUS_MINUS, "--") /* decrement */ \ OP(DEREF, "->") /* accessors */\ OP(DOT, ".") \ OP(SCOPE, "::") \
Re: [RS6000, patch] pr57936, ICE in rs6000_secondary_reload_inner
Alan Modra wrote: > Normally, reload 2, the secondary reload for reload 3, would result in > a call to reload_v16qi_si_load with (reg:V16QI 78 1) as its "reg" > argument, and the mem home for pseudo reg 159 as its "mem" arg. > > However, reload1.c:choose_reload_regs has code to, as the comment > says: > /* First see if this pseudo is already available as reloaded >for a previous insn. > > In this case reload finds such a register, so reload_v16qi_si_load > then sees the register as its "mem" argument, and spits the dummy. This still seems odd to me, I don't think the intent was that the back-end secondary reload code is supposed to handle such cases. Instead, there's code in emit_input_reload_insns that is supposed to re-check whether a secondary reload is still needed if something changed significantly, e.g. if a value was inherited: /* If we have a secondary reload, pick up the secondary register and icode, if any. If OLDEQUIV and OLD are different or if this is an in-out reload, recompute whether or not we still need a secondary register and what the icode should be. If we still need a secondary register and the class or icode is different, go back to reloading from OLD if using OLDEQUIV means that we got the wrong type of register. We cannot have different class or icode due to an in-out reload because we don't make such reloads when both the input and output need secondary reload registers. */ Note the condition "if OLDEQUIV and OLD are different" should be true if the input value was inherited. Can you check whether this case is hit with your test case? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH] Fix PR60327 - dealII and Xalanbmk ICEing with LTO
This fixes the ICE on our regular -flto-partition=none testers which sees an edge w/o call-stmt after inlining (see the PR for details). I'm not sure this is supposed to happen but the following re-instantiates the guard to inline_update_overall_summary which was present before the last change to that area. LTO bootstrap / regtest running on x86_64-unknown-linux-gnu, ok? Thanks, Richard. 2014-02-25 Richard Biener PR ipa/60327 * ipa.c (walk_polymorphic_call_targets): Properly guard call to inline_update_overall_summary. Index: gcc/ipa.c === *** gcc/ipa.c (revision 208124) --- gcc/ipa.c (working copy) *** walk_polymorphic_call_targets (pointer_s *** 223,232 edge->caller->order, target->name (), target->order); edge = cgraph_make_edge_direct (edge, target); ! if (!inline_summary_vec && edge->call_stmt) ! cgraph_redirect_edge_call_stmt_to_callee (edge); ! else inline_update_overall_summary (node); } } } --- 223,232 edge->caller->order, target->name (), target->order); edge = cgraph_make_edge_direct (edge, target); ! if (inline_summary_vec) inline_update_overall_summary (node); + else if (edge->call_stmt) + cgraph_redirect_edge_call_stmt_to_callee (edge); } } }
Re: [RS6000, patch] pr57936, ICE in rs6000_secondary_reload_inner
On Tue, Feb 25, 2014 at 02:30:59PM +0100, Ulrich Weigand wrote: > Alan Modra wrote: > > > Normally, reload 2, the secondary reload for reload 3, would result in > > a call to reload_v16qi_si_load with (reg:V16QI 78 1) as its "reg" > > argument, and the mem home for pseudo reg 159 as its "mem" arg. > > > > However, reload1.c:choose_reload_regs has code to, as the comment > > says: > > /* First see if this pseudo is already available as reloaded > > for a previous insn. > > > > In this case reload finds such a register, so reload_v16qi_si_load > > then sees the register as its "mem" argument, and spits the dummy. > > This still seems odd to me, I don't think the intent was that the > back-end secondary reload code is supposed to handle such cases. > > Instead, there's code in emit_input_reload_insns that is supposed > to re-check whether a secondary reload is still needed if something > changed significantly, e.g. if a value was inherited: > > /* If we have a secondary reload, pick up the secondary register > and icode, if any. If OLDEQUIV and OLD are different or > if this is an in-out reload, recompute whether or not we > still need a secondary register and what the icode should > be. If we still need a secondary register and the class or > icode is different, go back to reloading from OLD if using > OLDEQUIV means that we got the wrong type of register. We > cannot have different class or icode due to an in-out reload > because we don't make such reloads when both the input and > output need secondary reload registers. */ > > Note the condition "if OLDEQUIV and OLD are different" should be > true if the input value was inherited. Can you check whether > this case is hit with your test case? No. oldequiv and old are both (reg:V16QI 32 0). reload_override_in[j] is (reg:V16QI 32 0), and rl->in_reg is (subreg:V16QI (reg:V2DI 159 [ D.2446 ]) 0), so perhaps we're missing a subreg test in if (reload_override_in[j] && REG_P (rl->in_reg)) { oldequiv = old; old = rl->in_reg; } I'll follow this up after getting some sleep. Right now I'm too tired to think straight. Note that rs6000_secondary_reload_inner is not emitting a nop.. c4: 39 21 00 40 addir9,r1,64 c8: 38 61 00 20 addir3,r1,32 cc: 7c 3c ee 99 lxvd2x vs33,r28,r29 d0: 7c 1f ee 98 lxvd2x vs0,r31,r29 d4: 38 9f 00 d0 addir4,r31,208 d8: 38 a0 00 10 li r5,16 dc: 7c 00 4f 98 stxvd2x vs0,0,r9(1) e0: 10 01 ed c4 vsldv0,v1,v29 e4: f0 20 04 91 xxlor vs33,vs0,vs0(2) e8: 10 00 07 ab vperm v0,v0,v0,v30 ec: 10 00 f8 00 vaddubm v0,v0,v31 f0: 10 01 08 2b vperm v0,v1,v1,v0 (3) (1) the previous reload of pseudo 159 (2) the insn emitted by rs6000_secondary_reload_inner (3) insn 61 (v1==vs33) -- Alan Modra Australia Development Lab, IBM
[Ada] Illegal use of SPARK volatile object not detected
This patch simplifies the entity resolution machinery which detects an illegaly used SPARK volatile object with enabled external properties Async_Writers or Effective_Reads. The mechanism no longer traverses the parent chain as this is not needed. -- Source -- -- volatile_use.ads package Volatile_Use with SPARK_Mode => On is V1 : Integer with Volatile, Async_Writers => True; procedure Test_Eval_Order_OK (X : out Boolean) with Global => (Input => V1), Depends => (X => V1); procedure Test_Eval_Order_Bad1 (X : out Boolean) with Global => (Input => V1), Depends => (X => V1); procedure Test_Eval_Order_Bad2 (X : out Boolean) with Global => (Input => V1), Depends => (X => V1); end Volatile_Use; -- volatile_use.adb package body Volatile_Use with SPARK_Mode => On is procedure Test_Eval_Order_OK (X : out Boolean) is T1 : Integer; T2 : Integer; begin T1 := V1; T2 := V1; X := (T1 <= T2); end Test_Eval_Order_OK; procedure Test_Eval_Order_Bad1 (X : out Boolean) is T1 : Integer; begin T1 := V1; X := (T1 <= V1); end Test_Eval_Order_Bad1; procedure Test_Eval_Order_Bad2 (X : out Boolean) is begin X := (V1 <= V1); end Test_Eval_Order_Bad2; end Volatile_Use; -- Compilation and output -- $ gcc -c volatile_use.adb volatile_use.adb:15:19: volatile object cannot appear in this context (SPARK RM 7.1.3(13)) volatile_use.adb:20:13: volatile object cannot appear in this context (SPARK RM 7.1.3(13)) volatile_use.adb:20:19: volatile object cannot appear in this context (SPARK RM 7.1.3(13)) Tested on x86_64-pc-linux-gnu, committed on trunk 2014-02-25 Hristian Kirtchev * sem_res.adb (Appears_In_Check): New routine. (Resolve_Entity_Name): Remove local variables Prev and Usage_OK. Par is now a constant. Remove the parent chain traversal as the placement of a volatile object with enabled property Async_Writers and/or Effective_Reads must appear immediately within a legal construct. Index: sem_res.adb === --- sem_res.adb (revision 208076) +++ sem_res.adb (working copy) @@ -6434,13 +6434,43 @@ -- Used to resolve identifiers and expanded names procedure Resolve_Entity_Name (N : Node_Id; Typ : Entity_Id) is - E: constant Entity_Id := Entity (N); - Par : Node_Id; - Prev : Node_Id; + function Appears_In_Check (Nod : Node_Id) return Boolean; + -- Denote whether an arbitrary node Nod appears in a check node - Usage_OK : Boolean := False; - -- Flag set when the use of a volatile object agrees with its context + -- + -- Appears_In_Check -- + -- + function Appears_In_Check (Nod : Node_Id) return Boolean is + Par : Node_Id; + + begin + -- Climb the parent chain looking for a check node + + Par := Nod; + while Present (Par) loop +if Nkind (Par) in N_Raise_xxx_Error then + return True; + +-- Prevent the search from going too far + +elsif Is_Body_Or_Package_Declaration (Par) then + exit; +end if; + +Par := Parent (Par); + end loop; + + return False; + end Appears_In_Check; + + -- Local variables + + E : constant Entity_Id := Entity (N); + Par : constant Node_Id := Parent (N); + + -- Start of processing for Resolve_Entity_Name + begin -- If garbage from errors, set to Any_Type and return @@ -6555,62 +6585,43 @@ (Async_Writers_Enabled (E) or else Effective_Reads_Enabled (E)) then - Par := Parent (N); - Prev := N; - while Present (Par) loop + -- The volatile object can appear on either side of an assignment --- The volatile object can appear on either side of an assignment + if Nkind (Par) = N_Assignment_Statement then +null; -if Nkind (Par) = N_Assignment_Statement then - Usage_OK := True; - exit; + -- The volatile object is part of the initialization expression of + -- another object. Ensure that the climb of the parent chain came + -- from the expression side and not from the name side. --- The volatile object is part of the initialization expression of --- another object. Ensure that the climb of the parent chain came --- from the expression side and not from the name side. + elsif Nkind (Par) = N_Object_Declaration + and then Present (Expression (Par)) + and then N = Expression (Par) + then +null; -elsif Nkind (Par) = N_
Re: Unreviewed Patch
On 25-Feb-14 01:21, Jeff Law wrote: I think this should be queued until after 4.9 branches. It's adding a new capability (posix threading on vxworks), not fixing a bug and certainly not fixing a regression AFAICT. Fair enough. It just seems somewhat trivial to me, as it doesn't add any functional code, just some #defines and a stub. Bug could be "compilation of pthread-based gthreads disallowed on platform (vxworks) supporting pthreads", but that's a stretch, so if you want I can resubmit it after 4.9 branches. I just think that users should be able to compile targeting pthreads if they know that their target will support it, especially if it enables additional capabilities (e.g. C++11 std::thread).
[Ada] Memory leak with Ada 2012 iterator loop
This patch plugs several memory leaks involving Ada 2012 iterator loops by properly managing the secondary stack at each iteration of the loop. -- Source -- -- iterator_leak.adb with Ada.Containers; use Ada.Containers; with Ada.Containers.Vectors; with Ada.Text_IO;use Ada.Text_IO; procedure Iterator_Leak is type Rec is record Comp : Integer := 0; end record; package Vecs is new Vectors (Element_Type => Rec, Index_Type => Positive); V1_Size : constant Integer := 1_000; V2_Size : constant Integer := 1_000; Total : Integer := 1; V1 : Vecs.Vector; V2 : Vecs.Vector; begin Vecs.Set_Length (V1, Count_Type (V1_Size)); Vecs.Set_length (V2, Count_Type (V2_Size)); for Elem1 of V1 loop for Elem2 of V2 loop if Elem1 = Elem2 then Total := Total + 1; end if; end loop; end loop; for Index1 in 1 .. V1_Size loop for Index2 in 1 .. V2_Size loop declare Elem1 : constant Rec := V1 (Index1); Elem2 : constant Rec := V2 (Index2); begin if Elem1 = Elem2 then Total := Total + 1; end if; end; end loop; end loop; for Cur1 in Vecs.Iterate (V1) loop for Cur2 in Vecs.Iterate (V2) loop if V1 (Cur1) = V2 (Cur2) then Total := Total + 1; end if; end loop; end loop; end Iterator_Leak; -- Compilation and output -- $ gnatmake -q iterator_leak.adb -largs -lgmem $ ./iterator_leak $ gnatmem iterator_leak > output.txt $ grep "Total number" output.txt Total number of allocations: 2 Total number of deallocations : 2 Tested on x86_64-pc-linux-gnu, committed on trunk 2014-02-25 Hristian Kirtchev * einfo.ads Update the usage of flag Uses_Sec_Stack. Uses_Sec_Stack now applies to E_Loop entities. * exp_ch5.adb (Expand_Iterator_Loop): The temporary for a cursor now starts with the letter 'C'. This makes reading expanded code easier. * exp_ch7.adb (Establish_Transient_Scope): Add local variable Iter_Loop. Signal that an Ada 2012 iterator loop requires secondary stack management when creating a transient scope for an element reference. * exp_util.adb (Process_Statements_For_Controlled_Objects): When wrapping the statements of a loop, pass the E_Loop entity to the wrapping machinery. (Wrap_Statements_In_Block): Add formal parameter Scop along with comment on usage. Add local variables Block_Id, Block_Nod and Iter_Loop. Mark the generated block as requiring secondary stack management when the block is created inside an Ada 2012 iterator loop. This ensures that any reference objects are reclaimed on each iteration of the loop. * sem_ch5.adb (Analyze_Loop_Statement): Mark the generated block tasked with the handling of container iterators as requiring secondary stack management. This ensures that iterators are reclaimed when the loop terminates or is exited in any fashion. * sem_util.adb (Add_Block_Identifier): New routine. (Find_Enclosing_Iterator_Loop): New routine. * sem_util.ads (Add_Block_Identifier): New routine. (Find_Enclosing_Iterator_Loop): New routine. Index: exp_ch5.adb === --- exp_ch5.adb (revision 208132) +++ exp_ch5.adb (revision 208133) @@ -3264,7 +3264,7 @@ Ent : Entity_Id; begin - Cursor := Make_Temporary (Loc, 'I'); + Cursor := Make_Temporary (Loc, 'C'); -- For an container element iterator, the iterator type -- is obtained from the corresponding aspect, whose return Index: exp_ch7.adb === --- exp_ch7.adb (revision 208132) +++ exp_ch7.adb (revision 208133) @@ -3558,6 +3558,7 @@ procedure Establish_Transient_Scope (N : Node_Id; Sec_Stack : Boolean) is Loc : constant Source_Ptr := Sloc (N); + Iter_Loop : Entity_Id; Wrap_Node : Node_Id; begin @@ -3571,8 +3572,8 @@ return; - -- If we have encountered Standard there are no enclosing - -- transient scopes. + -- If we have encountered Standard there are no enclosing transient + -- scopes. elsif Scope_Stack.Table (S).Entity = Standard_Standard then exit; @@ -3581,17 +3582,17 @@ Wrap_Node := Find_Node_To_Be_Wrapped (N); - -- Case of no wrap node, false alert, no transient scope needed + -- The context does not contain a node that requires a transient scope, + -- nothing to do. if No (Wrap_Node) then null; - -- If the node to
Re: [PATCH, rs6000] Canonicalize split for unordered vector compares
Hi David, Thanks. I have this upstream for mainline now. This problem appears to have been introduced in GCC 4.6. Is it ok to backport this fix to the FSF 4.7 and 4.8 branches? Thanks, Bill On Mon, 2014-02-24 at 23:29 -0500, David Edelsohn wrote: > On Mon, Feb 24, 2014 at 9:13 PM, Bill Schmidt > wrote: > > Hi, > > > > The pattern *vector_unordered performs a split that's intended to > > match the nor3 pattern. However, it doesn't use the proper > > canonical form, so the resulting insn isn't recognized. This patch > > changes the split to use the canonical form. > > > > Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no > > regressions. Is this ok for trunk? > > > > Thanks, > > Bill > > > > > > 2014-02-24 Bill Schmidt > > > > * config/rs6000/vector.md (*vector_unordered): Change split > > to use canonical form for nor3. > > Okay. > > Thanks, David >
[Ada] Handling of SPARK aspects/pragmas on subprogram body stubs
This patch reimplements the support for SPARK aspects/pragmas that apply to a subprogram body stub and implements a missing rule which forbids the placement of refinement annotations in subunits. -- Source -- -- error.ads package Error with SPARK_Mode => On, Abstract_State => State is procedure Spec_Stub_Body_1 with Global => (In_Out => State); procedure Spec_Stub_Body_2 with Global => (In_Out => State), Depends => (State => State); procedure Spec_Stub_Body_3 with Global => (In_Out => State), Depends => (State => State); procedure Spec_Stub_Body_4 with Global => (In_Out => State), Depends => (State => State); procedure Spec_Stub_Body_5 with Global => (In_Out => State), Depends => (State => State); end Error; -- error.adb package body Error with SPARK_Mode=> On, Refined_State => (State => (A, B)) is A : Integer := 1; B : Integer := 2; procedure Spec_Stub_Body_1 is separate with Depends => (A => B); -- error -- Depends must appear on the spec (first declaration) procedure Spec_Stub_Body_2 is separate with Refined_Global => (In_Out => (A, B)); -- Refined_Depends must appear on the stub (second declaration) procedure Spec_Stub_Body_3 is separate; -- Refined_Global and Refined_Depends must appear on the stub (second -- declaration). procedure Spec_Stub_Body_4 is separate with Refined_Global => (In_Out => (A, "error")), Refined_Depends => ("error" => B); -- Refined_Global and Refined_Depends are placed properly, but malformed procedure Spec_Stub_Body_5 is separate with Refined_Global => (In_Out => (A, "error")), Refined_Depends => ("error" => B); -- Refined_Global and Refined_Depends are placed properly, but malformed. A -- proper body is also missing. procedure Stub_Body is separate with Global => (In_Out => (A, B)), Depends => (A => B); -- Refined_Global and Refined_Depends apply to a body whose spec (the -- stub) is not visible. end Error; -- error-spec_stub_body_1.adb separate (Error) procedure Spec_Stub_Body_1 is begin null; end Spec_Stub_Body_1; -- error-spec_stub_body_2.adb separate (Error) procedure Spec_Stub_Body_2 with Refined_Depends => (A => B) -- error is begin null; end Spec_Stub_Body_2; -- error-spec_stub_body_3.adb separate (Error) procedure Spec_Stub_Body_3 with Refined_Global => (In_Out => (A, B)), -- error Refined_Depends => (A => B) -- error is begin null; end Spec_Stub_Body_3; -- error-spec_stub_body_4.adb separate (Error) procedure Spec_Stub_Body_4 is begin null; end Spec_Stub_Body_4; -- Compilation and output -- $ gcc -c error.adb error.adb:9:11: aspect specification must appear in subprogram declaration error.adb:25:04: warning: subunit "Error.Spec_Stub_Body_5" in file "error-spec_stub_body_5.adb" not found error-spec_stub_body_2.adb:4:08: aspect "Refined_Depends" cannot apply to a subunit error-spec_stub_body_3.adb:4:08: aspect "Refined_Global" cannot apply to a subunit error-spec_stub_body_3.adb:5:08: aspect "Refined_Depends" cannot apply to a subunit error-stub_body.adb:4:08: aspect "Refined_Global" cannot apply to a subunit error-stub_body.adb:5:08: aspect "Refined_Depends" cannot apply to a subunit Tested on x86_64-pc-linux-gnu, committed on trunk 2014-02-25 Hristian Kirtchev * exp_ch6.adb (Add_Or_Save_Precondition): New routine. (Collect_Body_Postconditions_In_Decls): New routine. (Collect_Body_Postconditions_Of_Kind): Factor out code. Handle postcondition aspects or pragmas that appear on a subprogram body stub. (Collect_Spec_Preconditions): Factor out code. Handle precondition aspects or pragmas that appear on a subprogram body stub. * sem_ch6.adb (Analyze_Subprogram_Body_Helper): The analysis of aspects that apply to a subprogram body stub is no longer delayed, the aspects are analyzed on the spot. (SPARK_Aspect_Error): Aspects that apply to a subprogram declaration cannot appear in a subunit. * sem_ch10.adb Remove with and use clause for Sem_Ch13. (Analyze_Proper_Body): Add local variable Comp_Unit. Unum is now a local variable. Code cleanup. Analysis related to the aspects of a subprogram body stub is now carried out by Analyze_Subprogram_Body_Helper. Do not propagate the aspects and/or pragmas of a subprogram body stub to the proper body as this is no longer needed. Do not analyze the aspects of a subprogram stub when the corresponding source unit is missing. (Analyze_Protected_Body_Stub): Flag the illegal use of aspects on a stub. (Analyze_Task_Body_Stub): Flag the illegal use of a
[Ada] Implement new pragma Warning_As_Error
This implements a new pragma Warning_As_Error which can be used to specify that selected warnings are to be treated as errors. See new documentation in GNAT RM for full details. The pragma can appear either in a global configuration pragma file (e.g. gnat.adc), or at the start of a file. Given a global configuration pragma file containing: pragma Warning_As_Error ("[-gnatwj]"); which will treat all obsolescent feature warnings as errors, the following program compiles as shown (compile options here are @option{-gnatwa.e -gnatld7 -gnatj60}). 1. pragma Warning_As_Error ("*never assigned*"); 2. function Warnerr return String is 3.X : Integer; | >>> warning(error): variable "X" is never read and never assigned [-gnatwv] 4.Y : Integer; | >>> warning: variable "Y" is assigned but never read [-gnatwu] 5. 6. begin 7.Y := 0; 8.return %ABC%; | >>> warning(error): use of "%" is an obsolescent feature (RM J.2(4)), use """ instead [-gnatwj] 9. end; 9 lines: No errors, 3 warnings (2 treated as errors) Tested on x86_64-pc-linux-gnu, committed on trunk 2014-02-25 Robert Dewar * atree.ads (Warnings_Treated_As_Errors): New variable. * errout.adb (Error_Msg_Internal): Set Warn_Err flag in error object (Initialize): Initialize Warnings_As_Errors_Count (Write_Error_Summary): Include count of warnings treated as errors. * erroutc.adb (Warning_Treated_As_Error): New function. (Matches): Function moved to outer level of package. * erroutc.ads (Error_Msg_Object): Add Warn_Err flag. (Warning_Treated_As_Error): New function. * gnat_rm.texi: Document pragma Treat_Warning_As_Error. * opt.adb: Add handling of Warnings_As_Errors_Count[_Config]. * opt.ads (Config_Switches_Type): Add entry for Warnings_As_Errors_Count. (Warnings_As_Errors_Count): New variable. (Warnings_As_Errors): New array. * par-prag.adb: Add dummy entry for Warning_As_Error. * sem_prag.adb (Analyze_Pragma): Implement new pragma Warning_As_Error. * snames.ads-tmpl: Add entries for Warning_As_Error pragma. Index: gnat_rm.texi === --- gnat_rm.texi(revision 208144) +++ gnat_rm.texi(working copy) @@ -275,6 +275,7 @@ * Pragma Use_VADS_Size:: * Pragma Validity_Checks:: * Pragma Volatile:: +* Pragma Warning_As_Error:: * Pragma Warnings:: * Pragma Weak_External:: * Pragma Wide_Character_Encoding:: @@ -1109,6 +1110,7 @@ * Pragma Use_VADS_Size:: * Pragma Validity_Checks:: * Pragma Volatile:: +* Pragma Warning_As_Error:: * Pragma Warnings:: * Pragma Weak_External:: * Pragma Wide_Character_Encoding:: @@ -7557,6 +7559,80 @@ implementation of pragma Volatile is upwards compatible with the implementation in DEC Ada 83. +@node Pragma Warning_As_Error +@unnumberedsec Pragma Warning_As_Error +@findex Warning_As_Error +@noindent +Syntax: + +@smallexample @c ada +pragma Warning_As_Error (static_string_EXPRESSION); +@end smallexample + +@noindent +This configuration pragma allows the programmer to specify a set +of warnings that will be treated as errors. Any warning which +matches the pattern given by the pragma argument will be treated +as an error. This gives much more precise control that -gnatwe +which treats all warnings as errors. + +The pattern may contain asterisks, which match zero or more characters in +the message. For example, you can use +@code{pragma Warnings (Off, "*bits of*unused")} to suppress the warning +message @code{warning: 960 bits of "a" unused}. No other regular +expression notations are permitted. All characters other than asterisk in +these three specific cases are treated as literal characters in the match. +The match is case insensitive, for example XYZ matches xyz. + +Another possibility for the static_string_EXPRESSION which works if +error tags are enabled (@option{-gnatw.e}) is to use the tag string +preceded by a space, +as shown in the example below. + +The pragma can appear either in a global configuration pragma file +(e.g. @file{gnat.adc}), or at the start of a file. Given a global +configuration pragma file containing: + +@smallexample @c ada +pragma Warning_As_Error (" [-gnatwj]"); +@end smallexample + +@noindent +which will treat all obsolescent feature warnings as errors, the +following program compiles as shown (compile options here are +@option{-gnatwa.e -gnatld7 -gnatj60}). + +@smallexample @c ada + 1. pragma Warning_As_Error ("*never assigned*"); + 2. function Warnerr return String is + 3.X : Integer; + | +>>> warning(error): variable "X" is never read and +never assigned [-gnatwv] + + 4.Y : Integer; + | +>>> warning: variable "Y" is assigned but never +
[PATCH][AARCH64]combine "ubfiz" and "orr" with bfi when certain condition meets.
Hi all, This is an optimization patch which will combine "ubfiz" and "orr" insns with a single "bfi" when certain conditions meet. tmp = (x & m) | ( (y & n) << lsb) can be presented using and tmp, x, m bfi tmp, y, #lsb, #width if ((n+1) == 2^width) && (m & n << lsb) == 0. The original codegen is ubfiz tmp1, y, #lsb, #width and tmp, x, m orr tmp, tmp1, tmp A small test case is also added to verify it. Is this Okay for trunk? Kind regards, Renlin Li gcc/ChangeLog: 2014-02-25 Renlin Li * config/aarch64/aarch64.md (define_insn_and_split): New *combine_bfi2 and *combine_bfi3 insns. gcc/testsuite: 2014-02-25 Renlin Li * gcc.target/aarch64/combine-and-orr.c (New): New test case for this feature. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..2307f43 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3115,6 +3115,62 @@ [(set_attr "type" "bfm")] ) +(define_insn_and_split "*combine_bfi2" + [(set (match_operand:GPI 0 "register_operand" "=r") +(ior:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand 2 "const_int_operand" "n")) + (match_operand 3 "const_int_operand" "n")) + (zero_extend:GPI (match_operand:SHORT 4 "register_operand" "0"] + "exact_log2 ((INTVAL (operands[3]) >> INTVAL (operands[2])) + 1) >= 0 + && (INTVAL (operands[3]) & ((1 << INTVAL (operands[2])) - 1)) == 0 + && <= INTVAL (operands[2])" + "#" + "" + [(set (match_dup 0) +(zero_extend:GPI (match_dup 4))) + (set (zero_extract:GPI (match_dup 0 ) + (match_dup 3 ) + (match_dup 2 )) + (match_dup 1 ))] + "{ + int tmp = (INTVAL (operands[3]) >> INTVAL (operands[2])) + 1; + operands[3] = GEN_INT (exact_log2 (tmp)); + }" + [(set_attr "type" "bfm")] +) + +(define_insn_and_split "*combine_bfi3" + [(set (match_operand:GPI 0 "register_operand" "=r") +(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0") + (match_operand 2 "const_int_operand" "n")) + (and:GPI (ashift:GPI (match_operand:GPI 3 "register_operand" "r") + (match_operand 4 "const_int_operand" "n")) + (match_operand 5 "const_int_operand" "n"] + "exact_log2 ((INTVAL (operands[5]) >> INTVAL (operands[4])) + 1) >= 0 + && (INTVAL (operands[5]) & ((1 << INTVAL (operands[4])) - 1)) == 0 + && (INTVAL (operands[2]) & INTVAL (operands[5])) == 0" + "#" + "" + [(set (match_dup 0) +(and:GPI (match_dup 1) (match_dup 6))) + (set (zero_extract:GPI (match_dup 0 ) + (match_dup 5 ) + (match_dup 4 )) + (match_dup 3 ))] + "{ + int tmp = (INTVAL (operands[5]) >> INTVAL (operands[4])) + 1; + operands[5] = GEN_INT (exact_log2 (tmp)); + + enum machine_mode mode = GET_MODE (operands[0]); + operands[6] = can_create_pseudo_p () ? gen_reg_rtx (mode) : operands[0]; + if (!aarch64_bitmask_imm (INTVAL (operands[2]), mode)) +emit_move_insn (operands[6], operands[2]); + else +operands[6] = operands[2]; + }" + [(set_attr "type" "bfm")] +) + (define_insn "*extr_insv_lower_reg" [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r") (match_operand 1 "const_int_operand" "n") diff --git a/gcc/testsuite/gcc.target/aarch64/combine-and-orr.c b/gcc/testsuite/gcc.target/aarch64/combine-and-orr.c new file mode 100644 index 000..97d8d5d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/combine-and-orr.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fexpensive-optimizations" } */ + +unsigned int +foo1 (unsigned int major, unsigned int minor) +{ + unsigned int tmp = (minor & 0xff) | ((major & 0xfff) << 8); + return tmp; +} + +unsigned int +foo2 (unsigned int major, unsigned int minor) +{ + unsigned int tmp = (minor & 0x1f) | ((major & 0xfff) << 8); + return tmp; +} + +unsigned int +foo3 (unsigned int major, unsigned int minor) +{ + unsigned int tmp = (minor & 0x12) | ((major & 0xfff) << 5); + return tmp; +} + +/* { dg-final { scan-assembler-times "bfi\tw\[0-9\]+, w\[0-9\]+, 8|5, 12" 3} } */
[Ada] Better enforcement of No_Dynamic_Attachment/No_Abort_Statements
No_Dynamic_Attachment is now enforced in -gnatc mode, and includes checking for any use of any of the entities, including rename and access. No_Abort_Statements now checks for any use of Abort_Task, including renaming. The following test programs are compiled using -gnatc -gnatj55. 1. pragma Restrictions (No_Dynamic_Attachment); 2. with Ada.Interrupts; use Ada.Interrupts; 3. procedure NoDynAt is 4.X : Interrupt_ID := Interrupt_ID'First; 5.function XXX 6. (Interrupt : Interrupt_Id) return Boolean 7. renames Is_Attached; | >>> violation of restriction "NO_DYNAMIC_ATTACHMENT" at line 1 8.type M is access function 9. (Interrupt : Interrupt_Id) return Boolean; 10.MV : M := Is_Attached'Access; | >>> violation of restriction "NO_DYNAMIC_ATTACHMENT" at line 1 11. begin 12.if Ada.Interrupts.Is_Reserved (X) then | >>> violation of restriction "NO_DYNAMIC_ATTACHMENT" at line 1 13. null; 14.elsif Ada.Interrupts.Is_Attached (X) then | >>> violation of restriction "NO_DYNAMIC_ATTACHMENT" at line 1 15. null; 16.elsif XXX (X) then 17. null; 18.end if; 19. end NoDynAt; 1. pragma Restrictions (No_Abort_Statements); 2. with Ada.Task_Identification; 3. use Ada.Task_Identification; 4. procedure ATI_Abort is 5.procedure XXX (T : Task_Id) renames Abort_Task; | >>> violation of restriction "NO_ABORT_STATEMENTS" at line 1 6.procedure YYY (T : Task_Id); 7.procedure YYY (T : Task_Id) renames Abort_Task; | >>> violation of restriction "NO_ABORT_STATEMENTS" at line 1 8.type R is access procedure (T : Task_Id); 9.RV : R := Abort_Task'Access; | >>> violation of restriction "NO_ABORT_STATEMENTS" at line 1 10. begin 11.Abort_Task (Current_Task); | >>> violation of restriction "NO_ABORT_STATEMENTS" at line 1 12. end; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-02-25 Robert Dewar * rtsfind.adb (Is_RTE): Protect against entity with no scope field (previously this call blew up on the Standard entity). * sem_attr.adb (Analyze_Attribute, case Access): Remove test for No_Abort_Statements, this is now handled in Set_Entity_With_Checks. * exp_ch6.adb, sem_ch10.adb, sem_ch4.adb, sem_ch8.adb, sem_res.adb: Change name Set_Entity_With_Style_Check => Set_Entity_With_Checks. * sem_util.ads, sem_util.adb: Change name Set_Entity_With_Style_Check => Set_Entity_With_Checks. (Set_Entity_With_Checks): Add checks for No_Dynamic_Attachment, Add checks for No_Abort_Statements. Index: sem_ch10.adb === --- sem_ch10.adb(revision 208134) +++ sem_ch10.adb(working copy) @@ -6,7 +6,7 @@ -- -- -- B o d y -- -- -- --- Copyright (C) 1992-2013, Free Software Foundation, Inc. -- +-- Copyright (C) 1992-2014, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -2632,7 +2632,7 @@ -- to consider the unit as unreferenced if this is the only reference -- that occurs. - Set_Entity_With_Style_Check (Name (N), E_Name); + Set_Entity_With_Checks (Name (N), E_Name); Generate_Reference (E_Name, Name (N), 'w', Set_Ref => False); -- Generate references and check No_Dependence restriction for parents @@ -2657,7 +2657,7 @@ exit; end if; -Set_Entity_With_Style_Check (Pref, Par_Name); +Set_Entity_With_Checks (Pref, Par_Name); Generate_Reference (Par_Name, Pref); Check_Restriction_No_Dependence (Pref, N); @@ -2697,7 +2697,7 @@ -- Guard against missing or misspelled child units if Present (Par_Name) then -Set_Entity_With_Style_Check (Pref, Par_Name); +Set_Entity_With_Checks (Pref, Par_Name); Generate_Reference (Par_Name, Pref); else Index: rtsfind.adb === --- rtsfind.adb
Re: [PATCH][i386][AVX512] Match latest spec.
On 20 Feb 17:23, Uros Bizjak wrote: > On Thu, Feb 20, 2014 at 4:39 PM, Ilya Tocar wrote: > > > Latest version of AVX512 spec > > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf > > Has a few changes. > > 2)Currently for scatter/gather prefetches intrinsics we accept 1 as > > possible hint parameter. This is consistent with ICC. However as > > GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC > > (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces > > are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as > > hint. We can either change gather prefetches to accept 1 instead of 3 and > > hope that everyone will use _MM_HINT_T0 and not the raw value, or we can > > change _MM_HINT_T0 to be consistent with ICC. What solution do you > > prefer? > > Builtins, including __builtin_prefetch, are considered as internal > implementation detail, so we can pass to them wharever we like. The > published interface is in *.h files, and this includes _MM_HINT_T0. > For now, I suggest to change prefetches, so they will accept > _MM_HINT_T0, as this is the least invasive change. > Patch bellow changes prefetches to accept 3 (_MM_HINT_T0), and replaces all hint's values in tests with corresponding _MM_HINT. Testing passes. Ok for trunk? ChangeLog: 2014-02-25 Ilya Tocar * common/config/i386/predicates.md (const1256_operand): Remove. (const2356_operand): New. (const_1_to_2_operand): Remove. * config/i386/sse.md (avx512pf_gatherpfsf): Change hint value. (*avx512pf_gatherpfsf_mask): Ditto. (*avx512pf_gatherpfsf): Ditto. (avx512pf_gatherpfdf): Ditto. (*avx512pf_gatherpfdf_mask): Ditto. (*avx512pf_gatherpfdf): Ditto. (avx512pf_scatterpfsf): Ditto. (*avx512pf_scatterpfsf_mask): Ditto. (*avx512pf_scatterpfsf): Ditto. (avx512pf_scatterpfdf): Ditto. (*avx512pf_scatterpfdf_mask): Ditto. (*avx512pf_scatterpfdf): Ditto. * common/config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET0. And for tests: 2014-02-25 Ilya Tocar * gcc.target/i386/avx-1.c: Use _MM_HINT_T0 in __builtin_ia32_gatherpfdps, __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdps, __builtin_ia32_scatterpfqps, __builtin_ia32_gatherpfdpd, __builtin_ia32_gatherpfqpd, __builtin_ia32_scatterpfdpd, __builtin_ia32_scatterpfqpd. * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Use enum values instead of raw ints. * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. --- gcc/config/i386/predicates.md | 11 ++ gcc/config/i386/sse.md | 40 +++--- gcc/config/i386/xmmintrin.h| 1 + gcc/testsuite/gcc.target/i386/avx-1.c | 16 - .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf0dps-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf0qps-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1dps-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c | 2 +- .../gcc.target/i386/avx512pf-vgatherpf1qps-1.c | 2 +- .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf0dps-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf0qps-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1dps-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c| 4 +-- .../gcc.target/i386/avx512pf-vscatterpf1qps-1.c| 4 +-- gcc/testsuite/gcc.target/i386/sse-14.c | 16 - gcc/testsuite/gcc.target/i386/sse-22.c | 18 +- gcc/testsuite
Re: [PATCH] Fix typo and miswordings in three error messages
On 2014-02-06 21:18, Benno Schulenberg wrote: > Updating a bit the Dutch translations of GCC's messages, > I noticed the following mistakes in three msgids: > > "only displayed one" ==> "displayed only once" > "none class-method" ==> "non-class method" > "incorect" ==> "incorrect" Ping? > Below patch fixes those. I'm not entirely sure about > the second fix, but the "none" doesn't make any sense. > When found okay, please apply. > > > 2014-02-05 Benno Schulenberg > > * gcov.c (find_source): Fix miswording in error message. > * config/i386/i386.c (ix86_handle_cconv_attribute): Likewise. > (ix86_expand_sse_comi_round): Fix typo in error message. > > > Index: gcov.c > === > --- gcov.c(revision 207551) > +++ gcov.c(working copy) > @@ -1141,7 +1141,7 @@ >if (!info_emitted) > { > fnotice (stderr, > -"(the message is only displayed one per source file)\n"); > +"(the message is displayed only once per source file)\n"); > info_emitted = 1; > } >sources[idx].file_time = 0; > Index: config/i386/i386.c > === > --- config/i386/i386.c(revision 207551) > +++ config/i386/i386.c(working copy) > @@ -5446,7 +5446,7 @@ >else if (is_attribute_p ("thiscall", name)) > { >if (TREE_CODE (*node) != METHOD_TYPE && pedantic) > - warning (OPT_Wattributes, "%qE attribute is used for none class-method", > + warning (OPT_Wattributes, "%qE attribute is used for non-class method", >name); >if (lookup_attribute ("stdcall", TYPE_ATTRIBUTES (*node))) > { > @@ -34230,7 +34230,7 @@ > } >if (INTVAL (op2) < 0 || INTVAL (op2) >= 32) > { > - error ("incorect comparison mode"); > + error ("incorrect comparison mode"); >return const0_rtx; > } >
Re: [PATCH] Fix more typos in error messages
On 2014-02-07 14:48, Benno Schulenberg wrote: > The below fixes some more typos in GCC's error messages. > When found okay, please apply. Ping? > 2014-02-07 Benno Schulenberg > > * config/arc/arc.c (arc_init): Fix typo in error message. > * config/i386/i386.c (ix86_expand_builtin): Likewise. > (split_stack_prologue_scratch_regno): Likewise. > * fortran/check.c (gfc_check_fn_rc2008): Remove duplicate > word from error message. > > > Index: gcc/fortran/check.c > === > --- gcc/fortran/check.c (revision 207597) > +++ gcc/fortran/check.c (working copy) > @@ -1736,7 +1736,7 @@ > return false; > >if (a->ts.type == BT_COMPLEX > - && !gfc_notify_std (GFC_STD_F2008, "COMPLEX argument '%s' " > + && !gfc_notify_std (GFC_STD_F2008, "COMPLEX '%s' " > "argument of '%s' intrinsic at %L", > gfc_current_intrinsic_arg[0]->name, > gfc_current_intrinsic, &a->where)) > Index: gcc/config/i386/i386.c > === > --- gcc/config/i386/i386.c(revision 207597) > +++ gcc/config/i386/i386.c(working copy) > @@ -11804,7 +11804,7 @@ > if (regparm >= 2) > { > sorry ("-fsplit-stack does not support 2 register " > - " parameters for a nested function"); > + "parameters for a nested function"); > return INVALID_REGNUM; > } > return DX_REG; > @@ -36006,7 +36006,7 @@ > >if (!insn_data[icode].operand[3].predicate (op3, mode3)) > { > - error ("the forth argument must be scale 1, 2, 4, 8"); > + error ("the fourth argument must be scale 1, 2, 4, 8"); > return const0_rtx; > } > > Index: gcc/config/arc/arc.c > === > --- gcc/config/arc/arc.c (revision 207597) > +++ gcc/config/arc/arc.c (working copy) > @@ -746,7 +746,7 @@ >error ("-mmul32x16 supported only for ARC600 or ARC601"); > >if (!TARGET_DPFP && TARGET_DPFP_DISABLE_LRSR) > - error ("-mno-dpfp-lrsr suppforted only with -mdpfp"); > + error ("-mno-dpfp-lrsr supported only with -mdpfp"); > >/* FPX-1. No fast and compact together. */ >if ((TARGET_DPFP_FAST_SET && TARGET_DPFP_COMPACT_SET) >
Re: [PATCH, rs6000] Canonicalize split for unordered vector compares
On Tue, Feb 25, 2014 at 10:15 AM, Bill Schmidt wrote: > Hi David, > > Thanks. I have this upstream for mainline now. This problem appears to > have been introduced in GCC 4.6. Is it ok to backport this fix to the > FSF 4.7 and 4.8 branches? This is okay to backport. Thanks, David
Re: [PATCH][i386][AVX512] Match latest spec.
On Tue, Feb 25, 2014 at 5:04 PM, Ilya Tocar wrote: >> > Latest version of AVX512 spec >> > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf >> > Has a few changes. > >> > 2)Currently for scatter/gather prefetches intrinsics we accept 1 as >> > possible hint parameter. This is consistent with ICC. However as >> > GCC defines _MM_HINT_T0 to 3 and not to 1 as ICC >> > (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56603), gather prefethces >> > are inconsistent with normal prefetches as they won't accept _MM_HINT_T0 as >> > hint. We can either change gather prefetches to accept 1 instead of 3 and >> > hope that everyone will use _MM_HINT_T0 and not the raw value, or we can >> > change _MM_HINT_T0 to be consistent with ICC. What solution do you >> > prefer? >> >> Builtins, including __builtin_prefetch, are considered as internal >> implementation detail, so we can pass to them wharever we like. The >> published interface is in *.h files, and this includes _MM_HINT_T0. >> For now, I suggest to change prefetches, so they will accept >> _MM_HINT_T0, as this is the least invasive change. >> > Patch bellow changes prefetches to accept 3 (_MM_HINT_T0), > and replaces all hint's values in tests with corresponding _MM_HINT. > Testing passes. Ok for trunk? > > ChangeLog: > > 2014-02-25 Ilya Tocar > > * common/config/i386/predicates.md (const1256_operand): Remove. > (const2356_operand): New. > (const_1_to_2_operand): Remove. > * config/i386/sse.md (avx512pf_gatherpfsf): Change hint value. > (*avx512pf_gatherpfsf_mask): Ditto. > (*avx512pf_gatherpfsf): Ditto. > (avx512pf_gatherpfdf): Ditto. > (*avx512pf_gatherpfdf_mask): Ditto. > (*avx512pf_gatherpfdf): Ditto. > (avx512pf_scatterpfsf): Ditto. > (*avx512pf_scatterpfsf_mask): Ditto. > (*avx512pf_scatterpfsf): Ditto. > (avx512pf_scatterpfdf): Ditto. > (*avx512pf_scatterpfdf_mask): Ditto. > (*avx512pf_scatterpfdf): Ditto. > * common/config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET0. > > And for tests: > > 2014-02-25 Ilya Tocar > > * gcc.target/i386/avx-1.c: Use _MM_HINT_T0 in > __builtin_ia32_gatherpfdps, > __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdps, > __builtin_ia32_scatterpfqps, __builtin_ia32_gatherpfdpd, > __builtin_ia32_gatherpfqpd, __builtin_ia32_scatterpfdpd, > __builtin_ia32_scatterpfqpd. > * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Use enum values instead > of raw ints. > * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto. > * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto. > * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto. > * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto. > * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto. > * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto. > * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto. > * gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto. > * gcc.target/i386/sse-14.c: Ditto. > * gcc.target/i386/sse-22.c: Ditto. > * gcc.target/i386/sse-23.c: Ditto. OK for mainline with a small change below. > --- a/gcc/config/i386/xmmintrin.h > +++ b/gcc/config/i386/xmmintrin.h > @@ -55,6 +55,7 @@ enum _mm_hint > { >/* _MM_HINT_ET is _MM_HINT_T with set 3rd bit. */ >_MM_HINT_ET1 = 6, > + _MM_HINT_ET0 = 5, Please put new hint above HINT_ET1, to be consistent with the part below. >_MM_HINT_T0 = 3, >_MM_HINT_T1 = 2, >_MM_HINT_T2 = 1, Uros.
[AArch64] 64-bit float vreinterpret implemention
Hi, This patch introduces vreinterpret implementation for 64-bit float vectors intrinsics and adds testcase for them. This patch tested on LE or BE with no regressions. Is this patch ok for stage-1? Thanks, Alex gcc/ 2014-02-14 Alex Velenko * config/aarch64/aarch64-builtins.c (aarch64_types_su_qualifiers): Qualifier added. (aarch64_types_sp_qualifiers): Likewise. (aarch64_types_us_qualifiers): Likewise. (aarch64_types_ps_qualifiers): Likewise. (TYPES_REINTERP_SS): Type macro added. (TYPES_REINTERP_SU): Likewise. (TYPES_REINTERP_SP): Likewise. (TYPES_REINTERP_US): Likewise. (TYPES_REINTERP_PS): Likewise. * config/aarch64/aarch64-simd-builtins.def (REINTERP): Declarations removed. (REINTERP_SS): Declarations added. (REINTERP_US): Likewise. (REINTERP_PS): Likewise. (REINTERP_SU): Likewise. (REINTERP_SP): Likewise. * config/aarch64/arm_neon.h (vreinterpret_p8_f64): Implemented. (vreinterpretq_p8_f64): Likewise. (vreinterpret_p16_f64): Likewise. (vreinterpretq_p16_f64): Likewise. (vreinterpret_f32_f64): Likewise. (vreinterpretq_f32_f64): Likewise. (vreinterpret_f64_f32): Likewise. (vreinterpret_f64_p8): Likewise. (vreinterpret_f64_p16): Likewise. (vreinterpret_f64_s8): Likewise. (vreinterpret_f64_s16): Likewise. (vreinterpret_f64_s32): Likewise. (vreinterpret_f64_s64): Likewise. (vreinterpret_f64_u8): Likewise. (vreinterpret_f64_u16): Likewise. (vreinterpret_f64_u32): Likewise. (vreinterpret_f64_u64): Likewise. (vreinterpretq_f64_f32): Likewise. (vreinterpretq_f64_p8): Likewise. (vreinterpretq_f64_p16): Likewise. (vreinterpretq_f64_s8): Likewise. (vreinterpretq_f64_s16): Likewise. (vreinterpretq_f64_s32): Likewise. (vreinterpretq_f64_s64): Likewise. (vreinterpretq_f64_u8): Likewise. (vreinterpretq_f64_u16): Likewise. (vreinterpretq_f64_u32): Likewise. (vreinterpretq_f64_u64): Likewise. (vreinterpret_s64_f64): Likewise. (vreinterpretq_s64_f64): Likewise. (vreinterpret_u64_f64): Likewise. (vreinterpretq_u64_f64): Likewise. (vreinterpret_s8_f64): Likewise. (vreinterpretq_s8_f64): Likewise. (vreinterpret_s16_f64): Likewise. (vreinterpretq_s16_f64): Likewise. (vreinterpret_s32_f64): Likewise. (vreinterpretq_s32_f64): Likewise. (vreinterpret_u8_f64): Likewise. (vreinterpretq_u8_f64): Likewise. (vreinterpret_u16_f64): Likewise. (vreinterpretq_u16_f64): Likewise. (vreinterpret_u32_f64): Likewise. (vreinterpretq_u32_f64): Likewise. gcc/testsuite/ 2014-02-14 Alex Velenko * gcc.target/aarch64/vreinterpret_f64_1.c: new_testcase diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 5e0e9b94653deb1530955d62d9842c39da95058a..0485447d266fd7542d66f01f2d4d4cbc37177079 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -147,6 +147,23 @@ aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned }; #define TYPES_UNOPU (aarch64_types_unopu_qualifiers) #define TYPES_CREATE (aarch64_types_unop_qualifiers) +#define TYPES_REINTERP_SS (aarch64_types_unop_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_unop_su_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned }; +#define TYPES_REINTERP_SU (aarch64_types_unop_su_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_unop_sp_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_poly }; +#define TYPES_REINTERP_SP (aarch64_types_unop_sp_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_unop_us_qualifiers[SIMD_MAX_BUILTIN_ARGS] += { qualifier_unsigned, qualifier_none }; +#define TYPES_REINTERP_US (aarch64_types_unop_us_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_unop_ps_qualifiers[SIMD_MAX_BUILTIN_ARGS] += { qualifier_poly, qualifier_none }; +#define TYPES_REINTERP_PS (aarch64_types_unop_ps_qualifiers) static enum aarch64_type_qualifiers aarch64_types_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_maybe_immediate }; diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 8a3d7ecbbfc7743310da3f46a03f42a524302c9f..82aceedb4ec3c639df504aaeff9a54a174b6acf8 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -51,6 +51,28 @@ VAR1 (GETLANE, get_lane, 0, di) BUILTIN_VALL (GETLANE, be_checked_get_lane, 0) + VAR1 (REINTERP_SS, reinterpretdi, 0, df) + VAR1 (REINTERP_SS, reinterpretv8qi, 0, df) + VAR1 (REINT
Re: [PATCH][AARCH64]combine "ubfiz" and "orr" with bfi when certain condition meets.
On 02/25/2014 07:56 AM, Renlin Li wrote: > +(define_insn_and_split "*combine_bfi3" > + [(set (match_operand:GPI 0 "register_operand" "=r") > +(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0") > + (match_operand 2 "const_int_operand" "n")) > + (and:GPI (ashift:GPI (match_operand:GPI 3 > "register_operand" "r") > + (match_operand 4 "const_int_operand" > "n")) > + (match_operand 5 "const_int_operand" "n"] > + "exact_log2 ((INTVAL (operands[5]) >> INTVAL (operands[4])) + 1) >= 0 > + && (INTVAL (operands[5]) & ((1 << INTVAL (operands[4])) - 1)) == 0 > + && (INTVAL (operands[2]) & INTVAL (operands[5])) == 0" > + "#" > + "" > + [(set (match_dup 0) > +(and:GPI (match_dup 1) (match_dup 6))) > + (set (zero_extract:GPI (match_dup 0 ) > + (match_dup 5 ) > + (match_dup 4 )) > + (match_dup 3 ))] > + "{ Don't use quotes and braces. Just use braces. Watch the extra space before close parenthesis, all over the place. > + int tmp = (INTVAL (operands[5]) >> INTVAL (operands[4])) + 1; > + operands[5] = GEN_INT (exact_log2 (tmp)); > + > + enum machine_mode mode = GET_MODE (operands[0]); You know from the pattern that "GET_MODE (operands[0])" is "mode", a compile-time constant. > + operands[6] = can_create_pseudo_p () ? gen_reg_rtx (mode) : > operands[0]; > + if (!aarch64_bitmask_imm (INTVAL (operands[2]), mode)) > +emit_move_insn (operands[6], operands[2]); > + else > +operands[6] = operands[2]; When aarch64_bitmask_imm is true, you're creating a pseudo that you don't use. I don't see how operands[0] can be unconditionally overwritten. Why couldn't it overlap with operands[1] or operands[3]? Positive tests are easier to follow than negative tests, given a choice. I'm thinking this should be more like if (aarch64_bitmask_imm (INTVAL (operands[2]), mode)) operands[6] = operands[2]; else if (can_create_pseudo_p ()) { operands[6] = gen_reg_rtx (mode); emit_move_insn (operands[6], operands[2]); } else FAIL; Alternately, you could decline to handle non-aarch64_bitmask_imm constants by using aarch64_logical_immediate as the predicate for operands[2]. Which would make all this code go away. > + }" > + [(set_attr "type" "bfm")] > +) Surely "multiple" is better for a force-split insn. r~
Re: [PATCH 4/n] Add conditional compare support - test cases
On Feb 24, 2014, at 2:04 AM, Zhenqiang Chen wrote: > >> all of your testcases have mixed C++/C comments for the dg-do line. >> Please use pure C comments instead. > > Thanks for the comments! Updated. Ok.
Re: [PATCH/AARCH64 5/6] Fix TLS for ILP32.
On 02/25/14 01:23, Andrew Pinski wrote: On Wed, Dec 4, 2013 at 10:12 AM, Yufeng Zhang wrote: On 12/03/13 21:24, Andrew Pinski wrote: [snip] diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 313517f..08fcc94 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3577,35 +3577,85 @@ [(set_attr "type" "call") (set_attr "length" "16")]) -(define_insn "tlsie_small" - [(set (match_operand:DI 0 "register_operand" "=r") -(unspec:DI [(match_operand:DI 1 "aarch64_tls_ie_symref" "S")] +(define_expand "tlsie_small" + [(set (match_operand 0 "register_operand" "=r") +(unspec [(match_operand 1 "aarch64_tls_ie_symref" "S")] + UNSPEC_GOTSMALLTLS))] + "" +{ + if (TARGET_ILP32) +{ + operands[0] = gen_lowpart (ptr_mode, operands[0]); + emit_insn (gen_tlsie_small_si (operands[0], operands[1])); +} + else +emit_insn (gen_tlsie_small_di (operands[0], operands[1])); + DONE; +}) + +(define_insn "tlsie_small_" + [(set (match_operand:PTR 0 "register_operand" "=r") +(unspec:PTR [(match_operand 1 "aarch64_tls_ie_symref" "S")] UNSPEC_GOTSMALLTLS))] "" - "adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1]" + "adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1]" [(set_attr "type" "load1") (set_attr "length" "8")] ) -(define_insn "tlsle_small" - [(set (match_operand:DI 0 "register_operand" "=r") -(unspec:DI [(match_operand:DI 1 "register_operand" "r") - (match_operand:DI 2 "aarch64_tls_le_symref" "S")] + +(define_expand "tlsle_small" + [(set (match_operand 0 "register_operand" "=r") +(unspec [(match_operand 1 "register_operand" "r") + (match_operand 2 "aarch64_tls_le_symref" "S")] + UNSPEC_GOTSMALLTLS))] + "" +{ + if (TARGET_ILP32) +{ + rtx temp = gen_reg_rtx (ptr_mode); + operands[1] = gen_lowpart (ptr_mode, operands[1]); + emit_insn (gen_tlsle_small_si (temp, operands[1], operands[2])); + emit_move_insn (operands[0], gen_lowpart (GET_MODE (operands[0]), temp)); +} Looks like you hit the similar issue where the matched RTX can have either SImode or DImode in ILP32. The mechanism looks OK but I think the approach that 'add_losym' adopts is neater, which checks on the mode instead of TARGET_ILP32 and calls gen_add_losym_di or gen_add_losym_si accordingly. Note that the iterator used in add_losym_ is P instead of PTR. Yes I agree with this and will fix this. Same for tlsie_small above. But not with this one, tlsie_small should rather be similar to ldr_got_small instead. Agree; tlsie_small needs to handle the load of an SImode-sized item from an address having DImode. Yufeng
Re: [AArch64] 64-bit float vreinterpret implemention
On 02/25/2014 09:02 AM, Alex Velenko wrote: > +(define_expand "aarch64_reinterpretdf" > + [(match_operand:DF 0 "register_operand" "") > + (match_operand:VD_RE 1 "register_operand" "")] > + "TARGET_SIMD" > +{ > + aarch64_simd_reinterpret (operands[0], operands[1]); > + DONE; > +}) I believe you want to implement these in aarch64_fold_builtin to fold to a VIEW_CONVERT_EXPR. No sense in leaving these opaque until rtl expansion. r~
[jit] Ensure that locals make it into the generated debuginfo
Committed to branch dmalcolm/jit: With this commit, it's possible to inspect local variables in the debugger when stepping through a JIT-generated function. gcc/jit/ * internal-api.h (gcc::jit::playback::function): Add field m_inner_block. * internal-api.c (gcc::jit::playback::function::function): Create BLOCK here and link it to the BIND_EXPR. (gcc::jit::playback::function::gt_ggc_mx): Walk m_inner_block. (gcc::jit::playback::function::postprocess): Set up BLOCK_VARS on the block, so that the local variables make it into the debuginfo. --- gcc/jit/ChangeLog.jit | 11 +++ gcc/jit/internal-api.c | 12 ++-- gcc/jit/internal-api.h | 1 + 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index f59b258..40d445e 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,14 @@ +2014-02-25 David Malcolm + + * internal-api.h (gcc::jit::playback::function): Add field + m_inner_block. + + * internal-api.c (gcc::jit::playback::function::function): + Create BLOCK here and link it to the BIND_EXPR. + (gcc::jit::playback::function::gt_ggc_mx): Walk m_inner_block. + (gcc::jit::playback::function::postprocess): Set up BLOCK_VARS on + the block, so that the local variables make it into the debuginfo. + 2014-02-24 Philip Herron * Make-lang.in (jit.install-common): Implement. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 957edb7..43de7cd 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -2832,11 +2832,13 @@ function (context *ctxt, /* Create a BIND_EXPR, and within it, a statement list. */ m_stmt_list = alloc_stmt_list (); m_stmt_iter = tsi_start (m_stmt_list); + m_inner_block = make_node (BLOCK); m_inner_bind_expr = - build3 (BIND_EXPR, void_type_node, NULL, m_stmt_list, NULL); + build3 (BIND_EXPR, void_type_node, NULL, m_stmt_list, m_inner_block); } else { + m_inner_block = NULL; m_stmt_list = NULL; } } @@ -2848,6 +2850,7 @@ gt_ggc_mx () gt_ggc_m_9tree_node (m_inner_fndecl); gt_ggc_m_9tree_node (m_inner_bind_expr); gt_ggc_m_9tree_node (m_stmt_list); + gt_ggc_m_9tree_node (m_inner_block); } tree @@ -2904,11 +2907,16 @@ postprocess () if (m_kind != GCC_JIT_FUNCTION_IMPORTED) { /* Seem to need this in gimple-low.c: */ - DECL_INITIAL (m_inner_fndecl) = make_node (BLOCK); + gcc_assert (m_inner_block); + DECL_INITIAL (m_inner_fndecl) = m_inner_block; /* how to add to function? the following appears to be how to set the body of a m_inner_fndecl: */ DECL_SAVED_TREE(m_inner_fndecl) = m_inner_bind_expr; + + /* Ensure that locals appear in the debuginfo. */ + BLOCK_VARS (m_inner_block) = BIND_EXPR_VARS (m_inner_bind_expr); + //debug_tree (m_inner_fndecl); /* Convert to gimple: */ diff --git a/gcc/jit/internal-api.h b/gcc/jit/internal-api.h index 83bda17..0017f6c 100644 --- a/gcc/jit/internal-api.h +++ b/gcc/jit/internal-api.h @@ -1887,6 +1887,7 @@ private: private: tree m_inner_fndecl; + tree m_inner_block; tree m_inner_bind_expr; enum gcc_jit_function_kind m_kind; tree m_stmt_list; -- 1.7.11.7
C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
The primary bug under discussion in 53808 has been fixed separately, but it also pointed out that once devirtualization resolves the delete to use the bar destructor, we ought to be able to inline that destructor. So if we're devirtualizing, always add a virtual defaulted dtor to the list of functions to be synthesized. Tested x86_64-pc-linux-gnu, applying to trunk. commit 9f9f907732429f413e490be0fa969b72153fdd88 Author: Jason Merrill Date: Fri Mar 22 20:24:49 2013 -0400 PR c++/53808 * class.c (clone_function_decl): Call note_vague_linkage_fn for defaulted virtual dtor. diff --git a/gcc/cp/class.c b/gcc/cp/class.c index 97a1cc2..e861e4d 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -4582,6 +4582,10 @@ clone_function_decl (tree fn, int update_method_vec_p) destructor. */ if (DECL_VIRTUAL_P (fn)) { + if (DECL_DEFAULTED_FN (fn) && flag_devirtualize) + /* Make sure the destructor gets synthesized so that it can be + inlined after devirtualization. */ + note_vague_linkage_fn (fn); clone = build_clone (fn, deleting_dtor_identifier); if (update_method_vec_p) add_method (DECL_CONTEXT (clone), clone, NULL_TREE); diff --git a/gcc/testsuite/g++.dg/opt/devirt4.C b/gcc/testsuite/g++.dg/opt/devirt4.C new file mode 100644 index 000..26e8ee6 --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/devirt4.C @@ -0,0 +1,16 @@ +// PR c++/53808 +// Devirtualization + inlining should produce a non-virtual +// call to ~foo. +// { dg-options "-O -fdevirtualize" } +// { dg-final { scan-assembler "_ZN3fooD2Ev" } } + +struct foo { + virtual ~foo(); +}; +struct bar : public foo { + virtual void zed(); +}; +void f() { + foo *x(new bar); + delete x; +}
C++ PATCH for c++/60328 (alias template equivalence)
My initial implementation of alias template equivalence failed to handle the case here, of a nested alias being equivalent to its enclosing class template. Fixed by implementing the rules in the standard more directly. Tested x86_64-pc-linux-gnu, applying to trunk. commit f7c8a08b6ee7f9229be75efa909d6673c77a4fd3 Author: Jason Merrill Date: Tue Feb 25 11:15:36 2014 -0500 DR 1286 PR c++/60328 * pt.c (get_underlying_template): Fix equivalence calculation. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index bd59142..4a9fa71 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -5185,9 +5185,12 @@ get_underlying_template (tree tmpl) tree sub = TYPE_TI_TEMPLATE (result); if (PRIMARY_TEMPLATE_P (sub) && (num_innermost_template_parms (tmpl) - == num_innermost_template_parms (sub)) - && same_type_p (result, TREE_TYPE (sub))) + == num_innermost_template_parms (sub))) { + tree alias_args = INNERMOST_TEMPLATE_ARGS + (template_parms_to_args (DECL_TEMPLATE_PARMS (tmpl))); + if (!comp_template_args (TYPE_TI_ARGS (result), alias_args)) + break; /* The alias type is equivalent to the pattern of the underlying template, so strip the alias. */ tmpl = sub; diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1286b.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1286b.C new file mode 100644 index 000..fef9818 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1286b.C @@ -0,0 +1,12 @@ +// PR c++/60328 +// { dg-require-effective-target c++11 } + +template +struct Foo +{ + template + using Bar = Foo<_TT, _RR...>; + + using Normal = Foo<_Rest...>; + using Fail = Bar<_Rest...>; +};
[jit] New API entrypoint: gcc_jit_function_add_void_return
Committed to branch dmalcolm/jit: gcc/jit/ * libgccjit.h (gcc_jit_function_add_void_return): New. * libgccjit.map (gcc_jit_function_add_void_return): New. * libgccjit.c (gcc_jit_function_add_void_return): New. * libgccjit++.h (add_return): Add overloaded variant with no rvalue, calling gcc_jit_function_add_void_return. * internal-api.c (gcc::jit::recording::function::add_return): Add comment that rvalue could be NULL. (gcc::jit::playback::function::add_return): Support rvalue being NULL. gcc/testsuite/ * jit.dg/test-functions.c (create_use_of_void_return): New, to add test coverage for gcc_jit_function_add_void_return. (verify_void_return): Likewise. (create_code): Add call to create_use_of_void_return. (verify_code): Add call to verify_void_return. --- gcc/jit/ChangeLog.jit | 13 gcc/jit/internal-api.c| 31 +++--- gcc/jit/libgccjit++.h | 8 + gcc/jit/libgccjit.c | 17 ++ gcc/jit/libgccjit.h | 11 +++ gcc/jit/libgccjit.map | 1 + gcc/testsuite/ChangeLog.jit | 8 + gcc/testsuite/jit.dg/test-functions.c | 62 +++ 8 files changed, 139 insertions(+), 12 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 40d445e..de352fa 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,18 @@ 2014-02-25 David Malcolm + * libgccjit.h (gcc_jit_function_add_void_return): New. + * libgccjit.map (gcc_jit_function_add_void_return): New. + * libgccjit.c (gcc_jit_function_add_void_return): New. + * libgccjit++.h (add_return): Add overloaded variant with no + rvalue, calling gcc_jit_function_add_void_return. + + * internal-api.c (gcc::jit::recording::function::add_return): Add + comment that rvalue could be NULL. + (gcc::jit::playback::function::add_return): Support rvalue being + NULL. + +2014-02-25 David Malcolm + * internal-api.h (gcc::jit::playback::function): Add field m_inner_block. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 43de7cd..1cb4717 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -1309,6 +1309,9 @@ void recording::function::add_return (recording::location *loc, recording::rvalue *rvalue) { + /* This is used by both gcc_jit_function_add_return and + gcc_jit_function_add_void_return; rvalue will be non-NULL for + the former and NULL for the latter. */ statement *result = new return_ (this, loc, rvalue); m_ctxt->record (result); m_activity.safe_push (result); @@ -3114,22 +3117,26 @@ add_return (location *loc, { gcc_assert (m_kind != GCC_JIT_FUNCTION_IMPORTED); + tree modify_retval = NULL; tree return_type = TREE_TYPE (TREE_TYPE (m_inner_fndecl)); - tree t_lvalue = DECL_RESULT (m_inner_fndecl); - tree t_rvalue = rvalue->as_tree (); - if (TREE_TYPE (t_rvalue) != TREE_TYPE (t_lvalue)) -t_rvalue = build1 (CONVERT_EXPR, - TREE_TYPE (t_lvalue), - t_rvalue); - tree modify_retval = build2 (MODIFY_EXPR, return_type, - t_lvalue, t_rvalue); + if (rvalue) +{ + tree t_lvalue = DECL_RESULT (m_inner_fndecl); + tree t_rvalue = rvalue->as_tree (); + if (TREE_TYPE (t_rvalue) != TREE_TYPE (t_lvalue)) + t_rvalue = build1 (CONVERT_EXPR, + TREE_TYPE (t_lvalue), + t_rvalue); + modify_retval = build2 (MODIFY_EXPR, return_type, + t_lvalue, t_rvalue); + if (loc) + set_tree_location (modify_retval, loc); +} tree return_stmt = build1 (RETURN_EXPR, return_type, modify_retval); if (loc) -{ - set_tree_location (modify_retval, loc); - set_tree_location (return_stmt, loc); -} +set_tree_location (return_stmt, loc); + add_stmt (return_stmt); } diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h index db01053..95a8c71 100644 --- a/gcc/jit/libgccjit++.h +++ b/gcc/jit/libgccjit++.h @@ -329,6 +329,7 @@ namespace gccjit void add_return (rvalue rvalue, location loc = location ()); +void add_return (location loc = location ()); /* A way to add a function call to the body of a function being defined, with various numbers of args. */ @@ -1213,6 +1214,13 @@ function::add_return (rvalue rvalue, rvalue.get_inner_rvalue ()); } +inline void +function::add_return (location loc) +{ + gcc_jit_function_add_void_return (get_inner_function (), + loc.get_inner_location ()); +} + inline rvalue function::add_call (function other, location loc) d
hard-reg-set.h replace #else #if by #elif
Not sure if this is a good idea, I thought it would be better to replace #else #if by #elif. * hard-reg-set.h: Replace #else #if by #elif. Bootstrapped on x86_64-unknown-linux-gnu. Ok for trunk ? Thanks and Regards, Prathamesh Index: gcc/hard-reg-set.h === --- gcc/hard-reg-set.h (revision 208111) +++ gcc/hard-reg-set.h (working copy) @@ -221,8 +221,7 @@ hard_reg_set_empty_p (const HARD_REG_SET return x[0] == 0 && x[1] == 0; } -#else -#if FIRST_PSEUDO_REGISTER <= 3*HOST_BITS_PER_WIDEST_FAST_INT +#elif FIRST_PSEUDO_REGISTER <= 3*HOST_BITS_PER_WIDEST_FAST_INT #define CLEAR_HARD_REG_SET(TO) \ do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ scan_tp_[0] = 0; \ @@ -299,8 +298,7 @@ hard_reg_set_empty_p (const HARD_REG_SET return x[0] == 0 && x[1] == 0 && x[2] == 0; } -#else -#if FIRST_PSEUDO_REGISTER <= 4*HOST_BITS_PER_WIDEST_FAST_INT +#elif FIRST_PSEUDO_REGISTER <= 4*HOST_BITS_PER_WIDEST_FAST_INT #define CLEAR_HARD_REG_SET(TO) \ do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ scan_tp_[0] = 0; \ @@ -483,8 +481,6 @@ hard_reg_set_empty_p (const HARD_REG_SET #endif #endif -#endif -#endif /* Iterator for hard register sets. */
C++ PATCH for c++/55877 (names for linkage purposes)
My earlier patch didn't go far enough; when we apply a name for linkage purposes and fix up the visibility of the type, we also need to fix up the visibility and names of any members and implementation bits. At first I thought we would need to deal with static data members as well as functions, but it turns out that an unnamed class can't have static data members. So the second patch enforces that better. Tested x86_64-pc-linux-gnu, applying to trunk. commit f32d03db15009c0ee07b33bda6083a163bf87fa7 Author: Jason Merrill Date: Fri Jan 31 17:39:17 2014 -0500 PR c++/55877 * decl2.c (no_linkage_error): Handle C++98 semantics. (reset_type_linkage): Move from decl.c. (reset_type_linkage_1, reset_type_linkage_2, bt_reset_linkage_1) (bt_reset_linkage_2, reset_decl_linkage): New. (tentative_decl_linkage): Factor out of expand_or_defer_fn_1. (cp_write_global_declarations): Move condition into no_linkage_error. * decl.c (grokfndecl, grokvardecl): Use no_linkage_error. * semantics.c (expand_or_defer_fn_1): Factor out tentative_decl_linkage. * cp-tree.h: Adjust. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 7681b27..3db18f3 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5313,12 +5313,15 @@ extern tree coerce_delete_type (tree); extern void comdat_linkage (tree); extern void determine_visibility (tree); extern void constrain_class_visibility (tree); +extern void reset_type_linkage (tree); +extern void tentative_decl_linkage (tree); extern void import_export_decl (tree); extern tree build_cleanup (tree); extern tree build_offset_ref_call_from_tree (tree, vec **, tsubst_flags_t); extern bool decl_constant_var_p (tree); extern bool decl_maybe_constant_var_p (tree); +extern void no_linkage_error (tree); extern void check_default_args (tree); extern bool mark_used(tree); extern bool mark_used (tree, tsubst_flags_t); diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 04c4cf5..db86d97 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -7569,29 +7569,7 @@ grokfndecl (tree ctype, declare an entity with linkage. DR 757 relaxes this restriction for C++0x. */ - t = no_linkage_check (TREE_TYPE (decl), - /*relaxed_p=*/false); - if (t) - { - if (TYPE_ANONYMOUS_P (t)) - { - if (DECL_EXTERN_C_P (decl)) - /* Allow this; it's pretty common in C. */; - else - { - permerror (input_location, "anonymous type with no linkage " - "used to declare function %q#D with linkage", - decl); - if (DECL_ORIGINAL_TYPE (TYPE_NAME (t))) - permerror (input_location, "%q+#D does not refer to the unqualified " - "type, so it is not used for linkage", - TYPE_NAME (t)); - } - } - else - permerror (input_location, "type %qT with no linkage used to " - "declare function %q#D with linkage", t, decl); - } + no_linkage_error (decl); } TREE_PUBLIC (decl) = publicp; @@ -7874,7 +7852,7 @@ set_linkage_for_static_data_member (tree decl) If SCOPE is non-NULL, it is the class type or namespace containing the variable. If SCOPE is NULL, the variable should is created in - the innermost enclosings scope. */ + the innermost enclosing scope. */ static tree grokvardecl (tree type, @@ -7972,33 +7950,8 @@ grokvardecl (tree type, declare an entity with linkage. DR 757 relaxes this restriction for C++0x. */ - tree t = (cxx_dialect > cxx98 ? NULL_TREE - : no_linkage_check (TREE_TYPE (decl), /*relaxed_p=*/false)); - if (t) - { - if (TYPE_ANONYMOUS_P (t)) - { - if (DECL_EXTERN_C_P (decl)) - /* Allow this; it's pretty common in C. */ - ; - else - { - /* DRs 132, 319 and 389 seem to indicate types with - no linkage can only be used to declare extern "C" - entities. Since it's not always an error in the - ISO C++ 90 Standard, we only issue a warning. */ - warning (0, "anonymous type with no linkage used to declare " - "variable %q#D with linkage", decl); - if (DECL_ORIGINAL_TYPE (TYPE_NAME (t))) - warning (0, "%q+#D does not refer to the unqualified " - "type, so it is not used for linkage", - TYPE_NAME (t)); - } - } - else - warning (0, "type %qT with no linkage used to declare variable " - "%q#D with linkage", t, decl); - } + if (cxx_dialect < cxx11) + no_linkage_error (decl); } else DECL_INTERFACE_KNOWN (decl) = 1; @@ -8670,23 +8623,6 @@ check_var_type (tree identifier, tree type) return type; } -/* Functions for adjusting the visibility of a tagged type and its nested - types when it gets a name for linkage purposes from a typedef. */ - -static void bt_reset_linkage (binding_entry, void *); -static void -reset_type_linkage (tree type) -{ - set_linkage_according_to_type (type, TYPE_MAIN_DECL (type)); - if (CLASS_TYPE_P (type)) -binding_table_foreach (CLASSTYPE_NESTE
C++ PATCH for DR 1571 (reference binding)
Getting the reference binding rules for C++11 right (in the standard) has taken quite a few iterations. I'm pretty happy with the latest wording, which deals with user-defined conversions by recursing on the result of the conversion. This patch implements those rules. I'm a little uncertain about applying this so late in the 4.9 cycle, but I think it's a significant improvement to C++11 support. The second patch fixes a diagnostic issue I noticed while working on this: when explaining that a conversion from the result of the conversion function failed, the compiler was talking about the 'this' parameter. Tested x86_64-pc-linux-gnu, applying to trunk. commit acf9584aa8d20a2ae1e4b4505f224fc9f937e836 Author: Jason Merrill Date: Tue Feb 11 11:29:18 2014 -0800 DR 1571 * call.c (reference_binding): Recurse on user-defined conversion. (convert_like_real) [ck_ref_bind]: Explain cv-qual mismatch. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 700099d..32767ec 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -1677,20 +1677,37 @@ reference_binding (tree rto, tree rfrom, tree expr, bool c_cast_p, int flags, if (!conv) return NULL; - conv = build_conv (ck_ref_bind, rto, conv); + /* Limit this to C++11 mode for GCC 4.9, to be safe. */ + if (cxx_dialect >= cxx11 && conv->user_conv_p) +{ + /* If initializing the temporary used a conversion function, + recalculate the second conversion sequence. */ + for (conversion *t = conv; t; t = next_conversion (t)) + if (t->kind == ck_user + && DECL_CONV_FN_P (t->cand->fn)) + { + tree ftype = TREE_TYPE (TREE_TYPE (t->cand->fn)); + if (TREE_CODE (ftype) != REFERENCE_TYPE) + /* Pretend we start from an xvalue to avoid trouble from + LOOKUP_NO_TEMP_BIND. */ + ftype = cp_build_reference_type (ftype, true); + conversion *new_second + = reference_binding (rto, ftype, NULL_TREE, c_cast_p, + flags|LOOKUP_NO_CONVERSION, complain); + if (!new_second) + return NULL; + conv = merge_conversion_sequences (t, new_second); + break; + } +} + + if (conv->kind != ck_ref_bind) +conv = build_conv (ck_ref_bind, rto, conv); + /* This reference binding, unlike those above, requires the creation of a temporary. */ conv->need_temporary_p = true; - if (TYPE_REF_IS_RVALUE (rto)) -{ - conv->rvaluedness_matches_p = 1; - /* In the second case, if the reference is an rvalue reference and - the second standard conversion sequence of the user-defined - conversion sequence includes an lvalue-to-rvalue conversion, the - program is ill-formed. */ - if (conv->user_conv_p && next_conversion (conv)->kind == ck_rvalue) - conv->bad_p = 1; -} + conv->rvaluedness_matches_p = TYPE_REF_IS_RVALUE (rto); return conv; } @@ -6213,12 +6230,25 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, if (convs->bad_p && !next_conversion (convs)->bad_p) { - gcc_assert (TYPE_REF_IS_RVALUE (ref_type) - && (real_lvalue_p (expr) - || next_conversion(convs)->kind == ck_rvalue)); + gcc_assert (TYPE_REF_IS_RVALUE (ref_type)); - error_at (loc, "cannot bind %qT lvalue to %qT", - TREE_TYPE (expr), totype); + if (real_lvalue_p (expr) + || next_conversion(convs)->kind == ck_rvalue) + error_at (loc, "cannot bind %qT lvalue to %qT", + TREE_TYPE (expr), totype); + else if (!reference_compatible_p (totype, TREE_TYPE (expr))) + error_at (loc, "binding %qT to reference of type %qT " + "discards qualifiers", TREE_TYPE (expr),totype); + else + gcc_unreachable (); + if (convs->user_conv_p) + for (conversion *t = convs; t; t = next_conversion (t)) + if (t->kind == ck_user) + { + print_z_candidate (loc, "after user-defined conversion:", + t->cand); + break; + } if (fn) inform (input_location, "initializing argument %P of %q+D", argnum, fn); diff --git a/gcc/testsuite/g++.dg/cpp0x/overload3.C b/gcc/testsuite/g++.dg/cpp0x/overload3.C index e521b35..b8f781a 100644 --- a/gcc/testsuite/g++.dg/cpp0x/overload3.C +++ b/gcc/testsuite/g++.dg/cpp0x/overload3.C @@ -13,5 +13,5 @@ struct wrap int main() { wrap w; - f(w);// { dg-error "lvalue" } + f(w);// { dg-error "" } } diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-init1.C b/gcc/testsuite/g++.dg/cpp0x/rv-init1.C new file mode 100644 index 000..2e8d4f7 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/rv-init1.C @@ -0,0 +1,26 @@ +// Core DR 1604/1571/1572 +// { dg-require-effective-target c++11 } + +struct Banana { }; +struct Enigma { operator const Banana(); }; +struct Doof { operator Banana&(); }; +void enigmatic() { + typedef const Banana ConstBanana; + Banana &&banana1 = ConstBanana(); // { dg-error "" } + Banana &&banana2 = Enigma(); // { dg-error "" } + Banana &&banana3 = Doof();// { dg-error "" } +} + +class A { +public: + operator volatile int &(); +}
Re: [PATCH] Fix PR60327 - dealII and Xalanbmk ICEing with LTO
> > This fixes the ICE on our regular -flto-partition=none testers > which sees an edge w/o call-stmt after inlining (see the PR > for details). I'm not sure this is supposed to happen but the > following re-instantiates the guard to inline_update_overall_summary > which was present before the last change to that area. > > LTO bootstrap / regtest running on x86_64-unknown-linux-gnu, ok? > > Thanks, > Richard. > > 2014-02-25 Richard Biener > > PR ipa/60327 > * ipa.c (walk_polymorphic_call_targets): Properly guard > call to inline_update_overall_summary. Ah yes, that is OK! Indeed inline summaries are freed after inlining during normal compilation. During WPA they are kept so lto-partitioning can use them. Thanks! Honza
[Committed] Fix __builtin_thread_pointer for ILP32 and other like ABIs
Hi, With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable decl has a mode of SImode while the register is DImode. So the target that gets passed down to expand_builtin_thread_pointer is NULL as expand does not know how to get a subreg for a pointer type. This fixes the problem by handling a NULL target like we are able to handle for a non register/correct mode target inside expand_builtin_thread_pointer. Committed as obvious after a build and test for both aarch64-linux-gnu and x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski * builtins.c (expand_builtin_thread_pointer): Create a new target when the target is NULL. Fix __builtin_thread_pointer for AARCH64 ILP32 Hi, With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable decl has a mode of SImode while the register is DImode. So the target that gets passed down to expand_builtin_thread_pointer is NULL as expand does not know how to get a subreg for a pointer type. This fixes the problem by handling a NULL target like we are able to handle for a non register/correct mode target inside expand_builtin_thread_pointer. Thanks, Andrew Pinski * builtins.c (expand_builtin_thread_pointer): Create a new target when the target is NULL. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 616d8ec..570bff0 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2014-02-24 Andrew Pinski + + * builtins.c (expand_builtin_thread_pointer): Create a new target + when the target is NULL. + 2014-02-25 Vladimir Makarov PR rtl-optimization/60317 diff --git a/gcc/builtins.c b/gcc/builtins.c index 35969ad..7c6318e 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5712,7 +5712,10 @@ expand_builtin_thread_pointer (tree exp, rtx target) if (icode != CODE_FOR_nothing) { struct expand_operand op; - if (!REG_P (target) || GET_MODE (target) != Pmode) + /* If the target is not sutitable then create a new target. */ + if (target == NULL_RTX + || !REG_P (target) + || GET_MODE (target) != Pmode) target = gen_reg_rtx (Pmode); create_output_operand (&op, target, Pmode); expand_insn (icode, 1, &op);
Re: [PATCH, rs6000] Canonicalize split for unordered vector compares
On Tue, 2014-02-25 at 11:23 -0500, David Edelsohn wrote: > On Tue, Feb 25, 2014 at 10:15 AM, Bill Schmidt > wrote: > > Hi David, > > > > Thanks. I have this upstream for mainline now. This problem appears to > > have been introduced in GCC 4.6. Is it ok to backport this fix to the > > FSF 4.7 and 4.8 branches? > > This is okay to backport. Well, I guess the backport won't be necessary. It looks like the canonical form was changed in GCC 4.9 for some unknown reason, so previous releases don't have a bug. This happened when the 128-bit expanders were moved from vector.md to rs6000.md last year. I'll check around why the canonicalization changed at the same time. Thanks, Bill > > Thanks, David >
[Patch, GCC/ARM] Redefine the ASM_APP_OFF in a cleaner way
Hi There, As the assembler directive ".code 16" equals ".thumb", this small patch is going to redefine the ASM_APP_OFF in a cleaner way. Tested with GCC regression test and no regressions. Is it OK to current trunk or shall we wait until the release-branch mode end? BR, Terry 2014-02-25 Terry Guo * config/arm/arm.h (ASM_APP_OFF): Re-define it in a cleaner way.From fd607bf3a164e885a6e16a1028feff11b8a653bc Mon Sep 17 00:00:00 2001 From: Terry Guo Date: Thu, 2 Jan 2014 15:08:55 +0800 Subject: [PATCH] patch --- gcc/config/arm/arm.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 288ff8b..f9d1537 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -2132,8 +2132,7 @@ extern int making_const_table; do { cfun->machine->thumb1_cc_insn = NULL_RTX; } while (0) #undef ASM_APP_OFF -#define ASM_APP_OFF (TARGET_THUMB1 ? "\t.code\t16\n" : \ -TARGET_THUMB2 ? "\t.thumb\n" : "") +#define ASM_APP_OFF (TARGET_ARM ? "" : "\t.thumb\n") /* Output a push or a pop instruction (only used when profiling). We can't push STATIC_CHAIN_REGNUM (r12) directly with Thumb-1. We know -- 1.8.3.2
[PATCH/AARCH64 1/3] Add AARCH64 ILP32 PCH support
Hi, Just like most of the targets out there we should define TRY_EMPTY_VM_SPACE to have better PCH support. OK? Built and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32 definition. --- gcc/ChangeLog |5 + gcc/config/host-linux.c |4 +++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 616d8ec..fd2b6cd 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2014-02-25 Andrew Pinski + + * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32 + definition. + 2014-02-25 Vladimir Makarov PR rtl-optimization/60317 diff --git a/gcc/config/host-linux.c b/gcc/config/host-linux.c index 17048d7..b298a17 100644 --- a/gcc/config/host-linux.c +++ b/gcc/config/host-linux.c @@ -86,8 +86,10 @@ # define TRY_EMPTY_VM_SPACE0x6000 #elif defined(__mc68000__) # define TRY_EMPTY_VM_SPACE0x4000 -#elif defined(__aarch64__) +#elif defined(__aarch64__) && defined(__LP64__) # define TRY_EMPTY_VM_SPACE0x10 +#elif defined(__aarch64__) +# define TRY_EMPTY_VM_SPACE0x6000 #elif defined(__ARM_EABI__) # define TRY_EMPTY_VM_SPACE 0x6000 #elif defined(__mips__) && defined(__LP64__) -- 1.7.2.5
[PATCHv2/AARCH64 0/3] Add ILP32 GNU/Linux support
This patch set adds ILP32 support to GCC for GNU/Linux. A patch which adds the host support for PCH. A patch which fixes TLS variables with ILP32; shows up while compiling glibc so no new testcases added. One final patch which adds the name of the dynamic linker and passes the linker script to the linker and allows for the multi-lib to work correctly. All of these patches are tested incrementally. Only the last patch depends on the rest of the patches. The rest can be applied independently. Thanks, Andrew Pinski Andrew Pinski (3): 2014-02-25 Andrew Pinski 2014-02-25 Andrew Pinski 2014-02-25 Andrew Pinski gcc/ChangeLog | 28 ++ gcc/config/aarch64/aarch64-linux.h |4 +- gcc/config/aarch64/aarch64.c | 48 gcc/config/aarch64/aarch64.md | 54 gcc/config/aarch64/t-aarch64-linux |7 +--- gcc/config/host-linux.c|4 ++- 6 files changed, 119 insertions(+), 26 deletions(-) -- 1.7.2.5
[PATCHv2/AARCH64 3/3] Support ILP32 multi-lib
Hi, This is the final patch which adds support for the dynamic linker and multi-lib directories for ILP32. I did not change multi-arch support as I did not know what it should be changed to and internally here at Cavium, we don't use multi-arch. Updated for the new names that were decided on. OK? Build and tested for aarch64-linux-gnu with and without --with-multilib-list=lp64,ilp32. Thanks, Andrew Pinski * config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): /lib/ld-linux-aarch64_ilp32.so.1 is used for ILP32. (LINUX_TARGET_LINK_SPEC): Update linker script for ILP32. file whose name depends on -mabi= and -mbig-endian. * config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 better and handle ilp32 too. (MULTILIB_OPTIONS): Delete. (MULTILIB_DIRNAMES): Delete. --- gcc/ChangeLog | 11 +++ gcc/config/aarch64/aarch64-linux.h |4 ++-- gcc/config/aarch64/t-aarch64-linux |7 ++- 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 155ce45..a0cdc58 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,16 @@ 2014-02-25 Andrew Pinski + * config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): /lib/ld-linux32-aarch64_ilp32.so.1 + is used for ILP32. + (LINUX_TARGET_LINK_SPEC): Update linker script for ILP32. + file whose name depends on -mabi= and -mbig-endian. + * config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 better + and handle ilp32 too. + (MULTILIB_OPTIONS): Delete. + (MULTILIB_DIRNAMES): Delete. + +2014-02-25 Andrew Pinski + * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle TLS for ILP32. * config/aarch64/aarch64.md (tlsie_small): Rename to ... diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index a8f0771..48beafb 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -21,7 +21,7 @@ #ifndef GCC_AARCH64_LINUX_H #define GCC_AARCH64_LINUX_H -#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-aarch64%{mbig-endian:_be}.so.1" +#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1" #define CPP_SPEC "%{pthread:-D_REENTRANT}" @@ -33,7 +33,7 @@ -dynamic-linker " GNU_USER_DYNAMIC_LINKER " \ -X \ %{mbig-endian:-EB} %{mlittle-endian:-EL} \ - -maarch64linux%{mbig-endian:b}" + -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}" #define LINK_SPEC LINUX_TARGET_LINK_SPEC diff --git a/gcc/config/aarch64/t-aarch64-linux b/gcc/config/aarch64/t-aarch64-linux index 147452b..d6a678e 100644 --- a/gcc/config/aarch64/t-aarch64-linux +++ b/gcc/config/aarch64/t-aarch64-linux @@ -22,10 +22,7 @@ LIB1ASMSRC = aarch64/lib1funcs.asm LIB1ASMFUNCS = _aarch64_sync_cache_range AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be) -MULTILIB_OSDIRNAMES = .=../lib64$(call if_multiarch,:aarch64$(AARCH_BE)-linux-gnu) +MULTILIB_OSDIRNAMES = mabi.lp64=../lib64$(call if_multiarch,:aarch64$(AARCH_BE)-linux-gnu) MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu) -# Disable the multilib for linux-gnu targets for the time being; focus -# on the baremetal targets. -MULTILIB_OPTIONS= -MULTILIB_DIRNAMES = +MULTILIB_OSDIRNAMES += mabi.ilp32=../libilp32 -- 1.7.2.5
[PATCHv2/AARCH64 2/3] Fix TLS for ILP32.
Hi, With ILP32, some simple usage of TLS variables causes an unrecognizable instruction due to needing to use SImode for loading pointers from memory. This fixes the three (tlsie_small, tlsle_small, tlsdesc_small) patterns to support SImode for pointers. I modified them to be like what was done for the GOT patterns. OK? Build and tested on aarch64-elf with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle TLS for ILP32. * config/aarch64/aarch64.md (tlsie_small): Rename to ... (tlsie_small_): this and handle PTR. (tlsie_small_sidi): New pattern. (tlsle_small): Change to an expand to handle ILP32. (tlsle_small_): New pattern. (tlsdesc_small): Rename to ... (tlsdesc_small_): this and handle PTR. --- gcc/ChangeLog | 12 + gcc/config/aarch64/aarch64.c | 48 +++ gcc/config/aarch64/aarch64.md | 54 +++- 3 files changed, 96 insertions(+), 18 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fd2b6cd..155ce45 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,17 @@ 2014-02-25 Andrew Pinski + * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): + Handle TLS for ILP32. + * config/aarch64/aarch64.md (tlsie_small): Rename to ... + (tlsie_small_): this and handle PTR. + (tlsie_small_sidi): New pattern. + (tlsle_small): Change to an expand to handle ILP32. + (tlsle_small_): New pattern. + (tlsdesc_small): Rename to ... + (tlsdesc_small_): this and handle PTR. + +2014-02-25 Andrew Pinski + * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32 definition. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 901ad3d..e65c049 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -640,22 +640,58 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, case SYMBOL_SMALL_TLSDESC: { - rtx x0 = gen_rtx_REG (Pmode, R0_REGNUM); + enum machine_mode mode = GET_MODE (dest); + rtx x0 = gen_rtx_REG (mode, R0_REGNUM); rtx tp; - emit_insn (gen_tlsdesc_small (imm)); + gcc_assert (mode == Pmode || mode == ptr_mode); + + /* In ILP32, the got entry is always of SImode size. Unlike + small GOT, the dest is fixed at reg 0. */ + if (TARGET_ILP32) + emit_insn (gen_tlsdesc_small_si (imm)); + else + emit_insn (gen_tlsdesc_small_di (imm)); tp = aarch64_load_tp (NULL); - emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, x0))); + + if (mode != Pmode) + tp = gen_lowpart (mode, tp); + + emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, x0))); set_unique_reg_note (get_last_insn (), REG_EQUIV, imm); return; } case SYMBOL_SMALL_GOTTPREL: { - rtx tmp_reg = gen_reg_rtx (Pmode); + /* In ILP32, the mode of dest can be either SImode or DImode, + while the got entry is always of SImode size. The mode of + dest depends on how dest is used: if dest is assigned to a + pointer (e.g. in the memory), it has SImode; it may have + DImode if dest is dereferenced to access the memeory. + This is why we have to handle three different tlsie_small + patterns here (two patterns for ILP32). */ + enum machine_mode mode = GET_MODE (dest); + rtx tmp_reg = gen_reg_rtx (mode); rtx tp = aarch64_load_tp (NULL); - emit_insn (gen_tlsie_small (tmp_reg, imm)); - emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, tmp_reg))); + + if (mode == ptr_mode) + { + if (mode == DImode) + emit_insn (gen_tlsie_small_di (tmp_reg, imm)); + else + { + emit_insn (gen_tlsie_small_si (tmp_reg, imm)); + tp = gen_lowpart (mode, tp); + } + } + else + { + gcc_assert (mode == Pmode); + emit_insn (gen_tlsie_small_sidi (tmp_reg, imm)); + } + + emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, tmp_reg))); set_unique_reg_note (get_last_insn (), REG_EQUIV, imm); return; } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..7d8a645 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3581,35 +3581,65 @@ [(set_attr "type" "call") (set_attr "length" "16")]) -(define_insn "tlsie_small" - [(set (match_operand:DI 0 "register_operand" "=r") -(unspec:DI [(match_operand:DI 1 "aarch64_tls_ie_symref" "S")] +(define_insn "tlsie_small_" + [(set (match_operand:PTR 0 "register_operand" "=r") +(unspec:PTR [(match_operand 1 "aarch64_tls_ie_symref" "S")]
Re: [RS6000, patch] pr57936, ICE in rs6000_secondary_reload_inner
On Wed, Feb 26, 2014 at 01:03:52AM +1030, Alan Modra wrote: > On Tue, Feb 25, 2014 at 02:30:59PM +0100, Ulrich Weigand wrote: > > Instead, there's code in emit_input_reload_insns that is supposed > > to re-check whether a secondary reload is still needed if something > > changed significantly, e.g. if a value was inherited: > reload_override_in[j] is (reg:V16QI 32 0), and rl->in_reg is > (subreg:V16QI (reg:V2DI 159 [ D.2446 ]) 0), so perhaps we're missing a > subreg test in > > if (reload_override_in[j] > && REG_P (rl->in_reg)) > { > oldequiv = old; > old = rl->in_reg; > } Yes, that was it. I did wonder why the secondary reload wasn't being deleted but didn't spot the code in emit_input_reload_insns.. Some notes: Setting old to rl->in_reg when it is a subreg doesn't change the cases where delete_output_reload is called, since that call is protected by REG_P (old). The same thing goes for the following: /* If we are reloading a pseudo-register that was set by the previous insn, see if we can get rid of that pseudo-register entirely by redirecting the previous insn into our reload register. */ else if (optimize && REG_P (old) Perhaps the above could handle subregs too, but I figure such a change probably isn't good for stage 4. So the net result of this patch ought to just change the conditions under which we recheck secondary reloads. Bootstrapped and regression tested powerpc64-linux, x86_64-linux bootstrap still chugging along. OK to apply, assuming no regressions? PR target/57935 * reload1.c (emit_input_reload_insns): When reload_override_in, set old to rl->in_reg when rl->in_reg is a subreg. Index: gcc/reload1.c === --- gcc/reload1.c (revision 208097) +++ gcc/reload1.c (working copy) @@ -7238,9 +7238,12 @@ emit_input_reload_insns (struct insn_chain *chain, /* delete_output_reload is only invoked properly if old contains the original pseudo register. Since this is replaced with a hard reg when RELOAD_OVERRIDE_IN is set, see if we can - find the pseudo in RELOAD_IN_REG. */ + find the pseudo in RELOAD_IN_REG. This is also used to + determine whether a secondary reload is needed. */ if (reload_override_in[j] - && REG_P (rl->in_reg)) + && (REG_P (rl->in_reg) + || (GET_CODE (rl->in_reg) == SUBREG + && REG_P (SUBREG_REG (rl->in_reg) { oldequiv = old; old = rl->in_reg; -- Alan Modra Australia Development Lab, IBM