Re: [ipa-vrp] ice in set_value_range
On Nov 09 2016, Andrew Pinski wrote: > Either this patch or the patch for "Handle unary pass-through jump > functions for ipa-vrp" caused a bootstrap failure on > aarch64-linux-gnu. > Bootstrap comparison failure! > gcc/go/types.o differs > gcc/fortran/class.o differs > gcc/tree-ssa-live.o differs > gcc/data-streamer-out.o differs > gcc/ira-build.o differs > gcc/hsa-gen.o differs > gcc/hsa-brig.o differs > gcc/omp-low.o differs > gcc/lto-streamer-in.o differs > gcc/real.o differs > gcc/final.o differs > gcc/df-core.o differs Similar failure on ia64. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [patch, avr] Add flash size to device info and make wrap around default
On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote: On 09.11.2016 10:14, Pitchumani Sivanupandi wrote: On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote: On 08.11.2016 08:08, Pitchumani Sivanupandi wrote: I have updated patch to include the flash size as well. Took that info from device headers (it was fed into crt's device information note section also). The new option would render -mn-flash superfluous, but we should keep it for backward compatibility. Ok. Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from LINK_RELAX_SPEC? And what happens if relaxation is off? Yes. Removed link_pmem_wrap from link_relax. Disabling relaxation doesn't change -mpmem-wrap-around behavior. flashsize-and-wrap-around.patch diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def index 6bcc6ff..9d4aa1a 100644 /* /* Classic, > 8K, <= 64K. */ -AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,0x0060, 0x0, 1) -AVR_MCU ("at43usb355", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__",0x0060, 0x0, 1) -AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__", 0x0060, 0x0, 1) +AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,0x0060, 0x0, 1, 0x6000) +AVR_MCU ("at43usb355", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000) +AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__", 0x0060, 0x0, 1, 0x4000) +AVR_MCU ("at43usb320", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1) /* Classic, == 128K. */ -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,0x0060, 0x0, 2) -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2) -AVR_MCU ("at43usb320", ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__", 0x0060, 0x0, 2) +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,0x0060, 0x0, 2, 0x2) +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2) /* Classic + MOVW + JMP/CALL. */ If at43usb320 is in the wrong multilib, then this should be handled as separate issue / patch together with its own PR. Sorry for the confusion. I just noticed that some fields don't match... It is not even clear to me from the data sheet if avr3 is the correct multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it also has MUL) as there is no reference to the exact instruction set -- Atmochip will know. Moreover, such a change should be sync'ed with avr-libc as all multilib stuff is hand-wired there: no use of --print-foo meta information retrieval by avr-libc :-(( I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565 for this one. Thats better. I've attached the updated patch. If OK, could someone commit please? I'll try if I could find some more info for AT43USB320. Regards, Pitchumani diff --git a/gcc/config/avr/avr-arch.h b/gcc/config/avr/avr-arch.h index 42eaee5..e0961d4 100644 --- a/gcc/config/avr/avr-arch.h +++ b/gcc/config/avr/avr-arch.h @@ -122,6 +122,9 @@ typedef struct /* Number of 64k segments in the flash. */ int n_flash; + + /* Flash size in bytes. */ + int flash_size; } avr_mcu_t; /* AVR device specific features. diff --git a/gcc/config/avr/avr-devices.c b/gcc/config/avr/avr-devices.c index 7d13ba4..cef3b9a 100644 --- a/gcc/config/avr/avr-devices.c +++ b/gcc/config/avr/avr-devices.c @@ -111,12 +111,12 @@ avr_texinfo[] = const avr_mcu_t avr_mcu_types[] = { -#define AVR_MCU(NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH)\ - { NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH }, +#define AVR_MCU(NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH, FLASH_SIZE)\ + { NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH, FLASH_SIZE }, #include "avr-mcus.def" #undef AVR_MCU /* End of list. */ - { NULL, ARCH_UNKNOWN, AVR_ISA_NONE, NULL, 0, 0, 0 } + { NULL, ARCH_UNKNOWN, AVR_ISA_NONE, NULL, 0, 0, 0, 0 } }; diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def index 6bcc6ff..6e51c2e 100644 --- a/gcc/config/avr/avr-mcus.def +++ b/gcc/config/avr/avr-mcus.def @@ -62,295 +62,297 @@ N_FLASH Number of 64 KiB flash segments, rounded up. The default value for -mn-flash=. + FLASH_SIZEFlash size in bytes. + "avr2" must be first for the "0" default to work as intended. */ /* Classic, <= 8K. */ -AVR_MCU ("avr2", ARCH_AVR2, AVR_ERRATA_SKIP, NULL, 0x0060, 0x0, 6) -AVR_MCU ("at90s2313",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2313__", 0x0060, 0x0, 1) -AVR_MCU ("at90s2323",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2323__", 0x0060, 0x0, 1) -AVR_MCU ("at90s
Re: [PATCH TEST]Drop xfail for gcc.dg/vect/vect-cond-2.c
On Wed, Nov 9, 2016 at 9:46 PM, Christophe Lyon wrote: > Hi Bin > > On 8 November 2016 at 13:37, Bin Cheng wrote: >> Hi, >> Test gcc.dg/vect/vect-cond-2.c can be vectorized by GCC now, this patch >> drops the xfail. >> >> Thanks, >> bin >> >> gcc/testsuite/ChangeLog >> 2016-11-04 Bin Cheng >> >> * gcc.dg/vect/vect-cond-2.c: Drop xfail. > > But the test is now fail on armeb Thanks for reporting. Also on targets don't support vect_max_reduc, I will refine it. Thanks, bin > > Christophe
Re: [PATCH] Fix PR78189
On Wed, 9 Nov 2016, Christophe Lyon wrote: > On 9 November 2016 at 09:36, Bin.Cheng wrote: > > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener wrote: > >> On Mon, 7 Nov 2016, Christophe Lyon wrote: > >> > >>> Hi Richard, > >>> > >>> > >>> On 7 November 2016 at 09:01, Richard Biener wrote: > >>> > > >>> > The following fixes an oversight when computing alignment in the > >>> > vectorizer. > >>> > > >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > >>> > > >>> > Richard. > >>> > > >>> > 2016-11-07 Richard Biener > >>> > > >>> > PR tree-optimization/78189 > >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix > >>> > alignment computation. > >>> > > >>> > * g++.dg/torture/pr78189.C: New testcase. > >>> > > >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C > >>> > === > >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) > >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) > >>> > @@ -0,0 +1,41 @@ > >>> > +/* { dg-do run } */ > >>> > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" > >>> > } */ > >>> > + > >>> > +#include > >>> > + > >>> > +struct A > >>> > +{ > >>> > + void * a; > >>> > + void * b; > >>> > +}; > >>> > + > >>> > +struct alignas(16) B > >>> > +{ > >>> > + void * pad; > >>> > + void * misaligned; > >>> > + void * pad2; > >>> > + > >>> > + A a; > >>> > + > >>> > + void Null(); > >>> > +}; > >>> > + > >>> > +void B::Null() > >>> > +{ > >>> > + a.a = nullptr; > >>> > + a.b = nullptr; > >>> > +} > >>> > + > >>> > +void __attribute__((noinline,noclone)) > >>> > +NullB(void * misalignedPtr) > >>> > +{ > >>> > + B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) > >>> > - offsetof(B, misaligned)); > >>> > + b->Null(); > >>> > +} > >>> > + > >>> > +int main() > >>> > +{ > >>> > + B b; > >>> > + NullB(&b.misaligned); > >>> > + return 0; > >>> > +} > >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > >>> > index 9346cfe..b03cb1e 100644 > >>> > --- gcc/tree-vect-data-refs.c > >>> > +++ gcc/tree-vect-data-refs.c > >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct > >>> > data_reference *dr) > >>> >base = ref; > >>> >while (handled_component_p (base)) > >>> > base = TREE_OPERAND (base, 0); > >>> > + unsigned int base_alignment; > >>> > + unsigned HOST_WIDE_INT base_bitpos; > >>> > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); > >>> > + /* As data-ref analysis strips the MEM_REF down to its base operand > >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to > >>> > + adjust things to make base_alignment valid as the alignment of > >>> > + DR_BASE_ADDRESS. */ > >>> >if (TREE_CODE (base) == MEM_REF) > >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, > >>> > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), > >>> > 0)); > >>> > - unsigned int base_alignment = get_object_alignment (base); > >>> > +{ > >>> > + base_bitpos -= mem_ref_offset (base).to_short_addr () * > >>> > BITS_PER_UNIT; > >>> > + base_bitpos &= (base_alignment - 1); > >>> > +} > >>> > + if (base_bitpos != 0) > >>> > +base_alignment = base_bitpos & -base_bitpos; > >>> > + /* Also look at the alignment of the base address DR analysis > >>> > + computed. */ > >>> > + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); > >>> > + if (base_addr_alignment > base_alignment) > >>> > +base_alignment = base_addr_alignment; > >>> > > >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) > >>> > DR_VECT_AUX (dr)->base_element_aligned = true; > >>> > >>> Since you committed this patch (r241892), I'm seeing execution failures: > >>> gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test > >>> gcc.dg/vect/pr40074.c execution test > >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 > >>> --with-fpu=neon-fp16 > >>> (using qemu as simulator) > >> > >> The difference is that we now vectorize the testcase with versioning > >> for alignment (but it should never execute the vectorized variant). > >> I need arm peoples help to understand what is wrong. > > Hi All, > > I will look at it. > > > > Hi, > > This is causing new regressions on armeb: > gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vectorized 2 loops" 1 > gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect > "vectorized 2 loops" 1 It's actually an improvement as armeb can't do unaligned loads. Before the patch we versioned both loops for alignment (they can't be possibly both aligned by luck). After the patch we align 'arr' and thus the first loop becomes known-aligned and vectorized while the 2nd one is now known unaligned (and thus not vectorized because we can't do unalign
Re: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable
Ping. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html Andrew, do you have any objections to this version? Thanks, Kyrill On 01/11/16 15:21, Kyrill Tkachov wrote: On 31/10/16 11:54, Kyrill Tkachov wrote: On 24/10/16 17:15, Andrew Pinski wrote: On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov wrote: Hi all, When storing a 64-bit immediate that has equal bottom and top halves we currently synthesize the repeating 32-bit pattern twice and perform a single X-store. With this patch we synthesize the 32-bit pattern once into a W register and store that twice using an STP. This reduces codesize bloat from synthesising the same constant multiple times at the expense of converting a store to a store-pair. It will only trigger if we can save two or more instructions, so it will only transform: mov x1, 49370 movkx1, 0xc0da, lsl 32 str x1, [x0] into: mov w1, 49370 stp w1, w1, [x0] when optimising for -Os, whereas it will always transform a 4-insn synthesis sequence into a two-insn sequence + STP (see comments in the patch). This patch triggers already but will trigger more with the store merging pass that I'm working on since that will generate more of these repeating 64-bit constants. This helps improve codegen on 456.hmmer where store merging can sometimes create very complex repeating constants and target-specific expand needs to break them down. Doing STP might be worse on ThunderX 1 than the mov/movk. Or this might cause an ICE with -mcpu=thunderx; I can't remember if the check for slow unaligned store pair word is with the pattern or not. I can't get it to ICE with -mcpu=thunderx. The restriction is just on the STP forming code in the sched-fusion hooks AFAIK. Basically the rule is 1) if 4 byte aligned, then it is better to do two str. 2) If 8 byte aligned, then doing stp is good 3) Otherwise it is better to do two str. Ok, then I'll make the function just emit two stores and depend on the sched-fusion machinery to fuse them into an STP when appropriate since that has the logic that takes thunderx into account. Here it is. I've confirmed that it emits to STRs for 4 byte aligned stores when -mtune=thunderx and still generates STP for the other tunings, though now sched-fusion is responsible for merging them, which is ok by me. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyril 2016-11-01 Kyrylo Tkachov * config/aarch64/aarch64.md (mov): Call aarch64_split_dimode_const_store on DImode constant stores. * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): New prototype. * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New function. 2016-11-01 Kyrylo Tkachov * gcc.target/aarch64/store_repeating_constant_1.c: New test. * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. Thanks, Andrew Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2016-10-24 Kyrylo Tkachov * config/aarch64/aarch64.md (mov): Call aarch64_split_dimode_const_store on DImode constant stores. * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): New prototype. * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New function. 2016-10-24 Kyrylo Tkachov * gcc.target/aarch64/store_repeating_constant_1.c: New test. * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
Re: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable
On Thu, Nov 10, 2016 at 1:04 AM, Kyrill Tkachov wrote: > Ping. > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html > > Andrew, do you have any objections to this version? Not really. Thanks, Andrew > Thanks, > Kyrill > > On 01/11/16 15:21, Kyrill Tkachov wrote: >> >> >> On 31/10/16 11:54, Kyrill Tkachov wrote: >>> >>> >>> On 24/10/16 17:15, Andrew Pinski wrote: On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov wrote: > > Hi all, > > When storing a 64-bit immediate that has equal bottom and top halves we > currently > synthesize the repeating 32-bit pattern twice and perform a single > X-store. > With this patch we synthesize the 32-bit pattern once into a W register > and > store > that twice using an STP. This reduces codesize bloat from synthesising > the > same > constant multiple times at the expense of converting a store to a > store-pair. > It will only trigger if we can save two or more instructions, so it > will > only transform: > mov x1, 49370 > movkx1, 0xc0da, lsl 32 > str x1, [x0] > > into: > > mov w1, 49370 > stp w1, w1, [x0] > > when optimising for -Os, whereas it will always transform a 4-insn > synthesis > sequence into a two-insn sequence + STP (see comments in the patch). > > This patch triggers already but will trigger more with the store > merging > pass > that I'm working on since that will generate more of these repeating > 64-bit > constants. > This helps improve codegen on 456.hmmer where store merging can > sometimes > create very > complex repeating constants and target-specific expand needs to break > them > down. Doing STP might be worse on ThunderX 1 than the mov/movk. Or this might cause an ICE with -mcpu=thunderx; I can't remember if the check for slow unaligned store pair word is with the pattern or not. >>> >>> >>> I can't get it to ICE with -mcpu=thunderx. >>> The restriction is just on the STP forming code in the sched-fusion hooks >>> AFAIK. >>> Basically the rule is 1) if 4 byte aligned, then it is better to do two str. 2) If 8 byte aligned, then doing stp is good 3) Otherwise it is better to do two str. >>> >>> >>> Ok, then I'll make the function just emit two stores and depend on the >>> sched-fusion >>> machinery to fuse them into an STP when appropriate since that has the >>> logic that >>> takes thunderx into account. >>> >> >> Here it is. >> I've confirmed that it emits to STRs for 4 byte aligned stores when >> -mtune=thunderx >> and still generates STP for the other tunings, though now sched-fusion is >> responsible for >> merging them, which is ok by me. >> >> Bootstrapped and tested on aarch64. >> Ok for trunk? >> >> Thanks, >> Kyril >> >> >> 2016-11-01 Kyrylo Tkachov >> >> * config/aarch64/aarch64.md (mov): Call >> aarch64_split_dimode_const_store on DImode constant stores. >> * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): >> New prototype. >> * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New >> function. >> >> 2016-11-01 Kyrylo Tkachov >> >> * gcc.target/aarch64/store_repeating_constant_1.c: New test. >> * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. >> >>> >>> Thanks, Andrew > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > > Thanks, > Kyrill > > 2016-10-24 Kyrylo Tkachov > > * config/aarch64/aarch64.md (mov): Call > aarch64_split_dimode_const_store on DImode constant stores. > * config/aarch64/aarch64-protos.h > (aarch64_split_dimode_const_store): > New prototype. > * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New > function. > > 2016-10-24 Kyrylo Tkachov > > * gcc.target/aarch64/store_repeating_constant_1.c: New test. > * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. >>> >>> >> >
Re: [Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]
Hi Janus, well, is it really that obvious? It fixes the ICE, correct. But what about cases where the actual has an explicit interface, but is not a variable? Like in: function f() integer :: f end function call sub(f()) ! sub() defined as in the pr. I haven't checked whether that is valid Fortran. Another thought, because I fell on my nose about this once too often, what about the actual argument being a component of a derived type. I known this is not the bug you fixes, but quite close. Think about: type t real :: r(:) end type type(t) :: foo call sub(foo%r) I assume the check for C1232 does not address this correctly. Can you comment having thought about this somewhat? - Andre On Wed, 9 Nov 2016 21:42:15 +0100 Janus Weil wrote: > Hi all, > > I have committed yet another obvious ice-on-invalid fix: > > https://gcc.gnu.org/viewcvs?rev=242020&root=gcc&view=rev > > Cheers, > Janus -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [ARM] PR 78253 do not resolve weak ref locally
On 09/11/16 21:29, Christophe Lyon wrote: > Hi, > > PR 78253 shows that the handling of weak references has changed for > ARM with gcc-5. > > When r220674 was committed, default_binds_local_p_2 gained a new > parameter (weak_dominate), which, when true, implies that a reference > to a weak symbol defined locally will be resolved locally, even though > it could be overridden by a strong definition in another object file. > > With r220674, default_binds_local_p forces weak_dominate=true, > effectively changing the previous behavior. > > The attached patch introduces default_binds_local_p_4 which is a copy > of default_binds_local_p_2, but using weak_dominate=false, and updates > the ARM target to call default_binds_local_p_4 instead of > default_binds_local_p_2. > > I ran cross-tests on various arm* configurations with no regression, > and checked that the test attached to the original bugzilla now works > as expected. > > I am not sure why weak_dominate defaults to true, and I couldn't > really understand why by reading the threads related to r220674 and > following updates to default_binds_local_p_* which all deal with other > corner cases and do not discuss the weak_dominate parameter. > > Or should this patch be made more generic? > I certainly don't think it should be ARM specific. The questions I have are: 1) What do other targets do today. Are they the same, or different? 2) If different why? 3) Is the current behaviour really what was intended by the patch? ie. Was the old behaviour actually wrong? R. > Thanks, > > Christophe >
Re: [PATCH v2] Add mcpu flag for Qualcomm falkor core
On 04/11/16 15:47, Siddhesh Poyarekar wrote: > This adds an mcpu option for the upcoming Qualcomm Falkor core. This > is identical to the qdf24xx part that was added earlier and hence > retains the same tuning structure and continues to have the a57 > pipeline for now. The part number has also been changed and this > patch fixes this for both qdf24xx and falkor options. > > Tested with aarch64 and armhf. > > Siddhesh > > * gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part > number. > (falkor): New core. > * gcc/config/aarch64/aarch64-tune.md: Regenerated. > * gcc/config/arm/arm-cores.def (falkor): New core. > * gcc/config/arm/arm-tables.opt: Regenerated. > * gcc/config/arm/arm-tune.md: Regenerated. > * gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support. > * gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor > support. > * gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor. > (ARM Options/-mtune): Add falkor. > Installed. Please can you prepare a patch for the release notes as well. R.
Re: [PATCH, RFC] Improve ivopts group costs
On Wed, Nov 9, 2016 at 1:02 PM, Bin.Cheng wrote: > On Thu, Nov 3, 2016 at 4:00 PM, Bin.Cheng wrote: >> On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov >> wrote: >>> Hello, >>> >>> I'm facing the following problem related to ivopts. The problem is that GCC >>> generates a lot of induction variables and as a result there is an >>> unnecessary increase of stack usage and register pressure. >>> >>> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26 >>> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for >>> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target >>> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm >>> attaching assembly I get on current trunk. >>> >>> The reason might be in use groups costs, in particular, in the way of >>> estimation. Currently, the cost of a tuple (group, candidate) is a sum of >>> per-use costs in the group. So, the cost of a group grows proportional to >>> the number of uses in the group. This approach has a negative effect on the >>> algorithm for finding the best set of induction variables: the part of a >>> total cost related to adding a new candidate is almost washed out by the >>> cost of the group. In addition, when there is a lot of groups with many uses >>> inside and a target is out of free registers, the cost of spill is washing >>> out too. As a result, GCC prefers to use native candidates (candidate >>> created for a particular group) for each group of uses instead of >>> considering the real cost of introducing a new variable into a set. >>> >>> The summing approach was added as a part of this patch >>> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the >>> motivation for taking the sum does not seem to be >>> specifically discussed. >>> >>> I propose the following patch that changes a group cost from cost summing to >>> selecting the largest one inside a group. For the given test case I have: >>> necessary size of stack is decreased by almost 3 times and ldr\str amount >>> are decreased by less than 2 times. Also I'm attaching assembly after >>> applying the patch. >>> >>> The essential change in the patch is just: >>> >>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >>> index f9211ad..a149418 100644 >>> --- a/gcc/tree-ssa-loop-ivopts.c >>> +++ b/gcc/tree-ssa-loop-ivopts.c >>> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data >>> *data, >>> offset and where the iv_use is. */ >>> cost = get_computation_cost (data, next, cand, true, >>> NULL, &can_autoinc, NULL); >>> - sum_cost += cost; >>> + if (sum_cost < cost) >>> +sum_cost = cost; >>> } >>>set_group_iv_cost (data, group, cand, sum_cost, depends_on, >>> NULL_TREE, ERROR_MARK, inv_expr); >>> >>> Any suggestions? >> Hi Evgeny, >> >> I tend to be practical here. Given cost is not model that well in >> IVOPT, any heuristic tuning in cost is quite likely resulting in >> improvement in some cases, while regressions in others. At least we >> need to run good number of benchmarks for such changes. Specific to >> this one, the summary of cost in a group is a natural result of the >> original code, in which uses' cost is added up before candidate >> selection. Take your code as an example, choosing loop's original >> candidate for group results in lots of memory accesses with [base + >> index << scale] addressing mode, which could have higher cost than >> [base + offset] mode wrto u-arch, IMHO, it makes sense to add up this >> cost. OTOH, I totally agree that IVOPT tends to choose too many >> candidates at the moment, especially for large loops. Is it possible >> to achieve the same effect by penalizing register pressure cost? >> Meanwhile, I can collect benchmark data for your patch on AArch64 and >> get back to you later. > I collected spec2k6 data on one AArch64 processors, it doesn't give > meaningful improvement or regression. Looking at the test, it boils > down to how we choose induction variable for loops having below memory > references: > for (biv) > MEM[base + biv << scale + off1]; > MEM[base + biv << scale + off2]; > // ... > MEM[base + biv << scale + offn]; > > On targets support [base + index << scale + offset] addressing mode , > the biv should be preferred (if cost of the addressing mode is not > blocking) thus register pressure is 2. While on targets only support > [base + index << scale], it is more complicated. Choosing biv > actually increases the register pressure by 1 than choosing {base_1 + > biv << scale, step} as the candidate, or an additional computation > outside of address expression is required for each memory referece. > Well, there is one exception when "base" is actually anticipated on > exit edge of this loop. So this is really target dependent cost model > issue, IMHO, decreasing group cost in target-independent wa
Re: [PATCH] PR78241, fix loop unroller when niter expr is not reliable
On Wed, Nov 9, 2016 at 11:13 PM, Pat Haugen wrote: > The following fixes a problem introduced by my earlier loop unroller patch, > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01612.html. In instances where > the niter expr is not reliable we need to still emit an initial peel copy of > the loop. > > Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk? Ok. Richard. > -Pat > > > 2016-11-09 Pat Haugen > > PR rtl-optimization/78241 > * loop-unroll.c (unroll_loop_runtime_iterations): Don't adjust > 'niter', but > emit initial peel copy if niter expr is not reliable. > > > testsuite/ChangeLog: > 2016-11-09 Pat Haugen > > * gcc.dg/pr78241.c: New test. > >
[ARM] Remove duplicated enum type for CPU identifiers
Another minor cleanup. We currently have two enumeration types for the list of CPUs. This patch gets rid of one of them and merges the uses together. * arm.h (target_cpus): Delete. * arm-opts.h (enum processor_type): Prefix entries with TARGET_CPU_. * arm.c (all_cores): Prefix IDENT with TARGET_CPU_. (all_architectures): Likewise. (arm_option_override): Adjust use of CPU enums. (arm_sched_reorder): Likewise. * vfp.md (movdi_vfp, movdi_vfp_cortexa8): Likewise. * arm.opt (mcpu, mtune): Adjust use of CPU enums. * arm/genopt.sh (processor_type): Prefix enumeration entries with TARGET_CPU_. * arm-tables.opt: Regenerated. Installed on trunk. >From 7f6788b63cdade43b245d33df667bc5af73c11aa Mon Sep 17 00:00:00 2001 From: Richard Earnshaw Date: Wed, 9 Nov 2016 13:11:10 + Subject: [PATCH] [ARM] Remove duplicated enum type for CPU identifiers. MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.7.4" This is a multi-part message in MIME format. --2.7.4 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit * arm.h (target_cpus): Delete. * arm-opts.h (enum processor_type): Prefex entires with TARGET_CPU_. * arm.c (all_cores): Prefix IDENT with TARGET_CPU_. (all_architectures): Likewise. (arm_option_override): Adjust use of CPU enums. (arm_sched_reorder): Likewise. * vfp.md (movdi_vfp): Likewise. * arm.opt (mcpu, mtune): Adjust use of CPU enums. * arm/genopt.sh (processor_type): Prefix enumeration entries with TARGET_CPU_. * arm-tables.opt: Regenerated. --- gcc/config/arm/arm-opts.h | 4 +- gcc/config/arm/arm-tables.opt | 218 +- gcc/config/arm/arm.c | 16 ++-- gcc/config/arm/arm.h | 10 -- gcc/config/arm/arm.opt| 4 +- gcc/config/arm/genopt.sh | 2 +- gcc/config/arm/vfp.md | 4 +- 7 files changed, 124 insertions(+), 134 deletions(-) --2.7.4 Content-Type: text/x-patch; name="0001-ARM-Remove-duplicated-enum-type-for-CPU-identifiers.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-ARM-Remove-duplicated-enum-type-for-CPU-identifiers.patch" diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index a649ba5..9fa8113 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -30,11 +30,11 @@ enum processor_type { #undef ARM_CORE #define ARM_CORE(NAME, INTERNAL_IDENT, IDENT, ARCH, FLAGS, COSTS) \ - INTERNAL_IDENT, + TARGET_CPU_##INTERNAL_IDENT, #include "arm-cores.def" #undef ARM_CORE /* Used to indicate that no processor has been specified. */ - arm_none + TARGET_CPU_arm_none }; /* Which __fp16 format to use. diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index f7886b9..effdfbc 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -25,331 +25,331 @@ Name(processor_type) Type(enum processor_type) Known ARM CPUs (for use with the -mcpu= and -mtune= options): EnumValue -Enum(processor_type) String(arm2) Value(arm2) +Enum(processor_type) String(arm2) Value( TARGET_CPU_arm2) EnumValue -Enum(processor_type) String(arm250) Value(arm250) +Enum(processor_type) String(arm250) Value( TARGET_CPU_arm250) EnumValue -Enum(processor_type) String(arm3) Value(arm3) +Enum(processor_type) String(arm3) Value( TARGET_CPU_arm3) EnumValue -Enum(processor_type) String(arm6) Value(arm6) +Enum(processor_type) String(arm6) Value( TARGET_CPU_arm6) EnumValue -Enum(processor_type) String(arm60) Value(arm60) +Enum(processor_type) String(arm60) Value( TARGET_CPU_arm60) EnumValue -Enum(processor_type) String(arm600) Value(arm600) +Enum(processor_type) String(arm600) Value( TARGET_CPU_arm600) EnumValue -Enum(processor_type) String(arm610) Value(arm610) +Enum(processor_type) String(arm610) Value( TARGET_CPU_arm610) EnumValue -Enum(processor_type) String(arm620) Value(arm620) +Enum(processor_type) String(arm620) Value( TARGET_CPU_arm620) EnumValue -Enum(processor_type) String(arm7) Value(arm7) +Enum(processor_type) String(arm7) Value( TARGET_CPU_arm7) EnumValue -Enum(processor_type) String(arm7d) Value(arm7d) +Enum(processor_type) String(arm7d) Value( TARGET_CPU_arm7d) EnumValue -Enum(processor_type) String(arm7di) Value(arm7di) +Enum(processor_type) String(arm7di) Value( TARGET_CPU_arm7di) EnumValue -Enum(processor_type) String(arm70) Value(arm70) +Enum(processor_type) String(arm70) Value( TARGET_CPU_arm70) EnumValue -Enum(processor_type) String(arm700) Value(arm700) +Enum(processor_type) String(arm700) Value( TARGET_CPU_arm700) EnumValue -Enum(processor_type) String(arm700i) Value(arm700i) +Enum(processor_type) String(arm700i) Value( TARGET_CPU_arm700i) EnumValue -Enum(processor_type) String(arm710) Value(arm710) +Enum(processor_type) String(arm710) Value( TARGET_CPU_arm710) EnumValue -Enum(pro
[patch 0/5] OpenACC tile clause support
Hi Jakub, this patch set is a port of the OpenACC tile clause functionality from the gomp4 branch. The tile clause allows the programmer to specify the tiling of an OpenACC loop nest. These patches were mostly implemented by Nathan, with the Fortran front-end bits by Cesar. I'm just upstreaming this on behalf of them. As for the testing status, one libgomp testcase is currently XFAILed, as we're fixing this as PR78266. Other test results look fine. Is this patch set okay for trunk? We hope to get this into GCC 7. Thanks, Chung-Lin
Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const
On Thu, Nov 10, 2016 at 12:05 AM, Segher Boessenkool wrote: > On Wed, Nov 09, 2016 at 11:29:45PM +0100, Marc Glisse wrote: >> >>>This patch makes RTL simplify transform (xor (and (xor A B) C) B) back >> >>>to (ior (and A C) (and B ~C)) for constant C (and similar with A instead >> >>>of B for that last term). >> >> >> >>Would it make sense to implement this transformation in match.pd, next to >> >>the "opposite" one, or do you need it at the RTL level because C only >> >>becomes a constant at that stage? >> > >> >It becomes a constant in the later gimple passes, but we need it in the RTL >> >simplifiers as well, even if you also do it in match.pd? >> >> (assuming it is always an improvement, even though it may use the same >> number of operations and one more constant) > > And a shallower evaluation tree. > > Does match.pd (or the gimple optimisers in general) care about the number > of constants at all? It's a matter of definition what is simpler. I think for two different constants vs one constant I'd prefer the single constant (the match.pd pattern would apply to int128 as well). Unless you have a andnot instruction and combine/CSE are clever enough to see the opportunity to re-use a previous constant if it changes bit-and to bit-and-not ... I'd at _most_ consider them equal cost which would mean we'd need to decide for a canonical form -- having the same one for constant and not constant seems like a good idea here as well. bernd notices better resource utilization due to the shallower tree for the ior variant but that also eventually increases register pressure. Thus, I'd leave this all to RTL land ;) (esp. the constant with and-not instruction trick is likely not done on RTL) Richard. >> Sure, it doesn't hurt to have it in both places. It just seems that since >> the problem was caused by match.pd in your original testcase, fixing it at >> that level (undoing the harm as soon as possible) would make the RTL >> version less useful (though not useless). Anyway, I don't feel competent >> to decide when which form is preferable, I was just curious. > > I don't know either. See PR63568. > >> (simplify >> (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0) >> (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2))) >> >> (this handles vectors as well, I don't know if that is desired) > > No clue. There is a theme here ;-) > > > Segher
[Patch 1/5] OpenACC tile clause support, OMP_CLAUSE_TILE adjustments
This patch contains a few supporting changes that adjusts how OMP_CLAUSE_TILE is handled. This is in support of the more elaborate omp-low.c changes in another patch. Thanks, Chung-Lin 2016-XX-XX Nathan Sidwell * tree.h (OMP_CLAUSE_TILE_ITERVAR, OMP_CLAUSE_TILE_COUNT): New. * tree.c (omp_clause_num_ops): Adjust TILE ops. * tree-nested.c (convert_nonlocal_omp_clauses): Allow OMP_CLAUSE_TILE. * gimplify.c (gimplify_scan_omp_clauses): No special handling for OMP_CLAUSE_TILE. (gomplify_adjust_omp_clauses): Don't delete TILE. (gimplify_omp_for): Deal with TILE. Index: tree.c === --- tree.c (revision 241809) +++ tree.c (working copy) @@ -327,7 +327,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_NUM_GANGS */ 1, /* OMP_CLAUSE_NUM_WORKERS */ 1, /* OMP_CLAUSE_VECTOR_LENGTH */ - 1, /* OMP_CLAUSE_TILE */ + 3, /* OMP_CLAUSE_TILE */ 2, /* OMP_CLAUSE__GRIDDIM_ */ }; Index: tree.h === --- tree.h (revision 241809) +++ tree.h (working copy) @@ -1654,6 +1654,10 @@ extern void protected_set_expr_location (tree, loc #define OMP_CLAUSE_TILE_LIST(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0) +#define OMP_CLAUSE_TILE_ITERVAR(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 1) +#define OMP_CLAUSE_TILE_COUNT(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 2) #define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\ Index: tree-nested.c === --- tree-nested.c (revision 241809) +++ tree-nested.c (working copy) @@ -1274,6 +1274,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, stru case OMP_CLAUSE_DEFAULT: case OMP_CLAUSE_COPYIN: case OMP_CLAUSE_COLLAPSE: + case OMP_CLAUSE_TILE: case OMP_CLAUSE_UNTIED: case OMP_CLAUSE_MERGEABLE: case OMP_CLAUSE_PROC_BIND: @@ -1286,8 +1287,6 @@ convert_nonlocal_omp_clauses (tree *pclauses, stru case OMP_CLAUSE_AUTO: break; - /* OpenACC tile clauses are discarded during gimplification. */ - case OMP_CLAUSE_TILE: /* The following clause belongs to the OpenACC cache directive, which is discarded during gimplification. */ case OMP_CLAUSE__CACHE_: @@ -1982,6 +1981,7 @@ convert_local_omp_clauses (tree *pclauses, struct case OMP_CLAUSE_DEFAULT: case OMP_CLAUSE_COPYIN: case OMP_CLAUSE_COLLAPSE: + case OMP_CLAUSE_TILE: case OMP_CLAUSE_UNTIED: case OMP_CLAUSE_MERGEABLE: case OMP_CLAUSE_PROC_BIND: @@ -1994,8 +1994,6 @@ convert_local_omp_clauses (tree *pclauses, struct case OMP_CLAUSE_AUTO: break; - /* OpenACC tile clauses are discarded during gimplification. */ - case OMP_CLAUSE_TILE: /* The following clause belongs to the OpenACC cache directive, which is discarded during gimplification. */ case OMP_CLAUSE__CACHE_: Index: gimplify.c === --- gimplify.c (revision 241809) +++ gimplify.c (working copy) @@ -8138,20 +8138,11 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se remove = true; break; - case OMP_CLAUSE_TILE: - for (tree list = OMP_CLAUSE_TILE_LIST (c); !remove && list; - list = TREE_CHAIN (list)) - { - if (gimplify_expr (&TREE_VALUE (list), pre_p, NULL, - is_gimple_val, fb_rvalue) == GS_ERROR) - remove = true; - } - break; - case OMP_CLAUSE_NOWAIT: case OMP_CLAUSE_ORDERED: case OMP_CLAUSE_UNTIED: case OMP_CLAUSE_COLLAPSE: + case OMP_CLAUSE_TILE: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: case OMP_CLAUSE_INDEPENDENT: @@ -8927,13 +8918,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gi case OMP_CLAUSE_VECTOR: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: - break; - case OMP_CLAUSE_TILE: - /* We're not yet making use of the information provided by OpenACC - tile clauses. Discard these here, to simplify later middle end - processing. */ - remove = true; break; default: @@ -9388,10 +9373,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) (OMP_FOR_INIT (for_stmt)) * 2); } - int collapse = 1; - c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_COLLAPSE); - if (c) -collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c)); + int collapse = 0; + /* Find the first of COLLAPSE or TILE. */ + for (c = OMP_FOR_CLAUSES (for_stmt); c; c = TREE_CHAIN (c)) +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_COLLAPSE) + { + collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c)); + if (collapse == 1) + /* Not really collapsing. */ + collapse = 0; + break; + } +else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_TILE) + { + collapse = list_length (OMP_CLAUSE_TILE_LIST (c)); + break; + } + for
[Patch 2/5] OpenACC tile clause support, omp-low parts
This part is the bulk of the patch set. It consists of the definition of the GOACC_TILE internal fn, the lowering/expanding of this in omp-low, as well as loop auto-partitioning adjustments that help improve generated code. This patch corresponds to the gomp4 committed patches: https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00518.html https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02031.html https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02351.html https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02418.html https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00107.html https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00152.html https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00314.html with some small bits separated out to the prior patch in this series. Thanks, Chung-Lin 2016-XX-XX Nathan Sidwell * internal-fn.def (GOACC_DIM_POS): Comment may be overly conservative. (GOACC_TILE): New. * internal-fn.c (expand_GOACC_TILE): New. * omp-low.c (struct omp_for_data): Add tiling field. (struct oacc_loop): Change 'ifns' to vector of call stmts, add e_mask field. (enum oacc_loop_flags): Add OLF_TILE flag. (extract_omp_for_data): Deal with tiling. (scan_sharing_clauses): Allow OMP_CLAUSE_TILE. (lower_oacc_head_mark): Add OLF_TILE as appropriate. Ensure 2 levels for auto loops. Remove default auto determining, moved to oacc_loop_fixed_partitions. (struct oacc_collapse): Add tile and outer fields. */ (expand_oacc_collapse_init): Add LOC paramter. Initialize tile element fields. (expand_oacc_collapse_vars): Add INNER parm, adjust for tiling, avoid DIV for outermost collapse var. (expand_oacc_for): Insert tile element loop as needed. Adjust. Remove out of date comments, fix whitespace. (oacc_dim_call): New function, abstracted out from oacc_thread_numbers. (oacc_thread_numbers): Use oacc_dim_call. (oacc_xform_tile): New. (new_oacc_loop_raw): Initialize e_mask, adjust for ifns vector. (finish_oacc_loop): Adjust for ifns vector. (oacc_loop_discover_walk): Append loop abstraction sites to list, add case for GOACC_TILE fns. (oacc_loop_xform_loop): Delete. (oacc_loop_process): Iterate over call list directly, and add handling for GOACC_TILE fns. (oacc_loop_fixed_partitions): Determine default auto, deal with TILE, dump partitioning. (oacc_loop_auto_partitions): Add outer_assign parm. Assign all but vector partitioning to outer loops. Assign 2 partitions to loops when available. Add TILE handling. (oacc_loop_partition): Adjust oacc_loop_auto_partitions call. (execite_oacc_device_lower): Process GOACC_TILE fns, ignore unknown specs. Index: internal-fn.c === --- internal-fn.c (revision 241809) +++ internal-fn.c (working copy) @@ -2168,6 +2168,14 @@ expand_GOACC_REDUCTION (internal_fn, gcall *) gcc_unreachable (); } +/* This is expanded by oacc_device_lower pass. */ + +static void +expand_GOACC_TILE (internal_fn, gcall *) +{ + gcc_unreachable (); +} + /* Set errno to EDOM. */ static void Index: internal-fn.def === --- internal-fn.def (revision 241809) +++ internal-fn.def (working copy) @@ -175,7 +175,7 @@ DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW, NULL) dimension. DIM_POS is pure (and not const) so that it isn't thought to clobber memory and can be gcse'd within a single parallel region, but not across FORK/JOIN boundaries. They take a - single INTEGER_CST argument. */ + single INTEGER_CST argument. This might be overly conservative. */ DEF_INTERNAL_FN (GOACC_DIM_SIZE, ECF_CONST | ECF_NOTHROW | ECF_LEAF, ".") DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".") @@ -185,6 +185,10 @@ DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHRO /* OpenACC reduction abstraction. See internal-fn.h for usage. */ DEF_INTERNAL_FN (GOACC_REDUCTION, ECF_NOTHROW | ECF_LEAF, NULL) +/* Openacc tile abstraction. Describes the spans of the element loop. + GOACC_TILE (num-loops, loop-no, tile-arg, tile-mask, element-mask). */ +DEF_INTERNAL_FN (GOACC_TILE, ECF_NOTHROW | ECF_LEAF, NULL) + /* Set errno to EDOM, if GCC knows how to do that directly for the current target. */ DEF_INTERNAL_FN (SET_EDOM, ECF_LEAF | ECF_NOTHROW, NULL) Index: omp-low.c === --- omp-low.c (revision 241809) +++ omp-low.c (working copy) @@ -213,7 +213,8 @@ struct omp_for_data tree chunk_size; gomp_for *for_stmt; tree pre, iter_type; - int collapse; + tree tiling; /* Tiling values (if non null). */ + int collapse; /* Collapsed loops, 1 for a non-collapsed loop. */ int ordered; bool have_nowait, have_ordered, simd_schedule;
[Patch 3/5] OpenACC tile clause support, C/C++ front-end parts
These are the patches for the C/C++ front-ends, along with the testsuite patches. Thanks, Chung-Lin 2016-XX-XX Nathan Sidwell c/ * c-parser.c (c_parser_omp_clause_collapse): Disallow tile. (c_parser_oacc_clause_tile): Disallow collapse. Fix parsing and semantic checking. * c-parser.c (c_parser_omp_for_loop): Accept tiling constructs. cp/ * parser.c (cp_parser_oacc_clause_tile): Disallow collapse. Fix parsing. Parse constant expression. Remove semantic checking. (cp_parser_omp_clause_collapse): Disallow tile. (cp_parser_omp_for_loop): Deal with tile clause. Don't emit a parse error about missing for after already emitting one. Use more conventional for idiom for unbounded loop. * pt.c (tsubst_omp_clauses): Require integral constant expression for COLLAPSE and TILE. Remove broken TILE subst. * semantics.c (finish_omp_clauses): Correct TILE semantic check. (finish_omp_for): Deal with tile clause. gcc/testsuite/ * c-c++-common/goacc/loop-auto-1.c: Adjust and add additional case. * c-c++-common/goacc/loop-auto-2.c: New. * c-c++-common/goacc/tile.c: Include stdbool, fix expected errors. * g++.dg/goacc/template.C: Test tile subst. Adjust erroneous uses. * g++.dg/goacc/tile-1.C: Check tile subst. * gcc.dg/goacc/loop-processing-1.c: Adjust dg-final pattern. Index: c/c-parser.c === --- c/c-parser.c (revision 241809) +++ c/c-parser.c (working copy) @@ -11010,6 +11010,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tr location_t loc; check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse"); + check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile"); loc = c_parser_peek_token (parser)->location; if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) @@ -11920,10 +11921,11 @@ static tree c_parser_oacc_clause_tile (c_parser *parser, tree list) { tree c, expr = error_mark_node; - location_t loc, expr_loc; + location_t loc; tree tile = NULL_TREE; check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile"); + check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse"); loc = c_parser_peek_token (parser)->location; if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) @@ -11931,16 +11933,19 @@ c_parser_oacc_clause_tile (c_parser *parser, tree do { + if (tile && !c_parser_require (parser, CPP_COMMA, "expected %<,%>")) + return list; + if (c_parser_next_token_is (parser, CPP_MULT) && (c_parser_peek_2nd_token (parser)->type == CPP_COMMA || c_parser_peek_2nd_token (parser)->type == CPP_CLOSE_PAREN)) { c_parser_consume_token (parser); - expr = integer_minus_one_node; + expr = integer_zero_node; } else { - expr_loc = c_parser_peek_token (parser)->location; + location_t expr_loc = c_parser_peek_token (parser)->location; c_expr cexpr = c_parser_expr_no_commas (parser, NULL); cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true); expr = cexpr.value; @@ -11952,28 +11957,20 @@ c_parser_oacc_clause_tile (c_parser *parser, tree return list; } - if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))) - { - c_parser_error (parser, "% value must be integral"); - return list; - } - expr = c_fully_fold (expr, false, NULL); - /* Attempt to statically determine when expr isn't positive. */ - c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, expr, - build_int_cst (TREE_TYPE (expr), 0)); - protected_set_expr_location (c, expr_loc); - if (c == boolean_true_node) + if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) + || TREE_CODE (expr) != INTEGER_CST + || !tree_fits_shwi_p (expr) + || tree_to_shwi (expr) <= 0) { - warning_at (expr_loc, 0,"% value must be positive"); - expr = integer_one_node; + error_at (expr_loc, "% argument needs positive" + " integral constant"); + expr = integer_zero_node; } } tile = tree_cons (NULL_TREE, expr, tile); - if (c_parser_next_token_is (parser, CPP_COMMA)) - c_parser_consume_token (parser); } while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN)); @@ -14899,11 +14896,17 @@ c_parser_omp_for_loop (location_t loc, c_parser *p bool fail = false, open_brace_parsed = false; int i, collapse = 1, ordered = 0, count, nbraces = 0; location_t for_loc; + bool tiling = false; vec *for_block = make_tree_vector (); for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl)) if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE) collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl)); +else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE) + { + tiling = true; + collapse = list_length (OMP_CLAUSE_TILE_LIST (cl)); + } else if (OMP_CLAUSE_CO
[Patch 5/5] OpenACC tile clause support, libgomp testsuite patches
Some additional tests and adjustments to existing ones were made. 2016-XX-XX Nathan Sidwell Chung-Lin Tang libgomp/ * testsuite/libgomp.oacc-c-c++-common/tile-1.c: New. * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust and add additional case. * testsuite/libgomp.oacc-c-c++-common/vprop.c: XFAIL under "openacc_nvidia_accel_selected". * libgomp.oacc-fortran/nested-function-1.f90 (test2): Add num_workers(8) clause. Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c === --- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c (revision 241809) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c (working copy) @@ -112,7 +112,7 @@ int vector_1 (int *ary, int size) ary[ix] = place (); } - return check (ary, size, 0, 0, 1); + return check (ary, size, 0, 1, 1); } int vector_2 (int *ary, int size) @@ -196,10 +196,24 @@ int gang_3 (int *ary, int size) ary[ix + jx * 64] = place (); } + return check (ary, size, 1, 1, 1); +} + +int gang_4 (int *ary, int size) +{ + clear (ary, size); + +#pragma acc parallel vector_length(32) copy(ary[0:size]) firstprivate (size) + { +#pragma acc loop auto +for (int jx = 0; jx < size; jx++) + ary[jx] = place (); + } + return check (ary, size, 1, 0, 1); } -#define N (32*32*32) +#define N (32*32*32*2) int main () { int ondev = 0; @@ -227,6 +241,8 @@ int main () return 1; if (gang_3 (ary, N)) return 1; + if (gang_4 (ary, N)) +return 1; return 0; } Index: libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c === --- libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c (revision 0) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c (revision 0) @@ -0,0 +1,281 @@ +/* This code uses nvptx inline assembly guarded with acc_on_device, which is + not optimized away at -O0, and then confuses the target assembler. + { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */ + +/* { dg-additional-options "-fopenacc-dim=32" } */ + +#include +#include + +static int check (const int *ary, int size, int gp, int wp, int vp) +{ + int exit = 0; + int ix; + int gangs[32], workers[32], vectors[32]; + + for (ix = 0; ix < 32; ix++) +gangs[ix] = workers[ix] = vectors[ix] = 0; + + for (ix = 0; ix < size; ix++) +{ + vectors[ary[ix] & 0xff]++; + workers[(ary[ix] >> 8) & 0xff]++; + gangs[(ary[ix] >> 16) & 0xff]++; +} + + for (ix = 0; ix < 32; ix++) +{ + if (gp) + { + int expect = gangs[0]; + if (gangs[ix] != expect) + { + exit = 1; + printf ("gang %d not used %d times\n", ix, expect); + } + } + else if (ix && gangs[ix]) + { + exit = 1; + printf ("gang %d unexpectedly used\n", ix); + } + + if (wp) + { + int expect = workers[0]; + if (workers[ix] != expect) + { + exit = 1; + printf ("worker %d not used %d times\n", ix, expect); + } + } + else if (ix && workers[ix]) + { + exit = 1; + printf ("worker %d unexpectedly used\n", ix); + } + + if (vp) + { + int expect = vectors[0]; + if (vectors[ix] != expect) + { + exit = 1; + printf ("vector %d not used %d times\n", ix, expect); + } + } + else if (ix && vectors[ix]) + { + exit = 1; + printf ("vector %d unexpectedly used\n", ix); + } + +} + return exit; +} + +#pragma acc routine seq +static int __attribute__((noinline)) place () +{ + int r = 0; + + if (acc_on_device (acc_device_nvidia)) +{ + int g = 0, w = 0, v = 0; + + __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g)); + __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w)); + __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v)); + r = (g << 16) | (w << 8) | v; +} + return r; +} + +static void clear (int *ary, int size) +{ + int ix; + + for (ix = 0; ix < size; ix++) +ary[ix] = -1; +} + +int gang_vector_1 (int *ary, int size) +{ + clear (ary, size); +#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size) + { +#pragma acc loop tile(128) gang vector +for (int jx = 0; jx < size; jx++) + ary[jx] = place (); + } + + return check (ary, size, 1, 0, 1); +} + +int gang_vector_2a (int *ary, int size) +{ + if (size % 256) +return 1; + + clear (ary, size); +#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size) + { +#pragma acc loop tile(64, 64) gang vector +for (int jx = 0; jx < size / 256; jx++) + for (int ix = 0; ix < 256; ix++) + ary[jx * 256 + ix] = place (); + } + + return check (ary, size, 1, 0, 1); +} + +int gang_vector_2b (int *ary, int size) +{ + if (size % 256) +return 1; + + clear (ary, size); +#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size) + {
[Patch 4/5] OpenACC tile clause support, Fortran front-end parts
The Fortran front-end patches. These were originally written by Cesar. Thanks, Chung-Lin 2016-XX-XX Cesar Philippidis fortran/ * openmp.c (resolve_oacc_positive_int_expr): Promote the warning to an error. (resolve_oacc_loop_blocks): Use integer zero to represent the '*' tile argument. (resolve_omp_clauses): Error on directives containing both tile and collapse clauses. * trans-openmp.c (gfc_trans_omp_do): Lower tiled loops like collapsed loops. gcc/testsuite/ * gfortran.dg/goacc/loop-2.f95: Change expected tile clause warnings to errors. * gfortran.dg/goacc/loop-5.f95: Likewise. * gfortran.dg/goacc/sie.f95: Likewise. * gfortran.dg/goacc/tile-1.f90: New test. * gfortran.dg/goacc/tile-2.f90: New test * gfortran.dg/goacc/tile-lowering.f95: New test. Index: fortran/openmp.c === --- fortran/openmp.c (revision 241809) +++ fortran/openmp.c (working copy) @@ -3024,8 +3024,8 @@ resolve_oacc_positive_int_expr (gfc_expr *expr, co resolve_oacc_scalar_int_expr (expr, clause); if (expr->expr_type == EXPR_CONSTANT && expr->ts.type == BT_INTEGER && mpz_sgn(expr->value.integer) <= 0) -gfc_warning (0, "INTEGER expression of %s clause at %L must be positive", - clause, &expr->where); +gfc_error ("INTEGER expression of %s clause at %L must be positive", + clause, &expr->where); } /* Emits error when symbol is pointer, cray pointer or cray pointee @@ -3859,6 +3859,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_claus if (omp_clauses->wait_list) for (el = omp_clauses->wait_list; el; el = el->next) resolve_oacc_scalar_int_expr (el->expr, "WAIT"); + if (omp_clauses->collapse && omp_clauses->tile_list) +gfc_error ("Incompatible use of TILE and COLLAPSE at %L", &code->loc); } @@ -4964,11 +4966,11 @@ resolve_oacc_loop_blocks (gfc_code *code) if (el->expr == NULL) { /* NULL expressions are used to represent '*' arguments. - Convert those to a -1 expressions. */ + Convert those to a 0 expressions. */ el->expr = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind, &code->loc); - mpz_set_si (el->expr->value.integer, -1); + mpz_set_si (el->expr->value.integer, 0); } else { Index: fortran/trans-openmp.c === --- fortran/trans-openmp.c (revision 241809) +++ fortran/trans-openmp.c (working copy) @@ -3162,7 +3162,18 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, vec inits = vNULL; dovar_init *di; unsigned ix; + gfc_expr_list *tile = do_clauses ? do_clauses->tile_list : clauses->tile_list; + /* Both collapsed and tiled loops are lowered the same way. In + OpenACC, those clauses are not compatible, so prioritize the tile + clause, if present. */ + if (tile) +{ + collapse = 0; + for (gfc_expr_list *el = tile; el; el = el->next) + collapse++; +} + if (collapse <= 0) collapse = 1; Index: testsuite/gfortran.dg/goacc/loop-5.f95 === --- testsuite/gfortran.dg/goacc/loop-5.f95 (revision 241809) +++ testsuite/gfortran.dg/goacc/loop-5.f95 (working copy) @@ -93,9 +93,6 @@ program test DO j = 1,10 ENDDO ENDDO -!$acc loop tile(-1) ! { dg-warning "must be positive" } -do i = 1,10 -enddo !$acc loop vector tile(*) DO i = 1,10 ENDDO @@ -129,9 +126,6 @@ program test DO j = 1,10 ENDDO ENDDO -!$acc loop tile(-1) ! { dg-warning "must be positive" } -do i = 1,10 -enddo !$acc loop vector tile(*) DO i = 1,10 ENDDO @@ -242,9 +236,6 @@ program test DO j = 1,10 ENDDO ENDDO - !$acc kernels loop tile(-1) ! { dg-warning "must be positive" } - do i = 1,10 - enddo !$acc kernels loop vector tile(*) DO i = 1,10 ENDDO @@ -333,9 +324,6 @@ program test DO j = 1,10 ENDDO ENDDO - !$acc parallel loop tile(-1) ! { dg-warning "must be positive" } - do i = 1,10 - enddo !$acc parallel loop vector tile(*) DO i = 1,10 ENDDO Index: testsuite/gfortran.dg/goacc/tile-1.f90 === --- testsuite/gfortran.dg/goacc/tile-1.f90 (revision 0) +++ testsuite/gfortran.dg/goacc/tile-1.f90 (revision 0) @@ -0,0 +1,339 @@ +subroutine parloop + integer, parameter :: n = 100 + integer i, j, k, a + + !$acc parallel loop tile(10) + do i = 1, n + end do + + !$acc parallel loop tile(*) + do i = 1, n + end do + + !$acc parallel loop tile(10, *) + do i = 1, n + do j = 1, n + end do + end do + + !$acc parallel loop tile(10, *, i) ! { dg-error "" } + do i = 1, n + do j = 1, n +do k = 1, n +end do + end do + end do + + !$acc parallel loop t
Re: [PATCH] loop distribution bug fix
On Thu, Nov 10, 2016 at 6:25 AM, Jim Wilson wrote: > This fixes a bug in code adding edges to the dependence graph. Values > for this_dir can be -1 (backward edge), 0 (no edge), 1 (forward edge), > and 2 (both backward and forward edges). There can be multiple > dependencies checked, creating multiple edges that have to be merged > together. this_dir contains the current edge. dir contains the > previous edges. The code fails to handle the case where this_dir is 2, > in which case we can return immediately. This is a minor optimization > to improve compile time. The code handles the case where dir is > non-zero and this_dir is zero by returning 2, which is incorrect. dir > should be unmodified in this case. We can return 2 only if both dir > and this_dir are non-zero and unequal, i.e. one is -1 and the other is > 1. This problem creates extra unnecessary edges, which can prevent > loops from being distributed. The patch fixes both problems. > > This passed an x86_64 gcc bootstrap with -ftree-loop-distribution > added to BOOT_CFLAGS and a testsuite regression check. Curiously, I > see that I get different results for the C/C++ ubsan tests every time > I run them, but this has nothing to do with my patch, as it happens > with or without my patch. I haven't tried debugging this yet. Might > be related to my recent upgrade to Ubuntu 16.04 LTS. Otherwise, there > are no regressions. > > On SPEC CPU2006, on aarch64, I see 5879 loops distributed without the > patch, and 5906 loops distributed with the patch. So 27 extra loops > are distributed which is about 0.5% more loop distributions. There is > no measurable performance gain from the bug fix on the CPU2006 run > time though I plan to spend some more time looking at this code to see > if I can find other improvements. The biggest "lack" of loop distribution is the ability to undo CSE so for for (;;) { a[i] = a[i] + 1; b[i] = a[i]; } CSE makes us see for (;;) { tem = a[i]; tem2 = tem + 1; a[i] = tem; b[i] = tem; } and loop distribution cannot re-materialize tem3 from a[i] thus most of the time it ends up pulling redundant computations into each partition (sometimes that can reduce memory bandwith if one less stream is used but sometimes not, like in the above case). Then of course the cost model is purely modeled for STREAM (reduce the number of memory streams). So loop distribution is expected to pessimize code for the CSE case in case you are not memory bound and improve things if you are memory bound. > OK? Ok. Thanks for the improvement! Richard. > Jim
Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)
On 11/09/2016 02:47 PM, Martin Liška wrote: > On 11/09/2016 02:29 PM, Jakub Jelinek wrote: >> On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote: >>> As shown in the attached test-case, the assert cannot always be true. >>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >>> >>> Ready to be installed? >>> Martin >> >>> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001 >>> From: marxin >>> Date: Wed, 9 Nov 2016 11:52:00 +0100 >>> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR >>> sanitizer/78270) >>> >>> gcc/ChangeLog: >>> >>> 2016-11-09 Martin Liska >>> >>> PR sanitizer/78270 >>> * gimplify.c (gimplify_switch_expr): >> >> No description on what you've changed. >> >> That said, I'm not 100% sure it is the right fix. >> As the testcase shows, for switch without GIMPLE_BIND wrapping the body >> we can have variables that are in scope from the switch onwards. >> I bet we could also have variables that go out of scope, say if in the >> compound literal's initializer there is ({ ... }) that declares variables. >> I doubt you can have a valid case/default label in those though, so >> perhaps it would be simpler not to create live_switch_vars at all >> if SWITCH_BODY is not a BIND_EXPR? > > I like the approach you introduced! I'll re-trigger regression tests and > send a newer version of patch. > > Martin Sending the patch. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin > >> >> Jakub >> > >From fb4b852a17656309e6acfb8da97cf9bce4b3b176 Mon Sep 17 00:00:00 2001 From: marxin Date: Wed, 9 Nov 2016 11:52:00 +0100 Subject: [PATCH] Create live_switch_vars conditionally (PR sanitizer/78270) gcc/testsuite/ChangeLog: 2016-11-09 Martin Liska * gcc.dg/asan/pr78269.c: New test. gcc/ChangeLog: 2016-11-09 Martin Liska * gimplify.c (gimplify_switch_expr): Create live_switch_vars only when SWITCH_BODY is a BIND_EXPR. --- gcc/gimplify.c | 20 +++- gcc/testsuite/gcc.dg/asan/pr78269.c | 13 + 2 files changed, 28 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/asan/pr78269.c diff --git a/gcc/gimplify.c b/gcc/gimplify.c index d392450..da60c05 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -2241,7 +2241,7 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p) { vec labels; vec saved_labels; - hash_set *saved_live_switch_vars; + hash_set *saved_live_switch_vars = NULL; tree default_case = NULL_TREE; gswitch *switch_stmt; @@ -2253,8 +2253,14 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p) labels. Save all the things from the switch body to append after. */ saved_labels = gimplify_ctxp->case_labels; gimplify_ctxp->case_labels.create (8); - saved_live_switch_vars = gimplify_ctxp->live_switch_vars; - gimplify_ctxp->live_switch_vars = new hash_set (4); + + /* Do not create live_switch_vars if SWITCH_BODY is not a BIND_EXPR. */ + if (TREE_CODE (SWITCH_BODY (switch_expr)) == BIND_EXPR) + { + saved_live_switch_vars = gimplify_ctxp->live_switch_vars; + gimplify_ctxp->live_switch_vars = new hash_set (4); + } + bool old_in_switch_expr = gimplify_ctxp->in_switch_expr; gimplify_ctxp->in_switch_expr = true; @@ -2269,8 +2275,12 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p) labels = gimplify_ctxp->case_labels; gimplify_ctxp->case_labels = saved_labels; - gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0); - delete gimplify_ctxp->live_switch_vars; + + if (gimplify_ctxp->live_switch_vars) + { + gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0); + delete gimplify_ctxp->live_switch_vars; + } gimplify_ctxp->live_switch_vars = saved_live_switch_vars; preprocess_case_label_vec_for_gimple (labels, index_type, diff --git a/gcc/testsuite/gcc.dg/asan/pr78269.c b/gcc/testsuite/gcc.dg/asan/pr78269.c new file mode 100644 index 000..55840b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/asan/pr78269.c @@ -0,0 +1,13 @@ +// { dg-do compile } +// { dg-additional-options "-Wno-switch-unreachable" } + +typedef struct +{ +} bdaddr_t; + +int a; +void fn1 () +{ + switch (a) +&(bdaddr_t){}; +} -- 2.10.1
[PATCH] [ARC] New option handling, refurbish multilib support.
Hi, Please find the revised patch which includes the refurbishing of mmpy-option option, and a new comment on DEFAULT_arc_fpu_build define. As for the last suggestion, my proposal is to have a latter patch on the topic of .cpu, synced with a related binutils patch. OK to apply? Claudiu gcc/ 2016-05-09 Claudiu Zissulescu * config/arc/arc-arch.h: New file. * config/arc/arc-arches.def: Likewise. * config/arc/arc-cpus.def: Likewise. * config/arc/arc-options.def: Likewise. * config/arc/t-multilib: Likewise. * config/arc/genmultilib.awk: Likewise. * config/arc/genoptions.awk: Likewise. * config/arc/arc-tables.opt: Likewise. * config/arc/driver-arc.c: Likewise. * common/config/arc/arc-common.c (arc_handle_option): Trace toggled options. * config.gcc (arc*-*-*): Add arc-tables.opt to arc's extra options; check for supported cpu against arc-cpus.def file. (arc*-*-elf*, arc*-*-linux-uclibc*): Use new make fragment; define TARGET_CPU_BUILD macro; add driver-arc.o as an extra object. * config/arc/arc-c.def: Add emacs local variables. * config/arc/arc-opts.h (processor_type): Use arc-cpus.def file. (FPU_FPUS, FPU_FPUD, FPU_FPUDA, FPU_FPUDA_DIV, FPU_FPUDA_FMA) (FPU_FPUDA_ALL, FPU_FPUS_DIV, FPU_FPUS_FMA, FPU_FPUS_ALL) (FPU_FPUD_DIV, FPU_FPUD_FMA, FPU_FPUD_ALL): New defines. (DEFAULT_arc_fpu_build): Define. (DEFAULT_arc_mpy_option): Define. * config/arc/arc-protos.h (arc_init): Delete. * config/arc/arc.c (arc_cpu_name): New variable. (arc_selected_cpu, arc_selected_arch, arc_arcem, arc_archs) (arc_arc700, arc_arc600, arc_arc601): New variable. (arc_init): Add static; remove selection of default tune value, cleanup obsolete error messages. (arc_override_options): Make use of .def files for selecting the right cpu and option configurations. * config/arc/arc.h (stdbool.h): Include. (TARGET_CPU_DEFAULT): Define. (CPP_SPEC): Remove mcpu=NPS400 handling. (arc_cpu_to_as): Declare. (EXTRA_SPEC_FUNCTIONS): Define. (OPTION_DEFAULT_SPECS): Likewise. (ASM_DEFAULT): Remove. (ASM_SPEC): Use arc_cpu_to_as. (DRIVER_SELF_SPECS): Remove deprecated options. (arc_base_cpu): Declare. (TARGET_ARC600, TARGET_ARC601, TARGET_ARC700, TARGET_EM) (TARGET_HS, TARGET_V2, TARGET_ARC600): Make them use arc_base_cpu variable. (MULTILIB_DEFAULTS): Use ARC_MULTILIB_CPU_DEFAULT. * config/arc/arc.md (attr_cpu): Remove. * config/arc/arc.opt (mno-mpy): Deprecate. (mcpu=ARC600, mcpu=ARC601, mcpu=ARC700, mcpu=NPS400, mcpu=ARCEM) (mcpu=ARCHS): Remove. (mcrc, mdsp-packa, mdvbf, mmac-d16, mmac-24, mtelephony, mrtsc): Deprecate. (mbarrel_shifte, mspfp_, mdpfp_, mdsp_pack, mmac_): Remove. (arc_fpu): Use new defines. (mpy-option): Change to use numeric or string like inputs. * config/arc/t-arc (driver-arc.o): New target. (arc-cpus, t-multilib, arc-tables.opt): Likewise. * config/arc/t-arc-newlib: Delete. * config/arc/t-arc-uClibc: Renamed to t-uClibc. * doc/invoke.texi (ARC): Update arc options. --- gcc/common/config/arc/arc-common.c | 69 - gcc/config.gcc | 47 + gcc/config/arc/arc-arch.h | 120 ++ gcc/config/arc/arc-arches.def | 56 ++ gcc/config/arc/arc-c.def | 4 + gcc/config/arc/arc-cpus.def| 75 ++ gcc/config/arc/arc-options.def | 109 gcc/config/arc/arc-opts.h | 49 +++-- gcc/config/arc/arc-protos.h| 1 - gcc/config/arc/arc-tables.opt | 90 gcc/config/arc/arc.c | 176 +--- gcc/config/arc/arc.h | 89 gcc/config/arc/arc.md | 5 - gcc/config/arc/arc.opt | 169 +++--- gcc/config/arc/driver-arc.c| 78 ++ gcc/config/arc/genmultilib.awk | 203 + gcc/config/arc/genoptions.awk | 86 gcc/config/arc/t-arc | 19 gcc/config/arc/t-arc-newlib| 46 - gcc/config/arc/t-arc-uClibc| 20 gcc/config/arc/t-multilib | 34 +++ gcc/config/arc/t-uClibc| 20 gcc/doc/invoke.texi| 90 +--- 23 files changed, 1277 insertions(+), 378 deletions(-) create mode 100644 gcc/config/arc/arc-arch.h create mode 100644 gcc/config/arc/arc-arches.def create mode 100644 gcc/config/arc/arc-cpus.def create mode 100644 gcc/config/arc/arc-options.def create mode 100644 gcc/config/arc/arc-tables.opt create mode 100644 gcc/config/arc/driver-a
Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes
On 11/09/2016 10:02 AM, Richard Biener wrote: Using scan-assembler-times on the dwarf? I always have a bad feeling about this kind of check as I imagine it can break very easily with legit changes. But I have nothing better to contribute, so I’ve added one such testcase. ;-) Ok to commit? Ok. This is committed as r242035. Thanks! -- Pierre-Marie de Rodat
Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)
On Thu, Nov 10, 2016 at 12:02:45PM +0100, Martin Liška wrote: > >From fb4b852a17656309e6acfb8da97cf9bce4b3b176 Mon Sep 17 00:00:00 2001 > From: marxin > Date: Wed, 9 Nov 2016 11:52:00 +0100 > Subject: [PATCH] Create live_switch_vars conditionally (PR sanitizer/78270) > > gcc/testsuite/ChangeLog: > > 2016-11-09 Martin Liska > Missing PR sanitizer/78270 note. > * gcc.dg/asan/pr78269.c: New test. Misnamed test, should be pr78270.c. > > gcc/ChangeLog: > > 2016-11-09 Martin Liska > Also missing the PR line. > * gimplify.c (gimplify_switch_expr): Create live_switch_vars > only when SWITCH_BODY is a BIND_EXPR. Ok with those nits fixed. Jakub
Patch ping: Fix dwarf2out related bootstrap failure on Solaris (PR debug/78191)
On Thu, Nov 03, 2016 at 05:42:51PM +0100, Jakub Jelinek wrote: > Bootstrapped/regtested on x86_64-linux and i686-linux and Rainer has kindly > tested it on Solaris, ok for trunk? > > 2016-11-03 Jakub Jelinek > > * dwarf2out.c (size_of_discr_list): Fix typo in function comment. > > PR debug/78191 > * dwarf2out.c (abbrev_opt_base_type_end): New variable. > (die_abbrev_cmp): Sort dies with die_abbrev smaller than > abbrev_opt_base_type_end only by increasing die_abbrev, before > any other dies. > (optimize_abbrev_table): Don't change abbrev numbers of > base types and CU or optimize implicit consts in them if > calc_base_type_die_sizes has been called during build_abbrev_table. > (calc_base_type_die_sizes): If abbrev_opt_start, set > abbrev_opt_base_type_end to one plus largest base type's > die_abbrev. I'd like to ping this patch, it fixes a bootstrap failure on Solaris (both sparc and x86), so I think it is high priority. Thanks. Jakub
Re: [PATCH] libiberty: Add Rust symbol demangling.
On Thu, 2016-11-03 at 18:39 +0100, Mark Wielaard wrote: > Adds Rust symbol demangler. Rust mangles symbols using GNU_V3 style, > adding a hash and various special character subtitutions. This adds > a new rust style to cplus_demangle and adds 3 helper functions > rust_demangle, rust_demangle_sym and rust_is_mangled. > > rust-demangle.c was written by David. Mark did the code formatting to > GNU style and integration into the gcc/libiberty build system and > testsuite. > > The original code was written for the perf tool. David agreed with > submitting it for gcc/libiberty so binutils, gdb, valgrind, etc. > can also easily reuse it. He has sent request-assign to ass...@gnu.org. > I already have write after approval for gcc. Ping. Any comments are welcome. A variant of this is being used already in perf and valgrind. It would be nice to make it part of upstream libiberty. I did also apply this patch to binutils-gdb/libiberty and with a oneliner patch (*) it allows c++filt to handle rust mangled symbols and accept --format=rust (and displays it with --help). I did not yet test gdb, which would need a similar small tweak to its current language settings. Any feedback on whether it also works as expected with gdb appreciated. Thanks, Mark (*) diff --git a/binutils/cxxfilt.c b/binutils/cxxfilt.c index d5863ee..7cf9458 100644 --- a/binutils/cxxfilt.c +++ b/binutils/cxxfilt.c @@ -242,6 +242,7 @@ main (int argc, char **argv) case gnu_v3_demangling: case dlang_demangling: case auto_demangling: +case rust_demangling: valid_symbols = standard_symbol_characters (); break; case hp_demangling:
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > Here is updated 3 patch. > > I checked that all new tests related to epilogue vectorization passed with it. > > Your comments will be appreciated. A lot better now. Instead of the ->aux dance I now prefer to pass the original loops loop_vinfo to vect_analyze_loop as optional argument (if non-NULL we analyze the epilogue of that loop_vinfo). OTOH I remember we mainly use it to get at the original vectorization factor? So we can pass down an (optional) forced vectorization factor as well? Richard. > 2016-11-08 15:38 GMT+03:00 Richard Biener : > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > > > >> Hi Richard, > >> > >> I did not understand your last remark: > >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): > >> > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > >> > && dump_enabled_p ()) > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > >> >"loop vectorized\n"); > >> > - vect_transform_loop (loop_vinfo); > >> > + new_loop = vect_transform_loop (loop_vinfo); > >> > num_vectorized_loops++; > >> >/* Now that the loop has been vectorized, allow it to be unrolled > >> > etc. */ > >> > loop->force_vectorize = false; > >> > > >> > + /* Add new loop to a processing queue. To make it easier > >> > + to match loop and its epilogue vectorization in dumps > >> > + put new loop as the next loop to process. */ > >> > + if (new_loop) > >> > + { > >> > + loops.safe_insert (i + 1, new_loop->num); > >> > + vect_loops_num = number_of_loops (cfun); > >> > + } > >> > > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > >> f> unction which will set up stuff properly (and also perform > >> > the if-conversion of the epilogue there). > >> > > >> > That said, if we can get in non-masked epilogue vectorization > >> > separately that would be great. > >> > >> Could you please clarify your proposal. > > > > When a loop was vectorized set things up to immediately vectorize > > its epilogue, avoiding changing the loop iteration and avoiding > > the re-use of ->aux. > > > > Richard. > > > >> Thanks. > >> Yuri. > >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > >> > > >> >> Hi All, > >> >> > >> >> I re-send all patches sent by Ilya earlier for review which support > >> >> vectorization of loop epilogues and loops with low trip count. We > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > >> >> approved by Jeff. > >> >> > >> >> I did re-base of all patches and performed bootstrapping and > >> >> regression testing that did not show any new failures. Also all > >> >> changes related to new vect_do_peeling algorithm have been changed > >> >> accordingly. > >> >> > >> >> Is it OK for trunk? > >> > > >> > I would have prefered that the series up to -03-nomask-tails would > >> > _only_ contain epilogue loop vectorization changes but unfortunately > >> > the patchset is oddly separated. > >> > > >> > I have a comment on that part nevertheless: > >> > > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > >> > loop_vinfo) > >> >/* Check if we can possibly peel the loop. */ > >> >if (!vect_can_advance_ivs_p (loop_vinfo) > >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) > >> > - || loop->inner) > >> > + || loop->inner > >> > + /* Required peeling was performed in prologue and > >> > +is not required for epilogue. */ > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > >> > do_peeling = false; > >> > > >> >if (do_peeling > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > >> > loop_vinfo) > >> > > >> >do_versioning = > >> > optimize_loop_nest_for_speed_p (loop) > >> > - && (!loop->inner); /* FORNOW */ > >> > + && (!loop->inner) /* FORNOW */ > >> > +/* Required versioning was performed for the > >> > + original loop and is not required for epilogue. */ > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); > >> > > >> >if (do_versioning) > >> > { > >> > > >> > please do that check in the single caller of this function. > >> > > >> > Otherwise I still dislike the new ->aux use and I believe that simply > >> > passing down info from the processed parent would be _much_ cleaner. > >> > That is, here (and avoid the FOR_EACH_LOOP change): > >> > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > >> > && dump_enabled_p ()) > >> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > >> > "loop vectorized\n"); > >> > - vect_transform_loop (loop_vinfo); > >> > + new_loop = vect_transform_loop (loop_vinfo); > >> > num_vectorized_loops++; > >> > /*
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Thu, 10 Nov 2016, Richard Biener wrote: > On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > > > Richard, > > > > Here is updated 3 patch. > > > > I checked that all new tests related to epilogue vectorization passed with > > it. > > > > Your comments will be appreciated. > > A lot better now. Instead of the ->aux dance I now prefer to > pass the original loops loop_vinfo to vect_analyze_loop as > optional argument (if non-NULL we analyze the epilogue of that > loop_vinfo). OTOH I remember we mainly use it to get at the > original vectorization factor? So we can pass down an (optional) > forced vectorization factor as well? Btw, I wonder if you can produce a single patch containing just epilogue vectorization, that is combine patches 1-3 but rip out changes only needed by later patches? Thanks, Richard. > Richard. > > > 2016-11-08 15:38 GMT+03:00 Richard Biener : > > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > > > > > >> Hi Richard, > > >> > > >> I did not understand your last remark: > > >> > > >> > That is, here (and avoid the FOR_EACH_LOOP change): > > >> > > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > > >> > && dump_enabled_p ()) > > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > > >> >"loop vectorized\n"); > > >> > - vect_transform_loop (loop_vinfo); > > >> > + new_loop = vect_transform_loop (loop_vinfo); > > >> > num_vectorized_loops++; > > >> >/* Now that the loop has been vectorized, allow it to be > > >> > unrolled > > >> > etc. */ > > >> > loop->force_vectorize = false; > > >> > > > >> > + /* Add new loop to a processing queue. To make it easier > > >> > + to match loop and its epilogue vectorization in dumps > > >> > + put new loop as the next loop to process. */ > > >> > + if (new_loop) > > >> > + { > > >> > + loops.safe_insert (i + 1, new_loop->num); > > >> > + vect_loops_num = number_of_loops (cfun); > > >> > + } > > >> > > > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > > >> f> unction which will set up stuff properly (and also perform > > >> > the if-conversion of the epilogue there). > > >> > > > >> > That said, if we can get in non-masked epilogue vectorization > > >> > separately that would be great. > > >> > > >> Could you please clarify your proposal. > > > > > > When a loop was vectorized set things up to immediately vectorize > > > its epilogue, avoiding changing the loop iteration and avoiding > > > the re-use of ->aux. > > > > > > Richard. > > > > > >> Thanks. > > >> Yuri. > > >> > > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : > > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > > >> > > > >> >> Hi All, > > >> >> > > >> >> I re-send all patches sent by Ilya earlier for review which support > > >> >> vectorization of loop epilogues and loops with low trip count. We > > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > > >> >> approved by Jeff. > > >> >> > > >> >> I did re-base of all patches and performed bootstrapping and > > >> >> regression testing that did not show any new failures. Also all > > >> >> changes related to new vect_do_peeling algorithm have been changed > > >> >> accordingly. > > >> >> > > >> >> Is it OK for trunk? > > >> > > > >> > I would have prefered that the series up to -03-nomask-tails would > > >> > _only_ contain epilogue loop vectorization changes but unfortunately > > >> > the patchset is oddly separated. > > >> > > > >> > I have a comment on that part nevertheless: > > >> > > > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > > >> > loop_vinfo) > > >> >/* Check if we can possibly peel the loop. */ > > >> >if (!vect_can_advance_ivs_p (loop_vinfo) > > >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) > > >> > - || loop->inner) > > >> > + || loop->inner > > >> > + /* Required peeling was performed in prologue and > > >> > +is not required for epilogue. */ > > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > > >> > do_peeling = false; > > >> > > > >> >if (do_peeling > > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > > >> > loop_vinfo) > > >> > > > >> >do_versioning = > > >> > optimize_loop_nest_for_speed_p (loop) > > >> > - && (!loop->inner); /* FORNOW */ > > >> > + && (!loop->inner) /* FORNOW */ > > >> > +/* Required versioning was performed for the > > >> > + original loop and is not required for epilogue. */ > > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); > > >> > > > >> >if (do_versioning) > > >> > { > > >> > > > >> > please do that check in the single caller of this function. > > >> > > > >> > Otherwise I still dislike the new ->aux use and I believe that simply > > >> > passing down info from the processed parent
Re: [PATCH] DWARF: make signedness explicit for enumerator const values
On Thu, 2016-10-13 at 18:12 +0200, Pierre-Marie de Rodat wrote: > Currently, the DWARF description does not specify the signedness of the > representation of enumeration types. This is a problem in some > contexts where DWARF consumers need to determine if value X is greater > than value Y. > > For instance in Ada: > > type Enum_Type is ( A, B, C, D); > for Enum_Type use (-1, 0, 1, 2); > > type Rec_Type (E : Enum_Type) is record >when A .. B => null; >when others => B : Booleann; > end record; > > The above can be described in DWARF the following way: > > DW_TAG_enumeration_type(Enum_Type) > | DW_AT_byte_size: 1 > DW_TAG_enumerator(A) > | DW_AT_const_value: -1 > DW_TAG_enumerator(B) > | DW_AT_const_value: 0 > DW_TAG_enumerator(C) > | DW_AT_const_value: 1 > DW_TAG_enumerator(D) > | DW_AT_const_value: 2 > > DW_TAG_structure_type(Rec_Type) > DW_TAG_member(E) > | DW_AT_type: > DW_TAG_variant_part > | DW_AT_discr: > DW_TAG_variant > | DW_AT_discr_list: DW_DSC_range 0x7f 0 > DW_TAG_variant > | DW_TAG_member(b) > > DWARF consumers need to know that enumerators (A, B, C and D) are signed > in order to determine the set of E values for which Rec_Type has a B > field. In practice, they need to know how to interpret the 0x7f LEB128 > number above (-1, not 127). > > There seems to be only two alternatives to solve this issue: one is to > add a DW_AT_type attribute to DW_TAG_enumerator_type DIEs to make it > point to a base type that specifies the signedness. The other is to > make sure the form of the DW_AT_const_value attribute carries the > signedness information. This patch implements the latter. IMHO having an explicit DW_AT_type pointing at the base type with size and encoding for the DW_TAG_enumerator_type is better for consumers than having to try and interpret the DW_FORM used to encode the values. Alternatively could we just attach a DW_AT_encoding to the DW_TAG_enumeration_type? The spec doesn't list it as one of the attributes for an enumeration_type, but it makes sense given it already carries bit/byte size attributes. Thanks, Mark
Re: RFA: PATCH to gengtype to avoid putting tree_node support in front end objects
On Thu, Oct 27, 2016 at 09:36:09AM -0400, Jason Merrill wrote: > Currently, the way gengtype works it scans the list of source files > with front end files at the end, and pushes data structures onto a > stack. It then processes the stack in LIFO order, so that data > structures from front ends are handled first. As a result, if a GTY > data structure in a front end depends on tree_node, gengtype happily > puts gt_ggc_mx(tree_node*&) in a front end file, leading to link > errors on all other front ends. > > This patch avoids this problem by appending to the list of data > structures so that they are processed in FIFO order, and so tree_node > gets handled in gtype-desc.o. > > Tested x86_64-pc-linux-gnu, OK for trunk? > commit 487a1c95c0d3169b2041942ff4f8d71c9ff689eb > Author: Jason Merrill > Date: Wed Oct 26 23:12:23 2016 -0400 > > * gengtype.c (new_structure): Append to structures list. > > (find_structure): Likewise. Please remove the blank line in the ChangeLog. When looking at the differences it creates, it is hard, because all the generated files have all the functions emitted in reverse order now from what it used to be, so I only looked at files where the size changed, and that is beyond gtype.state only in my case gt-tree-phinodes.h which lost void gt_ggc_mx (struct gimple *& x) { if (x) gt_ggc_mx_gimple ((void *) x); } and void gt_pch_nx (struct gimple *& x) { if (x) gt_pch_nx_gimple ((void *) x); } and gtype-desc.c which didn't contain those but now it does (for gtype-desc.c it is hard to find out due to the reordering what else has changed, but as gt-tree-phinodes.h shrunk by 170 characters and gtype-desc.c grew by 170 characters, I'd think it is all that changed). I believe those routines belong to gtype-desc.c, that is where similar ones for tree_node, etc. are, tree-phinodes.h certainly isn't the header that defines gimple. So I think this patch is ok for trunk. Thanks. Jakub
[PATCH] Fix print_node for CONSTRUCTORs
Hello. Following patch fixes indentation of print_node when printing a constructor that has some equal elements. Current implementation caches tree to prevent deep debug outputs. Such behavior is undesired for ctor elements. Apart from that, I switch to hash_set for a table that is used for tree node caching. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Before: constant 1> val constant 120> idx constant 2> val idx constant 3> val idx constant 4> val idx constant 5> val > After: constant 1> val constant 120> idx constant 2> val constant 120> idx constant 3> val constant 120> idx constant 4> val constant 120> idx constant 5> val constant 120>> Ready to be installed? Martin >From 6d18ff00ec1d8e6a8a154fbb70af25b2dda8165e Mon Sep 17 00:00:00 2001 From: marxin Date: Wed, 9 Nov 2016 16:28:52 +0100 Subject: [PATCH] Fix print_node for CONSTRUCTORs gcc/ChangeLog: 2016-11-10 Martin Liska * print-tree.c (struct bucket): Remove. (print_node): Add new argument which drives whether a tree node is printed briefly or not. (debug_tree): Replace a custom hash table with hash_set. * print-tree.h (print_node): Add the argument. --- gcc/print-tree.c | 43 +++ gcc/print-tree.h | 3 ++- 2 files changed, 17 insertions(+), 29 deletions(-) diff --git a/gcc/print-tree.c b/gcc/print-tree.c index 8c63cb8..f3ee04c 100644 --- a/gcc/print-tree.c +++ b/gcc/print-tree.c @@ -33,19 +33,14 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" /* FIXME */ #include "tree-cfg.h" #include "tree-dump.h" +#include "print-tree.h" /* Define the hash table of nodes already seen. Such nodes are not repeated; brief cross-references are used. */ #define HASH_SIZE 37 -struct bucket -{ - tree node; - struct bucket *next; -}; - -static struct bucket **table; +static hash_set *table = NULL; /* Print PREFIX and ADDR to FILE. */ void @@ -176,10 +171,9 @@ indent_to (FILE *file, int column) starting in column INDENT. */ void -print_node (FILE *file, const char *prefix, tree node, int indent) +print_node (FILE *file, const char *prefix, tree node, int indent, + bool brief_for_visited) { - int hash; - struct bucket *b; machine_mode mode; enum tree_code_class tclass; int len; @@ -219,21 +213,14 @@ print_node (FILE *file, const char *prefix, tree node, int indent) /* Allow this function to be called if the table is not there. */ if (table) { - hash = ((uintptr_t) node) % HASH_SIZE; - /* If node is in the table, just mention its address. */ - for (b = table[hash]; b; b = b->next) - if (b->node == node) - { - print_node_brief (file, prefix, node, indent); - return; - } + if (table->contains (node) && brief_for_visited) + { + print_node_brief (file, prefix, node, indent); + return; + } - /* Add this node to the table. */ - b = XNEW (struct bucket); - b->node = node; - b->next = table[hash]; - table[hash] = b; + table->add (node); } /* Indent to the specified column, since this is the long form. */ @@ -846,8 +833,8 @@ print_node (FILE *file, const char *prefix, tree node, int indent) FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (node), cnt, index, value) { - print_node (file, "idx", index, indent + 4); - print_node (file, "val", value, indent + 4); + print_node (file, "idx", index, indent + 4, false); + print_node (file, "val", value, indent + 4, false); } } break; @@ -997,10 +984,10 @@ print_node (FILE *file, const char *prefix, tree node, int indent) DEBUG_FUNCTION void debug_tree (tree node) { - table = XCNEWVEC (struct bucket *, HASH_SIZE); + table = new hash_set (HASH_SIZE); print_node (stderr, "", node, 0); - free (table); - table = 0; + delete table; + table = NULL; putc ('\n', stderr); } diff --git a/gcc/print-tree.h b/gcc/print-tree.h index 124deab..fd610f9 100644 --- a/gcc/print-tree.h +++ b/gcc/print-tree.h @@ -38,7 +38,8 @@ extern void debug_raw (vec &ref); extern void debug_raw (vec *ptr); #ifdef BUFSIZ extern void dump_addr (FILE*, const char *, const void *); -extern void print_node (FILE *, const char *, tree, int); +extern void print_node (FILE *, const char *, tree, int, + bool brief_for_visited = true); extern void print_node_brief (FILE *, const char *, const_tree, int); extern void indent_to (FILE *, int); #endif -- 2.10.1
Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)
On Tue, Nov 8, 2016 at 5:03 AM, Martin Sebor wrote: > It's taken me longer than I expected to finally get back to this > project. Sorry about the delay. > > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01110.html > > Attached is an updated patch with this enhancement and reflecting > you previous comment. > > Besides running the GCC test suite I tested the patch by building > Binutils and the Linux kernel. It found one stpcpy-related overflow > in Binutils that I'm looking into and reduced by one the number of > problems reported by the -Wformat-length option in the kernel (I > haven't yet checked which one it eliminated). > > Although I'm not done investigating the Binutils problem I'm posting > the patch for review now to allow for comments before stage 1 ends. @@ -158,14 +170,149 @@ compute_object_offset (const_tree expr, const_tree var) return size_binop (code, base, off); } +static bool +operand_unsigned_p (tree op) +{ + if (TREE_CODE (op) == SSA_NAME) new functions need a comment. But maybe you want to use tree_expr_nonnegative_p to also allow signed but known positive ones? +/* Fill the 2-element OFFRANGE array with the range of values OFF + is known to be in. Postcodition: OFFRANGE[0] <= OFFRANGE[1]. */ + +static bool +get_offset_range (tree off, HOST_WIDE_INT offrange[2]) +{ + STRIP_NOPS (off); why strip nops (even sign changes!) here? Why below convert things via to_uhwi when offrange is of type signed HOST_WIDE_INT[2]? + gimple *def = SSA_NAME_DEF_STMT (off); + if (is_gimple_assign (def)) + { + tree_code code = gimple_assign_rhs_code (def); + if (code == PLUS_EXPR) + { + /* Handle offset in the form VAR + CST where VAR's type +is unsigned so the offset must be the greater of +OFFRANGE[0] and CST. This assumes the PLUS_EXPR +is in a canonical form with CST second. */ + tree rhs2 = gimple_assign_rhs2 (def); err, what? What about overflow? Aren't you just trying to decompose 'off' into a variable and a constant part here and somehow extracting a range for the variable part? So why not just do that? + else if (range_type == VR_ANTI_RANGE) + { + offrange[0] = max.to_uhwi () + 1; + offrange[1] = min.to_uhwi () - 1; + return true; + } first of all, how do you know it fits uhwi? Second, from ~[5, 9] you get [10, 4] !? That looks bogus (and contrary to the function comment postcondition) + else if (range_type == VR_VARYING) + { + gimple *def = SSA_NAME_DEF_STMT (off); + if (is_gimple_assign (def)) + { + tree_code code = gimple_assign_rhs_code (def); + if (code == NOP_EXPR) + { please trust range-info instead of doing your own little VRP here. VR_VARYING -> return false. stopping review here noting that other parts of the compiler split constant parts from variable parts and it looks like this is what you want to do here? That is, enhance static vec object_sizes[4]; and cache a SSA-NAME, constant offset pair in addition? Or just a range (range of that SSA name + offset), if that's good enough -- wait, that's what you get from get_range_info! So I'm not sure where the whole complication in the patch comes from... Possibly from the fact that VRP on pointers is limited and thus &a[i] + 5 doesn't get you a range for i + 5? Richard. > Martin > > PS The tests added in the patch (but nothing else) depend on > the changes in the patch for c/53562: > > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00483.html >
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
PING^2 On 10/31/2016 10:13 AM, Martin Liška wrote: > PING^1 > > On 10/13/2016 05:34 PM, Martin Liška wrote: >> Hello. >> >> As it's very hard to guess from GCC driver whether a target supports atomic >> updates >> for GCOV counter or not, I decided to come up with a new option value >> (maybe-atomic), >> that would be transformed in a corresponding value (single or atomic) in >> tree-profile.c. >> The GCC driver selects the option when -pthread is present in the command >> line. >> >> That should fix all tests failures seen on AIX target. >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Ready to be installed? >> Martin >>
Re: [ipa-vrp] ice in set_value_range
Kugan Is there a PR for this failure? It broke bootstrap on AIX as well and I only was able to track it to your patch last night. Thanks, David
[PATCH] Fix PR71762
The following fixes PR71762 via reverting the transforms of ~X & Y to X < Y and similar because when the bools they apply to are expanded to RTL undefined values are not reliably zero-extended and thus the transform is invalid. Ensuring the zero-extension is too costly IMHO and the proper fix is to move the transform to RTL where we can check known-zero-bits to validate validity or to fix GIMPLE not not have operations on types not matching their mode in precision. Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk and branches? Any takers for the RTL implementation? Thanks, Richard. 2016-11-10 Richard Biener PR middle-end/71762 * match.pd ((~X & Y) -> X < Y, (X & ~Y) -> Y < X, (~X | Y) -> X <= Y, (X | ~Y) -> Y <= X): Remove. * gcc.dg/torture/pr71762-1.c: New testcase. * gcc.dg/torture/pr71762-2.c: Likewise. * gcc.dg/torture/pr71762-3.c: Likewise. * gcc.dg/tree-ssa/forwprop-28.c: XFAIL. Index: gcc/match.pd === --- gcc/match.pd(revision 242004) +++ gcc/match.pd(working copy) @@ -944,33 +944,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (op:c truth_valued_p@0 (logical_inverted_value @0)) { constant_boolean_node (op == NE_EXPR ? true : false, type); })) -/* If arg1 and arg2 are booleans (or any single bit type) - then try to simplify: - - (~X & Y) -> X < Y - (X & ~Y) -> Y < X - (~X | Y) -> X <= Y - (X | ~Y) -> Y <= X - - But only do this if our result feeds into a comparison as - this transformation is not always a win, particularly on - targets with and-not instructions. - -> simplify_bitwise_binary_boolean */ -(simplify - (ne (bit_and:c (bit_not @0) @1) integer_zerop) - (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) - && TYPE_PRECISION (TREE_TYPE (@1)) == 1) - (if (TYPE_UNSIGNED (TREE_TYPE (@1))) -(lt @0 @1) -(gt @0 @1 -(simplify - (ne (bit_ior:c (bit_not @0) @1) integer_zerop) - (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) - && TYPE_PRECISION (TREE_TYPE (@1)) == 1) - (if (TYPE_UNSIGNED (TREE_TYPE (@1))) -(le @0 @1) -(ge @0 @1 - /* ~~x -> x */ (simplify (bit_not (bit_not @0)) Index: gcc/testsuite/gcc.dg/torture/pr71762-1.c === --- gcc/testsuite/gcc.dg/torture/pr71762-1.c(revision 0) +++ gcc/testsuite/gcc.dg/torture/pr71762-1.c(working copy) @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-additional-options "-fdisable-rtl-init-regs" } */ + +static _Bool +foo (_Bool a, _Bool b) +{ + int x = a && ! b; + return x != 0; +} + +int y = 1; +int main() +{ + _Bool x; + if (foo (x, y)) +__builtin_abort (); + return 0; +} Index: gcc/testsuite/gcc.dg/torture/pr71762-2.c === --- gcc/testsuite/gcc.dg/torture/pr71762-2.c(revision 0) +++ gcc/testsuite/gcc.dg/torture/pr71762-2.c(working copy) @@ -0,0 +1,17 @@ +/* { dg-do run } */ + +static _Bool +foo (_Bool a, _Bool b) +{ + int x = a && ! b; + return x != 0; +} + +int y = 1; +int main() +{ + _Bool x[32]; + if (foo (x[1], y)) +__builtin_abort (); + return 0; +} Index: gcc/testsuite/gcc.dg/torture/pr71762-3.c === --- gcc/testsuite/gcc.dg/torture/pr71762-3.c(revision 0) +++ gcc/testsuite/gcc.dg/torture/pr71762-3.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do run } */ + +static _Bool +foo (_Bool a, _Bool b) +{ + int x = a && ! b; + return x != 0; +} + +int y = 1; +int main() +{ + register _Bool x + /* Add register spec for the argv parameter to main. */ +#if __i386__ || __x86_64__ + __asm__("%esi") +#endif +; + if (foo (x, y)) +__builtin_abort (); + return 0; +} Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c === --- gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c (revision 242032) +++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c (working copy) @@ -83,6 +83,8 @@ test_8 (int code) to a ordered compare. But the transform does not trigger if we transform the negated code == 22 compare to code != 22 first. It turns out if we do that we even generate better code on x86 at least. */ +/* ??? As PR71762 notices this transform causes wrong-code issues in RTL + with one uninitialized operand, thus it has been disabled. */ -/* { dg-final { scan-tree-dump-times "simplified to if \\\(\[^ ]* \[<>\]" 4 "forwprop1"} } */ +/* { dg-final { scan-tree-dump-times "simplified to if \\\(\[^ ]* \[<>\]" 4 "forwprop1" { xfail *-*-* } } } */
Re: [Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]
Hi Andre, > well, is it really that obvious? well ... what can I say. If you wanna be strict about it, I guess there is no such thing as an "obvious patch". There is basically always something that you can miss, or that can be improved. Mikael's patch was obvious to me in the sense that it is clear and simple and fixes the ICE without any side effects. Thus it's a clear improvement, and it has been rotting in bugzilla for over five years. Are you implying that it was premature to commit it as 'obvious'? > It fixes the ICE, correct. But what about > cases where the actual has an explicit interface, but is not a variable? Like > in: > [...] > Can you comment > having thought about this somewhat? In fact I have not thought about any further cases. Since you're not giving full examples, I can only guess what you mean: The cases in the attachment are working as expected. Anything else? Cheers, Janus > On Wed, 9 Nov 2016 21:42:15 +0100 > Janus Weil wrote: > >> Hi all, >> >> I have committed yet another obvious ice-on-invalid fix: >> >> https://gcc.gnu.org/viewcvs?rev=242020&root=gcc&view=rev >> >> Cheers, >> Janus > > > -- > Andre Vehreschild * Email: vehre ad gmx dot de implicit none call sub(f()) contains function f() integer :: f end function ! sub() defined as in the pr. subroutine sub (j) integer, volatile :: j end subroutine end type t integer :: i end type type(t) :: foo call sub(foo%i) contains ! sub() defined as in the pr. subroutine sub (j) integer, volatile :: j end subroutine end
Re: [ARM] PR 78253 do not resolve weak ref locally
On 10 November 2016 at 11:05, Richard Earnshaw wrote: > On 09/11/16 21:29, Christophe Lyon wrote: >> Hi, >> >> PR 78253 shows that the handling of weak references has changed for >> ARM with gcc-5. >> >> When r220674 was committed, default_binds_local_p_2 gained a new >> parameter (weak_dominate), which, when true, implies that a reference >> to a weak symbol defined locally will be resolved locally, even though >> it could be overridden by a strong definition in another object file. >> >> With r220674, default_binds_local_p forces weak_dominate=true, >> effectively changing the previous behavior. >> >> The attached patch introduces default_binds_local_p_4 which is a copy >> of default_binds_local_p_2, but using weak_dominate=false, and updates >> the ARM target to call default_binds_local_p_4 instead of >> default_binds_local_p_2. >> >> I ran cross-tests on various arm* configurations with no regression, >> and checked that the test attached to the original bugzilla now works >> as expected. >> >> I am not sure why weak_dominate defaults to true, and I couldn't >> really understand why by reading the threads related to r220674 and >> following updates to default_binds_local_p_* which all deal with other >> corner cases and do not discuss the weak_dominate parameter. >> >> Or should this patch be made more generic? >> > > I certainly don't think it should be ARM specific. That was my feeling too. > > The questions I have are: > > 1) What do other targets do today. Are they the same, or different? arm, aarch64, s390 use default_binds_local_p_2 since PR 65780, and default_binds_local_p before that. Both have weak_dominate=true i386 has its own version, calling default_binds_local_p_3 with true for weak_dominate But the behaviour of default_binds_local_p changed with r220674 as I said above. See https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=220674 and notice how weak_dominate was introduced The original bug report is about a different case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32219 The original patch submission is https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00410.html and the 1st version with weak_dominate is in: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00469.html but it's not clear to me why this was introduced > 2) If different why? on aarch64, although binds_local_p returns true, the relocations used when building the function pointer is still the same (still via the GOT). aarch64 has different logic than arm when accessing a symbol (eg aarch64_classify_symbol) > 3) Is the current behaviour really what was intended by the patch? ie. > Was the old behaviour actually wrong? > That's what I was wondering. Before r220674, calling a weak function directly or via a function pointer had the same effect (in other words, the function pointer points to the actual implementation: the strong one if any, the weak one otherwise). After r220674, on arm the function pointer points to the weak definition, which seems wrong to me, it should leave the actual resolution to the linker. > R. >> Thanks, >> >> Christophe >> >
Re: [PATCH] Fix PR78189
On 10 November 2016 at 09:34, Richard Biener wrote: > On Wed, 9 Nov 2016, Christophe Lyon wrote: > >> On 9 November 2016 at 09:36, Bin.Cheng wrote: >> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener wrote: >> >> On Mon, 7 Nov 2016, Christophe Lyon wrote: >> >> >> >>> Hi Richard, >> >>> >> >>> >> >>> On 7 November 2016 at 09:01, Richard Biener wrote: >> >>> > >> >>> > The following fixes an oversight when computing alignment in the >> >>> > vectorizer. >> >>> > >> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. >> >>> > >> >>> > Richard. >> >>> > >> >>> > 2016-11-07 Richard Biener >> >>> > >> >>> > PR tree-optimization/78189 >> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix >> >>> > alignment computation. >> >>> > >> >>> > * g++.dg/torture/pr78189.C: New testcase. >> >>> > >> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C >> >>> > === >> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) >> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) >> >>> > @@ -0,0 +1,41 @@ >> >>> > +/* { dg-do run } */ >> >>> > +/* { dg-additional-options "-ftree-slp-vectorize >> >>> > -fno-vect-cost-model" } */ >> >>> > + >> >>> > +#include >> >>> > + >> >>> > +struct A >> >>> > +{ >> >>> > + void * a; >> >>> > + void * b; >> >>> > +}; >> >>> > + >> >>> > +struct alignas(16) B >> >>> > +{ >> >>> > + void * pad; >> >>> > + void * misaligned; >> >>> > + void * pad2; >> >>> > + >> >>> > + A a; >> >>> > + >> >>> > + void Null(); >> >>> > +}; >> >>> > + >> >>> > +void B::Null() >> >>> > +{ >> >>> > + a.a = nullptr; >> >>> > + a.b = nullptr; >> >>> > +} >> >>> > + >> >>> > +void __attribute__((noinline,noclone)) >> >>> > +NullB(void * misalignedPtr) >> >>> > +{ >> >>> > + B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) >> >>> > - offsetof(B, misaligned)); >> >>> > + b->Null(); >> >>> > +} >> >>> > + >> >>> > +int main() >> >>> > +{ >> >>> > + B b; >> >>> > + NullB(&b.misaligned); >> >>> > + return 0; >> >>> > +} >> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> >>> > index 9346cfe..b03cb1e 100644 >> >>> > --- gcc/tree-vect-data-refs.c >> >>> > +++ gcc/tree-vect-data-refs.c >> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct >> >>> > data_reference *dr) >> >>> >base = ref; >> >>> >while (handled_component_p (base)) >> >>> > base = TREE_OPERAND (base, 0); >> >>> > + unsigned int base_alignment; >> >>> > + unsigned HOST_WIDE_INT base_bitpos; >> >>> > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >> >>> > + /* As data-ref analysis strips the MEM_REF down to its base operand >> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >> >>> > + adjust things to make base_alignment valid as the alignment of >> >>> > + DR_BASE_ADDRESS. */ >> >>> >if (TREE_CODE (base) == MEM_REF) >> >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, >> >>> > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), >> >>> > 0)); >> >>> > - unsigned int base_alignment = get_object_alignment (base); >> >>> > +{ >> >>> > + base_bitpos -= mem_ref_offset (base).to_short_addr () * >> >>> > BITS_PER_UNIT; >> >>> > + base_bitpos &= (base_alignment - 1); >> >>> > +} >> >>> > + if (base_bitpos != 0) >> >>> > +base_alignment = base_bitpos & -base_bitpos; >> >>> > + /* Also look at the alignment of the base address DR analysis >> >>> > + computed. */ >> >>> > + unsigned int base_addr_alignment = get_pointer_alignment >> >>> > (base_addr); >> >>> > + if (base_addr_alignment > base_alignment) >> >>> > +base_alignment = base_addr_alignment; >> >>> > >> >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >> >>> > DR_VECT_AUX (dr)->base_element_aligned = true; >> >>> >> >>> Since you committed this patch (r241892), I'm seeing execution failures: >> >>> gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test >> >>> gcc.dg/vect/pr40074.c execution test >> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 >> >>> --with-fpu=neon-fp16 >> >>> (using qemu as simulator) >> >> >> >> The difference is that we now vectorize the testcase with versioning >> >> for alignment (but it should never execute the vectorized variant). >> >> I need arm peoples help to understand what is wrong. >> > Hi All, >> > I will look at it. >> > >> >> Hi, >> >> This is causing new regressions on armeb: >> gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects >> scan-tree-dump-times vect "vectorized 2 loops" 1 >> gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect >> "vectorized 2 loops" 1 > > It's actually an improvement as armeb can't do unaligned loads. > Before the patch we versioned both loops for alignment (they > can't be possibly both ali
[PATCH][AArch64] Separate shrink wrapping hooks implementation
Hi all, This patch implements the new separate shrink-wrapping hooks for aarch64. In separate shrink wrapping (as I understand it) we consider each register save/restore as a 'component' that can be performed independently of the other save/restores in the prologue/epilogue and can be moved outside the prologue/epilogue and instead performed only in the basic blocks where it's actually needed. This allows us to avoid saving and restoring registers on execution paths where a register might not be needed. In the most general form a 'component' can be any operation that the prologue/epilogue performs, for example stack adjustment. But in this patch we only consider callee-saved register save/restores as components. The code is in many ways similar to the powerpc implementation of the hooks. The hooks implemented are: * TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS: Returns a bitmap containing a bit for each register that should be considered a 'component' i.e. its save/restore should be separated from the prologue and epilogue and placed at the basic block where it's needed. * TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB: Determine for a given basic block which 'component' registers it needs. This is determined through dataflow. If a component register is in the IN,GEN or KILL sets for the basic block it's considered as needed and marked as such in the bitmap. * TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS and TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS: Given a bitmap of component registers emits the save or restore code for them. * TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS: Given a bitmap of component registers record in the backend that the register is shrink-wrapped using this approach and that the normal prologue and epilogue expansion code should not emit code for them. This is done similarly to powerpc by defining a bool array in machine_function where we record whether each register is separately shrink-wrapped. The prologue and epilogue expansion code (through aarch64_save_callee_saves and aarch64_restore_callee_saves) is updated to not emit save/restores for these registers if they appear in that array. Our prologue and epilogue code has a lot of intricate logic to perform stack adjustments using the writeback forms of the load/store instructions. Separately shrink-wrapping those registers marked for writeback (cfun->machine->frame.wb_candidate1 and cfun->machine->frame.wb_candidate2) broke that codegen and I had to emit an explicit stack adjustment instruction that created ugly prologue/epilogue sequences. So this patch is conservative and doesn't allow shrink-wrapping of the registers marked for writeback. Maybe in the future we can relax it (for example allow wrapping of one of the two writeback registers if the writeback amount can be encoded in a single-register writeback store/load) but given the development stage of GCC I thought I'd play it safe. I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were some interesting swings. 458.sjeng +1.45% 471.omnetpp +2.19% 445.gobmk -2.01% On SPECFP: 453.povray+7.00% I'll be re-running the benchmarks with Segher's recent patch [1] to see if they fix the regression and if it does I think this can go in. [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00889.html Bootstrapped and tested on aarch64-none-linux-gnu. Thanks, Kyrill 2016-11-10 Kyrylo Tkachov * config/aarch64/aarch64.h (machine_function): Add reg_is_wrapped_separately field. * config/aarch64/aarch64.c (emit_set_insn): Change return type to rtx_insn *. (aarch64_save_callee_saves): Don't save registers that are wrapped separately. (aarch64_restore_callee_saves): Don't restore registers that are wrapped separately. (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p): Move earlier in the file. (aarch64_get_separate_components): New function. (aarch64_components_for_bb): Likewise. (aarch64_disqualify_components): Likewise. (aarch64_emit_prologue_components): Likewise. (aarch64_emit_epilogue_components): Likewise. (aarch64_set_handled_components): Likewise. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define. commit 14c7a66d9f3a44ef40499e61ca9643c7dfbc6c82 Author: Kyrylo Tkachov Date: Tue Oct 11 09:25:54 2016 +0100 [AArch64] Separate shrink wrapping hooks implementation diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 325e725..5508333 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1138,7 +1138,7 @@ aarch64_is_extend_from_extract (machine_mode mode, rtx mult_imm, /* Emit an insn that's a simple single-set. Both the
Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression
Hi Dominique. snip > I have a last glitch (which can be deferred if needed): snip Fixed by the new patch, which is attached. Bootstraps and regtests OK. OK for trunk? Paul 2016-11-10 Paul Thomas PR fortran/44265 * gfortran.h : Add fn_result_spec bitfield to gfc_symbol. * resolve.c (flag_fn_result_spec): New function. (resolve_fntype): Call it for character result lengths. * symbol.c (gfc_new_symbol): Set fn_result_spec to zero. * trans-decl.c (gfc_sym_mangled_identifier): Include the procedure name in the mangled name for symbols with the fn_result_spec bit set. (gfc_get_symbol_decl): Mangle the name of these symbols. (gfc_create_module_variable): Allow them through the assert. (gfc_generate_function_code): Remove the assert before the initialization of sym->tlink because the frontend no longer uses this field. * trans-expr.c (gfc_map_intrinsic_function): Add a case to treat the LEN_TRIM intrinsic. 2016-11-10 Paul Thomas PR fortran/44265 * gfortran.dg/char_result_14.f90: New test. * gfortran.dg/char_result_15.f90: New test. * gfortran.dg/char_result_16.f90: New test. -- The difference between genius and stupidity is; genius has its limits. Albert Einstein Index: gcc/fortran/gfortran.h === *** gcc/fortran/gfortran.h (revision 241994) --- gcc/fortran/gfortran.h (working copy) *** typedef struct gfc_symbol *** 1498,1503 --- 1498,1505 unsigned equiv_built:1; /* Set if this variable is used as an index name in a FORALL. */ unsigned forall_index:1; + /* Set if the symbol is used in a function result specification . */ + unsigned fn_result_spec:1; /* Used to avoid multiple resolutions of a single symbol. */ unsigned resolved:1; /* Set if this is a module function or subroutine with the Index: gcc/fortran/resolve.c === *** gcc/fortran/resolve.c (revision 241994) --- gcc/fortran/resolve.c (working copy) *** resolve_equivalence (gfc_equiv *eq) *** 15732,15737 --- 15732,15785 } + /* Function called by resolve_fntype to flag other symbol used in the +length type parameter specification of function resuls. */ + + static bool + flag_fn_result_spec (gfc_expr *expr, + gfc_symbol *sym ATTRIBUTE_UNUSED, + int *f ATTRIBUTE_UNUSED) + { + gfc_namespace *ns; + gfc_symbol *s; + + if (expr->expr_type == EXPR_VARIABLE) + { + s = expr->symtree->n.sym; + for (ns = s->ns; ns; ns = ns->parent) + if (!ns->parent) + break; + + if (!s->fn_result_spec + && s->attr.flavor == FL_PARAMETER) + { + /* Function contained in a module */ + if (ns->proc_name && ns->proc_name->attr.flavor == FL_MODULE) + { + gfc_symtree *st; + s->fn_result_spec = 1; + /* Make sure that this symbol is translated as a module +variable. */ + st = gfc_get_unique_symtree (ns); + st->n.sym = s; + s->refs++; + } + /* ... which is use associated and called. */ + else if (s->attr.use_assoc || s->attr.used_in_submodule + || + /* External function matched with an interface. */ + (s->ns->proc_name + && ((s->ns == ns +&& s->ns->proc_name->attr.if_source == IFSRC_DECL) + || s->ns->proc_name->attr.if_source == IFSRC_IFBODY) + && s->ns->proc_name->attr.function)) + s->fn_result_spec = 1; + } + } + return false; + } + + /* Resolve function and ENTRY types, issue diagnostics if needed. */ static void *** resolve_fntype (gfc_namespace *ns) *** 15782,15787 --- 15830,15838 el->sym->attr.untyped = 1; } } + + if (sym->ts.type == BT_CHARACTER) + gfc_traverse_expr (sym->ts.u.cl->length, NULL, flag_fn_result_spec, 0); } Index: gcc/fortran/symbol.c === *** gcc/fortran/symbol.c(revision 241994) --- gcc/fortran/symbol.c(working copy) *** gfc_new_symbol (const char *name, gfc_na *** 2933,2938 --- 2933,2939 p->common_block = NULL; p->f2k_derived = NULL; p->assoc = NULL; + p->fn_result_spec = 0; return p; } Index: gcc/fortran/trans-decl.c === *** gcc/fortran/trans-decl.c(revision 241994) --- gcc/fortran/trans-decl.c(working copy) *** gfc_sym_mangled_identifier (gfc_symbol * *** 355,362 if (sym->attr.is_bind_c == 1 && sym->binding_label) return get_identifier (sym
[PATCH][ARM] PR78255: wrong code generation for indirect sibling calls
Hi, As reported in PR78255 there is currently an issue with indirect sibling calls in ARM when the address of the sibling call is loaded into 'r3' and that same register is chosen to align the stack. See the report for further information. As I mentioned in the bugzilla ticket I am not sure this is the right approach, though it works... Bootstrapped on ARM and no regressions. Do you think this is OK? Another solution would be to make sure that 'arm_get_frame_offsets' recalculates offsets after we know that the call is going to be indirect, i.e. after we know the address is going to be loaded into a register, but I do not know what a sane way would be to ensure this. Regards, Andre gcc/ChangeLog 2016-11-10 Andre Vieira * config/arm/arm.md (sibcall_internal): Add 'use' to pattern. (sibcall_value_internal): Likewise. (sibcall_insn): Likewise. (sibcall_value_insn): Likewise. gcc/testsuite/ChangeLog 2016-11-10 Andre Vieira * gcc.target/arm/pr78255.c: New. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 8393f65bcf4c9c3e61b91e5adcd5f59ff7c6ec3f..ab28b15f3e4ebbaca2b8ec0523493b54cce8c306 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -8192,7 +8192,8 @@ [(parallel [(call (match_operand 0 "memory_operand" "") (match_operand 1 "general_operand" "")) (return) - (use (match_operand 2 "" ""))])]) + (use (match_operand 2 "" "")) + (use (match_dup 0))])]) ;; We may also be able to do sibcalls for Thumb, but it's much harder... (define_expand "sibcall" @@ -8225,7 +8226,8 @@ (call (match_operand 1 "memory_operand" "") (match_operand 2 "general_operand" ""))) (return) - (use (match_operand 3 "" ""))])]) + (use (match_operand 3 "" "")) + (use (match_dup 1))])]) (define_expand "sibcall_value" [(parallel [(set (match_operand 0 "" "") @@ -8258,7 +8260,8 @@ [(call (mem:SI (match_operand:SI 0 "call_insn_operand" "Cs, US")) (match_operand 1 "" "")) (return) - (use (match_operand 2 "" ""))] + (use (match_operand 2 "" "")) + (use (match_operand 3 "" ""))] "TARGET_32BIT && SIBLING_CALL_P (insn)" "* if (which_alternative == 1) @@ -8279,7 +8282,8 @@ (call (mem:SI (match_operand:SI 1 "call_insn_operand" "Cs,US")) (match_operand 2 "" ""))) (return) - (use (match_operand 3 "" ""))] + (use (match_operand 3 "" "")) + (use (match_operand 4 "" ""))] "TARGET_32BIT && SIBLING_CALL_P (insn)" "* if (which_alternative == 1) diff --git a/gcc/testsuite/gcc.target/arm/pr78255.c b/gcc/testsuite/gcc.target/arm/pr78255.c new file mode 100644 index ..4901acea51466c0bac92d9cb90e52b00b450d88a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr78255.c @@ -0,0 +1,57 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include + +struct table_s +{ +void (*fun0) +( void ); +void (*fun1) +( void ); +void (*fun2) +( void ); +void (*fun3) +( void ); +void (*fun4) +( void ); +void (*fun5) +( void ); +void (*fun6) +( void ); +void (*fun7) +( void ); +} table; + +void callback0(){__asm("mov r0, r0 \n\t");} +void callback1(){__asm("mov r0, r0 \n\t");} +void callback2(){__asm("mov r0, r0 \n\t");} +void callback3(){__asm("mov r0, r0 \n\t");} +void callback4(){__asm("mov r0, r0 \n\t");} + +void test (void) { +memset(&table, 0, sizeof table); + +asm volatile ("" : : : "r3"); + +table.fun0 = callback0; +table.fun1 = callback1; +table.fun2 = callback2; +table.fun3 = callback3; +table.fun4 = callback4; +table.fun0(); +} + +void foo (void) +{ + __builtin_abort (); +} + +int main (void) +{ + unsigned long p = (unsigned long) &foo; + asm volatile ("mov r3, %0" : : "r" (p)); + test (); + + return 0; +}
Unreviewed libstdc++ patch
The libstdc++ part of the following patch [fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+ https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00330.html has remained unreviewed for a week. Bruce already approved the fixincludes part. In the meantime, full testing on mainline has completed and backports to the gcc-5 and gcc-6 branches have also been tested successfully. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Support no newline in print_gimple_stmt
I've just noticed that tree-ssa-dse wrongly prints a new line to dump file. For the next stage1, I'll go through usages of print_gimple_stmt and remove extra new lines like: gcc/auto-profile.c: print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); gcc/auto-profile.c- fprintf (dump_file, "\n"); Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin >From ab1ed77381f78a8940ca250ee0f5ef5cd6b87e7f Mon Sep 17 00:00:00 2001 From: marxin Date: Thu, 10 Nov 2016 14:54:00 +0100 Subject: [PATCH] Support no newline in print_gimple_stmt gcc/ChangeLog: 2016-11-10 Martin Liska * gimple-pretty-print.c (print_gimple_stmt): Add new argument. * gimple-pretty-print.h (print_gimple_stmt): Declare the new argument. * tree-ssa-dse.c (dse_optimize_stmt): Use the argument to not to print a newline. --- gcc/gimple-pretty-print.c | 7 +-- gcc/gimple-pretty-print.h | 2 +- gcc/tree-ssa-dse.c| 5 +++-- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index f588f5e..8538911 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -76,13 +76,16 @@ debug_gimple_stmt (gimple *gs) FLAGS as in pp_gimple_stmt_1. */ void -print_gimple_stmt (FILE *file, gimple *g, int spc, int flags) +print_gimple_stmt (FILE *file, gimple *g, int spc, int flags, + bool newline) { pretty_printer buffer; pp_needs_newline (&buffer) = true; buffer.buffer->stream = file; pp_gimple_stmt_1 (&buffer, g, spc, flags); - pp_newline_and_flush (&buffer); + if (newline) +pp_newline (&buffer); + pp_flush (&buffer); } DEBUG_FUNCTION void diff --git a/gcc/gimple-pretty-print.h b/gcc/gimple-pretty-print.h index f8eef99..6890cfb 100644 --- a/gcc/gimple-pretty-print.h +++ b/gcc/gimple-pretty-print.h @@ -27,7 +27,7 @@ along with GCC; see the file COPYING3. If not see extern void debug_gimple_stmt (gimple *); extern void debug_gimple_seq (gimple_seq); extern void print_gimple_seq (FILE *, gimple_seq, int, int); -extern void print_gimple_stmt (FILE *, gimple *, int, int); +extern void print_gimple_stmt (FILE *, gimple *, int, int, bool newline = true); extern void debug (gimple &ref); extern void debug (gimple *ptr); extern void print_gimple_expr (FILE *, gimple *, int, int); diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 372a0be..b1a8c5d 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -237,7 +237,8 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi) if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, " Deleted dead call '"); - print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0); + print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0, + false); fprintf (dump_file, "'\n"); } @@ -293,7 +294,7 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi) if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, " Deleted dead store '"); - print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0); + print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0, false); fprintf (dump_file, "'\n"); } -- 2.10.1
Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx
Hello, Sorry for top-posting, but this is a ping for the attached patch. The patch doesn't seem to have been applied nor refused. So I'm pinging to see if I need to change something? I already have a copyright assignment now. I'm attaching a updated patch that doesn't conflict in the Changelog file. Regards, On Fri, Sep 16, 2016 at 3:37 AM, Felipe Magno de Almeida wrote: > Hello, > > Another patch. > > On Fri, Sep 16, 2016 at 2:53 AM, Felipe Magno de Almeida > wrote: >> On Fri, Sep 16, 2016 at 2:42 AM, Marc Glisse wrote: >>> On Thu, 15 Sep 2016, Felipe Magno de Almeida wrote: >>> >>> + || sizeof(uint32_t) == sizeof(void*) >>> +|| sizeof(uint16_t) == sizeof(void*), >>> >>> Indentation is off? >>> Call _M_extract_* functions family through temporary int objects >>> >>> >>> Would it make sense to use a template type instead of int for this >>> parameter? Or possibly have a typedef that defaults to int (what POSIX >>> requires). The hard case would be a libc that uses bitfields for the fields >>> of struct tm (that could save some space), but I don't think anyone does >>> that. >> >> I've tried both approaches. Templates were causing problems of not >> defined instantations because they were being used as ints too >> in other _M_extract functions through a tmp integer. And typedef's >> caused the same problem of having to use a tmp value of the right >> type but for example _M_extract_wday_or_month could not have the >> same type (in AVR they do) and I'd have to use a temporary anyway >> then. >> >> This was the least intrusive way. >> float pointing >>> >>> floating point? >> >> :D. Yes. >> >>> A ChangeLog entry would also help. >> >> OK. >> >>> -- >>> Marc Glisse >> >> Regards, >> -- >> Felipe Magno de Almeida > > > > -- > Felipe Magno de Almeida -- Felipe Magno de Almeida diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index f405ccd..b0efe72 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,17 @@ +2016-11-10 Felipe Magno de Almeida + + * src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers + +2016-11-10 Felipe Magno de Almeida + + * include/bits/locale_facets_nonio.tcc: Avoid compilation errors + with non-standard struct tm. + +2016-11-10 Felipe Magno de Almeida + + * crossconfig.m4: Add avr target for cross-compilation + * configure: Regenerate + 2016-11-09 Tim Shen * libstdc++-v3/include/bits/regex.h (regex_iterator::regex_iterator()): diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure index 8481a48..5e3f783 100755 --- a/libstdc++-v3/configure +++ b/libstdc++-v3/configure @@ -28840,6 +28840,55 @@ case "${host}" in # This is a freestanding configuration; there is nothing to do here. ;; + avr*-*-*) +$as_echo "#define HAVE_ACOSF 1" >>confdefs.h + +$as_echo "#define HAVE_ASINF 1" >>confdefs.h + +$as_echo "#define HAVE_ATAN2F 1" >>confdefs.h + +$as_echo "#define HAVE_ATANF 1" >>confdefs.h + +$as_echo "#define HAVE_CEILF 1" >>confdefs.h + +$as_echo "#define HAVE_COSF 1" >>confdefs.h + +$as_echo "#define HAVE_COSHF 1" >>confdefs.h + +$as_echo "#define HAVE_EXPF 1" >>confdefs.h + +$as_echo "#define HAVE_FABSF 1" >>confdefs.h + +$as_echo "#define HAVE_FLOORF 1" >>confdefs.h + +$as_echo "#define HAVE_FMODF 1" >>confdefs.h + +$as_echo "#define HAVE_FREXPF 1" >>confdefs.h + +$as_echo "#define HAVE_SQRTF 1" >>confdefs.h + +$as_echo "#define HAVE_HYPOTF 1" >>confdefs.h + +$as_echo "#define HAVE_LDEXPF 1" >>confdefs.h + +$as_echo "#define HAVE_LOG10F 1" >>confdefs.h + +$as_echo "#define HAVE_LOGF 1" >>confdefs.h + +$as_echo "#define HAVE_MODFF 1" >>confdefs.h + +$as_echo "#define HAVE_POWF 1" >>confdefs.h + +$as_echo "#define HAVE_SINF 1" >>confdefs.h + +$as_echo "#define HAVE_SINHF 1" >>confdefs.h + +$as_echo "#define HAVE_TANF 1" >>confdefs.h + +$as_echo "#define HAVE_TANHF 1" >>confdefs.h + +;; + mips*-sde-elf*) # These definitions are for the SDE C library rather than newlib. SECTION_FLAGS='-ffunction-sections -fdata-sections' diff --git a/libstdc++-v3/crossconfig.m4 b/libstdc++-v3/crossconfig.m4 index 6abc84f..2b955ec 100644 --- a/libstdc++-v3/crossconfig.m4 +++ b/libstdc++-v3/crossconfig.m4 @@ -9,6 +9,32 @@ case "${host}" in # This is a freestanding configuration; there is nothing to do here. ;; + avr*-*-*) +AC_DEFINE(HAVE_ACOSF) +AC_DEFINE(HAVE_ASINF) +AC_DEFINE(HAVE_ATAN2F) +AC_DEFINE(HAVE_ATANF) +AC_DEFINE(HAVE_CEILF) +AC_DEFINE(HAVE_COSF) +AC_DEFINE(HAVE_COSHF) +AC_DEFINE(HAVE_EXPF) +AC_DEFINE(HAVE_FABSF) +AC_DEFINE(HAVE_FLOORF) +AC_DEFINE(HAVE_FMODF) +AC_DEFINE(HAVE_FREXPF) +AC_DEFINE(HAVE_SQRTF) +AC_DEFINE(HAVE_HYPOTF) +AC_DEFINE(HAVE_LDEXPF) +AC_DEFINE(HAVE_LOG10F) +AC_DEFINE(HAVE_LOGF) +AC_DEFINE(HAVE_MODFF) +AC_DEFINE(HAVE_POWF) +AC_DEFINE(HAVE_SINF) +AC_DEFINE(HAVE
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On 11/10/2016 05:19 AM, Martin Liška wrote: On 10/13/2016 05:34 PM, Martin Liška wrote: Hello. As it's very hard to guess from GCC driver whether a target supports atomic updates for GCOV counter or not, I decided to come up with a new option value (maybe-atomic), that would be transformed in a corresponding value (single or atomic) in tree-profile.c. The GCC driver selects the option when -pthread is present in the command line. That should fix all tests failures seen on AIX target. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? I dislike this. If it's hard for gcc itself to know, how much harder for the user must it be? (does gcc have another instance of an option that behaves 'prefer-A-or-B-if-you-can't'? It's also not clear what problem it's solving for the user? If the user needs atomic update, they should get a hard error if the target doesn't support it. If they don't need atomic, why ask for it? But as ever, I'm not going to veto it. nathan -- Nathan Sidwell
Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx
On Thu, Nov 10, 2016 at 1:39 PM, Felipe Magno de Almeida wrote: > Hello, > > Sorry for top-posting, but this is a ping for the attached patch. > > The patch doesn't seem to have been applied nor refused. So I'm > pinging to see if I need to change something? I already have a > copyright assignment now. > > I'm attaching a updated patch that doesn't conflict in the Changelog > file. Reattaching the patch with the correct git commit descriptions. > Regards, [snip] > -- > Felipe Magno de Almeida Kind regards, -- Felipe Magno de Almeida From 2e62f3a4b3c307e2abec0cb0302ec601b8f693e8 Mon Sep 17 00:00:00 2001 From: Felipe Magno de Almeida Date: Thu, 15 Sep 2016 15:54:50 -0300 Subject: [PATCH 1/3] Add #ifdef case for 16 bits in cow-stdexcept.cc Added #ifdef case for when void* is 16 bits so it compiles in AVR target. --- libstdc++-v3/ChangeLog | 4 libstdc++-v3/src/c++11/cow-stdexcept.cc | 11 --- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index f405ccd..80d2e9d 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,7 @@ +2016-11-10 Felipe Magno de Almeida + + * src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers + 2016-11-09 Tim Shen * libstdc++-v3/include/bits/regex.h (regex_iterator::regex_iterator()): diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc index 31a89df..641b372 100644 --- a/libstdc++-v3/src/c++11/cow-stdexcept.cc +++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc @@ -208,6 +208,8 @@ extern void* _ZGTtnaX (size_t sz) __attribute__((weak)); extern void _ZGTtdlPv (void* ptr) __attribute__((weak)); extern uint8_t _ITM_RU1(const uint8_t *p) ITM_REGPARM __attribute__((weak)); +extern uint16_t _ITM_RU2(const uint16_t *p) + ITM_REGPARM __attribute__((weak)); extern uint32_t _ITM_RU4(const uint32_t *p) ITM_REGPARM __attribute__((weak)); extern uint64_t _ITM_RU8(const uint64_t *p) @@ -272,12 +274,15 @@ _txnal_cow_string_C1_for_exceptions(void* that, const char* s, static void* txnal_read_ptr(void* const * ptr) { static_assert(sizeof(uint64_t) == sizeof(void*) - || sizeof(uint32_t) == sizeof(void*), - "Pointers must be 32 bits or 64 bits wide"); + || sizeof(uint32_t) == sizeof(void*) + || sizeof(uint16_t) == sizeof(void*), + "Pointers must be 16 bits, 32 bits or 64 bits wide"); #if __UINTPTR_MAX__ == __UINT64_MAX__ return (void*)_ITM_RU8((const uint64_t*)ptr); -#else +#elif __UINTPTR_MAX__ == __UINT32_MAX__ return (void*)_ITM_RU4((const uint32_t*)ptr); +#else + return (void*)_ITM_RU2((const uint16_t*)ptr); #endif } -- 2.10.2 From 7ed4af72fe0bdee1a38c7487955590fb64f76a5d Mon Sep 17 00:00:00 2001 From: Felipe Magno de Almeida Date: Thu, 15 Sep 2016 18:52:57 -0300 Subject: [PATCH 2/3] Use temporary int objects to access struct tm members Call _M_extract_* functions family through temporary int objects, so it doesn't convert from lvalue to rvalue through a temporary in AVR because of the incompatible types used in AVR-Libc. This fixes compilation errors with AVR-Libc while compiling libstdc++ for AVR target. --- libstdc++-v3/ChangeLog| 5 +++ libstdc++-v3/include/bits/locale_facets_nonio.tcc | 44 --- 2 files changed, 37 insertions(+), 12 deletions(-) diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index 80d2e9d..80c75c6 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,5 +1,10 @@ 2016-11-10 Felipe Magno de Almeida + * include/bits/locale_facets_nonio.tcc: Avoid compilation errors + with non-standard struct tm. + +2016-11-10 Felipe Magno de Almeida + * src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers 2016-11-09 Tim Shen diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc b/libstdc++-v3/include/bits/locale_facets_nonio.tcc index 1a4f9a0..cc9d2df 100644 --- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc +++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc @@ -659,30 +659,38 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11 // Abbreviated weekday name [tm_wday] const char_type* __days1[7]; __tp._M_days_abbreviated(__days1); - __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days1, + __tm->tm_wday = __mem; + __beg = _M_extract_name(__beg, __end, __mem, __days1, 7, __io, __tmperr); + __mem = __tm->tm_wday; break; case 'A': // Weekday name [tm_wday]. const char_type* __days2[7]; __tp._M_days(__days2); - __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days2, + __mem = __tm->tm_wday; + __beg = _M_extract_name(__beg, __end, __mem, __days2, 7, __io, __tmperr); + __tm->tm_wday = __mem; break; case 'h': case 'b': // Abbreviated month name [tm_mon] const char_type* __months1[12]; __tp._M_months_abbrevi
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On Thu, Nov 10, 2016 at 10:43 AM, Nathan Sidwell wrote: > On 11/10/2016 05:19 AM, Martin Liška wrote: > >>> On 10/13/2016 05:34 PM, Martin Liška wrote: Hello. As it's very hard to guess from GCC driver whether a target supports atomic updates for GCOV counter or not, I decided to come up with a new option value (maybe-atomic), that would be transformed in a corresponding value (single or atomic) in tree-profile.c. The GCC driver selects the option when -pthread is present in the command line. That should fix all tests failures seen on AIX target. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? > > > I dislike this. If it's hard for gcc itself to know, how much harder for > the user must it be? (does gcc have another instance of an option that > behaves 'prefer-A-or-B-if-you-can't'? > > It's also not clear what problem it's solving for the user? If the user > needs atomic update, they should get a hard error if the target doesn't > support it. If they don't need atomic, why ask for it? > > But as ever, I'm not going to veto it. Do you have a better suggestion? gcc.c now imposes profile-update=atomic if -pthread is used, even if the target does not support profile-update=atomic. Either gcc.c must not impose profile-update=atomic or we need some way of differentiating between when the request should fail because the user really expects it and when the request should silently and gently be ignored. The atomic update feature is nice, but currently GCC is trying to be too smart to guess how important the feature is to the user. Thanks, David
Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)
On Wed, Nov 09, 2016 at 10:16:35PM +, Joseph Myers wrote: > On Wed, 9 Nov 2016, Bernd Edlinger wrote: > > > Yes, but maybe introduce a test if the half-wide value fits? > > > > like: > > > > #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1) > > Something like that. In patch form, that would look like this... I've checked on my ARM and AArch64 trees with _Float16 support that this lets the tests pass. OK? Thanks, James --- 2016-11-10 James Greenhalgh * gcc.dg/torture/fp-int-convert.h (M_OK2): New, use it in WVAL0S tests added in r241817. diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert.h b/gcc/testsuite/gcc.dg/torture/fp-int-convert.h index bbe9666..2b904b6 100644 --- a/gcc/testsuite/gcc.dg/torture/fp-int-convert.h +++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert.h @@ -53,13 +53,14 @@ do {\ TEST_I_F_VAL (U, F, HVAL1U (P, U), P_OK (P, U)); \ TEST_I_F_VAL (U, F, HVAL1U (P, U) + 1, P_OK (P, U)); \ TEST_I_F_VAL (U, F, HVAL1U (P, U) - 1, P_OK (P, U)); \ - TEST_I_F_VAL (I, F, WVAL0S (I), 1);\ - TEST_I_F_VAL (I, F, -WVAL0S (I), 1);\ + TEST_I_F_VAL (I, F, WVAL0S (I), M_OK2 (M, U)); \ + TEST_I_F_VAL (I, F, -WVAL0S (I), M_OK2 (M, U)); \ } while (0) #define P_OK(P, T) ((P) >= sizeof(T) * CHAR_BIT) #define P_OK1(P, T) ((P) >= sizeof(T) * CHAR_BIT - 1) #define M_OK1(M, T) ((M) > sizeof(T) * CHAR_BIT - 1) +#define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1) #define HVAL0U(P, U) (U)(P_OK (P, U) \ ? (U)1 \ : (((U)1 << (sizeof(U) * CHAR_BIT - 1)) \
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On 11/10/2016 04:43 PM, Nathan Sidwell wrote: > On 11/10/2016 05:19 AM, Martin Liška wrote: > >>> On 10/13/2016 05:34 PM, Martin Liška wrote: Hello. As it's very hard to guess from GCC driver whether a target supports atomic updates for GCOV counter or not, I decided to come up with a new option value (maybe-atomic), that would be transformed in a corresponding value (single or atomic) in tree-profile.c. The GCC driver selects the option when -pthread is present in the command line. That should fix all tests failures seen on AIX target. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? > > I dislike this. If it's hard for gcc itself to know, how much harder for the > user must it be? (does gcc have another instance of an option that behaves > 'prefer-A-or-B-if-you-can't'? > > It's also not clear what problem it's solving for the user? If the user > needs atomic update, they should get a hard error if the target doesn't > support it. If they don't need atomic, why ask for it? My initial motivation was to automatically selected -fprofile-update=atomic if supported by a target and when '-pthread' is present on command line. As it's very problematic to identify (from GCC driver) whether a target supports or not atomic updates, 'maybe' option is the only possible we can guess. > > But as ever, I'm not going to veto it. Other option is to disable selection of -fprofile-update=atomic automatically. Martin > > nathan >
Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers
Hi Mike, > I have built the spec 2006 CPU benchmark suite with these changes, and the > power8 (ISA 2.07) code generation does not change. Very good to hear :-) Just some nits; okay for trunk with that fixed: > +(define_split > + [(set (match_operand:EXTHI 0 "altivec_register_operand" "") > + (sign_extend:EXTHI > + (match_operand:HI 1 "indexed_or_indirect_operand" "")))] > + "TARGET_P9_VECTOR && reload_completed" > + [(set (match_dup 2) > + (match_dup 1)) > + (set (match_dup 0) > + (sign_extend:EXTHI (match_dup 2)))] > +{ > + operands[2] = gen_rtx_REG (HImode, REGNO (operands[1])); > +}) Please lose the default "" (here and elsewhere). > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c > ___ > Modified: svn:mergeinfo >Merged /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c:r241733-241924 > > > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c > ___ > Modified: svn:mergeinfo >Merged /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c:r241733-241924 I don't know what this is? Thanks! Segher
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On 11/10/2016 07:43 AM, Nathan Sidwell wrote: On 11/10/2016 05:19 AM, Martin Liška wrote: On 10/13/2016 05:34 PM, Martin Liška wrote: Hello. As it's very hard to guess from GCC driver whether a target supports atomic updates for GCOV counter or not, I decided to come up with a new option value (maybe-atomic), that would be transformed in a corresponding value (single or atomic) in tree-profile.c. The GCC driver selects the option when -pthread is present in the command line. That should fix all tests failures seen on AIX target. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? I dislike this. If it's hard for gcc itself to know, how much harder for the user must it be? (does gcc have another instance of an option that behaves 'prefer-A-or-B-if-you-can't'? Thinking further. why isn't the right solution for -fprofile-update=atomic when faced with a target that cannot support it to: a) issue an error and bail out at the first opportunity b) or issue a warning and fall back to single threaded update? For #b presumably there'll be the capability of suppressing that particular warning? nathan -- Nathan Sidwell
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On Thu, Nov 10, 2016 at 11:14 AM, Nathan Sidwell wrote: > On 11/10/2016 07:43 AM, Nathan Sidwell wrote: >> >> On 11/10/2016 05:19 AM, Martin Liška wrote: >> On 10/13/2016 05:34 PM, Martin Liška wrote: > > Hello. > > As it's very hard to guess from GCC driver whether a target supports > atomic updates > for GCOV counter or not, I decided to come up with a new option > value (maybe-atomic), > that would be transformed in a corresponding value (single or > atomic) in tree-profile.c. > The GCC driver selects the option when -pthread is present in the > command line. > > That should fix all tests failures seen on AIX target. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression > tests. > > Ready to be installed? >> >> >> I dislike this. If it's hard for gcc itself to know, how much harder >> for the user must it be? (does gcc have another instance of an option >> that behaves 'prefer-A-or-B-if-you-can't'? > > > Thinking further. why isn't the right solution for -fprofile-update=atomic > when faced with a target that cannot support it to: > a) issue an error and bail out at the first opportunity > b) or issue a warning and fall back to single threaded update? > > For #b presumably there'll be the capability of suppressing that particular > warning? Because that incorrectly breaks a huge portion of the testsuite. that's not what the user intended. - David
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On 11/10/2016 07:55 AM, David Edelsohn wrote: gcc.c now imposes profile-update=atomic if -pthread is used, even if the target does not support profile-update=atomic. ah, that's where this is coming from. nathan -- Nathan Sidwell
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On 11/10/2016 05:17 PM, David Edelsohn wrote: > Maybe instead of adding "maybe", we need to change the severity of the > warning so that the warning is not emitted by default. Adding the warning option to -Wextra can be solution. Is it acceptable approach? Martin
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
Hi! Great to see this. Just a few comments... On Thu, Nov 10, 2016 at 02:25:47PM +, Kyrill Tkachov wrote: > +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ > + > +static sbitmap > +aarch64_get_separate_components (void) > +{ > + /* Calls to alloca further extend the stack frame and it can be messy to > + figure out the location of the stack slots for each register. > + For now be conservative. */ > + if (cfun->calls_alloca) > +return NULL; The generic code already disallows functions with alloca (in try_shrink_wrapping_separate). > +static void > +aarch64_emit_prologue_components (sbitmap components) > +{ > + rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed > + ? HARD_FRAME_POINTER_REGNUM > + : STACK_POINTER_REGNUM); > + > + for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++) > +if (bitmap_bit_p (components, regno)) > + { > + rtx reg = gen_rtx_REG (Pmode, regno); > + HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno]; > + if (!frame_pointer_needed) > + offset += cfun->machine->frame.frame_size > + - cfun->machine->frame.hard_fp_offset; > + rtx addr = plus_constant (Pmode, ptr_reg, offset); > + rtx mem = gen_frame_mem (Pmode, addr); > + > + RTX_FRAME_RELATED_P (emit_move_insn (mem, reg)) = 1; > + } > +} I think you should emit the CFI notes here directly, just like for the epilogue components. Segher
[PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
Hi, this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. It requires additional patch for register allocator from Vladimir Makarov to be committed before. gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX5124FMAPS_SET, OPTION_MASK_ISA_AVX5124FMAPS_UNSET, OPTION_MASK_ISA_AVX5124VNNIW_SET, OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New. (ix86_handle_option): Handle OPT_mavx5124fmaps, OPT_mavx5124vnniw. * config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h. * config/i386/avx5124fmapsintrin.h: New file. * config/i386/avx5124vnniwintrin.h: Ditto. * config/i386/constraints.md (h): New constraint. * config/i386/cpuid.h: (bit_AVX5124VNNIW, bit_AVX5124FMAPS): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect avx5124fmaps, avx5124vnniw. * config/i386/i386-builtin-types.def: Add types V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI, V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF, V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF, V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI, V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI, V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI. * config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask, __builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss, __builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask, __builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss, __builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd, __builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds, __builtin_ia32_vp4dpwssds_mask): New. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AVX5124FMAPS__, __AVX5124VNNIW__. * config/i386/i386-modes.def (VECTOR_MODES (FLOAT, 256), VECTOR_MODE (INT, SI, 64)): New modes. * config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps, -mavx5124vnniw. (PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define. (ix86_option_override_internal): Handle new options. (ix86_valid_target_attribute_inner_p): Add avx5124fmaps, avx5124vnniw. (ix86_expand_builtin): Handle new builtins. (ix86_additional_allocno_class_p): New. * config/i386/i386.h (TARGET_AVX5124FMAPS, TARGET_AVX5124FMAPS_P, TARGET_AVX5124VNNIW, TARGET_AVX5124VNNIW_P): Define. (reg_class): Add MOD4_SSE_REGS. (MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New. * config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw. * config/i386/immintrin.h: Include avx5124fmapsintrin.h, avx5124vnniwintrin.h. * config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD, UNSPEC_VP4FNMADD, UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS. (define_mode_iterator IMOD4): New. (define_mode_attr imod4_narrow): Ditto. (define_insn "mov"): Ditto. (define_insn "avx5124fmaddps_4fmaddps"): Ditto. (define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto. (define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto. (define_insn "avx5124fmaddps_4fmaddss"): Ditto. (define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto. (define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto. (define_insn "avx5124fmaddps_4fnmaddps"): Ditto. (define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto. (define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto. (define_insn "avx5124fmaddps_4fnmaddss"): Ditto. (define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto. (define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto. (define_insn "avx5124vnniw_vp4dpwssd"): Ditto. (define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto. (define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto. (define_insn "avx5124vnniw_vp4dpwssds"): Ditto. (define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto. (define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto. * init-regs.c (initialize_uninitialized_regs): Add emit_clobber call. * genmodes.c (mode_size_inline): Extend return type. * machmode.h (mode_size, mode_base_align): Extend type. gcc/testsuite/ * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test. * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto. * gcc.target/i386/avx5124fmaps-check.h: Ditto. * gcc.target/i386/avx5124vnniw-check.h: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote: > Hi, > > this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. > > It requires additional patch for register allocator from Vladimir > Makarov to be committed before. Your MUA ate tabs (and in the ChangeLog you're using spaces instead of tabs), can you repost as attachment or configure your MUA not to do this? Just a couple of random nits follow: > * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps. This mentions an option that doesn't exist, is that s/dd// ? > * gcc.target/i386/sse-13.c: Ditto. > @@ -399,6 +403,13 @@ ix86_handle_option (struct gcc_options *opts, > { >opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET; >opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET; > + > + //turn off additional isa flags Comments start with capital letter, end with ., there should be space between // and T, better use /* ... */ style comment to match other comments in the file. > + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET; > + opts->x_ix86_isa_flags2_explicit |= > OPTION_MASK_ISA_AVX5124FMAPS_UNSET; > + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET; > + opts->x_ix86_isa_flags2_explicit |= > OPTION_MASK_ISA_AVX5124VNNIW_UNSET; > + > } The formatting looks very weird. Jakub
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On Thu, Nov 10, 2016 at 10:58 AM, Martin Liška wrote: > On 11/10/2016 04:43 PM, Nathan Sidwell wrote: >> On 11/10/2016 05:19 AM, Martin Liška wrote: >> On 10/13/2016 05:34 PM, Martin Liška wrote: > Hello. > > As it's very hard to guess from GCC driver whether a target supports > atomic updates > for GCOV counter or not, I decided to come up with a new option value > (maybe-atomic), > that would be transformed in a corresponding value (single or atomic) in > tree-profile.c. > The GCC driver selects the option when -pthread is present in the command > line. > > That should fix all tests failures seen on AIX target. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? >> >> I dislike this. If it's hard for gcc itself to know, how much harder for >> the user must it be? (does gcc have another instance of an option that >> behaves 'prefer-A-or-B-if-you-can't'? >> >> It's also not clear what problem it's solving for the user? If the user >> needs atomic update, they should get a hard error if the target doesn't >> support it. If they don't need atomic, why ask for it? > > My initial motivation was to automatically selected -fprofile-update=atomic > if supported by a target and when '-pthread' is present on command line. > As it's very problematic to identify (from GCC driver) whether a target > supports or not atomic updates, 'maybe' option is the only possible we can > guess. > >> >> But as ever, I'm not going to veto it. > > Other option is to disable selection of -fprofile-update=atomic automatically. Unfortunately, this cannot use a configure test or manually set value based on target because the same gcc.c driver is invoked with different options that may provide atomic update in some variants and no atomic update in other (e.g., -m64) because the profile counter is 64 bits. Maybe instead of adding "maybe", we need to change the severity of the warning so that the warning is not emitted by default. - David
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On 10/11/16 16:26, Segher Boessenkool wrote: Hi! Hi, Great to see this. Just a few comments... On Thu, Nov 10, 2016 at 02:25:47PM +, Kyrill Tkachov wrote: +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ + +static sbitmap +aarch64_get_separate_components (void) +{ + /* Calls to alloca further extend the stack frame and it can be messy to + figure out the location of the stack slots for each register. + For now be conservative. */ + if (cfun->calls_alloca) +return NULL; The generic code already disallows functions with alloca (in try_shrink_wrapping_separate). Ok, I'll remove this. +static void +aarch64_emit_prologue_components (sbitmap components) +{ + rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed +? HARD_FRAME_POINTER_REGNUM +: STACK_POINTER_REGNUM); + + for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++) +if (bitmap_bit_p (components, regno)) + { + rtx reg = gen_rtx_REG (Pmode, regno); + HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno]; + if (!frame_pointer_needed) + offset += cfun->machine->frame.frame_size + - cfun->machine->frame.hard_fp_offset; + rtx addr = plus_constant (Pmode, ptr_reg, offset); + rtx mem = gen_frame_mem (Pmode, addr); + + RTX_FRAME_RELATED_P (emit_move_insn (mem, reg)) = 1; + } +} I think you should emit the CFI notes here directly, just like for the epilogue components. The prologue code in expand_prologue doesn't attach any explicit notes, so I didn't want to deviate from that. Looking at the powerpc implementation, would that be a REG_CFA_OFFSET with the (SET (mem) (reg)) expression for saving the reg? Thanks, Kyrill Segher
Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)
On Thu, 10 Nov 2016, James Greenhalgh wrote: > > On Wed, Nov 09, 2016 at 10:16:35PM +, Joseph Myers wrote: > > On Wed, 9 Nov 2016, Bernd Edlinger wrote: > > > > > Yes, but maybe introduce a test if the half-wide value fits? > > > > > > like: > > > > > > #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1) > > > > Something like that. > > In patch form, that would look like this... > > I've checked on my ARM and AArch64 trees with _Float16 support that this > lets the tests pass. > > OK? OK. -- Joseph S. Myers jos...@codesourcery.com
[C++ PATCH] Only warn with -Wc++1z-compat about mangling of visible symbols
Hi! It seems -Wabi/-Wc++1z-compat warns about mangling changes even for symbols that are not visible outside of its TU (so likely only inline asm or tools looking at .symtab STB_LOCAL symbols would notice). Perhaps that is fine for -Wabi that isn't enabled in -Wall/-Wextra, but -Wc++1z-compat is, and I think most of the users don't really care about mangling of those symbols. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-10 Jakub Jelinek * mangle.c (mangle_decl): Only emit -Wc++1z-compat warnings for public or external symbols. * g++.dg/cpp1z/noexcept-type14.C: New test. * g++.dg/asan/asan_test.C: Remove -Wno-c++1z-compat from dg-options. --- gcc/cp/mangle.c.jj 2016-11-09 23:55:59.0 +0100 +++ gcc/cp/mangle.c 2016-11-10 10:14:26.914059686 +0100 @@ -3836,7 +3836,8 @@ mangle_decl (const tree decl) } SET_DECL_ASSEMBLER_NAME (decl, id); - if (G.need_cxx1z_warning) + if (G.need_cxx1z_warning + && (TREE_PUBLIC (decl) || DECL_REALLY_EXTERN (decl))) warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc__1z_compat, "mangled name for %qD will change in C++17 because the " "exception specification is part of a function type", --- gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C.jj 2016-11-10 10:14:42.898857481 +0100 +++ gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C2016-11-10 10:22:46.412741092 +0100 @@ -0,0 +1,26 @@ +// { dg-do compile { target c++11 } } +// { dg-options "-Wall" } + +#define A asm volatile ("" : : : "memory") +void foo () throw () {} +extern void f1 (decltype (foo) *); // { dg-bogus "mangled name" } +void f2 (decltype (foo) *);// { dg-bogus "mangled name" } +extern void f3 (decltype (foo) *); // { dg-warning "mangled name" "" { target c++14_down } } +void f4 (decltype (foo) *);// { dg-warning "mangled name" "" { target c++14_down } } +void f5 (decltype (foo) *) { A; } // { dg-warning "mangled name" "" { target c++14_down } } +static void f6 (decltype (foo) *) { A; }// { dg-bogus "mangled name" } +namespace N { +void f7 (decltype (foo) *) { A; } // { dg-warning "mangled name" "" { target c++14_down } } +} +namespace { +void f8 (decltype (foo) *) { A; } // { dg-bogus "mangled name" } +} +void bar () +{ + f3 (foo); + f4 (foo); + f5 (foo); + f6 (foo); + N::f7 (foo); + f8 (foo); +} --- gcc/testsuite/g++.dg/asan/asan_test.C.jj2016-11-10 00:00:00.0 +0100 +++ gcc/testsuite/g++.dg/asan/asan_test.C 2016-11-10 13:28:55.348073019 +0100 @@ -2,7 +2,7 @@ // { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } // { dg-skip-if "" { *-*-* } { "-flto" } { "" } } // { dg-additional-sources "asan_globals_test-wrapper.cc" } -// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Wno-c++1z-compat -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" } +// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" } // { dg-additional-options "-DASAN_NEEDS_SEGV=1" { target { ! arm*-*-* } } } // { dg-additional-options "-DASAN_LOW_MEMORY=1 -DASAN_NEEDS_SEGV=0" { target arm*-*-* } } // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! run_expensive_tests } } } Jakub
Re: [C++ PATCH] Only warn with -Wc++1z-compat about mangling of visible symbols
OK. On Thu, Nov 10, 2016 at 8:52 AM, Jakub Jelinek wrote: > Hi! > > It seems -Wabi/-Wc++1z-compat warns about mangling changes even for symbols > that are > not visible outside of its TU (so likely only inline asm or tools > looking at .symtab STB_LOCAL symbols would notice). Perhaps that is fine > for -Wabi that isn't enabled in -Wall/-Wextra, but -Wc++1z-compat is, and > I think most of the users don't really care about mangling of those symbols. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2016-11-10 Jakub Jelinek > > * mangle.c (mangle_decl): Only emit -Wc++1z-compat warnings for > public or external symbols. > > * g++.dg/cpp1z/noexcept-type14.C: New test. > * g++.dg/asan/asan_test.C: Remove -Wno-c++1z-compat from dg-options. > > --- gcc/cp/mangle.c.jj 2016-11-09 23:55:59.0 +0100 > +++ gcc/cp/mangle.c 2016-11-10 10:14:26.914059686 +0100 > @@ -3836,7 +3836,8 @@ mangle_decl (const tree decl) > } >SET_DECL_ASSEMBLER_NAME (decl, id); > > - if (G.need_cxx1z_warning) > + if (G.need_cxx1z_warning > + && (TREE_PUBLIC (decl) || DECL_REALLY_EXTERN (decl))) > warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc__1z_compat, > "mangled name for %qD will change in C++17 because the " > "exception specification is part of a function type", > --- gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C.jj 2016-11-10 > 10:14:42.898857481 +0100 > +++ gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C2016-11-10 > 10:22:46.412741092 +0100 > @@ -0,0 +1,26 @@ > +// { dg-do compile { target c++11 } } > +// { dg-options "-Wall" } > + > +#define A asm volatile ("" : : : "memory") > +void foo () throw () {} > +extern void f1 (decltype (foo) *); // { dg-bogus "mangled name" } > +void f2 (decltype (foo) *);// { dg-bogus "mangled name" } > +extern void f3 (decltype (foo) *); // { dg-warning "mangled name" "" { > target c++14_down } } > +void f4 (decltype (foo) *);// { dg-warning "mangled name" "" { > target c++14_down } } > +void f5 (decltype (foo) *) { A; } // { dg-warning "mangled name" "" { > target c++14_down } } > +static void f6 (decltype (foo) *) { A; }// { dg-bogus "mangled name" } > +namespace N { > +void f7 (decltype (foo) *) { A; } // { dg-warning "mangled name" "" { > target c++14_down } } > +} > +namespace { > +void f8 (decltype (foo) *) { A; } // { dg-bogus "mangled name" } > +} > +void bar () > +{ > + f3 (foo); > + f4 (foo); > + f5 (foo); > + f6 (foo); > + N::f7 (foo); > + f8 (foo); > +} > --- gcc/testsuite/g++.dg/asan/asan_test.C.jj2016-11-10 00:00:00.0 > +0100 > +++ gcc/testsuite/g++.dg/asan/asan_test.C 2016-11-10 13:28:55.348073019 > +0100 > @@ -2,7 +2,7 @@ > // { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } > // { dg-skip-if "" { *-*-* } { "-flto" } { "" } } > // { dg-additional-sources "asan_globals_test-wrapper.cc" } > -// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall > -Wno-c++1z-compat -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 > -DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" > } > +// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Werror -g > -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 > -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" } > // { dg-additional-options "-DASAN_NEEDS_SEGV=1" { target { ! arm*-*-* } } } > // { dg-additional-options "-DASAN_LOW_MEMORY=1 -DASAN_NEEDS_SEGV=0" { > target arm*-*-* } } > // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! > run_expensive_tests } } } > > Jakub
[PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)
Hi! On arm/aarch64 we ICE because some decls that make it here has non-NULL DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that doesn't fit into shwi would ICE similarly). While it is arguably a FE bug that it creates for VLA initialization from STRING_CST such a decl, I believe we have some PRs about it already open. I think it won't hurt to check for the large sizes properly even in this function though, and punt on unexpected cases, or even extremely large ones. Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going to test on arm and/or aarch64. Ok for trunk if the testing there passes too? 2016-11-10 Jakub Jelinek PR middle-end/78201 * varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo. Don't test decl != NULL. Don't look at DECL_SIZE, but DECL_SIZE_UNIT instead, return false if it is NULL, or doesn't fit into uhwi, or is larger or equal to targetm.max_anchor_offset. * g++.dg/opt/pr78201.C: New test. --- gcc/varasm.c.jj 2016-10-31 13:28:12.0 +0100 +++ gcc/varasm.c2016-11-10 15:18:41.282886244 +0100 @@ -6804,11 +6804,12 @@ default_use_anchors_for_symbol_p (const_ return false; /* Don't use section anchors for decls that won't fit inside a single -anchor range to reduce the amount of instructions require to refer +anchor range to reduce the amount of instructions required to refer to the entire declaration. */ - if (decl && DECL_SIZE (decl) -&& tree_to_shwi (DECL_SIZE (decl)) - >= (targetm.max_anchor_offset * BITS_PER_UNIT)) + if (DECL_SIZE_UNIT (decl) == NULL_TREE + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (decl)) + || (tree_to_uhwi (DECL_SIZE_UNIT (decl)) + >= (unsigned HOST_WIDE_INT) targetm.max_anchor_offset)) return false; } --- gcc/testsuite/g++.dg/opt/pr78201.C.jj 2016-11-10 15:20:18.398660681 +0100 +++ gcc/testsuite/g++.dg/opt/pr78201.C 2016-11-10 15:19:58.0 +0100 @@ -0,0 +1,13 @@ +// PR middle-end/78201 +// { dg-do compile } +// { dg-options "-O2" } + +struct B { long d (); } *c; +long e; + +void +foo () +{ + char a[e] = ""; + c && c->d(); +} Jakub
[committed] Bump fortran _OPENMP macro and openmp_version
Hi! I think it is better to announce 4.5 than 4.0 at the current state. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2016-11-10 Jakub Jelinek gcc/fortran/ * cpp.c (cpp_define_builtins): Define _OPENMP to 201511 instead of 201307. * gfortran.texi: Mention partial OpenMP 4.5 support. * intrinsic.texi: Update for OpenMP 4.5. gcc/testsuite/ * gfortran.dg/openmp-define-3.f90: Expect 201511 instead of 201307. libgomp/ * omp_lib.f90.in (openmp_version): Change to 201511 from 201307. * omp_lib.h.in (openmp_version): Likewise. * testsuite/libgomp.fortran/openmp_version-1.f: Expect 201511 instead of 201307. * testsuite/libgomp.fortran/openmp_version-2.f90: Likewise. --- gcc/fortran/cpp.c.jj2016-01-04 14:55:59.0 +0100 +++ gcc/fortran/cpp.c 2016-11-10 12:59:33.829477276 +0100 @@ -168,7 +168,7 @@ cpp_define_builtins (cpp_reader *pfile) cpp_define (pfile, "_OPENACC=201306"); if (flag_openmp) -cpp_define (pfile, "_OPENMP=201307"); +cpp_define (pfile, "_OPENMP=201511"); /* The defines below are necessary for the TARGET_* macros. --- gcc/fortran/gfortran.texi.jj2016-11-03 22:03:29.0 +0100 +++ gcc/fortran/gfortran.texi 2016-11-10 13:02:11.380466602 +0100 @@ -536,7 +536,8 @@ The current status of the support is can and @ref{TS 18508 status} sections of the documentation. Additionally, the GNU Fortran compilers supports the OpenMP specification -(version 4.0, @url{http://openmp.org/@/wp/@/openmp-specifications/}). +(version 4.0 and most of the features of the 4.5 version, +@url{http://openmp.org/@/wp/@/openmp-specifications/}). There also is initial support for the OpenACC specification (targeting version 2.0, @uref{http://www.openacc.org/}). Note that this is an experimental feature, incomplete, and subject to @@ -1999,7 +2000,7 @@ and environment variables that influence GNU Fortran strives to be compatible to the @uref{http://openmp.org/wp/openmp-specifications/, -OpenMP Application Program Interface v4.0}. +OpenMP Application Program Interface v4.5}. To enable the processing of the OpenMP directive @code{!$omp} in free-form source code; the @code{c$omp}, @code{*$omp} and @code{!$omp} --- gcc/fortran/intrinsic.texi.jj 2016-10-31 13:28:11.0 +0100 +++ gcc/fortran/intrinsic.texi 2016-11-10 13:02:57.463880355 +0100 @@ -14769,7 +14769,7 @@ with the following options: @code{-fno-u @section OpenMP Modules @code{OMP_LIB} and @code{OMP_LIB_KINDS} @table @asis @item @emph{Standard}: -OpenMP Application Program Interface v4.0 +OpenMP Application Program Interface v4.5 @end table @@ -14783,8 +14783,8 @@ the named constants defined in the modul below. For details refer to the actual -@uref{http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf, -OpenMP Application Program Interface v4.0}. +@uref{http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf, +OpenMP Application Program Interface v4.5}. @code{OMP_LIB_KINDS} provides the following scalar default-integer named constants: @@ -14799,7 +14799,7 @@ named constants: @code{OMP_LIB} provides the scalar default-integer named constant @code{openmp_version} with a value of the form @var{mm}, where @code{} is the year and @var{mm} the month -of the OpenMP version; for OpenMP v4.0 the value is @code{201307}. +of the OpenMP version; for OpenMP v4.5 the value is @code{201511}. The following scalar integer named constants of the kind @code{omp_sched_kind}: --- gcc/testsuite/gfortran.dg/openmp-define-3.f90.jj2014-06-18 09:11:57.0 +0200 +++ gcc/testsuite/gfortran.dg/openmp-define-3.f90 2016-11-10 13:04:12.381927290 +0100 @@ -6,6 +6,6 @@ # error _OPENMP not defined #endif -#if _OPENMP != 201307 +#if _OPENMP != 201511 # error _OPENMP defined to wrong value #endif --- libgomp/omp_lib.f90.in.jj 2016-01-04 14:38:59.0 +0100 +++ libgomp/omp_lib.f90.in 2016-11-10 13:04:30.704694198 +0100 @@ -59,7 +59,7 @@ module omp_lib use omp_lib_kinds implicit none -integer, parameter :: openmp_version = 201307 +integer, parameter :: openmp_version = 201511 interface subroutine omp_init_lock (svar) --- libgomp/omp_lib.h.in.jj 2016-01-04 14:38:59.0 +0100 +++ libgomp/omp_lib.h.in2016-11-10 13:04:43.363533159 +0100 @@ -58,7 +58,7 @@ parameter (omp_lock_hint_contended = 2) parameter (omp_lock_hint_nonspeculative = 4) parameter (omp_lock_hint_speculative = 8) - parameter (openmp_version = 201307) + parameter (openmp_version = 201511) external omp_init_lock, omp_init_nest_lock external omp_init_lock_with_hint --- libgomp/testsuite/libgomp.fortran/openmp_version-1.f.jj 2014-06-18 09:11:57.0 +0200 +++ libgomp/testsuite/libgomp.fortran/openmp_version-1.f2016-11-10 13:05:04.255267386 +0100 @@ -4,6
[PATCH][AArch64] Tweak Cortex-A57 vector cost
The existing vector costs stop some beneficial vectorization. This is mostly due to vector statement cost being set to 3 as well as vector loads having a higher cost than scalar loads. This means that even when we vectorize 4x, it is possible that the cost of a vectorized loop is similar to the scalar version, and we fail to vectorize. For example for a particular loop the costs for -mcpu=generic are: note: Cost model analysis: Vector inside of loop cost: 146 Vector prologue cost: 5 Vector epilogue cost: 0 Scalar iteration cost: 50 Scalar outside cost: 0 Vector outside cost: 5 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 1 note: Runtime profitability threshold = 3 note: Static estimate profitability threshold = 3 note: loop vectorized While -mcpu=cortex-a57 reports: note: Cost model analysis: Vector inside of loop cost: 294 Vector prologue cost: 15 Vector epilogue cost: 0 Scalar iteration cost: 74 Scalar outside cost: 0 Vector outside cost: 15 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 31 note: Runtime profitability threshold = 30 note: Static estimate profitability threshold = 30 note: not vectorized: vectorization not profitable. note: not vectorized: iteration count smaller than user specified loop bound parameter or minimum profitable iterations (whichever is more conservative). Using a cost of 3 for a vector operation suggests they are 3 times as expensive as scalar operations. Since most vector operations have a similar throughput as scalar operations, this is not correct. Using slightly lower values for these heuristics now allows this loop and many others to be vectorized. On a proprietary benchmark the gain from vectorizing this loop is around 15-30% which shows vectorizing it is indeed beneficial. ChangeLog: 2016-11-10 Wilco Dijkstra * config/aarch64/aarch64.c (cortexa57_vector_cost): Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -382,12 +382,12 @@ static const struct cpu_vector_cost cortexa57_vector_cost = 1, /* scalar_stmt_cost */ 4, /* scalar_load_cost */ 1, /* scalar_store_cost */ - 3, /* vec_stmt_cost */ + 2, /* vec_stmt_cost */ 3, /* vec_permute_cost */ 8, /* vec_to_scalar_cost */ 8, /* scalar_to_vec_cost */ - 5, /* vec_align_load_cost */ - 5, /* vec_unalign_load_cost */ + 4, /* vec_align_load_cost */ + 4, /* vec_unalign_load_cost */ 1, /* vec_unalign_store_cost */ 1, /* vec_store_cost */ 1, /* cond_taken_branch_cost */
gomp-nvptx branch status
Hello, I'd like to provide an overview of the gomp-nvptx branch status. In response to this message I'll send two more emails, with libgomp and middle-end changes on the branch. Some of the changes to libgomp such as build machinery adaptations have already received substantial comments in 2015, but the middle-end stuff is mostly unreviewed I believe. Middle-end changes mostly amount to adding SIMD-to-SIMT transforms in omp-low.c, as shown on the Cauldron. SIMT outlining via gimplifier abuse is not there, and neither is cloning of SIMD/SIMT loops. Outlining is required for correctness, and cloning is useful as it allows to avoid intermixing SIMD+SIMT and thus be sure that SIMT lowering does not 'dirty' SIMD loops and regress host/MIC vectorization. I could argue that it's possible to improve my SIMT lowering to avoid some dirtying (like moving loop-invariant calls to GOMP_SIMT_VF()), but the need for outlining makes that moot anyway, I think. To get great performance this will need further changes everywhere, including in target-independent code, due to accidents like this bug (which I'd like to ping given the topic): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68706 With OpenMP/PTX offloading there are 5 additional failures in check-target-libgomp: Two due to tests using 'usleep' in a target region: FAIL: libgomp.c/target-32.c (test for excess errors) FAIL: libgomp.c/thread-limit-2.c (test for excess errors) Two with 'target nowait' (not implemented) FAIL: libgomp.c/target-33.c execution test FAIL: libgomp.c/target-34.c execution test One with 'target link' (not implemented) FAIL: libgomp.c/target-link-1.c (test for excess errors) Eventually these can be fixed by implementing the two missing OpenMP 4.5 features; for the 'usleep' issues, while I think it's not good to have tests with that, eventually I'd like to provide a port of musl libc for PTX which would also provide usleep (either a no-op stub, or based on a busy loop). Short term, it should be possible to implement something like -foffload=^nvptx to skip PTX (and only PTX) offloading on those tests. Thanks. Alexander
[PATCH 1/2][AArch64] Add bfx attribute
Currently the SBFM, UBFM and BFM instructions all use the attribute "bfm". SBFM and UBFM include all shifts on AArch64, which are simpler than bitfield insert. Add a new bfx attribute for these instructions so that they can be modelled more accurately in the future. There is no difference in code generation. ChangeLog: 2016-11-10 Wilco Dijkstra * config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3) Use bfx attribute. (aarch64_lshr_sisd_or_int_3): Likewise. (aarch64_ashr_sisd_or_int_3): Likewise. (si3_insn_uxtw): Likewise. (3_insn): Likewise. (_ashl): Likewise. (zero_extend_lshr): Likewise. (extend_ashr): Likewise. (): Likewise. (insv): Likewise. (andim_ashift_bfiz): Likewise. * config/aarch64/thunderx.md (thunderx_shift): Add bfx. * config/arm/cortex-a53.md (cortex_a53_alu_shift): Likewise. * config/arm/cortex-a57.md (cortex_a57_alu): Add bfx. * config/arm/exynos-m1.md (exynos_m1_alu): Add bfx. (exynos_m1_alu_p): Likewise. * config/arm/types.md: Add bfx. * config/arm/xgene1.md (xgene1_bfm): Add bfx. -- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 62eda569f9b642ac569a61718d7debf7eae1b59e..afd463602af4c3f19db8f8cc834aa8cf0b78867e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3955,7 +3955,7 @@ shl\t%0, %1, %2 ushl\t%0, %1, %2" [(set_attr "simd" "no,no,yes,yes") - (set_attr "type" "bfm,shift_reg,neon_shift_imm, neon_shift_reg")] + (set_attr "type" "bfx,shift_reg,neon_shift_imm, neon_shift_reg")] ) ;; Logical right shift using SISD or Integer instruction @@ -3972,7 +3972,7 @@ # #" [(set_attr "simd" "no,no,yes,yes,yes") - (set_attr "type" "bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")] + (set_attr "type" "bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")] ) (define_split @@ -4019,7 +4019,7 @@ # #" [(set_attr "simd" "no,no,yes,yes,yes") - (set_attr "type" "bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")] + (set_attr "type" "bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")] ) (define_split @@ -4129,7 +4129,7 @@ "@ \\t%w0, %w1, %2 \\t%w0, %w1, %w2" - [(set_attr "type" "bfm,shift_reg")] + [(set_attr "type" "bfx,shift_reg")] ) (define_insn "*3_insn" @@ -4141,7 +4141,7 @@ operands[3] = GEN_INT ( - UINTVAL (operands[2])); return "\t%w0, %w1, %2, %3"; } - [(set_attr "type" "bfm")] + [(set_attr "type" "bfx")] ) (define_insn "*extr5_insn" @@ -4234,7 +4234,7 @@ operands[3] = GEN_INT ( - UINTVAL (operands[2])); return "bfiz\t%0, %1, %2, %3"; } - [(set_attr "type" "bfm")] + [(set_attr "type" "bfx")] ) (define_insn "*zero_extend_lshr" @@ -4247,7 +4247,7 @@ operands[3] = GEN_INT ( - UINTVAL (operands[2])); return "ubfx\t%0, %1, %2, %3"; } - [(set_attr "type" "bfm")] + [(set_attr "type" "bfx")] ) (define_insn "*extend_ashr" @@ -4260,7 +4260,7 @@ operands[3] = GEN_INT ( - UINTVAL (operands[2])); return "sbfx\\t%0, %1, %2, %3"; } - [(set_attr "type" "bfm")] + [(set_attr "type" "bfx")] ) ;; --- @@ -4283,7 +4283,7 @@ (match_operand 3 "const_int_operand" "n")))] "" "bfx\\t%0, %1, %3, %2" - [(set_attr "type" "bfm")] + [(set_attr "type" "bfx")] ) ;; Bitfield Insert (insv) @@ -4365,7 +4365,7 @@ : GEN_INT ( - UINTVAL (operands[2])); return "bfiz\t%0, %1, %2, %3"; } - [(set_attr "type" "bfm")] + [(set_attr "type" "bfx")] ) ;; XXX We should match (any_extend (ashift)) here, like (and (ashift)) below @@ -4377,7 +4377,7 @@ (match_operand 3 "const_int_operand" "n")))] "aarch64_mask_and_shift_for_ubfiz_p (mode, operands[3], operands[2])" "ubfiz\\t%0, %1, %2, %P3" - [(set_attr "type" "bfm")] + [(set_attr "type" "bfx")] ) (define_insn "bswap2" diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md index 058713a2ad98a364d36a3faaf0e93c39cb89adbc..7c1c28b0498cfe0129e3f0de7e29e31536fe421a 100644 --- a/gcc/config/aarch64/thunderx.md +++ b/gcc/config/aarch64/thunderx.md @@ -39,7 +39,7 @@ (define_insn_reservation "thunderx_shift" 1 (and (eq_attr "tune" "thunderx") - (eq_attr "type" "bfm,extend,rotate_imm,shift_imm,shift_reg,rbit,rev")) + (eq_attr "type" "bfm,bfx,extend,rotate_imm,shift_imm,shift_reg,rbit,rev")) "thunderx_pipe0 | thunderx_pipe1") diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md index 70c0f4daabe0ccb8e32808f1af51f5460e087a18..eb6d0b04976aaf441dd95cc43d02918226e75387 100644 --- a/gcc/config/arm/cortex-a53.md +++ b/gcc/config/arm/cortex-a53.md @@ -93,7 +93,7 @@ (and (eq_attr "tune" "cortexa53") (eq_attr "type" "alu_shift_imm,alus_shift_imm, crc,logic_shift_imm,logics_
A RA patch necessary for new Intel insns generation
Hi, the following patch is necessary for generation of new Intel insns requiring 4 aligned zmm regs. Committed as rev. 242043. Index: ChangeLog === --- ChangeLog (revision 242040) +++ ChangeLog (working copy) @@ -1,3 +1,12 @@ +2016-11-10 Vladimir Makarov + + * target.def (additional_allocno_class_p): New. + * hooks.h (hook_bool_reg_class_t_false): New prototype. + * hooks.c (hook_bool_reg_class_t_false): New. + * ira.c (setup_allocno_and_important_classes): Use the new hook. + * doc/tm.texi.in (TARGET_ADDITIONAL_ALLOCNO_CLASS_P): Add it. + * doc/tm.texi: Update. + 2016-11-10 Jason Merrill * gengtype.c (new_structure): Append to structures list. Index: hooks.h === --- hooks.h (revision 242040) +++ hooks.h (working copy) @@ -55,6 +55,7 @@ extern bool hook_bool_rtx_insn_true (rtx extern bool hook_bool_rtx_false (rtx); extern bool hook_bool_rtx_insn_int_false (rtx_insn *, int); extern bool hook_bool_uintp_uintp_false (unsigned int *, unsigned int *); +extern bool hook_bool_reg_class_t_false (reg_class_t regclass); extern bool hook_bool_rtx_mode_int_int_intp_bool_false (rtx, machine_mode, int, int, int *, bool); extern bool hook_bool_tree_tree_false (tree, tree); Index: hooks.c === --- hooks.c (revision 242040) +++ hooks.c (working copy) @@ -466,3 +466,11 @@ hook_bool_uint_uintp_false (unsigned int { return false; } + +/* Generic hook that takes a register class and returns false. */ +bool +hook_bool_reg_class_t_false (reg_class_t regclass ATTRIBUTE_UNUSED) +{ + return false; +} + Index: target.def === --- target.def (revision 242040) +++ target.def (working copy) @@ -5029,6 +5029,18 @@ DEFHOOK reg_class_t, (reg_class_t, machine_mode), NULL) +/* Determine an additional allocno class. */ +DEFHOOK +(additional_allocno_class_p, + "This hook should return @code{true} if given class of registers should\ + be an allocno class in any way. Usually RA uses only one register\ + class from all classes containing the same register set. In some\ + complicated cases, you need to have two or more such classes as\ + allocno ones for RA correct work. Not defining this hook is\ + equivalent to returning @code{false} for all inputs.", + bool, (reg_class_t), + hook_bool_reg_class_t_false) + DEFHOOK (cstore_mode, "This hook defines the machine mode to use for the boolean result of\ Index: ira.c === --- ira.c (revision 242040) +++ ira.c (working copy) @@ -1012,7 +1012,7 @@ setup_allocno_and_important_classes (voi temp_hard_regset2)) break; } - if (j >= n) + if (j >= n || targetm.additional_allocno_class_p (i)) classes[n++] = (enum reg_class) i; else if (i == GENERAL_REGS) /* Prefer general regs. For i386 example, it means that Index: doc/tm.texi.in === --- doc/tm.texi.in (revision 242040) +++ doc/tm.texi.in (working copy) @@ -2507,6 +2507,8 @@ value that the middle-end intended. @hook TARGET_SPILL_CLASS +@hook TARGET_ADDITIONAL_ALLOCNO_CLASS_P + @hook TARGET_CSTORE_MODE @hook TARGET_COMPUTE_PRESSURE_CLASSES Index: doc/tm.texi === --- doc/tm.texi (revision 242040) +++ doc/tm.texi (working copy) @@ -2899,6 +2899,10 @@ addressing. This hook defines a class of registers which could be used for spilling pseudos of the given mode and class, or @code{NO_REGS} if only memory should be used. Not defining this hook is equivalent to returning @code{NO_REGS} for all inputs. @end deftypefn +@deftypefn {Target Hook} bool TARGET_ADDITIONAL_ALLOCNO_CLASS_P (reg_class_t) +This hook should return @code{true} if given class of registers should be an allocno class in any way. Usually RA uses only one register class from all classes containing the same register set. In some complicated cases, you need to have two or more such classes as allocno ones for RA correct work. Not defining this hook is equivalent to returning @code{false} for all inputs. +@end deftypefn + @deftypefn {Target Hook} machine_mode TARGET_CSTORE_MODE (enum insn_code @var{icode}) This hook defines the machine mode to use for the boolean result of conditional store patterns. The ICODE argument is the instruction code for the cstore being performed. Not definiting this hook is the same as accepting the mode encoded into operand 0 of the cstore expander patterns. @end deftypefn
Re: gomp-nvptx branch - libgomp changes
libgomp/ * Makefile.am (libgomp_la_SOURCES): Add atomic.c, icv.c, icv-device.c. * Makefile.in. Regenerate. * configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it... (LIBGOMP_USE_PTHREADS): ...here; new define. * configure: Regenerate. * config.h.in: Likewise. * config/posix/affinity.c: Move to... * affinity.c: ...here (new file). Guard use of PThreads-specific interface by LIBGOMP_USE_PTHREADS. * critical.c: Split out GOMP_atomic_{start,end} into... * atomic.c: ...here (new file). * env.c: Split out ICV definitions into... * icv.c: ...here (new file) and... * icv-device.c: ...here. New file. * config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c. (gomp_destroy_lock_30): Ditto. (gomp_set_lock_30): Ditto. (gomp_unset_lock_30): Ditto. (gomp_test_lock_30): Ditto. (gomp_init_nest_lock_30): Ditto. (gomp_destroy_nest_lock_30): Ditto. (gomp_set_nest_lock_30): Ditto. (gomp_unset_nest_lock_30): Ditto. (gomp_test_nest_lock_30): Ditto. * lock.c: New. * config/nvptx/lock.c: New. * config/nvptx/bar.c: New. * config/nvptx/bar.h: New. * config/nvptx/doacross.h: New. * config/nvptx/error.c: New. * config/nvptx/icv-device.c: New. * config/nvptx/mutex.h: New. * config/nvptx/pool.h: New. * config/nvptx/proc.c: New. * config/nvptx/ptrlock.h: New. * config/nvptx/sem.h: New. * config/nvptx/simple-bar.h: New. * config/nvptx/target.c: New. * config/nvptx/task.c: New. * config/nvptx/team.c: New. * config/nvptx/time.c: New. * config/posix/simple-bar.h: New. * libgomp.h: Guard pthread.h inclusion. Include simple-bar.h. (gomp_num_teams_var): Declare. (struct gomp_thread_pool): Change threads_dock member to gomp_simple_barrier_t. [__nvptx__] (gomp_thread): New implementation. (gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS. (gomp_thread_destructor): Ditto. (gomp_init_thread_affinity): Ditto. * team.c: Guard uses of PThreads-specific interfaces by LIBGOMP_USE_PTHREADS. Adjust all uses of threads_dock. (gomp_free_thread) [__nvptx__]: Do not call 'free'. * config/nvptx/alloc.c: Delete. * config/nvptx/barrier.c: Ditto. * config/nvptx/fortran.c: Ditto. * config/nvptx/iter.c: Ditto. * config/nvptx/iter_ull.c: Ditto. * config/nvptx/loop.c: Ditto. * config/nvptx/loop_ull.c: Ditto. * config/nvptx/ordered.c: Ditto. * config/nvptx/parallel.c: Ditto. * config/nvptx/section.c: Ditto. * config/nvptx/single.c: Ditto. * config/nvptx/splay-tree.c: Ditto. * config/nvptx/work.c: Ditto. * testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass -foffload=-lgfortran in addition to -lgfortran. * testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto. * plugin/plugin-nvptx.c: Include . (struct targ_fn_descriptor): Add new fields. (struct ptx_device): Ditto. Set them... (nvptx_open_device): ...here. (nvptx_adjust_launch_bounds): New. (nvptx_host2dev): Allow NULL 'nvthd'. (nvptx_dev2host): Ditto. (GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400. (link_ptx): Adjust log sizes. (nvptx_host2dev): Allow NULL 'nvthd'. (nvptx_dev2host): Ditto. (nvptx_set_clocktick): New. Use it... (GOMP_OFFLOAD_load_image): ...here. Set new targ_fn_descriptor fields. (GOMP_OFFLOAD_dev2dev): New. (nvptx_adjust_launch_bounds): New. (nvptx_stacks_size): New. (nvptx_stacks_alloc): New. (nvptx_stacks_free): New. (GOMP_OFFLOAD_run): New. (GOMP_OFFLOAD_async_run): New (stub). diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am index a3e1c2b..4090336 100644 --- a/libgomp/Makefile.am +++ b/libgomp/Makefile.am @@ -58,12 +58,12 @@ libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \ libgomp_la_DEPENDENCIES = $(libgomp_version_dep) libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS) -libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \ - iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \ - task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \ - time.c fortran.c affinity.c target.c splay-tree.c libgomp-plugin.c \ - oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \ - oacc-plugin.c oacc-cuda.c priority_queue.c +libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \ + icv.c icv-device.c iter.c iter_ull.c loop.c loop_ull.c ordered.c \ + parallel.c sections.c single.c task.c team.
Re: gomp-nvptx branch - middle-end changes
gcc/ * internal-fn.c (expand_GOMP_SIMT_LANE): New. (expand_GOMP_SIMT_VF): New. (expand_GOMP_SIMT_LAST_LANE): New. (expand_GOMP_SIMT_ORDERED_PRED): New. (expand_GOMP_SIMT_VOTE_ANY): New. (expand_GOMP_SIMT_XCHG_BFLY): New. (expand_GOMP_SIMT_XCHG_IDX): New. * internal-fn.def (GOMP_SIMT_LANE): New. (GOMP_SIMT_VF): New. (GOMP_SIMT_LAST_LANE): New. (GOMP_SIMT_ORDERED_PRED): New. (GOMP_SIMT_VOTE_ANY): New. (GOMP_SIMT_XCHG_BFLY): New. (GOMP_SIMT_XCHG_IDX): New. * omp-low.c (omp_maybe_offloaded_ctx): New, outlined from... (create_omp_child_function): ...here. Set "omp target entrypoint" or "omp declare target" attribute based on is_gimple_omp_offloaded. (omp_max_simt_vf): New. Use it... (omp_max_vf): ...here. (lower_rec_input_clauses): Add reduction lowering for SIMT execution. (lower_lastprivate_clauses): Likewise, for "lastprivate" lowering. (lower_omp_ordered): Likewise, for "ordered" lowering. (expand_omp_simd): Add SIMT transforms. (pass_data_lower_omp): Add PROP_gimple_lomp_dev. (execute_omp_device_lower): New. (pass_data_omp_device_lower): New. (pass_omp_device_lower): New pass. (make_pass_omp_device_lower): New. * passes.def (pass_omp_device_lower): Position new pass. * tree-pass.h (PROP_gimple_lomp_dev): Define. (make_pass_omp_device_lower): Declare. diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index cbee97e..fd1cd8b 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -157,6 +157,132 @@ expand_ANNOTATE (internal_fn, gcall *) gcc_unreachable (); } +/* Lane index on SIMT targets: thread index in the warp on NVPTX. On targets + without SIMT execution this should be expanded in omp_device_lower pass. */ + +static void +expand_GOMP_SIMT_LANE (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (targetm.have_omp_simt_lane ()); + emit_insn (targetm.gen_omp_simt_lane (target)); +} + +/* This should get expanded in omp_device_lower pass. */ + +static void +expand_GOMP_SIMT_VF (internal_fn, gcall *) +{ + gcc_unreachable (); +} + +/* Lane index of the first SIMT lane that supplies a non-zero argument. + This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the + lane that executed the last iteration for handling OpenMP lastprivate. */ + +static void +expand_GOMP_SIMT_LAST_LANE (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx cond = expand_normal (gimple_call_arg (stmt, 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], cond, mode); + gcc_assert (targetm.have_omp_simt_last_lane ()); + expand_insn (targetm.code_for_omp_simt_last_lane, 2, ops); +} + +/* Non-transparent predicate used in SIMT lowering of OpenMP "ordered". */ + +static void +expand_GOMP_SIMT_ORDERED_PRED (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx ctr = expand_normal (gimple_call_arg (stmt, 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], ctr, mode); + gcc_assert (targetm.have_omp_simt_ordered ()); + expand_insn (targetm.code_for_omp_simt_ordered, 2, ops); +} + +/* "Or" boolean reduction across SIMT lanes: return non-zero in all lanes if + any lane supplies a non-zero argument. */ + +static void +expand_GOMP_SIMT_VOTE_ANY (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx cond = expand_normal (gimple_call_arg (stmt, 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], cond, mode); + gcc_assert (targetm.have_omp_simt_vote_any ()); + expand_insn (targetm.code_for_omp_simt_vote_any, 2, ops); +} + +/* Exchange between SIMT lanes with a "butterfly" pattern: source lane index + is destination lane index XOR given offset. */ + +static void +expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx src = expand_normal (gimple_call_arg (stmt, 0)); + rtx idx = expand_normal (gimple_call_arg (stmt, 1)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_op
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On 11/10/2016 11:27 AM, Andrew Senkevich wrote: Hi, this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. It requires additional patch for register allocator from Vladimir Makarov to be committed before. I've just committed the necessary patch.
[PATCH 2/2][AArch64] Add bfx attribute
The second patch updates the Cortex-A57 scheduler now that we can differentiate between shifts and bitfield inserts. The Cortex-A57 Software Optimization Guide indicates that BFM operations use the integer multi-cycle pipeline, while ARM UXTB/H instructions use the Integer 1 or Integer 0 pipelines, so swap the bfm and extend reservations. This results in minor scheduling differences. I think the XGene-1 scheduler might need a similar change as currently all AArch64 shifts are modelled as 2-cycle operations. ChangeLog: 2016-11-10 Wilco Dijkstra * config/arm/cortex-a57.md (cortex_a57_alu): Move extend here, move bfm... (cortex_a57_alu_shift): ...here. -- diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md index da461846baa5b28ce3d9c9f731dbfd7becb31a85..63072509e50375929f75c44af900a4803a6285f3 100644 --- a/gcc/config/arm/cortex-a57.md +++ b/gcc/config/arm/cortex-a57.md @@ -297,7 +297,7 @@ (eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,\ alu_sreg,alus_sreg,logic_reg,logics_reg,\ adc_imm,adcs_imm,adc_reg,adcs_reg,\ - adr,bfm,bfx,clz,rbit,rev,alu_dsp_reg,\ + adr,bfx,extend,clz,rbit,rev,alu_dsp_reg,\ rotate_imm,shift_imm,shift_reg,\ mov_imm,mov_reg,\ mvn_imm,mvn_reg,\ @@ -307,7 +307,7 @@ ;; ALU ops with immediate shift (define_insn_reservation "cortex_a57_alu_shift" 3 (and (eq_attr "tune" "cortexa57") - (eq_attr "type" "extend,\ + (eq_attr "type" "bfm,\ alu_shift_imm,alus_shift_imm,\ crc,logic_shift_imm,logics_shift_imm,\ mov_shift,mvn_shift"))
[PATCH][AArch64] Improve TI mode address offsets
Improve TI mode address offsets - these may either use LDP of 64-bit or LDR of 128-bit, so we need to use the correct intersection of offsets. When splitting a large offset into base and offset, use a signed 9-bit unscaled offset. Remove the Ump constraint on movti and movtf instructions as this blocks the reload optimizer from merging address CSEs (is this supposed to work only on 'm' constraints?). The result is improved codesize, especially wrf and gamess in SPEC2006. int f (int x) { __int128_t arr[100]; arr[31] = 0; arr[48] = 0; arr[79] = 0; arr[65] = 0; arr[70] = 0; return arr[x]; } Before patch (note the multiple redundant add x1, sp, 1024): sub sp, sp, #1600 sbfiz x0, x0, 4, 32 add x1, sp, 256 stp xzr, xzr, [x1, 240] add x1, sp, 768 stp xzr, xzr, [x1] add x1, sp, 1024 stp xzr, xzr, [x1, 240] add x1, sp, 1024 stp xzr, xzr, [x1, 16] add x1, sp, 1024 stp xzr, xzr, [x1, 96] ldr w0, [sp, x0] add sp, sp, 1600 ret After patch: sub sp, sp, #1600 sbfiz x0, x0, 4, 32 add x1, sp, 1024 stp xzr, xzr, [sp, 496] stp xzr, xzr, [x1, -256] stp xzr, xzr, [x1, 240] stp xzr, xzr, [x1, 16] stp xzr, xzr, [x1, 96] ldr w0, [sp, x0] add sp, sp, 1600 ret Bootstrap & regress OK. ChangeLog: 2015-11-10 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (movti_aarch64): Change Ump to m. (movtf_aarch64): Likewise. * config/aarch64/aarch64.c (aarch64_classify_address): Use correct intersection of offsets. (aarch64_legitimize_address_displacement): Use 9-bit signed offsets. (aarch64_legitimize_address): Use 9-bit signed offsets for TI/TF mode. Use 7-bit signed scaled mode for modes > 16 bytes. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3045e6d6447d5c1860feb51708eeb2a21d2caca9..45f44e96ba9e9d3c8c41d977aa509fa13398a8fd 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4066,7 +4066,8 @@ aarch64_classify_address (struct aarch64_address_info *info, instruction memory accesses. */ if (mode == TImode || mode == TFmode) return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) - && offset_9bit_signed_unscaled_p (mode, offset)); + && (offset_9bit_signed_unscaled_p (mode, offset) + || offset_12bit_unsigned_scaled_p (mode, offset))); /* A 7bit offset check because OImode will emit a ldp/stp instruction (only big endian will get here). @@ -4270,18 +4271,19 @@ aarch64_legitimate_address_p (machine_mode mode, rtx x, /* Split an out-of-range address displacement into a base and offset. Use 4KB range for 1- and 2-byte accesses and a 16KB range otherwise to increase opportunities for sharing the base address of different sizes. - For TI/TFmode and unaligned accesses use a 256-byte range. */ + For unaligned accesses and TI/TF mode use the signed 9-bit range. */ static bool aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode mode) { - HOST_WIDE_INT mask = GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3fff; + HOST_WIDE_INT offset = INTVAL (*disp); + HOST_WIDE_INT base = offset & ~(GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3ffc); - if (mode == TImode || mode == TFmode || - (INTVAL (*disp) & (GET_MODE_SIZE (mode) - 1)) != 0) -mask = 0xff; + if (mode == TImode || mode == TFmode + || (offset & (GET_MODE_SIZE (mode) - 1)) != 0) +base = (offset + 0x100) & ~0x1ff; - *off = GEN_INT (INTVAL (*disp) & ~mask); - *disp = GEN_INT (INTVAL (*disp) & mask); + *off = GEN_INT (base); + *disp = GEN_INT (offset - base); return true; } @@ -5148,12 +5150,10 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) x = gen_rtx_PLUS (Pmode, base, offset_rtx); } - /* Does it look like we'll need a load/store-pair operation? */ + /* Does it look like we'll need a 16-byte load/store-pair operation? */ HOST_WIDE_INT base_offset; - if (GET_MODE_SIZE (mode) > 16 - || mode == TImode) - base_offset = ((offset + 64 * GET_MODE_SIZE (mode)) - & ~((128 * GET_MODE_SIZE (mode)) - 1)); + if (GET_MODE_SIZE (mode) > 16) + base_offset = (offset + 0x400) & ~0x7f0; /* For offsets aren't a multiple of the access size, the limit is -256...255. */ else if (offset & (GET_MODE_SIZE (mode) - 1)) @@ -5167,6 +5167,8 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) /* Small negative offsets are supported. */ else if (IN_RANGE (offset, -256, 0)) base_offset = 0; + else if (mode == TImod
Re: [PATCH 2/2][AArch64] Add bfx attribute
> On 10 Nov 2016, at 18:14, Wilco Dijkstra wrote: > > I think the XGene-1 scheduler might need a similar change as currently all > AArch64 > shifts are modelled as 2-cycle operations. Thanks for the heads-up. We’ll indeed need to update this. Regards, Philipp.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-10 19:36 GMT+03:00 Jakub Jelinek : > On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote: >> Hi, >> >> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. >> >> It requires additional patch for register allocator from Vladimir >> Makarov to be committed before. > > Your MUA ate tabs (and in the ChangeLog you're using spaces instead of > tabs), can you repost as attachment or configure your MUA not to do this? > > Just a couple of random nits follow: > >> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps. > > This mentions an option that doesn't exist, is that s/dd// ? Yes. Attached fixed version. -- WBR, Andrew new_avx512_instructions.patch Description: Binary data
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-10 20:14 GMT+03:00 Vladimir N Makarov : > > > On 11/10/2016 11:27 AM, Andrew Senkevich wrote: >> >> Hi, >> >> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. >> >> It requires additional patch for register allocator from Vladimir >> Makarov to be committed before. >> >> > I've just committed the necessary patch. Thanks, Vladimir. -- WBR, Andrew
[PATCH][ARM] Improve max_insns_skipped logic
Improve the logic when setting max_insns_skipped. Limit the maximum size of IT to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed, increasing codesize. Given 4 works well for Thumb-2, use the same limit for ARM for consistency. ChangeLog: 2016-11-04 Wilco Dijkstra * config/arm/arm.c (arm_option_params_internal): Improve setting of max_insns_skipped. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2901,20 +2901,12 @@ arm_option_params_internal (void) targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET; } - if (optimize_size) -{ - /* If optimizing for size, bump the number of instructions that we - are prepared to conditionally execute (even on a StrongARM). */ - max_insns_skipped = 6; + /* Increase the number of conditional instructions with -Os. */ + max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped; - /* For THUMB2, we limit the conditional sequence to one IT block. */ - if (TARGET_THUMB2) -max_insns_skipped = arm_restrict_it ? 1 : 4; -} - else -/* When -mrestrict-it is in use tone down the if-conversion. */ -max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it) - ? 1 : current_tune->max_insns_skipped; + /* For THUMB2, we limit the conditional sequence to one IT block. */ + if (TARGET_THUMB2) +max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK); } /* True if -mflip-thumb should next add an attribute for the default
Re: [PATCH] dwarf2cfi: Dump row differences before asserting
Ping. On Mon, Oct 31, 2016 at 05:15:53PM +, Segher Boessenkool wrote: > If maybe_record_trace_start fails because the CFI is inconsistent on two > paths into a block it currently just ICEs. This changes it to also dump > the CFI on those two paths in the dump file; debugging it without that > information is hopeless. > > Tested on powerpc64-linux {-m32,-m64}. Is this okay for trunk? > > > Segher > > > 2016-10-31 Segher Boessenkool > > * dwarf2cfi.c (dump_cfi_row): Add forward declaration. > (maybe_record_trace_start): If the CFI is different on the new and > old paths, print out both to the dump file before ICEing. > > --- > gcc/dwarf2cfi.c | 18 +- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c > index da9da52..19edc28 100644 > --- a/gcc/dwarf2cfi.c > +++ b/gcc/dwarf2cfi.c > @@ -2239,6 +2239,8 @@ add_cfis_to_fde (void) > } > } > > +static void dump_cfi_row (FILE *f, dw_cfi_row *row); > + > /* If LABEL is the start of a trace, then initialize the state of that > trace from CUR_TRACE and CUR_ROW. */ > > @@ -2282,7 +2284,21 @@ maybe_record_trace_start (rtx_insn *start, rtx_insn > *origin) >/* We ought to have the same state incoming to a given trace no >matter how we arrive at the trace. Anything else means we've >got some kind of optimization error. */ > - gcc_checking_assert (cfi_row_equal_p (cur_row, ti->beg_row)); > +#if CHECKING_P > + if (!cfi_row_equal_p (cur_row, ti->beg_row)) > + { > + if (dump_file) > + { > + fprintf (dump_file, "Inconsistent CFI state!\n"); > + fprintf (dump_file, "SHOULD have:\n"); > + dump_cfi_row (dump_file, ti->beg_row); > + fprintf (dump_file, "DO have:\n"); > + dump_cfi_row (dump_file, cur_row); > + } > + > + gcc_unreachable (); > + } > +#endif > >/* The args_size is allowed to conflict if it isn't actually used. */ >if (ti->beg_true_args_size != args_size) > -- > 1.9.3
Re: [PATCH] Introduce -fprofile-update=maybe-atomic
On 11/10/2016 08:24 AM, Martin Liška wrote: On 11/10/2016 05:17 PM, David Edelsohn wrote: Maybe instead of adding "maybe", we need to change the severity of the warning so that the warning is not emitted by default. Adding the warning option to -Wextra can be solution. Is it acceptable approach? I don't think that's good. Now I understand the -pthreads thing, we have different use cases. 1) user explicitly said -fprofile-update=FOO. They shouldn't have to enable something else to get a diagnostic that FOO doesn't work. 2) driver implicitly said -fprofile-update=FOO, because the user said -pthreads but the driver doesn't know if FOO is acceptable. We want to silently fallback to the old behaviour. The proposed solution addresses #2 by having the driver say -fprofile-update=META-FOO. My dislike is that we're exposing this to the user and they're going to start using it. That strikes me as undesirable. How hard is it to implement the fprofile-update option value as a list. I.e. '-fprofile-update=atomic,single', with semantics of 'pick the first one you can do'? If that's straightforwards, then that seems to me as a better solution for #2. [flyby-thought, have 'atomic,single' as an acceptable single option value?] Failing that, Martin's solution is probably the sanest available solution, but I'd like to rename 'maybe-atomic' to the more meaningful 'prefer-atomic'. With 'maybe-atomic', I'm left wondering if it looks at the phase of the moon. nathan -- Nathan Sidwell
Re: [PATCH] dwarf2cfi: Dump row differences before asserting
On Thu, Nov 10, 2016 at 11:29:12AM -0600, Segher Boessenkool wrote: > Ping. > > On Mon, Oct 31, 2016 at 05:15:53PM +, Segher Boessenkool wrote: > > If maybe_record_trace_start fails because the CFI is inconsistent on two > > paths into a block it currently just ICEs. This changes it to also dump > > the CFI on those two paths in the dump file; debugging it without that > > information is hopeless. > > > > Tested on powerpc64-linux {-m32,-m64}. Is this okay for trunk? > > > > > > Segher > > > > > > 2016-10-31 Segher Boessenkool > > > > * dwarf2cfi.c (dump_cfi_row): Add forward declaration. > > (maybe_record_trace_start): If the CFI is different on the new and > > old paths, print out both to the dump file before ICEing. Ok, thanks. Jakub
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On Fri, 28 Oct 2016, Richard Biener wrote: > +/* Parse a gimple expression. > + > + gimple-expression: > + gimple-unary-expression > + gimple-call-statement > + gimple-binary-expression > + gimple-assign-expression > + gimple-cast-expression I don't see any comments expanding what the syntax is for most of these constructs. > + if (c_parser_next_token_is (parser, CPP_EQ)) > +c_parser_consume_token (parser); That implies you're allowing an optional '=' at this point in the syntax. That doesn't seem to make sense to me; I'd expect you to do if (=) { process assignment; } else { other cases; } or similar. > + /* GIMPLE PHI expression. */ > + if (c_parser_next_token_is_keyword (parser, RID_PHI)) I don't see this mentioned in any of the syntax comments. > + struct { > +/* The expression at this stack level. */ > +struct c_expr expr; > +/* The operation on its left. */ > +enum tree_code op; > +/* The source location of this operation. */ > +location_t loc; > + } stack[2]; > + int sp; > + /* Location of the binary operator. */ > + location_t binary_loc = UNKNOWN_LOCATION; /* Quiet warning. */ > +#define POP\ This all looks like excess complexity. The syntax in the comment indicates that in GIMPLE, the operands of a binary expression are unary expressions. So nothing related to precedence is needed at all, and you shouldn't need this stack construct. -- Joseph S. Myers jos...@codesourcery.com
[gomp4] add support for derived types in ACC UPDATE
OpenACC 2.0a has limited support for fortran derived types. Basically, derived type variables are only supported in ACC UPDATE. Rather than adding generalized support for derived times in the gimplifier, this patch has the fortran FE pass both subarrays and arrays as void pointers with an appropriate OMP_CLAUSE_SIZE. ACC UPDATE is an executable directive, and the gimplifier would just end up pruning out all of the unnecessary supporting data clauses otherwise. As of right now, GCC still lacks support for non-contiguous subarray arguments to ACC UPDATE. I'm not sure why it was decided to let ACC UPDATE support non-contiguous subarrays, but it already is an oddball with its lone support for derived types. This patch has been committed to gomp-4_0-branch. Cesar 2016-11-10 Cesar Philippidis gcc/fortran/ * openmp.c (gfc_match_omp_variable_list): New allow_derived argument. (gfc_match_omp_map_clause): Update call to gfc_match_omp_variable_list. (gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clause. (gfc_match_oacc_update): Update call to gfc_match_omp_clauses. (resolve_omp_clauses): Permit derived type variables in ACC UPDATE clauses. * trans-openmp.c (gfc_trans_omp_clauses_1): Lower derived type members. gcc/ * gimplify.c (gimplify_scan_omp_clauses): Update handling of ACC UPDATE variables. gcc/testsuite/ * gfortran.dg/goacc/derived-types.f90: New test. libgomp/ * testsuite/libgomp.oacc-fortran/update-2.f90: New test. diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 95885bc..0a9d137 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -216,7 +216,8 @@ static match gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list, bool allow_common, bool *end_colon = NULL, gfc_omp_namelist ***headp = NULL, - bool allow_sections = false) + bool allow_sections = false, + bool allow_derived = false) { gfc_omp_namelist *head, *tail, *p; locus old_loc, cur_loc; @@ -242,7 +243,8 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list, case MATCH_YES: gfc_expr *expr; expr = NULL; - if (allow_sections && gfc_peek_ascii_char () == '(') + if (allow_sections && gfc_peek_ascii_char () == '(' + || allow_derived && gfc_peek_ascii_char () == '%') { gfc_current_locus = cur_loc; m = gfc_match_variable (&expr, 0); @@ -634,10 +636,11 @@ cleanup: static bool gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op, - bool common_blocks) + bool common_blocks, bool allow_derived) { gfc_omp_namelist **head = NULL; - if (gfc_match_omp_variable_list ("", list, common_blocks, NULL, &head, true) + if (gfc_match_omp_variable_list ("", list, common_blocks, NULL, &head, true, + allow_derived) == MATCH_YES) { gfc_omp_namelist *n; @@ -655,7 +658,8 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op, static match gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask, uint64_t dtype_mask, bool first = true, - bool needs_space = true, bool openacc = false) + bool needs_space = true, bool openacc = false, + bool allow_derived = false) { gfc_omp_clauses *base_clauses, *c = gfc_get_omp_clauses (); locus old_loc; @@ -773,7 +777,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask, if ((mask & OMP_CLAUSE_COPY) && gfc_match ("copy ( ") == MATCH_YES && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP], - OMP_MAP_FORCE_TOFROM, openacc)) + OMP_MAP_FORCE_TOFROM, openacc, + allow_derived)) continue; if (mask & OMP_CLAUSE_COPYIN) { @@ -781,7 +786,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask, { if (gfc_match ("copyin ( ") == MATCH_YES && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP], - OMP_MAP_FORCE_TO, true)) + OMP_MAP_FORCE_TO, true, + allow_derived)) continue; } else if (gfc_match_omp_variable_list ("copyin (", @@ -792,7 +798,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask, if ((mask & OMP_CLAUSE_COPYOUT) && gfc_match ("copyout ( ") == MATCH_YES && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP], - OMP_MAP_FORCE_FROM, true)) + OMP_MAP_FORCE_FROM, true, + allow_derived)) continue; if ((mask & OMP_CLAUSE_COPYPRIVATE) && gfc_match_omp_variable_list ("copyprivate (", @@ -802,14 +809,15 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask, if ((mask & OMP_CLAUSE_CREATE) && gfc_match ("create ( ") == MATCH_YES && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP], - OMP_MAP_FORCE_ALLOC, true)) + OMP_MAP_FORCE_ALLOC, true, + allow_derived)) continue; break; case 'd': if ((mask & OMP_CLAUSE_DELETE) && gfc_match ("delete ( ") == MATCH_YES && gfc_match_omp_map_cl
Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers
On Thu, Nov 10, 2016 at 10:05:25AM -0600, Segher Boessenkool wrote: > Hi Mike, > > > I have built the spec 2006 CPU benchmark suite with these changes, and the > > power8 (ISA 2.07) code generation does not change. > > Very good to hear :-) > > Just some nits; okay for trunk with that fixed: > > > +(define_split > > + [(set (match_operand:EXTHI 0 "altivec_register_operand" "") > > + (sign_extend:EXTHI > > +(match_operand:HI 1 "indexed_or_indirect_operand" "")))] > > + "TARGET_P9_VECTOR && reload_completed" > > + [(set (match_dup 2) > > + (match_dup 1)) > > + (set (match_dup 0) > > + (sign_extend:EXTHI (match_dup 2)))] > > +{ > > + operands[2] = gen_rtx_REG (HImode, REGNO (operands[1])); > > +}) > > Please lose the default "" (here and elsewhere). Ok for the match_operands, but I discovered that match_scratch still needs the "". > > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c > > ___ > > Modified: svn:mergeinfo > >Merged > > /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c:r241733-241924 > > > > > > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c > > ___ > > Modified: svn:mergeinfo > >Merged > > /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c:r241733-241924 > > I don't know what this is? This is some artifact from svnmerge. It shows up a lot of times when I do a svn diff against the branch. Usually I try to delete it, but once in awhile it slips through. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[patch, doc] PR 37998 -- Unclear documentation of -fno-common
This patch is for a long-standing issue with the way -fno-common is documented. The specific bug report was about an ambiguous use of the word "This", but as I looked more at the way this option was described, I realized the whole thing was needlessly confusing because it didn't use the correct terminology -- the current text tries to use "declaration" to describe what the C standard calls a "tentative definition". Joseph, can you look this over with your language-lawyer hat on, before I commit it? I'm pretty sure I understand what this option does, but I want to be sure I'm explaining it correctly now. -Sandra 2016-11-09 Sandra Loosemore PR c/37998 gcc/ * doc/invoke.texi (Code Gen Options) [-fno-common]: Use correct terminology. Expand to remove ambiguity. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 241974) +++ gcc/doc/invoke.texi (working copy) @@ -11953,25 +11953,32 @@ Use it to conform to a non-default appli @item -fno-common @opindex fno-common -In C code, controls the placement of uninitialized global variables. -Unix C compilers have traditionally permitted multiple definitions of -such variables in different compilation units by placing the variables -in a common block. -This is the behavior specified by @option{-fcommon}, and is the default -for GCC on most targets. -On the other hand, this behavior is not required by ISO C, and on some -targets may carry a speed or code size penalty on variable references. -The @option{-fno-common} option specifies that the compiler should place -uninitialized global variables in the data section of the object file, -rather than generating them as common blocks. -This has the effect that if the same variable is declared -(without @code{extern}) in two different compilations, -you get a multiple-definition error when you link them. -In this case, you must compile with @option{-fcommon} instead. +@cindex tentative definitions +In C code, this option controls the placement of global variables +defined without an initializer, known as @dfn{tentative definitions} +in the C standard. Tentative definitions are distinct from declarations +of a variable with the @code{extern} keyword, which do not allocate storage. + +Unix C compilers have traditionally allocated storage for +uninitialized global variables in a common block. This allows the +linker to resolve all tentative definitions of the same variable +in different compilation units to the same object, or to a non-tentative +definition. +This is the behavior specified by @option{-fcommon}, and is the default for +GCC on most targets. +On the other hand, this behavior is not required by ISO +C, and on some targets may carry a speed or code size penalty on +variable references. + +The @option{-fno-common} option specifies that the compiler should instead +place uninitialized global variables in the data section of the object file. +This inhibits the merging of tentative definitions by the linker so +you get a multiple-definition error if the same +variable is defined in more than one compilation unit. Compiling with @option{-fno-common} is useful on targets for which it provides better performance, or if you wish to verify that the program will work on other systems that always treat uninitialized -variable declarations this way. +variable definitions this way. @item -fno-ident @opindex fno-ident
Re: [patch, doc] PR 37998 -- Unclear documentation of -fno-common
On Thu, 10 Nov 2016, Sandra Loosemore wrote: > This patch is for a long-standing issue with the way -fno-common is > documented. The specific bug report was about an ambiguous use of the word > "This", but as I looked more at the way this option was described, I realized > the whole thing was needlessly confusing because it didn't use the correct > terminology -- the current text tries to use "declaration" to describe what > the C standard calls a "tentative definition". > > Joseph, can you look this over with your language-lawyer hat on, before I > commit it? I'm pretty sure I understand what this option does, but I want to > be sure I'm explaining it correctly now. This looks correct to me. -- Joseph S. Myers jos...@codesourcery.com
[PATCH, gcc, wwwdocs] Document upcoming Qualcomm Falkor processor support
Hi, This patch documents the newly added flag in gcc 7 for the upcoming Qualcomm Falkor processor core. Siddhesh Index: htdocs/gcc-7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v retrieving revision 1.24 diff -u -r1.24 changes.html --- htdocs/gcc-7/changes.html 9 Nov 2016 14:28:59 - 1.24 +++ htdocs/gcc-7/changes.html 10 Nov 2016 13:43:42 - @@ -296,6 +296,11 @@ -mcpu=vulcan and -mtune=vulcan options as well as the equivalent target attributes and pragmas. + + The Qualcomm Falkor processor is now supported via the + -mcpu=falkor and -mtune=falkor options as + well as the equivalent target attributes and pragmas. + ARM
Re: [PATCH] Fix PR71762
On Thu, 10 Nov 2016, Richard Biener wrote: The following fixes PR71762 via reverting the transforms of ~X & Y to X < Y and similar because when the bools they apply to are expanded to RTL undefined values are not reliably zero-extended and thus the transform is invalid. Ensuring the zero-extension is too costly IMHO and the proper fix is to move the transform to RTL where we can check known-zero-bits to validate validity or to fix GIMPLE not not have operations on types not matching their mode in precision. Can you explain why this particular transformation is special? We have a number of other optimizations that take advantage of the fact that bool is in [0, 1], even without looking at VRP, say for instance x != 0 -> x. Are we supposed to remove all of them? -- Marc Glisse
Re: [PATCH] Fix PR71762
On November 10, 2016 7:39:57 PM GMT+01:00, Marc Glisse wrote: >On Thu, 10 Nov 2016, Richard Biener wrote: > >> The following fixes PR71762 via reverting the transforms of >> ~X & Y to X < Y and similar because when the bools they apply to >> are expanded to RTL undefined values are not reliably zero-extended >> and thus the transform is invalid. Ensuring the zero-extension >> is too costly IMHO and the proper fix is to move the transform >> to RTL where we can check known-zero-bits to validate validity >> or to fix GIMPLE not not have operations on types not matching their >mode >> in precision. > >Can you explain why this particular transformation is special? We have >a >number of other optimizations that take advantage of the fact that bool >is >in [0, 1], even without looking at VRP, say for instance x != 0 -> x. >Are >we supposed to remove all of them? No. But UNDEFINED & 0 is well-defined Even if precision ends up being bigger. Undefined != 0 is never well-defined. Richard.
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On November 10, 2016 6:38:12 PM GMT+01:00, Joseph Myers wrote: >On Fri, 28 Oct 2016, Richard Biener wrote: > >> +/* Parse a gimple expression. >> + >> + gimple-expression: >> + gimple-unary-expression >> + gimple-call-statement >> + gimple-binary-expression >> + gimple-assign-expression >> + gimple-cast-expression > >I don't see any comments expanding what the syntax is for most of these > >constructs. > >> + if (c_parser_next_token_is (parser, CPP_EQ)) >> +c_parser_consume_token (parser); > >That implies you're allowing an optional '=' at this point in the >syntax. >That doesn't seem to make sense to me; I'd expect you to do if (=) { >process assignment; } else { other cases; } or similar. > >> + /* GIMPLE PHI expression. */ >> + if (c_parser_next_token_is_keyword (parser, RID_PHI)) > >I don't see this mentioned in any of the syntax comments. > >> + struct { >> +/* The expression at this stack level. */ >> +struct c_expr expr; >> +/* The operation on its left. */ >> +enum tree_code op; >> +/* The source location of this operation. */ >> +location_t loc; >> + } stack[2]; >> + int sp; >> + /* Location of the binary operator. */ >> + location_t binary_loc = UNKNOWN_LOCATION; /* Quiet warning. */ >> +#define POP \ > >This all looks like excess complexity. The syntax in the comment >indicates that in GIMPLE, the operands of a binary expression are unary > >expressions. So nothing related to precedence is needed at all, and >you >shouldn't need this stack construct. I'll address those comments. As you did not have any comments on the c-parser.[CH] parts does that mean you are fine with them? That is, does the above constitute a complete review of the patch? Thanks, Richard.
[PATCH, Fortran] PR78277: ICE in is_anonymous_component, at fortran/interface.c:450
All, The attached fixes an ICE-on-invalid-code, specifically due to invalid anonymous structure declarations, as seen in the attached test case. This also improves error handling in such cases- the anonymous structure body will continue to be parsed even if the variable-decl after the opening variable-type-decl is invalid. (Something similar could be done to improve regular structure declarations.) See the in-code comments and comments on the on the PR for an additional description Along with the first patch, I've attached another patch (struct_whitespace) containing whitespace-only changes to some dec-structure-related code; the poor formatting was introduced before I had my vim settings right. I intend to commit the two attached patches soon for trunk if nobody finds any issues with it. They both regtest on x86_64-redhat-linux of course. --- Fritz Reese > pr78277.diff From: Fritz O. Reese Date: Thu, 10 Nov 2016 13:36:54 -0500 Subject: [PATCH] Fix ICE and improve errors for invalid anonymous structure declarations. PR fortran/78277 * gcc/fortran/decl.c (gfc_match_data_decl): Gracefully handle bad anonymous structure declarations. PR fortran/78277 * gcc/testsuite/gfortran.dg/dec_structure_17.f90: New test. < > struct_whitespace.diff From: Fritz O. Reese Date: Thu, 10 Nov 2016 11:02:08 -0500 Subject: [PATCH] Fix some whitespace. gcc/fortran/ * decl.c (get_struct_decl, gfc_match_map, gfc_match_union): Fix whitespace. * interface.c (gfc_compare_union_types): Likewise. < diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 1272f1f..bf6bc24 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -4901,7 +4901,28 @@ ok: } if (!gfc_error_flag_test ()) -gfc_error ("Syntax error in data declaration at %C"); +{ + /* An anonymous structure declaration is unambiguous; if we matched one +according to gfc_match_structure_decl, we need to return MATCH_YES +here to avoid confusing the remaining matchers, even if there was an +error during variable_decl. We must flush any such errors. Note this +causes the parser to gracefully continue parsing the remaining input +as a structure body, which likely follows. */ + if (current_ts.type == BT_DERIVED && current_ts.u.derived + && gfc_fl_struct (current_ts.u.derived->attr.flavor)) + { + gfc_error_now ("Syntax error in anonymous structure declaration" +" at %C"); + /* Skip the bad variable_decl and line up for the start of the +structure body. */ + gfc_error_recovery (); + m = MATCH_YES; + goto cleanup; + } + + gfc_error ("Syntax error in data declaration at %C"); +} + m = MATCH_ERROR; gfc_free_data_all (gfc_current_ns); diff --git a/gcc/testsuite/gfortran.dg/dec_structure_17.f90 b/gcc/testsuite/gfortran.dg/dec_structure_17.f90 new file mode 100644 index 000..18d3193 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/dec_structure_17.f90 @@ -0,0 +1,27 @@ +! { dg-do compile } +! { dg-options "-fdec-structure" } +! +! PR fortran/78277 +! +! Fix ICE for invalid structure declaration code. +! + +subroutine sub1() + structure /s/ +structure t + integer i +end structure + end structure + record /s/ u + interface +subroutine sub0(u) + structure /s/ +structure t. ! { dg-error "Syntax error in anonymous structure decl" } + integer i +end structure + end structure + record /s/ u +end + end interface + call sub0(u) +end diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 0120ceb..1272f1f 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8597,31 +8597,31 @@ get_struct_decl (const char *name, sym_flavor fl, locus *decl, match gfc_match_map (void) { -/* Counter used to give unique internal names to map structures. */ -static unsigned int gfc_map_id = 0; -char name[GFC_MAX_SYMBOL_LEN + 1]; -gfc_symbol *sym; -locus old_loc; + /* Counter used to give unique internal names to map structures. */ + static unsigned int gfc_map_id = 0; + char name[GFC_MAX_SYMBOL_LEN + 1]; + gfc_symbol *sym; + locus old_loc; -old_loc = gfc_current_locus; + old_loc = gfc_current_locus; -if (gfc_match_eos () != MATCH_YES) - { - gfc_error ("Junk after MAP statement at %C"); - gfc_current_locus = old_loc; - return MATCH_ERROR; - } + if (gfc_match_eos () != MATCH_YES) +{ + gfc_error ("Junk after MAP statement at %C"); + gfc_current_locus = old_loc; + return MATCH_ERROR; +} -/* Map blocks are anonymous so we make up unique names for the symbol table - which are invalid Fortran identifiers. */ -snprintf (name, GFC_MAX_SYMBOL_LEN + 1, "MM$%u", gfc_map_id++); + /* Map blocks are anonymous so we make up unique names for the symbol table +
Re: [ipa-vrp] ice in set_value_range
Hi David, Sorry about the breakage. I have already reverted this patch as this is causing bootstrap failures. I will test it on more targets before submitting this patch again. Thanks, Kugan On 11/11/16 00:25, David Edelsohn wrote: Kugan Is there a PR for this failure? It broke bootstrap on AIX as well and I only was able to track it to your patch last night. Thanks, David
Re: [PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)
Hi, On 10 November 2016 at 18:00, Jakub Jelinek wrote: > Hi! > > On arm/aarch64 we ICE because some decls that make it here has non-NULL > DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that > doesn't fit into shwi would ICE similarly). While it is arguably a FE bug > that it creates for VLA initialization from STRING_CST such a decl, > I believe we have some PRs about it already open. > I think it won't hurt to check for the large sizes properly even in this > function though, and punt on unexpected cases, or even extremely large ones. > > Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going > to test on arm and/or aarch64. Ok for trunk if the testing there passes > too? Bootstrapped and regtested on arm, aarch64, linux and bare targets without issue. Thanks, Yvan > 2016-11-10 Jakub Jelinek > > PR middle-end/78201 > * varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo. > Don't test decl != NULL. Don't look at DECL_SIZE, but DECL_SIZE_UNIT > instead, return false if it is NULL, or doesn't fit into uhwi, or > is larger or equal to targetm.max_anchor_offset. > > * g++.dg/opt/pr78201.C: New test. > > --- gcc/varasm.c.jj 2016-10-31 13:28:12.0 +0100 > +++ gcc/varasm.c2016-11-10 15:18:41.282886244 +0100 > @@ -6804,11 +6804,12 @@ default_use_anchors_for_symbol_p (const_ > return false; > >/* Don't use section anchors for decls that won't fit inside a single > -anchor range to reduce the amount of instructions require to refer > +anchor range to reduce the amount of instructions required to refer > to the entire declaration. */ > - if (decl && DECL_SIZE (decl) > -&& tree_to_shwi (DECL_SIZE (decl)) > - >= (targetm.max_anchor_offset * BITS_PER_UNIT)) > + if (DECL_SIZE_UNIT (decl) == NULL_TREE > + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (decl)) > + || (tree_to_uhwi (DECL_SIZE_UNIT (decl)) > + >= (unsigned HOST_WIDE_INT) targetm.max_anchor_offset)) > return false; > > } > --- gcc/testsuite/g++.dg/opt/pr78201.C.jj 2016-11-10 15:20:18.398660681 > +0100 > +++ gcc/testsuite/g++.dg/opt/pr78201.C 2016-11-10 15:19:58.0 +0100 > @@ -0,0 +1,13 @@ > +// PR middle-end/78201 > +// { dg-do compile } > +// { dg-options "-O2" } > + > +struct B { long d (); } *c; > +long e; > + > +void > +foo () > +{ > + char a[e] = ""; > + c && c->d(); > +} > > Jakub
[gomp4] remove OMP_CLAUSE_DEVICE_RESIDENT
I've committed this patch to gomp-4_0-branch which removes OMP_CLAUSE_DEVICE_RESIDENT. This standalone clause is no longer necessary, and hasn't been for a while, because device_resident is treated as a data mapping type for OMP_CLAUSE_MAP, and not a clause itself. Cesar 2016-11-10 Cesar Philippidis gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses_1): Remove OMP_CLAUSE_DEVICE_RESIDENT. gcc/ * gimplify.c (gimplify_scan_omp_clauses): Remove OMP_CLAUSE_DEVICE_RESIDENT. (gimplify_adjust_omp_clauses): Likewise. * omp-low.c (scan_sharing_clauses): Likewise. * tree-core.h (enum omp_clause_code): Likewise. * tree-nested.c (convert_nonlocal_omp_clauses): (convert_local_omp_clauses): * tree-pretty-print.c (dump_omp_clause): Likewise. * tree.c (walk_tree_1): Likewise. diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 9924872..3c53414 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1781,9 +1781,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses, case OMP_LIST_USE_DEVICE: clause_code = OMP_CLAUSE_USE_DEVICE_PTR; goto add_clause; - case OMP_LIST_DEVICE_RESIDENT: - clause_code = OMP_CLAUSE_DEVICE_RESIDENT; - goto add_clause; add_clause: omp_clauses diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 36c128b..9649fae 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -7594,10 +7594,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, remove = true; break; - case OMP_CLAUSE_DEVICE_RESIDENT: - remove = true; - break; - case OMP_CLAUSE_NOWAIT: case OMP_CLAUSE_ORDERED: case OMP_CLAUSE_UNTIED: @@ -8445,7 +8441,6 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, case OMP_CLAUSE__CILK_FOR_COUNT_: case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WAIT: - case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_NUM_GANGS: case OMP_CLAUSE_NUM_WORKERS: diff --git a/gcc/omp-low.c b/gcc/omp-low.c index e7cb66c..142330c 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -2246,7 +2246,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx, break; case OMP_CLAUSE_BIND: - case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_NOHOST: case OMP_CLAUSE__CACHE_: default: @@ -2414,7 +2413,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx, break; case OMP_CLAUSE_BIND: - case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_NOHOST: case OMP_CLAUSE__CACHE_: default: diff --git a/gcc/tree-core.h b/gcc/tree-core.h index b07ff30..a3c4b18 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -313,9 +313,6 @@ enum omp_clause_code { #pragma acc cache (variable-list). */ OMP_CLAUSE__CACHE_, - /* OpenACC clause: device_resident (variable_list). */ - OMP_CLAUSE_DEVICE_RESIDENT, - /* OpenACC clause: gang [(gang-argument-list)]. Where gang-argument-list: [gang-argument-list, ] gang-argument diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c index ed77798..55f1f20 100644 --- a/gcc/tree-nested.c +++ b/gcc/tree-nested.c @@ -1215,7 +1215,6 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) gcc_unreachable (); case OMP_CLAUSE_BIND: - case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_NOHOST: default: gcc_unreachable (); @@ -1914,7 +1913,6 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) gcc_unreachable (); case OMP_CLAUSE_BIND: - case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_NOHOST: default: gcc_unreachable (); diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index f5970e1..f1af737 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -407,9 +407,6 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags) case OMP_CLAUSE__LOOPTEMP_: name = "_looptemp_"; goto print_remap; -case OMP_CLAUSE_DEVICE_RESIDENT: - name = "device_resident"; - goto print_remap; case OMP_CLAUSE_TO_DECLARE: name = "to"; goto print_remap; diff --git a/gcc/tree.c b/gcc/tree.c index f26e282..7c87612 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -281,7 +281,6 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_USE_DEVICE_PTR */ 1, /* OMP_CLAUSE_IS_DEVICE_PTR */ 2, /* OMP_CLAUSE__CACHE_ */ - 1, /* OMP_CLAUSE_DEVICE_RESIDENT */ 2, /* OMP_CLAUSE_GANG */ 1, /* OMP_CLAUSE_ASYNC */ 1, /* OMP_CLAUSE_WAIT */ @@ -356,7 +355,6 @@ const char * const omp_clause_code_name[] = "use_device_ptr", "is_device_ptr", "_cache_", - "device_resident", "gang", "async", "wait", @@ -11651,7 +11649,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data, WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1)); /* FALLTHRU */ - case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WAIT: case OMP_CLAUSE_WORKER:
Re: [RFC] Check number of uses in simplify_cond_using_ranges().
On Wed, Nov 09, 2016 at 03:46:38PM +0100, Richard Biener wrote: > On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt wrote: > > Something like the attached patch? Robin and me have spent quite > > some time to figure out the new pattern. Two questions: > > > > 1) In the match expression you cannot just use SSA_NAME@0 because > >then the "case SSA_NAME:" is added to a switch for another > >pattern that already has that label. Thus we made that "proxy" > >predicate "ssa_name_p" that forces the code for the new pattern > >out of the old switch and into a separate code block. We > >couldn't figure out whether this joining of case labels is a > >feature in the matching language. So, is this the right way to > >deal with the conflicting labels? > > No, just do not match SSA_NAME. And instead of > > + (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); } > + (if (is_gimple_assign (def_stmt) > + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) > > you instead want to change the pattern to > > (simpify > (cmp (convert @0) INTEGER_CST@1) > > @0 will then be your innerop > > note that you can't use get_value_range but you have to use the > get_range_info interface instead. I suppose a helper function > somewhere that computes whether an expression fits a type > would be helpful (see expr_not_equal_to for sth related, > thus expr_fits_type_p (@0, TREE_TYPE (@1))) All right, I think we got that (new patch attached). > Likewise the overflow_infinity checks do not translate to match.pd > (or rahter the range info you get). Can you give us another hint here, please? The overflow check should probably go into expr_fits_type_p, but with only the min and max values from get_range_info, how can the overflow TREE_OVERFLOW_P flag be retrieved from @1, to duplicate the logic from is_{nega,posi}tive_overflow_infinity? Is it availably somewhere, or is it necessary to somehow re-calculate it from the expression? (This is really necessary so that cases like this don't start folding with the patch: -- signed char foo3uu (unsigned char a) { unsigned char d; unsigned long un; d = (a & 63) + 200; un = d; if (un >= 12) ubar(un); return d; } -- ) Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany >From d6f30094d892d86598f6af35cb3b99c258642e9b Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Wed, 2 Nov 2016 14:01:46 +0100 Subject: [PATCH] Convert folding in simplify_cond_using_ranges() to match.pd. In cases like this ac_4 = *p_3(D); bc_5 = ac_4 & 63; l_6 = (long int) bc_5; if (l_6 > 9) bar(l_6); the function would fold bc_5 into the condition, replacing l_6, but l_6 cannot be eliminated. We'd end up with bc_5 being used in two places and l_6 in one place without the chance to eliminate either. The patched code suppresses folding if bc_5 has only a single use (i.e. being cast to l_6), AND l_6 has more than one use. However, the patch does not catch the case where all uses of l_6 could be eliminated by the function, e.g. ... if (l_6 > 9) bar(); else if (l_6 > 5) foo(); --- gcc/fold-const.c | 64 ++ gcc/fold-const.h | 1 + gcc/match.pd | 28 gcc/tree-vrp.c | 66 4 files changed, 93 insertions(+), 66 deletions(-) diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 593ea16..082ad46 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -9100,6 +9100,70 @@ expr_not_equal_to (tree t, const wide_int &w) } } +/* Return true if the value range of T is known not fit into TYPE. */ + +bool +expr_fits_type_p (tree t, tree type) +{ + wide_int min, max; + value_range_type rtype; + signop sign; + signop tsign; + bool has_range = false; + + if (!INTEGRAL_TYPE_P (type)) +return false; + tsign = TYPE_SIGN (type); + sign = TYPE_SIGN (TREE_TYPE (t)); + switch (TREE_CODE (t)) +{ +case INTEGER_CST: + min = t; + max = t; + has_range = true; + break; + +case SSA_NAME: + if (!INTEGRAL_TYPE_P (TREE_TYPE (t))) + return false; + rtype = get_range_info (t, &min, &max); + if (rtype == VR_RANGE) + has_range = true; + else if (rtype == VR_ANTI_RANGE) + { + /* As ANTI_RANGEs always wrap around, just check if T's whole type's +value range fits into TYPE. */ + min = TYPE_MIN_VALUE (TREE_TYPE (t)); + max = TYPE_MAX_VALUE (TREE_TYPE (t)); + has_range = true; + } + break; +} + if (has_range) +{ + if (sign == tsign) + { + if (wi::le_p (max, TYPE_MAX_VALUE (type), sign) + && wi::ge_p (min, TYPE_MIN_VALUE (type), sign)) + return true; + } + else if (sign == SIGNED && tsign == UNSIGNED) + { + if (wi::ge_p (min, 0, SIGNED) + && wi::le_p (max, TYPE_MAX_VALUE (type), UN
Re: Go patch committed: copy print code from Go 1.7 runtime
On Tue, Oct 18, 2016 at 1:06 AM, Uros Bizjak wrote: > >> This patch copies the code that implements the print and println >> predeclared functions from the Go 1.7 runtime. The compiler is >> changed to use the new names, and to call the printlock and >> printunlock functions around a sequence of print calls. The writebuf >> field in the g struct changes to a slice. Bootstrapped and ran Go >> testsuite on x86_64-pc-linux-gnu. Committed to mainline. > > This patch probably introduced recent regression on 32bit x86 multilib: > > Running target unix/-m32 > FAIL: go.test/test/fixedbugs/bug114.go compilation, -O2 -g > FAIL: go.test/test/printbig.go -O (test for excess errors) > FAIL: go.test/test/printbig.go execution > > === go Summary for unix/-m32 === > > # of expected passes6875 > # of unexpected failures3 > # of expected failures 1 > # of untested testcases 12 > # of unsupported tests 2 > > e.g.: > > /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:27: > error: integer constant overflow > /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:45: > error: integer constant overflow > /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:38: > error: integer constant overflow > /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:56: > error: integer constant overflow > > FAIL: go.test/test/fixedbugs/bug114.go compilation, -O2 -g > UNTESTED: go.test/test/fixedbugs/bug114.go execution, -O2 -g > > FAIL: go.test/test/printbig.go -O (test for excess errors) > Excess errors: > /home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:12:8: error: > integer constant overflow > /home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:13:15: > error: integer constant overflow > > ./printbig.exe >printbig.p 2>&1 > couldn't execute "./printbig.exe": no such file or directory > FAIL: go.test/test/printbig.go execution > UNTESTED: go.test/test/printbig.go compare For the record, Uros filed this as https://gcc.gnu.org/PR78145 and it is now fixed. Ian
Re: Reject out-of-range bit pos in bit-fields insns operating on a register.
On 11/09/2016 03:40 AM, Andreas Schwab wrote: As seen by the testcase in PR77822, combine can generate out-of-range bit pos in a bit-field insn, unless the pattern explicitly rejects it. This only makes a difference for expressions that are undefined at runtime. Without that we would either generate bad assembler or ICE in output_btst. PR target/78254 * config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields insns operating on a register. Could you please include a testcase for this? Even if it's something that triggered during a bootstrap -- few people bootstrap m68k these days :-) jeff
Re: Reject out-of-range bit pos in bit-fields insns operating on a register.
On Nov 10 2016, Jeff Law wrote: > On 11/09/2016 03:40 AM, Andreas Schwab wrote: >> As seen by the testcase in PR77822, combine can generate out-of-range >> bit pos in a bit-field insn, unless the pattern explicitly rejects it. >> This only makes a difference for expressions that are undefined at >> runtime. Without that we would either generate bad assembler or ICE in >> output_btst. >> >> PR target/78254 >> * config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields >> insns operating on a register. > Could you please include a testcase for this? Even if it's something that > triggered during a bootstrap -- few people bootstrap m68k these days :-) It didn't trigger during bootstrap. It was the testcase for PR77822. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
C++ PATCH for c++/77337 (auto return and constexpr lambda)
The constexpr lambda change introduced some problematic dependency ordering, since instantiate constexpr functions aggressively so that they are available for constexpr evaluation. The patch for 65942 delayed that instantiation by triggering it from constexpr evaluation directly rather than earlier in mark_used, but instantiate_decl still wanted to instantiate them right away. This patch fixes that and a latent bug in tsubst_friend_function, which was leaving DECL_INITIAL set on a function that was not yet instantiated. Tested x86_64-pc-linux-gnu, applying to trunk. commit aea89e80dc0b7b976fd69680771a8534af265d7e Author: Jason Merrill Date: Thu Nov 10 10:13:44 2016 -0800 PR c++/77337 - auto return and lambda * pt.c (tsubst_friend_function): Don't set DECL_INITIAL. (instantiate_decl): It's OK to defer a constexpr function. * cp-tree.h (DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION): Check DECL_LANG_SPECIFIC. * decl2.c (decl_defined_p): Use it. No longer static. * decl.c (redeclaration_error_message): Use decl_defined_p. * constexpr.c (cxx_eval_call_expression): Set input_location around call to instantiate_decl. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 43457d2..f75f0b0 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -1464,9 +1464,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, if (!DECL_INITIAL (fun) && DECL_TEMPLOID_INSTANTIATION (fun)) { + location_t save_loc = input_location; + input_location = loc; ++function_depth; instantiate_decl (fun, /*defer_ok*/false, /*expl_inst*/false); --function_depth; + input_location = save_loc; } /* If in direct recursive call, optimize definition search. */ diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 9b5b5bc..8183775 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -4380,7 +4380,8 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter) instantiated will not be a DECL_TEMPLATE_INSTANTIATION, but will be a DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION. */ #define DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION(DECL) \ - (DECL_TEMPLATE_INFO (DECL) && !DECL_USE_TEMPLATE (DECL)) + (DECL_LANG_SPECIFIC (DECL) && DECL_TEMPLATE_INFO (DECL) \ + && !DECL_USE_TEMPLATE (DECL)) /* Nonzero if DECL is a function generated from a function 'temploid', i.e. template, member of class template, or dependent friend. */ @@ -5895,6 +5896,7 @@ extern void import_export_decl(tree); extern tree build_cleanup (tree); extern tree build_offset_ref_call_from_tree(tree, vec **, tsubst_flags_t); +extern bool decl_defined_p (tree); extern bool decl_constant_var_p(tree); extern bool decl_maybe_constant_var_p (tree); extern void no_linkage_error (tree); diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 4b18d4e..185c98b 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -2778,8 +2778,8 @@ redeclaration_error_message (tree newdecl, tree olddecl) warn_extern_redeclared_static. */ /* Defining the same name twice is no good. */ - if (DECL_INITIAL (olddecl) != NULL_TREE - && DECL_INITIAL (newdecl) != NULL_TREE) + if (decl_defined_p (olddecl) + && decl_defined_p (newdecl)) { if (DECL_NAME (olddecl) == NULL_TREE) return G_("%q#D not declared in class"); diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index e0fff1e..4ebc7dc 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -80,7 +80,6 @@ static void import_export_class (tree); static tree get_guard_bits (tree); static void determine_visibility_from_class (tree, tree); static bool determine_hidden_inline (tree); -static bool decl_defined_p (tree); static void maybe_instantiate_decl (tree); /* A list of static class variables. This is needed, because a @@ -4085,11 +4084,15 @@ collect_ada_namespace (tree namespc, const char *source_file) /* Returns true iff there is a definition available for variable or function DECL. */ -static bool +bool decl_defined_p (tree decl) { if (TREE_CODE (decl) == FUNCTION_DECL) -return (DECL_INITIAL (decl) != NULL_TREE); +return (DECL_INITIAL (decl) != NULL_TREE + /* A pending instantiation of a friend temploid is defined. */ + || (DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION (decl) + && DECL_INITIAL (DECL_TEMPLATE_RESULT +(DECL_TI_TEMPLATE (decl); else { gcc_assert (VAR_P (decl)); diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index e8b6afd..d4855d5 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -9383,10 +9383,6 @@ tsubst_friend_function (tree decl, tree args) else new_friend_result_template_info = NULL_TREE; -
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On Thu, 10 Nov 2016, Richard Biener wrote: > I'll address those comments. As you did not have any comments on the > c-parser.[CH] parts does that mean you are fine with them? That is, > does the above constitute a complete review of the patch? I am fine with the c-parser.[ch] parts. -- Joseph S. Myers jos...@codesourcery.com
Re: C++ PATCH for c++/77337 (auto return and constexpr lambda)
On Thu, Nov 10, 2016 at 01:42:20PM -0800, Jason Merrill wrote: > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn33.C > @@ -0,0 +1,27 @@ > +// PR c++/77337 > +// { dg-do compile { target c++14 } } > + > +template > +struct fix_type { > + Functor functor; > + > + decltype(auto) operator()() > + { return functor(*this); } > +}; > + > +template > +fix_type fix(Functor functor) > +{ return { functor }; } > + > +int main() > +{ > + auto zero = fix > +([](auto& self) -> int // N.B. non-deduced, non-dependent return type > + { > + return 0; > + > + self(); // error: use of 'decltype(auto) > fix_type::operator()() [with Functor = main()::]' > before deduction of 'auto' Wouldn't it be clearer to turn that // error: line into // { dg-bogus "use of \[^\n\r]* before deduction of 'auto'" } so that it is clear that the error is undesirable even to casual reader? > + }); > + > + return zero(); > +} Jakub