Re: [gimplefe] hacking pass manager
On 30 June 2016 at 17:10, Richard Biener wrote: > On Wed, Jun 29, 2016 at 9:13 PM, Prasad Ghangal > wrote: >> On 29 June 2016 at 22:15, Richard Biener wrote: >>> On June 29, 2016 6:20:29 PM GMT+02:00, Prathamesh Kulkarni >>> wrote: On 18 June 2016 at 12:02, Prasad Ghangal wrote: > Hi, > > I tried hacking pass manager to execute only given passes. For this I > am adding new member as opt_pass *custom_pass_list to the function > structure to store passes need to execute and providing the > custom_pass_list to execute_pass_list() function instead of all passes > > for test case like- > > int a; > void __GIMPLE (execute ("tree-ccp1", "tree-fre1")) foo() > { > bb_1: > a = 1 + a; > } > > it will execute only given passes i.e. ccp1 and fre1 pass on the function > > and for test case like - > > int a; > void __GIMPLE (startwith ("tree-ccp1")) foo() > { > bb_1: > a = 1 + a; > } > > it will act as a entry point to the pipeline and will execute passes > starting from given pass. Bike-shedding: Would it make sense to have syntax for defining pass ranges to execute ? for instance: void __GIMPLE(execute (pass_start : pass_end)) which would execute all the passes within range [pass_start, pass_end], which would be convenient if the range is large. >>> >>> But it would rely on a particular pass pipeline, f.e. pass-start appearing >>> before pass-end. >>> >>> Currently control doesn't work 100% as it only replaces all_optimizations >>> but not lowering passes or early opts, nor IPA opts. >>> >> >> Each pass needs GIMPLE in some specific form. So I am letting lowering >> and early opt passes to execute. I think we have to execute some >> passes (like cfg) anyway to represent GIMPLE into proper form > > Yes, that's true. Note that early opt passes only optimize but we need > pass_build_ssa_passes at least (for into-SSA). For proper unit-testing > of GIMPLE passes we do need to guard off early opts somehow > (I guess a simple if (flag_gimple && cfun->custom_pass_list) would do > that). > > Then there is of course the question about IPA passes which I think is > somewhat harder (one could always disable all IPA passes manually > via flags of course or finally have a global -fipa/no-ipa like most > other compilers). > Can we iterate through all ipa passes and do -fdisable-ipa-pass or -fenable-ipa-pass equivalent for each? Thanks, Prasad > Richard. > >>> Richard. >>> Thanks, Prathamesh > > > > Thanks, > Prasad Ghangal >>> >>>
Re: [gimplefe] hacking pass manager
On Wed, Jul 6, 2016 at 9:51 AM, Prasad Ghangal wrote: > On 30 June 2016 at 17:10, Richard Biener wrote: >> On Wed, Jun 29, 2016 at 9:13 PM, Prasad Ghangal >> wrote: >>> On 29 June 2016 at 22:15, Richard Biener wrote: On June 29, 2016 6:20:29 PM GMT+02:00, Prathamesh Kulkarni wrote: >On 18 June 2016 at 12:02, Prasad Ghangal >wrote: >> Hi, >> >> I tried hacking pass manager to execute only given passes. For this I >> am adding new member as opt_pass *custom_pass_list to the function >> structure to store passes need to execute and providing the >> custom_pass_list to execute_pass_list() function instead of all >passes >> >> for test case like- >> >> int a; >> void __GIMPLE (execute ("tree-ccp1", "tree-fre1")) foo() >> { >> bb_1: >> a = 1 + a; >> } >> >> it will execute only given passes i.e. ccp1 and fre1 pass on the >function >> >> and for test case like - >> >> int a; >> void __GIMPLE (startwith ("tree-ccp1")) foo() >> { >> bb_1: >> a = 1 + a; >> } >> >> it will act as a entry point to the pipeline and will execute passes >> starting from given pass. >Bike-shedding: >Would it make sense to have syntax for defining pass ranges to execute >? >for instance: >void __GIMPLE(execute (pass_start : pass_end)) >which would execute all the passes within range [pass_start, pass_end], >which would be convenient if the range is large. But it would rely on a particular pass pipeline, f.e. pass-start appearing before pass-end. Currently control doesn't work 100% as it only replaces all_optimizations but not lowering passes or early opts, nor IPA opts. >>> >>> Each pass needs GIMPLE in some specific form. So I am letting lowering >>> and early opt passes to execute. I think we have to execute some >>> passes (like cfg) anyway to represent GIMPLE into proper form >> >> Yes, that's true. Note that early opt passes only optimize but we need >> pass_build_ssa_passes at least (for into-SSA). For proper unit-testing >> of GIMPLE passes we do need to guard off early opts somehow >> (I guess a simple if (flag_gimple && cfun->custom_pass_list) would do >> that). >> >> Then there is of course the question about IPA passes which I think is >> somewhat harder (one could always disable all IPA passes manually >> via flags of course or finally have a global -fipa/no-ipa like most >> other compilers). >> > Can we iterate through all ipa passes and do -fdisable-ipa-pass or > -fenable-ipa-pass equivalent for each? We could do that, yes. But let's postpone this issue. I think that startwith is going to be most useful and rather than constructing a pass list for it "native" support for it in the pass manager is likely to produce better results (add a 'startwith' member alongside the pass list member and if it is set the pass manager skips all passes that do not match 'startwith' and once it reaches it it clears the field). In the future I hope we can get away from a static pass list and more towards rule-driven pass execution (we have all those PROP_* stuff already but it isn't really used for example). But well, that would be a separate GSoC project ;) IMHO startwith will provide everything needed for unit-testing. We can add a flag on whether further passes should be executed or not and even a pass list like execute ("ccp1", "fre") can be implemented by startwith ccp1 and then from there executing the rest of the passes in the list and stopping at the end. As said, unit-testing should exercise a single pass if we can control its input. Thanks, Richard. > Thanks, > Prasad > >> Richard. >> Richard. >Thanks, >Prathamesh >> >> >> >> Thanks, >> Prasad Ghangal
Re: [Patch 0,1a] Improving effectiveness and generality of autovectorization using unified representation.
On Wed, Jul 6, 2016 at 12:49 PM, Sameera Deshpande wrote: > > From: Sameera Deshpande [sameera.deshpa...@imgtec.com] > Sent: 20 June 2016 11:37:58 > To: Richard Biener > Cc: Matthew Fortune; Rich Fuhler; Prachi Godbole; gcc@gcc.gnu.org; Jaydeep > Patil > Subject: Re: [Patch 0,1a] Improving effectiveness and generality of > autovectorization using unified representation. > > On Wednesday 15 June 2016 05:52 PM, Richard Biener wrote: >> On Mon, Jun 13, 2016 at 12:56 PM, Sameera Deshpande >> wrote: >>> On Thursday 09 June 2016 05:45 PM, Richard Biener wrote: On Thu, Jun 9, 2016 at 10:54 AM, Richard Biener wrote: > > On Tue, Jun 7, 2016 at 3:59 PM, Sameera Deshpande > wrote: >> >> Hi Richard, >> >> This is with reference to our discussion at GNU Tools Cauldron 2015 >> regarding my talk titled "Improving the effectiveness and generality of >> GCC >> auto-vectorization." Further to our prototype implementation of the >> concept, >> we have started implementing this concept in GCC. >> >> We are following incremental model to add language support in our >> front-end, and corresponding back-end (for auto-vectorizer) will be added >> for feature completion. >> >> Looking at the complexity and scale of the project, we have divided this >> project into subtasks listed below, for ease of implementation, testing >> and >> review. >> >> 0. Add new pass to perform autovectorization using unified >> representation - Current GCC framework does not give complete overview of >> the loop to be vectorized : it either breaks the loop across body, or >> across >> iterations. Because of which these data structures can not be reused for >> our >> approach which gathers all the information of loop body at one place >> using >> primitive permute operations. Hence, define new data structures and >> populate >> them. >> >> 1. Add support for vectorization of LOAD/STORE instructions >> a. Create permute order tree for the loop with LOAD and STORE >> instructions for single or multi-dimensional arrays, aggregates within >> nested loops. >> b. Basic transformation phase to generate vectorized code for the >> primitive reorder tree generated at stage 1a using tree tiling algorithm. >> This phase handles code generation for SCATTER, GATHER, stridded memory >> accesses etc. along with permute instruction generation. >> >> 2. Implementation of k-arity promotion/reduction : The permute nodes >> within primitive reorder tree generated from input program can have any >> arity. However, the target can support maximum of arity = 2 in most of >> the >> cases. Hence, we need to promote or reduce the arity of permute order >> tree >> to enable successful tree tiling. >> >> 3. Vector size reduction : Depending upon the vector size for target, >> reduce vector size per statement and adjust the loop count for vectorized >> loop accordingly. >> >> 4. Support simple arithmetic operations : >> a. Add support for analyzing statements with simple arithmetic >> operations like +, -, *, / for vectorization, and create primitive >> reorder >> tree with compute_op. >> b. Generate vector code for primitive reorder tree generated at >> stage 4a using tree tiling algorithm - here support for complex patterns >> like multiply-add should be checked and appropriate instruction to be >> generated. >> >> 5. Support reduction operation : >> a. Add support for reduction operation analysis and primitive >> reorder tree generation. The reduction operation needs special handling, >> as >> the finish statement should COLLAPSE the temporary reduction vector >> TEMP_VAR >> into original reduction variable. >> b. The code generation for primitive reorder tree does not need any >> handling - as reduction tree is same as tree generated in 4a, with only >> difference that in 4a, the destination is MEMREF (because of STORE >> operation) and for reduction it is TEMP_VAR. At this stage, generate code >> for COLLAPSE node in finish statements. >> >> 6. Support other vectorizable statements like complex arithmetic >> operations, bitwise operations, type conversions etc. >> a. Add support for analysis and primitive reorder tree generation. >> b. Vector code generation. >> >> 7. Cost effective tree tiling algorithm : Till now, the tree tiling is >> happening without considering cost of computation. However, there can be >> multiple target instructions covering the tree - hence, instead of >> picking >> first matched largest instruction cover, select the instruction cover >> based >> on cost of instruction given in
Re: [RFC] lto partitioning of varpool_nodes for section anchors
On 4 July 2016 at 13:51, Andrew Pinski wrote: > On Mon, Jul 4, 2016 at 12:58 AM, Prathamesh Kulkarni > wrote: >> Hi, >> I have attached a "quick and dirty" prototype patch (var-partition-1.diff), >> that attempts to partition variables to reduce number of >> external references and to increase usage of section-anchors >> to CSE address computation of global variables. >> >> We could put a variable in a partition that has max references for it, >> however it doesn't lend itself directly to section anchor optimization. >> For instance if a partition has max references for variables 'a' and 'b', >> but no function in that partition references both 'a', and 'b' then AFAIU >> it doesn't make any difference from section anchors perspective to have them >> in same partition. >> >> The patch tries to assign a set of variables (>= 2) >> to a partition whose functions have maximum references for that set. >> Functions within the partition that reference the variables >> in the set can take advantage of section-anchors. Functions >> referencing the variables in the set outside the partition >> would need to load them as external references (using movw/movt), >> however since we are placing the set in partition that has maximal >> references for it, number of external references should be overall >> reduced. >> >> Partitioning is gated by -flto-var-partition and enabled >> only for arm and aarch64. > > Why only for arm and aarch64? Shouldn't it be enabled for all section > anchor targets? AFAIK the only targets supporting section anchors are arm, aarch64 and powerpc. I didn't enable it for ppc64 because I am not sure how much profitable it is for that target. Honza mentioned to me some time back that effect of partitioning on powerpc was nearly zero. Thanks, Prathamesh > > Thanks, > Andrew > >> As per previous discussion [1], I haven't >> touched function partitioning. Does this approach look ok >> especially regarding correctness ? >> So far, I have cross-tested patch on arm*-*-*, aarch64*-*-*. >> >> I haven't yet managed to benchmark the patch. >> As a cheap measurement, I tried to measure number of external >> references with and without patch by writing a small ipa pass >> which is run during ltrans and simply walks over varpool nodes >> and counts number of varpool_nodes for which DECL_EXTERNAL (vnode->decl) is >> true >> and vnode->definition is 0. Is that sufficient condition to determine >> if variable is externally defined ? I have attached the pass >> (count-external-refs.diff) >> and the comparison done with it for for SPEC2000 [2]. The entries >> in "before" and "after" column contain summation of number of >> external refs (total_count) across all partitions before and after applying >> the patch. Does the comparison hold any merit ? >> I was wondering if we could we use a better way for >> measuring statically the effects of variable partitioning ? >> I hope also to get done with benchmarking soon. >> >> I have not yet figured out how to integrate it with existing cost metrics for >> balanced partitioning, I am looking into that. >> I would be grateful for suggestions on the patch. >> >> [1] https://gcc.gnu.org/ml/gcc/2016-04/msg00090.html >> >> [2] SPEC2000 comparison: >> https://docs.google.com/spreadsheets/d/1xnszyw04ksoyBspmCVYesq6KARLw-PA2n3T4aoaKdYw/edit?usp=sharing
Re: [RFC] lto partitioning of varpool_nodes for section anchors
On Wed, Jul 6, 2016 at 5:00 AM, Prathamesh Kulkarni wrote: > On 4 July 2016 at 13:51, Andrew Pinski wrote: >> On Mon, Jul 4, 2016 at 12:58 AM, Prathamesh Kulkarni >> wrote: >>> Hi, >>> I have attached a "quick and dirty" prototype patch (var-partition-1.diff), >>> that attempts to partition variables to reduce number of >>> external references and to increase usage of section-anchors >>> to CSE address computation of global variables. >>> >>> We could put a variable in a partition that has max references for it, >>> however it doesn't lend itself directly to section anchor optimization. >>> For instance if a partition has max references for variables 'a' and 'b', >>> but no function in that partition references both 'a', and 'b' then AFAIU >>> it doesn't make any difference from section anchors perspective to have them >>> in same partition. >>> >>> The patch tries to assign a set of variables (>= 2) >>> to a partition whose functions have maximum references for that set. >>> Functions within the partition that reference the variables >>> in the set can take advantage of section-anchors. Functions >>> referencing the variables in the set outside the partition >>> would need to load them as external references (using movw/movt), >>> however since we are placing the set in partition that has maximal >>> references for it, number of external references should be overall >>> reduced. >>> >>> Partitioning is gated by -flto-var-partition and enabled >>> only for arm and aarch64. >> >> Why only for arm and aarch64? Shouldn't it be enabled for all section >> anchor targets? > AFAIK the only targets supporting section anchors are arm, aarch64 and > powerpc. > I didn't enable it for ppc64 because I am not sure how much profitable > it is for that target. > Honza mentioned to me some time back that effect of partitioning on > powerpc was nearly zero. No MIPS has section anchors enabled too. Plus MIPS will benefit the same way as AARCH64 and ARM. PowerPC32 would too. I don't think it is correct to enable it only for arm and aarch64. Thanks, Andrew Pinski > > Thanks, > Prathamesh >> >> Thanks, >> Andrew >> >>> As per previous discussion [1], I haven't >>> touched function partitioning. Does this approach look ok >>> especially regarding correctness ? >>> So far, I have cross-tested patch on arm*-*-*, aarch64*-*-*. >>> >>> I haven't yet managed to benchmark the patch. >>> As a cheap measurement, I tried to measure number of external >>> references with and without patch by writing a small ipa pass >>> which is run during ltrans and simply walks over varpool nodes >>> and counts number of varpool_nodes for which DECL_EXTERNAL (vnode->decl) is >>> true >>> and vnode->definition is 0. Is that sufficient condition to determine >>> if variable is externally defined ? I have attached the pass >>> (count-external-refs.diff) >>> and the comparison done with it for for SPEC2000 [2]. The entries >>> in "before" and "after" column contain summation of number of >>> external refs (total_count) across all partitions before and after applying >>> the patch. Does the comparison hold any merit ? >>> I was wondering if we could we use a better way for >>> measuring statically the effects of variable partitioning ? >>> I hope also to get done with benchmarking soon. >>> >>> I have not yet figured out how to integrate it with existing cost metrics >>> for >>> balanced partitioning, I am looking into that. >>> I would be grateful for suggestions on the patch. >>> >>> [1] https://gcc.gnu.org/ml/gcc/2016-04/msg00090.html >>> >>> [2] SPEC2000 comparison: >>> https://docs.google.com/spreadsheets/d/1xnszyw04ksoyBspmCVYesq6KARLw-PA2n3T4aoaKdYw/edit?usp=sharing
Fwd: Re: GCC libatomic questions
Redirecting to the gcc list for discussion. I'll follow up on that thread directly. r~ Forwarded Message Subject:Re: GCC libatomic questions Date: Wed, 6 Jul 2016 10:27:20 -0700 From: Bin Fan Organization: Oracle Corporation To: Richard Henderson Hello Richard, This is Bin in Sun/Oracle compiler group. Sorry about the long delay for the libatomic ABI specification I mentioned a long long time ago. I was assigned to some other tasks. Please find a draft of the libatomic ABI specification attached. The text is also pasted at the end of the email. The goal of the ABI specification is twofold. First is to check with the GCC community that the ABI matches the latest GCC libatomic implementation. This would make sure that GCC and Oracle Developer Studio C/C++ compiler can work well together w/o any compatibility issues on Solaris/Linux + SPARC/x86. Second and a longer term goal is to integrate the libatomic ABI into the current SPARC/x86 ABI specifications. Could you please review the draft and/or forward it to the community for review? Thanks, - Bin 1. Overview 1.1. Why we need an ABI for atomics C11 standard allows different size, representation and alignment between atomic types and the corresponding non-atomic types [1]. The size, representation and alignment of atomic types need to be specified in the ABI specification. A runtime support library, libatomic, already exists on Solaris and Linux. The interface of this library needs to be standardized as part of the ABI specification, so that - On a system that supply libatomic, all compilers in compliance with the ABI can generate compatible binaries linking this library. - The binary can be backward compatible on different versions of the system as long as they support the same ABI. 1.2. What does the atomics ABI specify The ABI specifies the following - Data representation of the atomic types. - The names and behaviors of the implementation-specific support functions. - The atomic types for which the compiler may generate inlined code. - Lock-free property of the inlined atomic operations. Note that the name and behavior of the libatomic functions specified in the C standard do not need to be part of this ABI, because they are already required to meet the specification in the standard. 1.3. Affected platforms The following platforms are affected by this ABI specification. SPARC (32-bit and 64-bit) x86 (32-bit and 64-bit) Section 1.1 and 1.2, and the Rationale, Notes and Appendix sections in the rest of the document are for explanation purpose only, it is not considered as part of the formal ABI specification. 2. Data Representation 2.1. General Rules The general rules for size, representation and alignment of the data representation of atomic types are the following 1) Atomic types assume the same size with the corresponding non-atomic types. 2) Atomic types assume the same representation with the corresponding non-atomic types. 3) Atomic types assume the same alignment with the corresponding non-atomic types, with the following exceptions: On 32- and 64-bit x86 platforms and on 64-bit SPARC platforms, atomic types of size 1, 2, 4, 8 or 16-byte have the alignment that matches the size. On 32-bit SPARC platforms, atomic types of size 1, 2, 4 or 8-byte have the alignment that matches the size. If the alignment of a 16-byte non-atomic type is less than 8-byte, the alignment of the corresponding atomic type is increased to 8-byte. Note The above rules apply to both scalar types and aggregate types. 2.2. Atomic scalar types x86 LP64 (AMD64) ILP32 (i386) C Type sizeofAlignment Inlineable sizeof Alignment Inlineable atomic_flag 1 1 Y 1 1 Y _Atomic _Bool 1 1 Y 1 1 Y _Atomic char1 1 Y 1 1 Y _Atomic signed char 1 1 Y 1 1 Y _Atomic unsigned char 1 1 Y 1 1 Y _Atomic short 2 2 Y 2 2 Y _Atomic signed short2 2 Y 2 2 Y _Atomic unsigned short 2 2 Y 2 2 Y _Atomic int 4 4 Y 4 4 Y _Atomic signed int 4 4 Y 4 4 Y _Atomic enum4 4 Y 4 4 Y _Atomic unsigned int4 4 Y 4 4 Y _Atomic long8 8 Y
Re: Fwd: Re: GCC libatomic questions
CMPXCHG16B is not always available on 64-bit x86 platforms, so 16-byte naturally aligned atomics are not inlineable. The support functions for such atomics are free to use lock-free implementation if the instruction is available on specific platforms. Except that it is available on almost all 64-bit x86 platforms. As far as I know, only 2004 era AMD processors didn't have it; all Intel 64-bit cpus have supported it. Further, gcc will most certainly make use of it when one specifies any command-line option that enables it, such as -march=native. Therefore we must specify that for x86_64, 16-byte objects are non-locking on cpus that support cmpxchg16b. However, if a compiler inlines an atomic operation on an _Atomic long double object and uses the new lock-free instructions, it could break the compatibility if the library implementation is still non-lock-free. So such compiler change must be accompanied by a library change, and the ABI must be updated as well. The tie between gcc version and libgcc.so version is tight; I see no reason that the libatomic.so version should not also be tight with the compiler version. It is sufficient that libatomic use atomic instructions when they are available. If a new processor comes out with new capabilities, the compiler and runtime are upgraded in lock-step. How that is selected is beyond the ABI but possible solutions are (1) ld.so search path, based on processor capabilities, (2) ifunc (or workalike) where the function is selected at startup, (3) explicit runtime test within the relevant functions. All solutions expose the same function interface so the function call ABI is not affected. _Bool __atomic_is_lock_free (size_t size, void *object); Returns whether the object pointed to by object is lock-free. The function assumes that the size of the object is size. If object is NULL then the function assumes that object is aligned on an size-byte address. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65033 The actual code change is completely within libstdc++, but it affects the description of the libatomic function. C++ requires that is_lock_free return the same result for all objects of a given type. Whereas __atomic_is_lock_free, with a non-null object, determines if we will implement lock free for a *specific* object, using the specific object's alignment. Rather than break the ABI and add a different function that passes the type alignment, the solution we hit upon was to pass a "fake", minimally aligned pointer as the object parameter: (void *)(uintptr_t)-__alignof(type). The final component of the ABI that you've forgotten to specify, if you want full compatibility of linked binaries, is symbol versioning. We have had two ABI additions since the original release. See https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libatomic/libatomic.map;h=39e7c2c6b9a70121b5f4031da346a27ae6c1be98;hb=HEAD r~
question about illegal utf-8 encoding in string literals
Hello I work for Intel on the Intel C++ compiler and we strive to be compatible with the gnu compiler. We are processing a source file assuming utf-8 encoding and we see a string literal with illegal utf-8 encoding, such as an 8-bit character with the high bit set like 0xa3. Testing shows that gcc is passes the illegal utf-8 character through without diagnostic message, as though it were an "extended ascii" character. I don't see a way to enable warnings for this issue. Please confirm that gcc handles illegal utf-8 encodings this way. Thanks and regards, Melanie Blower
Re: [RFC] lto partitioning of varpool_nodes for section anchors
On 6 July 2016 at 22:25, Andrew Pinski wrote: > On Wed, Jul 6, 2016 at 5:00 AM, Prathamesh Kulkarni > wrote: >> On 4 July 2016 at 13:51, Andrew Pinski wrote: >>> On Mon, Jul 4, 2016 at 12:58 AM, Prathamesh Kulkarni >>> wrote: Hi, I have attached a "quick and dirty" prototype patch (var-partition-1.diff), that attempts to partition variables to reduce number of external references and to increase usage of section-anchors to CSE address computation of global variables. We could put a variable in a partition that has max references for it, however it doesn't lend itself directly to section anchor optimization. For instance if a partition has max references for variables 'a' and 'b', but no function in that partition references both 'a', and 'b' then AFAIU it doesn't make any difference from section anchors perspective to have them in same partition. The patch tries to assign a set of variables (>= 2) to a partition whose functions have maximum references for that set. Functions within the partition that reference the variables in the set can take advantage of section-anchors. Functions referencing the variables in the set outside the partition would need to load them as external references (using movw/movt), however since we are placing the set in partition that has maximal references for it, number of external references should be overall reduced. Partitioning is gated by -flto-var-partition and enabled only for arm and aarch64. >>> >>> Why only for arm and aarch64? Shouldn't it be enabled for all section >>> anchor targets? >> AFAIK the only targets supporting section anchors are arm, aarch64 and >> powerpc. >> I didn't enable it for ppc64 because I am not sure how much profitable >> it is for that target. >> Honza mentioned to me some time back that effect of partitioning on >> powerpc was nearly zero. > > > No MIPS has section anchors enabled too. Plus MIPS will benefit the > same way as AARCH64 and ARM. PowerPC32 would too. > > I don't think it is correct to enable it only for arm and aarch64. Thanks, I updated the patch to remove -flto-var-partition and gated the partition currently on target_supports_section_anchors_p() (although it doesn't test if -fsection-anchors is passed). Um I am not able to see where mips has section anchors enabled ? mips.c does not seem to override min_anchor_offset and max_anchor_offset hooks. Both these hooks have values 0 for default, and target_supports_section_anchors_p() returns false if both these hooks have value 0. Thanks, Prathamesh > > Thanks, > Andrew Pinski > >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Andrew >>> As per previous discussion [1], I haven't touched function partitioning. Does this approach look ok especially regarding correctness ? So far, I have cross-tested patch on arm*-*-*, aarch64*-*-*. I haven't yet managed to benchmark the patch. As a cheap measurement, I tried to measure number of external references with and without patch by writing a small ipa pass which is run during ltrans and simply walks over varpool nodes and counts number of varpool_nodes for which DECL_EXTERNAL (vnode->decl) is true and vnode->definition is 0. Is that sufficient condition to determine if variable is externally defined ? I have attached the pass (count-external-refs.diff) and the comparison done with it for for SPEC2000 [2]. The entries in "before" and "after" column contain summation of number of external refs (total_count) across all partitions before and after applying the patch. Does the comparison hold any merit ? I was wondering if we could we use a better way for measuring statically the effects of variable partitioning ? I hope also to get done with benchmarking soon. I have not yet figured out how to integrate it with existing cost metrics for balanced partitioning, I am looking into that. I would be grateful for suggestions on the patch. [1] https://gcc.gnu.org/ml/gcc/2016-04/msg00090.html [2] SPEC2000 comparison: https://docs.google.com/spreadsheets/d/1xnszyw04ksoyBspmCVYesq6KARLw-PA2n3T4aoaKdYw/edit?usp=sharing diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 453343a..09b525e 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -34,6 +34,12 @@ along with GCC; see the file COPYING3. If not see #include "ipa-prop.h" #include "ipa-inline.h" #include "lto-partition.h" +#include "toplev.h" /* for target_supports_section_anchors_p() */ +#include +#include +#include +#include +#include vec ltrans_partitions; @@ -407,6 +413,274 @@ add_sorted_nodes (vec &next_nodes, ltrans_partition partition) add_symbol_to_partition (partition, node); } +/* FIXME: Currently I don't care to compute power set if set has more
gcc-4.9-20160706 is now available
Snapshot gcc-4.9-20160706 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20160706/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 238060 You'll find: gcc-4.9-20160706.tar.bz2 Complete GCC MD5=44e8cd46bf8ffc9a61f0222e15b3288c SHA1=1415de843c84d9fb366ec0ef565aca97fbd6aac4 Diffs from 4.9-20160629 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.