date:20160706

Re: [gimplefe] hacking pass manager

2016-07-06 Thread Prasad Ghangal

On 30 June 2016 at 17:10, Richard Biener  wrote:
> On Wed, Jun 29, 2016 at 9:13 PM, Prasad Ghangal
>  wrote:
>> On 29 June 2016 at 22:15, Richard Biener  wrote:
>>> On June 29, 2016 6:20:29 PM GMT+02:00, Prathamesh Kulkarni 
>>>  wrote:
On 18 June 2016 at 12:02, Prasad Ghangal 
wrote:
> Hi,
>
> I tried hacking pass manager to execute only given passes. For this I
> am adding new member as opt_pass *custom_pass_list to the function
> structure to store passes need to execute and providing the
> custom_pass_list to execute_pass_list() function instead of all
passes
>
> for test case like-
>
> int a;
> void __GIMPLE (execute ("tree-ccp1", "tree-fre1")) foo()
> {
> bb_1:
>   a = 1 + a;
> }
>
> it will execute only given passes i.e. ccp1 and fre1 pass on the
function
>
> and for test case like -
>
> int a;
> void __GIMPLE (startwith ("tree-ccp1")) foo()
> {
> bb_1:
>   a = 1 + a;
> }
>
> it will act as a entry point to the pipeline and will execute passes
> starting from given pass.
Bike-shedding:
Would it make sense to have syntax for defining pass ranges to execute
?
for instance:
void __GIMPLE(execute (pass_start : pass_end))
which would execute all the passes within range [pass_start, pass_end],
which would be convenient if the range is large.
>>>
>>> But it would rely on a particular pass pipeline, f.e. pass-start appearing 
>>> before pass-end.
>>>
>>> Currently control doesn't work 100% as it only replaces all_optimizations 
>>> but not lowering passes or early opts, nor IPA opts.
>>>
>>
>> Each pass needs GIMPLE in some specific form. So I am letting lowering
>> and early opt passes to execute. I think we have to execute some
>> passes (like cfg) anyway to represent GIMPLE into proper form
>
> Yes, that's true.  Note that early opt passes only optimize but we need
> pass_build_ssa_passes at least (for into-SSA).  For proper unit-testing
> of GIMPLE passes we do need to guard off early opts somehow
> (I guess a simple if (flag_gimple && cfun->custom_pass_list) would do
> that).
>
> Then there is of course the question about IPA passes which I think is
> somewhat harder (one could always disable all IPA passes manually
> via flags of course or finally have a global -fipa/no-ipa like most
> other compilers).
>
Can we iterate through all ipa passes and do -fdisable-ipa-pass or
-fenable-ipa-pass equivalent for each?

Thanks,
Prasad

> Richard.
>
>>> Richard.
>>>
Thanks,
Prathamesh
>
>
>
> Thanks,
> Prasad Ghangal
>>>
>>>

Re: [gimplefe] hacking pass manager

2016-07-06 Thread Richard Biener

On Wed, Jul 6, 2016 at 9:51 AM, Prasad Ghangal  wrote:
> On 30 June 2016 at 17:10, Richard Biener  wrote:
>> On Wed, Jun 29, 2016 at 9:13 PM, Prasad Ghangal
>>  wrote:
>>> On 29 June 2016 at 22:15, Richard Biener  wrote:
 On June 29, 2016 6:20:29 PM GMT+02:00, Prathamesh Kulkarni 
  wrote:
>On 18 June 2016 at 12:02, Prasad Ghangal 
>wrote:
>> Hi,
>>
>> I tried hacking pass manager to execute only given passes. For this I
>> am adding new member as opt_pass *custom_pass_list to the function
>> structure to store passes need to execute and providing the
>> custom_pass_list to execute_pass_list() function instead of all
>passes
>>
>> for test case like-
>>
>> int a;
>> void __GIMPLE (execute ("tree-ccp1", "tree-fre1")) foo()
>> {
>> bb_1:
>>   a = 1 + a;
>> }
>>
>> it will execute only given passes i.e. ccp1 and fre1 pass on the
>function
>>
>> and for test case like -
>>
>> int a;
>> void __GIMPLE (startwith ("tree-ccp1")) foo()
>> {
>> bb_1:
>>   a = 1 + a;
>> }
>>
>> it will act as a entry point to the pipeline and will execute passes
>> starting from given pass.
>Bike-shedding:
>Would it make sense to have syntax for defining pass ranges to execute
>?
>for instance:
>void __GIMPLE(execute (pass_start : pass_end))
>which would execute all the passes within range [pass_start, pass_end],
>which would be convenient if the range is large.

 But it would rely on a particular pass pipeline, f.e. pass-start appearing 
 before pass-end.

 Currently control doesn't work 100% as it only replaces all_optimizations 
 but not lowering passes or early opts, nor IPA opts.

>>>
>>> Each pass needs GIMPLE in some specific form. So I am letting lowering
>>> and early opt passes to execute. I think we have to execute some
>>> passes (like cfg) anyway to represent GIMPLE into proper form
>>
>> Yes, that's true.  Note that early opt passes only optimize but we need
>> pass_build_ssa_passes at least (for into-SSA).  For proper unit-testing
>> of GIMPLE passes we do need to guard off early opts somehow
>> (I guess a simple if (flag_gimple && cfun->custom_pass_list) would do
>> that).
>>
>> Then there is of course the question about IPA passes which I think is
>> somewhat harder (one could always disable all IPA passes manually
>> via flags of course or finally have a global -fipa/no-ipa like most
>> other compilers).
>>
> Can we iterate through all ipa passes and do -fdisable-ipa-pass or
> -fenable-ipa-pass equivalent for each?

We could do that, yes.  But let's postpone this issue.  I think that
startwith is going to be most useful and rather than constructing
a pass list for it "native" support for it in the pass manager is
likely to produce better results (add a 'startwith' member alongside
the pass list member and if it is set the pass manager skips all
passes that do not match 'startwith' and once it reaches it it clears
the field).

In the future I hope we can get away from a static pass list and more
towards rule-driven pass execution (we have all those PROP_* stuff
already but it isn't really used for example).  But well, that would be
a separate GSoC project ;)

IMHO startwith will provide everything needed for unit-testing.  We can
add a flag on whether further passes should be executed or not and
even a pass list like execute ("ccp1", "fre") can be implemented by
startwith ccp1 and then from there executing the rest of the passes in the
list and stopping at the end.

As said, unit-testing should exercise a single pass if we can control
its input.

Thanks,
Richard.

> Thanks,
> Prasad
>
>> Richard.
>>
 Richard.

>Thanks,
>Prathamesh
>>
>>
>>
>> Thanks,
>> Prasad Ghangal

Re: [Patch 0,1a] Improving effectiveness and generality of autovectorization using unified representation.

2016-07-06 Thread Richard Biener

On Wed, Jul 6, 2016 at 12:49 PM, Sameera Deshpande
 wrote:
> 
> From: Sameera Deshpande [sameera.deshpa...@imgtec.com]
> Sent: 20 June 2016 11:37:58
> To: Richard Biener
> Cc: Matthew Fortune; Rich Fuhler; Prachi Godbole; gcc@gcc.gnu.org; Jaydeep 
> Patil
> Subject: Re: [Patch 0,1a] Improving effectiveness and generality of 
> autovectorization using unified representation.
>
> On Wednesday 15 June 2016 05:52 PM, Richard Biener wrote:
>> On Mon, Jun 13, 2016 at 12:56 PM, Sameera Deshpande
>>  wrote:
>>> On Thursday 09 June 2016 05:45 PM, Richard Biener wrote:

 On Thu, Jun 9, 2016 at 10:54 AM, Richard Biener
  wrote:
>
> On Tue, Jun 7, 2016 at 3:59 PM, Sameera Deshpande
>  wrote:
>>
>> Hi Richard,
>>
>> This is with reference to our discussion at GNU Tools Cauldron 2015
>> regarding my talk titled "Improving the effectiveness and generality of 
>> GCC
>> auto-vectorization." Further to our prototype implementation of the 
>> concept,
>> we have started implementing this concept in GCC.
>>
>> We are following incremental model to add language support in our
>> front-end, and corresponding back-end (for auto-vectorizer) will be added
>> for feature completion.
>>
>> Looking at the complexity and scale of the project, we have divided this
>> project into subtasks listed below, for ease of implementation, testing 
>> and
>> review.
>>
>> 0. Add new pass to perform autovectorization using unified
>> representation - Current GCC framework does not give complete overview of
>> the loop to be vectorized : it either breaks the loop across body, or 
>> across
>> iterations. Because of which these data structures can not be reused for 
>> our
>> approach which gathers all the information of loop body at one place 
>> using
>> primitive permute operations. Hence, define new data structures and 
>> populate
>> them.
>>
>> 1. Add support for vectorization of LOAD/STORE instructions
>>   a. Create permute order tree for the loop with LOAD and STORE
>> instructions for single or multi-dimensional arrays, aggregates within
>> nested loops.
>>   b. Basic transformation phase to generate vectorized code for the
>> primitive reorder tree generated at stage 1a using tree tiling algorithm.
>> This phase handles code generation for SCATTER, GATHER, stridded memory
>> accesses etc. along with permute instruction generation.
>>
>> 2. Implementation of k-arity promotion/reduction : The permute nodes
>> within primitive reorder tree generated from input program can have any
>> arity. However, the target can support maximum of arity = 2 in most of 
>> the
>> cases. Hence, we need to promote or reduce the arity of permute order 
>> tree
>> to enable successful tree tiling.
>>
>> 3. Vector size reduction : Depending upon the vector size for target,
>> reduce vector size per statement and adjust the loop count for vectorized
>> loop accordingly.
>>
>> 4. Support simple arithmetic operations :
>>   a. Add support for analyzing statements with simple arithmetic
>> operations like +, -, *, / for vectorization, and create primitive 
>> reorder
>> tree with compute_op.
>>   b. Generate vector code for primitive reorder tree generated at
>> stage 4a using tree tiling algorithm - here support for complex patterns
>> like multiply-add should be checked and appropriate instruction to be
>> generated.
>>
>> 5. Support reduction operation :
>>   a. Add support for reduction operation analysis and primitive
>> reorder tree generation. The reduction operation needs special handling, 
>> as
>> the finish statement should COLLAPSE the temporary reduction vector 
>> TEMP_VAR
>> into original reduction variable.
>>   b. The code generation for primitive reorder tree does not need any
>> handling - as reduction tree is same as tree generated in 4a, with only
>> difference that in 4a, the destination is MEMREF (because of STORE
>> operation) and for reduction it is TEMP_VAR. At this stage, generate code
>> for COLLAPSE node in finish statements.
>>
>> 6. Support other vectorizable statements like complex arithmetic
>> operations, bitwise operations, type conversions etc.
>>   a. Add support for analysis and primitive reorder tree generation.
>>   b. Vector code generation.
>>
>> 7. Cost effective tree tiling algorithm : Till now, the tree tiling is
>> happening without considering cost of computation. However, there can be
>> multiple target instructions covering the tree - hence, instead of 
>> picking
>> first matched largest instruction cover, select the instruction cover 
>> based
>> on cost of instruction given in

Re: [RFC] lto partitioning of varpool_nodes for section anchors

2016-07-06 Thread Prathamesh Kulkarni

On 4 July 2016 at 13:51, Andrew Pinski  wrote:
> On Mon, Jul 4, 2016 at 12:58 AM, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> I have attached a "quick and dirty" prototype patch (var-partition-1.diff),
>> that attempts to partition variables to reduce number of
>> external references and to increase usage of section-anchors
>> to CSE address computation of global variables.
>>
>> We could put a variable in a partition that has max references for it,
>> however it doesn't lend itself directly to section anchor optimization.
>> For instance if a partition has max references for variables 'a' and 'b',
>> but no function in that partition references both 'a', and 'b' then AFAIU
>> it doesn't make any difference from section anchors perspective to have them
>> in same partition.
>>
>> The patch tries to assign a set of variables (>= 2)
>> to a partition whose functions have maximum references for that set.
>> Functions within the partition that reference the variables
>> in the set can take advantage of section-anchors. Functions
>> referencing the variables in the set outside the partition
>> would need to load them as external references (using movw/movt),
>> however since we are placing the set in partition that has maximal
>> references for it, number of external references should be overall
>> reduced.
>>
>> Partitioning is gated by -flto-var-partition and enabled
>> only for arm and aarch64.
>
> Why only for arm and aarch64?  Shouldn't it be enabled for all section
> anchor targets?
AFAIK the only targets supporting section anchors are arm, aarch64 and powerpc.
I didn't enable it for ppc64 because I am not sure how much profitable
it is for that target.
Honza mentioned to me some time back that effect of partitioning on
powerpc was nearly zero.

Thanks,
Prathamesh
>
> Thanks,
> Andrew
>
>> As per previous discussion [1], I haven't
>> touched function partitioning. Does this approach look ok
>> especially regarding correctness ?
>> So far, I have cross-tested patch on arm*-*-*, aarch64*-*-*.
>>
>> I haven't yet managed to benchmark the patch.
>> As a cheap measurement, I tried to measure number of external
>> references with and without patch by writing a small ipa pass
>> which is run during ltrans and simply walks over varpool nodes
>> and counts number of varpool_nodes for which DECL_EXTERNAL (vnode->decl) is 
>> true
>> and vnode->definition is 0. Is that sufficient condition to determine
>> if variable is externally defined ? I have attached the pass
>> (count-external-refs.diff)
>> and the comparison done with it for for SPEC2000 [2]. The entries
>> in "before" and "after" column contain summation of number of
>> external refs (total_count) across all partitions before and after applying
>> the patch. Does the comparison hold any merit ?
>> I was wondering if we could we use a better way for
>> measuring statically the effects of variable partitioning ?
>> I hope also to get done with benchmarking soon.
>>
>> I have not yet figured out how to integrate it with existing cost metrics for
>> balanced partitioning, I am looking into that.
>> I would be grateful for suggestions on the patch.
>>
>> [1] https://gcc.gnu.org/ml/gcc/2016-04/msg00090.html
>>
>> [2] SPEC2000 comparison:
>> https://docs.google.com/spreadsheets/d/1xnszyw04ksoyBspmCVYesq6KARLw-PA2n3T4aoaKdYw/edit?usp=sharing

Re: [RFC] lto partitioning of varpool_nodes for section anchors

2016-07-06 Thread Andrew Pinski

On Wed, Jul 6, 2016 at 5:00 AM, Prathamesh Kulkarni
 wrote:
> On 4 July 2016 at 13:51, Andrew Pinski  wrote:
>> On Mon, Jul 4, 2016 at 12:58 AM, Prathamesh Kulkarni
>>  wrote:
>>> Hi,
>>> I have attached a "quick and dirty" prototype patch (var-partition-1.diff),
>>> that attempts to partition variables to reduce number of
>>> external references and to increase usage of section-anchors
>>> to CSE address computation of global variables.
>>>
>>> We could put a variable in a partition that has max references for it,
>>> however it doesn't lend itself directly to section anchor optimization.
>>> For instance if a partition has max references for variables 'a' and 'b',
>>> but no function in that partition references both 'a', and 'b' then AFAIU
>>> it doesn't make any difference from section anchors perspective to have them
>>> in same partition.
>>>
>>> The patch tries to assign a set of variables (>= 2)
>>> to a partition whose functions have maximum references for that set.
>>> Functions within the partition that reference the variables
>>> in the set can take advantage of section-anchors. Functions
>>> referencing the variables in the set outside the partition
>>> would need to load them as external references (using movw/movt),
>>> however since we are placing the set in partition that has maximal
>>> references for it, number of external references should be overall
>>> reduced.
>>>
>>> Partitioning is gated by -flto-var-partition and enabled
>>> only for arm and aarch64.
>>
>> Why only for arm and aarch64?  Shouldn't it be enabled for all section
>> anchor targets?
> AFAIK the only targets supporting section anchors are arm, aarch64 and 
> powerpc.
> I didn't enable it for ppc64 because I am not sure how much profitable
> it is for that target.
> Honza mentioned to me some time back that effect of partitioning on
> powerpc was nearly zero.


No MIPS has section anchors enabled too.  Plus MIPS will benefit the
same way as AARCH64 and ARM.  PowerPC32 would too.

I don't think it is correct to enable it only for arm and aarch64.

Thanks,
Andrew Pinski

>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Andrew
>>
>>> As per previous discussion [1], I haven't
>>> touched function partitioning. Does this approach look ok
>>> especially regarding correctness ?
>>> So far, I have cross-tested patch on arm*-*-*, aarch64*-*-*.
>>>
>>> I haven't yet managed to benchmark the patch.
>>> As a cheap measurement, I tried to measure number of external
>>> references with and without patch by writing a small ipa pass
>>> which is run during ltrans and simply walks over varpool nodes
>>> and counts number of varpool_nodes for which DECL_EXTERNAL (vnode->decl) is 
>>> true
>>> and vnode->definition is 0. Is that sufficient condition to determine
>>> if variable is externally defined ? I have attached the pass
>>> (count-external-refs.diff)
>>> and the comparison done with it for for SPEC2000 [2]. The entries
>>> in "before" and "after" column contain summation of number of
>>> external refs (total_count) across all partitions before and after applying
>>> the patch. Does the comparison hold any merit ?
>>> I was wondering if we could we use a better way for
>>> measuring statically the effects of variable partitioning ?
>>> I hope also to get done with benchmarking soon.
>>>
>>> I have not yet figured out how to integrate it with existing cost metrics 
>>> for
>>> balanced partitioning, I am looking into that.
>>> I would be grateful for suggestions on the patch.
>>>
>>> [1] https://gcc.gnu.org/ml/gcc/2016-04/msg00090.html
>>>
>>> [2] SPEC2000 comparison:
>>> https://docs.google.com/spreadsheets/d/1xnszyw04ksoyBspmCVYesq6KARLw-PA2n3T4aoaKdYw/edit?usp=sharing

Fwd: Re: GCC libatomic questions

2016-07-06 Thread Richard Henderson

Redirecting to the gcc list for discussion.
I'll follow up on that thread directly.

r~

 Forwarded Message 
Subject:Re: GCC libatomic questions
Date:   Wed, 6 Jul 2016 10:27:20 -0700
From:   Bin Fan 
Organization:   Oracle Corporation
To: Richard Henderson 

Hello Richard,

This is Bin in Sun/Oracle compiler group. Sorry about the long delay for the 
libatomic ABI specification I mentioned a long long time ago. I was assigned to 
some other tasks.

Please find a draft of the libatomic ABI specification attached. The text is 
also pasted at the end of the email.

The goal of the ABI specification is twofold. First is to check with the GCC 
community that the ABI matches the latest GCC libatomic implementation. This 
would make sure that GCC and Oracle Developer Studio C/C++ compiler can work 
well together w/o any compatibility issues on Solaris/Linux + SPARC/x86. Second 
and a longer term goal is to integrate the libatomic ABI into the current 
SPARC/x86 ABI specifications.

Could you please review the draft and/or forward it to the community for review?

Thanks,
- Bin

1. Overview

1.1. Why we need an ABI for atomics

C11 standard allows different size, representation and alignment
between atomic types and the corresponding non-atomic types [1].
The size, representation and alignment of atomic types need to be 
specified in the ABI specification.

A runtime support library, libatomic, already exists on Solaris 
and Linux. The interface of this library needs to be standardized 
as part of the ABI specification, so that

- On a system that supply libatomic, all compilers in compliance 
  with the ABI can generate compatible binaries linking this library.

- The binary can be backward compatible on different versions of 
  the system as long as they support the same ABI.

1.2. What does the atomics ABI specify

The ABI specifies the following

- Data representation of the atomic types.

- The names and behaviors of the implementation-specific support
  functions.

- The atomic types for which the compiler may generate inlined code. 

- Lock-free property of the inlined atomic operations.

Note that the name and behavior of the libatomic functions specified 
in the C standard do not need to be part of this ABI, because they 
are already required to meet the specification in the standard.

1.3. Affected platforms

The following platforms are affected by this ABI specification.

SPARC (32-bit and 64-bit)
x86 (32-bit and 64-bit)

Section 1.1 and 1.2, and the Rationale, Notes and Appendix sections 
in the rest of the document are for explanation purpose only, it 
is not considered as part of the formal ABI specification.

2. Data Representation

2.1. General Rules

The general rules for size, representation and alignment of the data
representation of atomic types are the following

1) Atomic types assume the same size with the corresponding non-atomic 
   types.

2) Atomic types assume the same representation with the corresponding 
   non-atomic types.

3) Atomic types assume the same alignment with the corresponding 
   non-atomic types, with the following exceptions:

   On 32- and 64-bit x86 platforms and on 64-bit SPARC platforms, 
   atomic types of size 1, 2, 4, 8 or 16-byte have the alignment 
   that matches the size.

   On 32-bit SPARC platforms, atomic types of size 1, 2, 4 or 8-byte
   have the alignment that matches the size. If the alignment of a 
   16-byte non-atomic type is less than 8-byte, the alignment of the 
   corresponding atomic type is increased to 8-byte.

Note 

The above rules apply to both scalar types and aggregate types.

2.2. Atomic scalar types

x86

  LP64 (AMD64) 
ILP32 (i386)
C Type  sizeofAlignment  Inlineable  sizeof
Alignment  Inlineable
atomic_flag 1 1  Y   1 1
  Y
_Atomic _Bool   1 1  Y   1 1
  Y
_Atomic char1 1  Y   1 1
  Y
_Atomic signed char 1 1  Y   1 1
  Y
_Atomic unsigned char   1 1  Y   1 1
  Y
_Atomic short   2 2  Y   2 2
  Y
_Atomic signed short2 2  Y   2 2
  Y
_Atomic unsigned short  2 2  Y   2 2
  Y
_Atomic int 4 4  Y   4 4
  Y
_Atomic signed int  4 4  Y   4 4
  Y
_Atomic enum4 4  Y   4 4
  Y
_Atomic unsigned int4 4  Y   4 4
  Y
_Atomic long8 8  Y

Re: Fwd: Re: GCC libatomic questions

2016-07-06 Thread Richard Henderson


CMPXCHG16B is not always available on 64-bit x86 platforms, so 16-byte
naturally aligned atomics are not inlineable. The support functions for
such atomics are free to use lock-free implementation if the instruction
is available on specific platforms.


Except that it is available on almost all 64-bit x86 platforms.  As far as I 
know, only 2004 era AMD processors didn't have it; all Intel 64-bit cpus have 
supported it.


Further, gcc will most certainly make use of it when one specifies any 
command-line option that enables it, such as -march=native.


Therefore we must specify that for x86_64, 16-byte objects are non-locking on 
cpus that support cmpxchg16b.



However, if a compiler inlines an atomic operation on an _Atomic long
double object and uses the new lock-free instructions, it could break
the compatibility if the library implementation is still non-lock-free.
So such compiler change must be accompanied by a library change, and
the ABI must be updated as well.


The tie between gcc version and libgcc.so version is tight; I see no reason 
that the libatomic.so version should not also be tight with the compiler version.


It is sufficient that libatomic use atomic instructions when they are 
available.  If a new processor comes out with new capabilities, the compiler 
and runtime are upgraded in lock-step.


How that is selected is beyond the ABI but possible solutions are

(1) ld.so search path, based on processor capabilities,
(2) ifunc (or workalike) where the function is selected at startup,
(3) explicit runtime test within the relevant functions.

All solutions expose the same function interface so the function call ABI is 
not affected.



_Bool __atomic_is_lock_free (size_t size, void *object);

Returns whether the object pointed to by object is lock-free.
The function assumes that the size of the object is size. If object
is NULL then the function assumes that object is aligned on an
size-byte address.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65033

The actual code change is completely within libstdc++, but it affects the 
description of the libatomic function.


C++ requires that is_lock_free return the same result for all objects of a 
given type.  Whereas __atomic_is_lock_free, with a non-null object, determines
if we will implement lock free for a *specific* object, using the specific 
object's alignment.


Rather than break the ABI and add a different function that passes the type 
alignment, the solution we hit upon was to pass a "fake", minimally aligned 
pointer as the object parameter: (void *)(uintptr_t)-__alignof(type).



The final component of the ABI that you've forgotten to specify, if you want 
full compatibility of linked binaries, is symbol versioning.


We have had two ABI additions since the original release.  See

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libatomic/libatomic.map;h=39e7c2c6b9a70121b5f4031da346a27ae6c1be98;hb=HEAD


r~

question about illegal utf-8 encoding in string literals

2016-07-06 Thread Blower, Melanie

Hello
I work for Intel on the Intel C++ compiler and we strive to be compatible with 
the gnu compiler.
We are processing a source file assuming utf-8 encoding and we see a string 
literal with illegal utf-8 encoding, such as an 8-bit character with the high 
bit set like 0xa3.
Testing shows that gcc is passes the illegal utf-8 character through without 
diagnostic message, as though it were an "extended ascii" character.
I don't see a way to enable warnings for this issue.
Please confirm that gcc handles illegal utf-8 encodings this way.
Thanks and regards, Melanie Blower

Re: [RFC] lto partitioning of varpool_nodes for section anchors

2016-07-06 Thread Prathamesh Kulkarni

On 6 July 2016 at 22:25, Andrew Pinski  wrote:
> On Wed, Jul 6, 2016 at 5:00 AM, Prathamesh Kulkarni
>  wrote:
>> On 4 July 2016 at 13:51, Andrew Pinski  wrote:
>>> On Mon, Jul 4, 2016 at 12:58 AM, Prathamesh Kulkarni
>>>  wrote:
 Hi,
 I have attached a "quick and dirty" prototype patch (var-partition-1.diff),
 that attempts to partition variables to reduce number of
 external references and to increase usage of section-anchors
 to CSE address computation of global variables.

 We could put a variable in a partition that has max references for it,
 however it doesn't lend itself directly to section anchor optimization.
 For instance if a partition has max references for variables 'a' and 'b',
 but no function in that partition references both 'a', and 'b' then AFAIU
 it doesn't make any difference from section anchors perspective to have 
 them
 in same partition.

 The patch tries to assign a set of variables (>= 2)
 to a partition whose functions have maximum references for that set.
 Functions within the partition that reference the variables
 in the set can take advantage of section-anchors. Functions
 referencing the variables in the set outside the partition
 would need to load them as external references (using movw/movt),
 however since we are placing the set in partition that has maximal
 references for it, number of external references should be overall
 reduced.

 Partitioning is gated by -flto-var-partition and enabled
 only for arm and aarch64.
>>>
>>> Why only for arm and aarch64?  Shouldn't it be enabled for all section
>>> anchor targets?
>> AFAIK the only targets supporting section anchors are arm, aarch64 and 
>> powerpc.
>> I didn't enable it for ppc64 because I am not sure how much profitable
>> it is for that target.
>> Honza mentioned to me some time back that effect of partitioning on
>> powerpc was nearly zero.
>
>
> No MIPS has section anchors enabled too.  Plus MIPS will benefit the
> same way as AARCH64 and ARM.  PowerPC32 would too.
>
> I don't think it is correct to enable it only for arm and aarch64.
Thanks, I updated the patch to remove -flto-var-partition
and gated the partition currently on target_supports_section_anchors_p()
(although it doesn't test if -fsection-anchors is passed).
Um I am not able to see where mips has section anchors enabled ?
mips.c does not seem to override min_anchor_offset and max_anchor_offset
hooks. Both these hooks have values 0 for default, and
target_supports_section_anchors_p()
returns false if both these hooks have value 0.

Thanks,
Prathamesh
>
> Thanks,
> Andrew Pinski
>
>>
>> Thanks,
>> Prathamesh
>>>
>>> Thanks,
>>> Andrew
>>>
 As per previous discussion [1], I haven't
 touched function partitioning. Does this approach look ok
 especially regarding correctness ?
 So far, I have cross-tested patch on arm*-*-*, aarch64*-*-*.

 I haven't yet managed to benchmark the patch.
 As a cheap measurement, I tried to measure number of external
 references with and without patch by writing a small ipa pass
 which is run during ltrans and simply walks over varpool nodes
 and counts number of varpool_nodes for which DECL_EXTERNAL (vnode->decl) 
 is true
 and vnode->definition is 0. Is that sufficient condition to determine
 if variable is externally defined ? I have attached the pass
 (count-external-refs.diff)
 and the comparison done with it for for SPEC2000 [2]. The entries
 in "before" and "after" column contain summation of number of
 external refs (total_count) across all partitions before and after applying
 the patch. Does the comparison hold any merit ?
 I was wondering if we could we use a better way for
 measuring statically the effects of variable partitioning ?
 I hope also to get done with benchmarking soon.

 I have not yet figured out how to integrate it with existing cost metrics 
 for
 balanced partitioning, I am looking into that.
 I would be grateful for suggestions on the patch.

 [1] https://gcc.gnu.org/ml/gcc/2016-04/msg00090.html

 [2] SPEC2000 comparison:
 https://docs.google.com/spreadsheets/d/1xnszyw04ksoyBspmCVYesq6KARLw-PA2n3T4aoaKdYw/edit?usp=sharing
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 453343a..09b525e 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -34,6 +34,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "ipa-inline.h"
 #include "lto-partition.h"
+#include "toplev.h" /* for target_supports_section_anchors_p()  */
+#include 
+#include 
+#include 
+#include 
+#include 

 vec ltrans_partitions;

@@ -407,6 +413,274 @@ add_sorted_nodes (vec &next_nodes, 
ltrans_partition partition)
   add_symbol_to_partition (partition, node);
 }

+/* FIXME: Currently I don't care to compute power set if set has more

gcc-4.9-20160706 is now available

2016-07-06 Thread gccadmin

Snapshot gcc-4.9-20160706 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20160706/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 238060

You'll find:

 gcc-4.9-20160706.tar.bz2 Complete GCC

  MD5=44e8cd46bf8ffc9a61f0222e15b3288c
  SHA1=1415de843c84d9fb366ec0ef565aca97fbd6aac4

Diffs from 4.9-20160629 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: [gimplefe] hacking pass manager

Re: [gimplefe] hacking pass manager

Re: [Patch 0,1a] Improving effectiveness and generality of autovectorization using unified representation.

Re: [RFC] lto partitioning of varpool_nodes for section anchors

Re: [RFC] lto partitioning of varpool_nodes for section anchors

Fwd: Re: GCC libatomic questions

Re: Fwd: Re: GCC libatomic questions

question about illegal utf-8 encoding in string literals

Re: [RFC] lto partitioning of varpool_nodes for section anchors

gcc-4.9-20160706 is now available

10 matches

Site Navigation

Mail list logo

Footer information