Re: Prototype implementation: Improving effectiveness and generality of auto-vectorization

2016-01-11 Thread Richard Biener
On Fri, Jan 8, 2016 at 5:11 PM, Alan Lawrence
 wrote:
> On Tues, Oct 27, 2015 at 2:39 PM, Richard Biener
>  wrote:
>>
>> On Mon, Oct 26, 2015 at 6:59 AM, sameera 
>> wrote:
>>>
>>>
>>> Richard, we have defined the input language for convenience in prototype
>>> implementation. However, we will be using GIMPLE as our IR. As per
>>> grammar
>>> of our tree, p-tree denote the permute order associated with the
>>> statement,
>>> whereas c-tree is actually the GIMPLE instruction, which performs compute
>>> operation. I tried looking at structures used in SLP, however they can
>>> not
>>> be used as they are, as main difference between current SLP
>>> implementation
>>> in GCC versus our representation is that, permute order in SLP is part of
>>> the tree node in current GCC, whereas in our representation permute order
>>> is
>>> represented as independent tree-node. Hence, I have created new tree
>>> structure for our pass, which will create p-tree nodes for permute order,
>>> and c-tree node which points to appropriate gimple statement.
>>
>>
>> Yes, that's the whole purpose - get the vectorizer (and SLP) a better
>> data structure
>> which is explicit about permutes.
>
>
 I somewhat miss an idea on how to deal with irregular SLP cases - that
 is,
 does the AST model each (current) SLP node as a single statement or
 does it model individual vector elements (so you can insert no-op
 compensation
 code to make the operations match).  Consider

   a[i] = b[i] * 3;
   a[i+1] = b[i+1];

 which can be vectorized with SLP if you realize you can multiply b[i+1]
 by
 1.
>>>
>>>
>>>
>>> The pass structure we are having for our optimization is as follows:
>>> - New pass target-aware-loop-vect with following subpasses:
>>>   - Loop structure identification
>>>   - Restricted loop construction
>>>   - Loop analysis (6 phases of target-aware reordering-instruction
>>> selection
>>> algorithm)
>>>   - Loop transformation (Cost aware gimple code generation)
>>> We might need some optimizations to be performed on loop body before we
>>> start our optimization for reordering instruction selection, so that
>>> input
>>> program can be transformed to fit into our restricted loop structure.
>>> These
>>> transformations can be performed by restricted loop construction subpass.
>>> Eg.: multi-dimentional array, nested loops, reduction chains etc. can be
>>> supported by transforming input loop. The case that you have mentioned,
>>> can
>>> be transformed at this level.
>
>
 As of implementing the whole thing in GCC I recently spent some more
 time
 thinking and came to the conclusion that the path to the fastest
 improvements
 in GCC code-gen would be to build the new AST after the current analysis
 phase finished, do the optimization and drive code-generation off the
 new
 AST.
 Thus keep things like data-ref and dependence analysis as is, as well as
 group analysis and SLP tree construction.  Build the AST from the
 vectorizer
 "IL" at the point vect_transform_loop / vect_slp_transform_bb is called.
>>>
>>>
>>>
>>> Current pass structure that we have designed, does exactly that. The only
>>> difference is that, we are planning to use the transform phase as a
>>> co-process of our transform pass, than reconstruct an AST from our
>>> representation to pass to vect_transform_loop.
>>>

 More and more of the rest of the vectorizer code can then be "slowly"
 moved
 to work on the AST and AST construction be moved earlier.
>
>
> So I think an incremental approach is much more likely to get us some
> benefit, and I'm wondering how much of the benefit here stems from which
> bit, and how many of these changes can be broken off.
>
> For example, making permutes into explicit operations on the SLP tree, and
> adding a pass that tries to push them up/down the tree into each other,
> might be done on the existing codebase, and would be a significant win in
> itself (bringing some/much(?) of improvements in interleaving code of your
> proposal).
>
> I'm not really sure how much of the benefit stems from what, though - e.g.
> does reducing vector length down to fit the target machine only after the
> other steps, bring any benefits besides raising abstraction? (Which
> certainly can be a benefit!). AFAICT this could be done independently of
> other changes?
>
> Another limitation of the present SLP structure is it's not understanding
> sharing (until later CSE, hence costing wrongly); it would be good to fix
> this (making the SLP tree into a DAG, and potentially reusing subtrees by
> adding permutes). This is a step we'd like to take anyway, but might tie in
> with your "Bottom-up commoning of p-node subtrees". And would help with (BB
> as well as loop) cost calculations.
>
> Are there any really fundamental differences that we cannot overcome in
> steps? ( / are there any steps we shouldn't take in isolation, because we'd
> hav

Re: __builtin_memcpy and alignment assumptions

2016-01-11 Thread Richard Biener
On Fri, Jan 8, 2016 at 7:36 PM, Steve Ellcey  wrote:
> On Fri, 2016-01-08 at 12:56 +0100, Richard Biener wrote:
>> On Fri, Jan 8, 2016 at 12:40 PM, Eric Botcazou  
>> wrote:
>> >> I think we only assume it if the pointer is actually dereferenced, 
>> >> otherwise
>> >> it just breaks too much code in the wild.  And while memcpy dereferences,
>> >> it dereferences it through a char * cast, and thus only the minimum
>> >> alignment is assumed.
>> >
>> > Yet the compiler was generating the expected code for Steve's testcase on
>> > strict-alignment architectures until very recently (GCC 4.5 IIUC) and this
>> > worked perfectly.
>
> Yes, I just checked and I did get the better code in GCC 4.5 and I get
> the current slower code in GCC 4.6.
>
>> Consider
>>
>> int a[256];
>> int
>> main()
>> {
>>   void *p = (char *)a + 1;
>>   void *q = (char *)a + 5;
>>   __builtin_memcpy (p, q, 4);
>>   return 0;
>> }
>>
>> where the ME would be entitled to "drop" the char */void * conversions
>> and use &a typed temps.
>
> I am not sure how this works but I tweaked get_pointer_alignment_1 so
> that if there was no align info or if get_ptr_info_alignment returned
> false then the routine would return type based alignment information
> instead of default 'void *' alignment.  In that case and using your
> example, GCC still accessed p & q as pointers to unaligned data.
>
> In fact if I used int pointers:
>
> int a[256];
> int main()
> {
>   int *p = (int *)((char *)a + 1);
>   int *q = (int *)((char *)a + 5);
>   __builtin_memcpy (p, q, 4);
>   return 0;
> }
>
> GCC did unaligned accesses when optimizing, but when unoptimized (and
> with my change) GCC did aligned accesses, which would not work on a
> strict alignment machine like MIPS  This seems to match what happens
> with:
>
> int a[256];
> int main()
> {
>   int *p = (int *)((char *)a + 1);
>   int *q = (int *)((char *)a + 5);
>   *p = *q;
>   return 0;
> }
>
> When I optimize it, GCC does unaligned accesses and when unoptimized
> GCC does aligned accesses which will not work on MIPS.

Of course reading the fine-print in the C standard makes these testcases
undefined (you use a int * type for not properly aligned pointers, -Wcast-align
should warn about this).

Btw, your change to get_pointer_alignment would basically boil down to
doing get_object_alignment.

Richard.


Re: ipa vrp implementation in gcc

2016-01-11 Thread Richard Biener
On Mon, Jan 11, 2016 at 1:38 AM, Kugan
 wrote:
> Hi All,
>
> I am looking at implementing a ipa vrp pass. Jan Hubicka also talks
> about this in 2013 GNU Cauldron as one of the optimization he would like
> to see in gcc. So my question is, is any one implementing it. If not we
> would like to do that.
>
> I also looked at the ipa-cp implementation to see how this can be done.
> Going by this, one of the way to implement this is (skipping all the
> details):
>
> - Have an early tree-vrp so that we can have value ranges for parameters
> at call sites.

I'd rather use the IPA analysis phase for this and use a VRP algorithm
that doesn't require ASSERT_EXPR insertion.

> - Create jump functions that captures the value ranges of call sites
> propagate the value ranges. In 2013 talk, Jan Hubicka talks about
>
> - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to
> have its own and much more simpler implementation ?

No idea.

> - Once we have the value ranges for parameter/return values, we could
> rely on tree-vrp to use this and do the optimizations

Yep.  IPA transform phase should annotate parameter default defs with
computed ranges.

> Does this make any sense? Any thoughts/suggestions to work on this is
> highly appreciated.

IPA alignment propagation should already be somewhat similar as in doing
an intersection step during propagation.

Richard.

> Thanks,
> Kugan


Re: Some real-life feedback on -Wmisleading-indentation

2016-01-11 Thread David Malcolm
On Mon, 2016-01-11 at 15:20 +0800, Gerald Pfeifer wrote:
> Compiling Wine with GCC trunk (to become GCC 6) I noticed four
> dozen of warnings triggered by -Wmisleading-indentation.
> 
> Some are simply weird formatting, some may be indicative of 
> real issues -- and I have started to look into them one by 
> one and submitting patches (to Wine).
> 
> However, there is a pattern I noticed which, while a bit weird 
> in terms of formatting, does not really strike me as something
> -Wmisleading-indentation should warn about:
> 
> VariantInit(&ret);
> hr = IWinHttpRequest_Invoke(request, ...);
> ok(hr == DISP_E_UNKNOWNINTERFACE, "error %#x\n", hr);
> 
> VariantInit(&ret);
> if (0) /* crashes */
> hr = IWinHttpRequest_Invoke(request, ...);
> 
> params.cArgs = 1;
> hr = IWinHttpRequest_Invoke(request, ...);
> ok(hr == DISP_E_TYPEMISMATCH, "error %#x\n", hr);
> 
> VariantInit(&arg[2]);
> 
> This is from the Wine testsuite, and the if (0) in colum one guards 
> one invication of the function under test that would crash (so is 
> the equivalent of #if 0...#endif, except that it avoids conditional 
> compilation).
> 
> Is this a bit unusual?  Definitely?  Does it look like a one of
> those cases a programmer would actually be tempted to misread?
> I don't think so.
> 
> What do you think?

How often does this pattern occur in Wine?


In Chapter 31 of "Code Complete" (I'm looking at the 2nd edition),
McConnell discusses the use of whitespace in code layout; he talks about
both blank lines and indentation as being useful for providing hints to
the human reader about the structure of the code.

In several places he talks about the use of blank lines for separating
groups of related statements into "paragraphs", that blank lines in code
can and should be used to demarcate logical "blocks" of related
statements (e.g. pp747-8 of 2nd edition).

I think that in the Wine example above the blank lines do effectively
create "paragraphs" of code to a human, and I agree that a human reader
is unlikely to think of the

if (0)

as guarding the

   params.cArgs = 1;

since they're in different "paragraphs".

I apologize if I'm belaboring the point here, but I think that
-Wmisleading-indentation should make use of blank lines of code.

I've posted patches to do so here, firstly here:

https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html

which was rejected thusly by Jeff:
> I would argue that each of these does represent misleading
> indentation and that the warning is warranted for each.
> Perhaps they aren't as bad as prior cases, but I'd still
> consider them mis-leading.
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03242.html)

and Bernd: 
> I think you could (barely) make a case for the last one being
> ok, but I disagree fairly strongly about the other two cases -
> they are clearly misindented, so why wouldn't we warn?
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03253.html)


I later posted:
"[PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall"
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01011.html

which was rejected by Bernd:
> I personally see no use for the blank line heuristic, in fact
> I think it is misguided. To my eyes a warning is clearly
> warranted for the examples in this testcase. Do we actually
> have any users calling for this heuristic?
(https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01015.html)


> That's a fairly low false-positive rate if we even want to
> call it that. I'd fix the three examples and leave the warning
> code as is. We can revisit the decision if we get a flood of
> bug reports.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01040.html


and Pedro noted:
> IMHO, the problem with the levels idea will be if/when later
> you come up with some other orthogonal heuristic to catch some
> other class of indentation problem, and users want to enable
> it but not the blank-lines heuristic, or vice-versa.
> Also, levels don't allow finer-selection
> with "-Werror -Wno-error=misleading-indentation", IIUC.
(https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01036.html)



In the light of Gerald's report on Wine, can we revisit this please, and
perhaps use the blank line heuristic for -Wall?

I'd like to keep the levels idea, and to simply have something this in
the docs:

"The warning can optionally accept a level (either 1 or 2), for
specifying increasing levels of strictness.
@option{-Wmisleading-indentation=1} is enabled by @option{-Wall} in C
and C++ and uses a set of heuristics intended to provide useful results
on typical code.
@option{-Wmisleading-indentation=2} is stricter."

or somesuch.

There may be an argument that -Wmisleading-indentation should be renamed
to -Wmisleading-whitespace, since indentation is a subset of whitespace.
I don't know how I feel about that.

Hope this is constructive
Dave



Re: ipa vrp implementation in gcc

2016-01-11 Thread vivek pandya
> On Mon, Jan 11, 2016 at 4:07 PM, Richard Biener 
> wrote:
>>
>> On Mon, Jan 11, 2016 at 1:38 AM, Kugan
>>  wrote:
>> > Hi All,
>> >
>> > I am looking at implementing a ipa vrp pass. Jan Hubicka also talks
>> > about this in 2013 GNU Cauldron as one of the optimization he would like
>> > to see in gcc. So my question is, is any one implementing it. If not we
>> > would like to do that.
>> >
>
> Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal for
> his work and it is very similar to extending ipa-cp pass. I am also in touch
> with Jan Hubicka. These comments will certainly help me but if is urgent for
> any one you can begin work on this. Jan has shown interest to mentor me for
> this project but any help from community is always appreciated.
>
>>
>> > I also looked at the ipa-cp implementation to see how this can be done.
>> > Going by this, one of the way to implement this is (skipping all the
>> > details):
>> >
>> > - Have an early tree-vrp so that we can have value ranges for parameters
>> > at call sites.
>
> Actually a tree-vrp pass already exists. But as Jan has suggested me that
> ipa-vrp implementation should not be too much costly. So I am also thinking
> to include this work in my proposal and also using the analysis to improve
> LTO heuristics as the project duration will be around 2.5 months.
>>
>>
>> I'd rather use the IPA analysis phase for this and use a VRP algorithm
>> that doesn't require ASSERT_EXPR insertion.
>>
>> > - Create jump functions that captures the value ranges of call sites
>> > propagate the value ranges. In 2013 talk, Jan Hubicka talks about
>> >
>> > - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to
>> > have its own and much more simpler implementation ?
>>
>> No idea.
>>
>> > - Once we have the value ranges for parameter/return values, we could
>> > rely on tree-vrp to use this and do the optimizations
>>
>> Yep.  IPA transform phase should annotate parameter default defs with
>> computed ranges.
>>
>> > Does this make any sense? Any thoughts/suggestions to work on this is
>> > highly appreciated.
>>
>> IPA alignment propagation should already be somewhat similar as in doing
>> an intersection step during propagation.
>>
>> Richard.
>>
>> > Thanks,
>> > Kugan
>
> Your comments certainly helps me to develop my proposal. Please let me know
> any updated to avoid the confusion and duplication of work.
> Sincerely,
> Vivek


Re: ipa vrp implementation in gcc

2016-01-11 Thread Jan Hubicka
> On Mon, Jan 11, 2016 at 1:38 AM, Kugan
>  wrote:
> > Hi All,
> >
> > I am looking at implementing a ipa vrp pass. Jan Hubicka also talks
> > about this in 2013 GNU Cauldron as one of the optimization he would like
> > to see in gcc. So my question is, is any one implementing it. If not we
> > would like to do that.
> >
> > I also looked at the ipa-cp implementation to see how this can be done.
> > Going by this, one of the way to implement this is (skipping all the
> > details):
> >
> > - Have an early tree-vrp so that we can have value ranges for parameters
> > at call sites.
> 
> I'd rather use the IPA analysis phase for this and use a VRP algorithm
> that doesn't require ASSERT_EXPR insertion.

Another potential use of value ranges is the profile estimation. 
http://www.lighterra.com/papers/valuerangeprop/Patterson1995-ValueRangeProp.pdf
It seems to me that we may want to have something that can feed sane loop
bounds for profile estimation as well and we can easily store the known
value ranges to SSA name annotations.
So I think separate local pass to compute value ranges (perhaps with less
accuracy than full blown VRP) is desirable.
> 
> > - Create jump functions that captures the value ranges of call sites
> > propagate the value ranges. In 2013 talk, Jan Hubicka talks about
> >
> > - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to
> > have its own and much more simpler implementation ?
> 
> No idea.

I think the ipa-prop.c probably won't need any siginificant changes.  The 
code basically analyze what values are passed thorugh the function and
this works for constants as well as for intervals. In fact ipa-cp already
uses the same ipa-prop analysis for 
 1) constant propagation
 2) alignment propagation
 3) propagation of known polymorphic call contextes.

So replacing 1) by value range propagation should be easily doable. 
I would also like to replace alignment propagation by bitwise constant
propagation (i.e. propagating what bits are known to be zero and what
bits are known to be one). We already do have bitwise CCP, so we could
deal with this basically in the same way as we deal with value ranges.

ipa-prop could use bit of clenaing up and modularizing that I hope will
be done next stage1 :)

> 
> > - Once we have the value ranges for parameter/return values, we could
> > rely on tree-vrp to use this and do the optimizations
> 
> Yep.  IPA transform phase should annotate parameter default defs with
> computed ranges.

Yep, in addition we will end up with known value ranges stored in aggregates
for that we need better separate representaiton

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68930
> 
> > Does this make any sense? Any thoughts/suggestions to work on this is
> > highly appreciated.
> 
> IPA alignment propagation should already be somewhat similar as in doing
> an intersection step during propagation.

The merge operation is same as VRP does, but yep, IPA alignment code is good
one to follow for start.  Martin implemented it in a bit limited way so it
never performs cloning, but we can deal with that incrementally.
> 
> Richard.
> 
> > Thanks,
> > Kugan


Re: ivopts vs. garbage collection

2016-01-11 Thread Michael Matz
Hi,

On Fri, 8 Jan 2016, Richard Biener wrote:

> > The only solution here is for ivopts to keep a pointer to the array, 
> > not a pointer to some location near, but outside of the array.
> 
> Yes, the solution is to make IVOPTs not do this (eventually controlled 
> by a parameter because clearly it thinks doing this is beneficial 
> cost-wise).

Well, that's a hack.  A solution is to design something that works 
generally for garbage collected languages with such requirements instead 
of arbitrarily limiting transformations here and there.  It could be 
something like the notion of derived pointers, where the base pointer 
needs to stay alive as long as the derived pointers are.  All potential GC 
points where a derived pointer is alive also needs to have the base 
pointer be alive (they could explicitely add uses of the base pointers, or 
alternatively anthing computing liveness needs to deal with this).  For 
normal transformation that don't deal with explicit liveness sets (i.e. 
most of our SSA transforms) it's enough if the instruction creating the 
derived pointer from the base can't be looked through, i.e. is an 
optimization barrier.


Ciao,
Michael.


Re: Some real-life feedback on -Wmisleading-indentation

2016-01-11 Thread Manuel López-Ibáñez

On 11/01/16 07:20, Gerald Pfeifer wrote:

This is from the Wine testsuite, and the if (0) in colum one guards
one invication of the function under test that would crash (so is
the equivalent of #if 0...#endif, except that it avoids conditional
compilation).


Perhaps a good heuristic is to disable the warning if the 'if' starts at the 
first column. Surely, such code is "special"!


About the blank line heuristic, either include it by default or not at all. I 
really dislike the idea of having several levels: it just shows that we could 
not decide for robust heuristics or there are limitations in the current 
implementation that no one is willing to fix (talking here about other warnings 
with numerical levels). What goes in each level becomes arbitrary, and 
non-default levels are rarely used.


Cheers,

Manuel.





Re: ivopts vs. garbage collection

2016-01-11 Thread Ian Lance Taylor
On Mon, Jan 11, 2016 at 10:00 AM, Michael Matz  wrote:
>
> On Fri, 8 Jan 2016, Richard Biener wrote:
>
>> > The only solution here is for ivopts to keep a pointer to the array,
>> > not a pointer to some location near, but outside of the array.
>>
>> Yes, the solution is to make IVOPTs not do this (eventually controlled
>> by a parameter because clearly it thinks doing this is beneficial
>> cost-wise).
>
> Well, that's a hack.  A solution is to design something that works
> generally for garbage collected languages with such requirements instead
> of arbitrarily limiting transformations here and there.  It could be
> something like the notion of derived pointers, where the base pointer
> needs to stay alive as long as the derived pointers are.  All potential GC
> points where a derived pointer is alive also needs to have the base
> pointer be alive (they could explicitely add uses of the base pointers, or
> alternatively anthing computing liveness needs to deal with this).  For
> normal transformation that don't deal with explicit liveness sets (i.e.
> most of our SSA transforms) it's enough if the instruction creating the
> derived pointer from the base can't be looked through, i.e. is an
> optimization barrier.

What do you suggest we do for GCC 6?

Your suggestion of every derived pointer keeping the base pointer
alive sounds too strong to me; it certainly sounds too strong for Go.
For Go it should be sufficient to ensure that every derived pointer
points within the bounds of the object.  Only when a derived pointer
points outside of the object (including just after the end of the
object) is it necessary to explicitly keep a base pointer alive.

It's also not obvious to me that making a pointer transformation into
an optimization barrier would be a win overall.  For something like
ivopts it seems better to simply not introduce the pointer
transformation--to apply ivopts only to non-pointers when GC matters
(GC matters not only for Go and Java, but also for any C/C++ program
using something like the Boehm collector).

(I'm still puzzled by the fact that Java has apparently not
encountered this problem in all these years.)

Ian


Re: How to determine source location of pragma?

2016-01-11 Thread Manuel López-Ibáñez

On 11/01/16 01:08, Vanush Vaswani wrote:

I am new to GCC internals.

I'm trying to create a plugin to operate on pragmas. Currently have
this working using c_register_pragma with a callback.

The callback performs pragma_lex and is able to retrieve the string
token of the pragma based on this example.
https://github.com/wh5a/gcc-plugin/blob/master/pragma_plugin.c

How can I determine the line number of the pragma? I tried
DECL_SOURCE_LINE on the tree type but this returns 0. Do I have to
interact with cpplib?


pragma_lex in GCC 6 has an optional &loc parameter. See 
handle_pragma_diagnostic() in c-pragma.c to see how it is used.


Cheers,

Manuel.



Re: ivopts vs. garbage collection

2016-01-11 Thread Michael Matz
Hi,

On Mon, 11 Jan 2016, Ian Lance Taylor wrote:

> > Well, that's a hack.  A solution is to design something that works 
> > generally for garbage collected languages with such requirements 
> > instead of arbitrarily limiting transformations here and there.  It 
> > could be something like the notion of derived pointers, where the base 
> > pointer needs to stay alive as long as the derived pointers are.  All 
> > potential GC points where a derived pointer is alive also needs to 
> > have the base pointer be alive (they could explicitely add uses of the 
> > base pointers, or alternatively anthing computing liveness needs to 
> > deal with this).  For normal transformation that don't deal with 
> > explicit liveness sets (i.e. most of our SSA transforms) it's enough 
> > if the instruction creating the derived pointer from the base can't be 
> > looked through, i.e. is an optimization barrier.
> 
> What do you suggest we do for GCC 6?

Realistically?  The hack of course.

> Your suggestion of every derived pointer keeping the base pointer alive 
> sounds too strong to me; it certainly sounds too strong for Go. For Go 
> it should be sufficient to ensure that every derived pointer points 
> within the bounds of the object.

Okay, that's a certain type of GC.  Others might need exact offset-zero 
base pointers.

> It's also not obvious to me that making a pointer transformation into
> an optimization barrier would be a win overall.

It will almost always be a pessimization (the additional live value needs 
at least a place on the stack).

> For something like
> ivopts it seems better to simply not introduce the pointer
> transformation--to apply ivopts only to non-pointers when GC matters

Of course that deals only with ivopts.  In practice that might be enough, 
even for a long time, but it's not really a full solution, there are other 
transformations on pointers (e.g. the vectorizer fiddling with alignment), 
some of them creating out of object pointers (e.g. chopping off the lowest 
few bits).

Another problem is to define "when GC matters".  With LTO you can't rely 
on anything from the frontend, so it needs to be encoded in the IL, or at 
the very least in a per-function item, with corresponding avoidance of 
inlining of GC into non-GC aware functions.

> (I'm still puzzled by the fact that Java has apparently not encountered 
> this problem in all these years.)

Probably it just so happened that the base pointer for some derived 
pointer was lying around in some memory location.  It's not very likely 
that the only reference is in a register, it can happen only shortly after 
allocating the new block, but before it's actually stored into some other 
structure (or freed, i.e. when it's only temporary, but then the base 
pointer would be live as well), perhaps this pattern doesn't happen very 
often in java code.  Obviously it does in Go.  Perhaps we can limit the 
ivopts hack also to pointers that are problematic.  Only if the base 
pointer comes from a new allocation (is not loaded from memory) and isn't 
stored into memory before use do we need to avoid manipulating it too 
much.


Ciao,
Michael.


Re: ivopts vs. garbage collection

2016-01-11 Thread Richard Biener
On January 11, 2016 8:35:25 PM GMT+01:00, Michael Matz  wrote:
>Hi,
>
>On Mon, 11 Jan 2016, Ian Lance Taylor wrote:
>
>> > Well, that's a hack.  A solution is to design something that works 
>> > generally for garbage collected languages with such requirements 
>> > instead of arbitrarily limiting transformations here and there.  It
>
>> > could be something like the notion of derived pointers, where the
>base 
>> > pointer needs to stay alive as long as the derived pointers are. 
>All 
>> > potential GC points where a derived pointer is alive also needs to 
>> > have the base pointer be alive (they could explicitely add uses of
>the 
>> > base pointers, or alternatively anthing computing liveness needs to
>
>> > deal with this).  For normal transformation that don't deal with 
>> > explicit liveness sets (i.e. most of our SSA transforms) it's
>enough 
>> > if the instruction creating the derived pointer from the base can't
>be 
>> > looked through, i.e. is an optimization barrier.
>> 
>> What do you suggest we do for GCC 6?
>
>Realistically?  The hack of course.
>
>> Your suggestion of every derived pointer keeping the base pointer
>alive 
>> sounds too strong to me; it certainly sounds too strong for Go. For
>Go 
>> it should be sufficient to ensure that every derived pointer points 
>> within the bounds of the object.
>
>Okay, that's a certain type of GC.  Others might need exact offset-zero
>
>base pointers.
>
>> It's also not obvious to me that making a pointer transformation into
>> an optimization barrier would be a win overall.
>
>It will almost always be a pessimization (the additional live value
>needs 
>at least a place on the stack).
>
>> For something like
>> ivopts it seems better to simply not introduce the pointer
>> transformation--to apply ivopts only to non-pointers when GC matters
>
>Of course that deals only with ivopts.  In practice that might be
>enough, 
>even for a long time, but it's not really a full solution, there are
>other 
>transformations on pointers (e.g. the vectorizer fiddling with
>alignment), 
>some of them creating out of object pointers (e.g. chopping off the
>lowest 
>few bits).
>
>Another problem is to define "when GC matters".  With LTO you can't
>rely 
>on anything from the frontend, so it needs to be encoded in the IL, or
>at 
>the very least in a per-function item, with corresponding avoidance of 
>inlining of GC into non-GC aware functions.

Here you can extend the hack to apply said flag to the final link if it is in 
at least one LTO object.  As we do for some already.

And yes, avoiding IVOPTs on pointer values is certainly the easiest fix.

Richard.

>> (I'm still puzzled by the fact that Java has apparently not
>encountered 
>> this problem in all these years.)
>
>Probably it just so happened that the base pointer for some derived 
>pointer was lying around in some memory location.  It's not very likely
>
>that the only reference is in a register, it can happen only shortly
>after 
>allocating the new block, but before it's actually stored into some
>other 
>structure (or freed, i.e. when it's only temporary, but then the base 
>pointer would be live as well), perhaps this pattern doesn't happen
>very 
>often in java code.  Obviously it does in Go.  Perhaps we can limit the
>
>ivopts hack also to pointers that are problematic.  Only if the base 
>pointer comes from a new allocation (is not loaded from memory) and
>isn't 
>stored into memory before use do we need to avoid manipulating it too 
>much.
>
>
>Ciao,
>Michael.




Re: ivopts vs. garbage collection

2016-01-11 Thread Tom Tromey
> "Michael" == Michael Matz  writes:

Michael> Well, that's a hack.  A solution is to design something that
Michael> works generally for garbage collected languages with such
Michael> requirements instead of arbitrarily limiting transformations
Michael> here and there.  It could be something like the notion of
Michael> derived pointers, where the base pointer needs to stay alive as
Michael> long as the derived pointers are.

This was done once in GCC, for the Modula 3 compiler.
There was a paper about it, but I can't find it any more.

The basic idea was to emit a description of the stack frame that their
GC could read.  They had a moving GC that could use this information to
rewrite the frame when moving objects.

Tom


Re: [Patch] MIPS FDE deletion

2016-01-11 Thread Maciej W. Rozycki
Hi Catherine,

On Fri, 16 Oct 2015, Moore, Catherine wrote:

> MIPS16 call stubs now have .cfi directives.  If the linker decides that 
> one of these call stubs is not required, it will emit incorrect frame 
> info.  This patch suppresses the generation of the frame info by setting 
> the output section of the stub to ABS.

 Does it mean PR target/53276 has been fixed now?  What was the commit to 
add .cfi support for the stubs?

  Maciej


RE: [Patch] MIPS FDE deletion

2016-01-11 Thread Moore, Catherine


> -Original Message-
> From: Maciej W. Rozycki [mailto:ma...@imgtec.com]
> Sent: Monday, January 11, 2016 5:00 PM
> To: Moore, Catherine
> Cc: binut...@sourceware.org; gcc@gcc.gnu.org; Richard Sandiford
> Subject: Re: [Patch] MIPS FDE deletion
> 
> Hi Catherine,
> 
> On Fri, 16 Oct 2015, Moore, Catherine wrote:
> 
> > MIPS16 call stubs now have .cfi directives.  If the linker decides
> > that one of these call stubs is not required, it will emit incorrect
> > frame info.  This patch suppresses the generation of the frame info by
> > setting the output section of the stub to ABS.

I have a slight update to this patch that I need to submit.  

> 
>  Does it mean PR target/53276 has been fixed now?  What was the commit to
> add .cfi support for the stubs?
> 

Hi Maciej,

I don't know about the status of PR target/53276.  The commit to add .cfi 
support for call stubs was this one:

r184379 | rsandifo | 2012-02-19 08:44:54 -0800 (Sun, 19 Feb 2012) | 7 lines

gcc/
* config/mips/mips.c (mips16_build_call_stub): Add CFI information
to stubs with non-sibling calls.

libgcc/
* config/mips/mips16.S (CALL_STUB_RET): Add CFI information.


Catherine



Re: Some real-life feedback on -Wmisleading-indentation

2016-01-11 Thread Jeff Law

On 01/11/2016 07:53 AM, David Malcolm wrote:



In Chapter 31 of "Code Complete" (I'm looking at the 2nd edition),
McConnell discusses the use of whitespace in code layout; he talks about
both blank lines and indentation as being useful for providing hints to
the human reader about the structure of the code.

In several places he talks about the use of blank lines for separating
groups of related statements into "paragraphs", that blank lines in code
can and should be used to demarcate logical "blocks" of related
statements (e.g. pp747-8 of 2nd edition).

I think that in the Wine example above the blank lines do effectively
create "paragraphs" of code to a human, and I agree that a human reader
is unlikely to think of the

if (0)

as guarding the

params.cArgs = 1;

since they're in different "paragraphs".

I apologize if I'm belaboring the point here, but I think that
-Wmisleading-indentation should make use of blank lines of code.

I've posted patches to do so here, firstly here:

https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html

which was rejected thusly by Jeff:

I would argue that each of these does represent misleading
indentation and that the warning is warranted for each.
Perhaps they aren't as bad as prior cases, but I'd still
consider them mis-leading.

(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03242.html)

I still stand by that assessment.

ISTM for the wine case the backwards indentation (column-wise) of the IF 
may be the right filter, maybe that in conjunction with the blank line 
heuristic.  However, I stand by my belief that the blank line heuristic 
is wrong when used by itself.


Jeff


Re: [RFC][AArch64] function prologue analyzer in linux kernel

2016-01-11 Thread AKASHI Takahiro

Will,

On 01/09/2016 12:53 AM, Will Deacon wrote:

On Fri, Jan 08, 2016 at 02:36:32PM +0900, AKASHI Takahiro wrote:

On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote:

On 07/01/16 14:22, Will Deacon wrote:

On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote:

So I'd like to introduce a function prologue analyzer to determine
a size allocated by a function's prologue and deduce it from "Depth".
My implementation of this analyzer has been submitted to
linux-arm-kernel mailing list[1].
I borrowed some ideas from gdb's analyzer[2], especially a loop of
instruction decoding as well as stop of decoding at exiting a basic block,
but implemented my own simplified one because gdb version seems to do
a bit more than what we expect here.
Anyhow, since it is somewhat heuristic (and may not be maintainable for
a long term), could you review it from a broader viewpoint of toolchain,
please?


My main issue with this is that we cannot rely on the frame layout
generated by the compiler and there's little point in asking for
commitment here. Therefore, the heuristics will need updating as and
when we identify new frames that we can't handle. That's pretty fragile
and puts us on the back foot when faced with newer compilers. This might
be sustainable if we don't expect to encounter much variation, but even
that would require some sort of "buy-in" from the various toolchain
communities.

GCC already has an option (-fstack-usage) to determine the stack usage
on a per-function basis and produce a report at build time. Why can't
we use that to provide the information we need, rather than attempt to
compute it at runtime based on your analyser?

If -fstack-usage is not sufficient, understanding why might allow us to
propose a better option.


Can you not use the dwarf frame unwind data?  That's always sufficient
to recover the CFA (canonical frame address - the value in SP when
executing the first instruction in a function).  It seems to me it's
unlikely you're going to need something that's an exceedingly high
performance operation.


Thank you for your comment.
Yeah, but we need some utility routines to handle unwind data(.debug_frame).
In fact, some guy has already attempted to merge (part of) libunwind into
the kernel[1], but it was rejected by the kernel community (including Linus
if I correctly remember). It seems that they thought the code was still buggy.


The ARC guys seem to have sneaked something in for their architecture:

   
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/unwind.c

so it might not be impossible if we don't require all the bells and
whistles of libunwind.


Thanks. I didn't notice this code.


That is one of reasons that I wanted to implement my own analyzer.


I still don't understand why you can't use fstack-usage. Can you please
tell me why that doesn't work? Am I missing something?


I don't know how gcc calculates the usage here, but I guess it would be more
robust than my analyzer.

The issues, that come up to my mind, are
- -fstack-usage generates a separate output file, *.su and so we have to
  manage them to be incorporated in the kernel binary.
  This implies that (common) kernel makefiles might have to be a bit changed.
- more worse, what if kernel module case? We will have no way to let the kernel
  know the stack usage without adding an extra step at loading.

If we need to put some information about stack usage in the kernel, that should
not be much different from dwarf frame data (.eh_frame).

-Takahiro AKASHI



Will



Re: [RFC][AArch64] function prologue analyzer in linux kernel

2016-01-11 Thread Andrew Pinski
On Mon, Jan 11, 2016 at 10:11 PM, AKASHI Takahiro
 wrote:
> Will,
>
>
> On 01/09/2016 12:53 AM, Will Deacon wrote:
>>
>> On Fri, Jan 08, 2016 at 02:36:32PM +0900, AKASHI Takahiro wrote:
>>>
>>> On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote:

 On 07/01/16 14:22, Will Deacon wrote:
>
> On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote:
>>
>> So I'd like to introduce a function prologue analyzer to determine
>> a size allocated by a function's prologue and deduce it from "Depth".
>> My implementation of this analyzer has been submitted to
>> linux-arm-kernel mailing list[1].
>> I borrowed some ideas from gdb's analyzer[2], especially a loop of
>> instruction decoding as well as stop of decoding at exiting a basic
>> block,
>> but implemented my own simplified one because gdb version seems to do
>> a bit more than what we expect here.
>> Anyhow, since it is somewhat heuristic (and may not be maintainable
>> for
>> a long term), could you review it from a broader viewpoint of
>> toolchain,
>> please?
>>
> My main issue with this is that we cannot rely on the frame layout
> generated by the compiler and there's little point in asking for
> commitment here. Therefore, the heuristics will need updating as and
> when we identify new frames that we can't handle. That's pretty fragile
> and puts us on the back foot when faced with newer compilers. This
> might
> be sustainable if we don't expect to encounter much variation, but even
> that would require some sort of "buy-in" from the various toolchain
> communities.
>
> GCC already has an option (-fstack-usage) to determine the stack usage
> on a per-function basis and produce a report at build time. Why can't
> we use that to provide the information we need, rather than attempt to
> compute it at runtime based on your analyser?
>
> If -fstack-usage is not sufficient, understanding why might allow us to
> propose a better option.


 Can you not use the dwarf frame unwind data?  That's always sufficient
 to recover the CFA (canonical frame address - the value in SP when
 executing the first instruction in a function).  It seems to me it's
 unlikely you're going to need something that's an exceedingly high
 performance operation.
>>>
>>>
>>> Thank you for your comment.
>>> Yeah, but we need some utility routines to handle unwind
>>> data(.debug_frame).
>>> In fact, some guy has already attempted to merge (part of) libunwind into
>>> the kernel[1], but it was rejected by the kernel community (including
>>> Linus
>>> if I correctly remember). It seems that they thought the code was still
>>> buggy.
>>
>>
>> The ARC guys seem to have sneaked something in for their architecture:
>>
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/unwind.c
>>
>> so it might not be impossible if we don't require all the bells and
>> whistles of libunwind.
>
>
> Thanks. I didn't notice this code.
>
>>> That is one of reasons that I wanted to implement my own analyzer.
>>
>>
>> I still don't understand why you can't use fstack-usage. Can you please
>> tell me why that doesn't work? Am I missing something?
>
>
> I don't know how gcc calculates the usage here, but I guess it would be more
> robust than my analyzer.

-fstack-usage does not work when there are VLAs or alloca's.  So there
is no way to figure that part out without analysis of the actual
assembly code.
I still think dwarf unwind info is the way forward.

Thanks,
Andrew


>
> The issues, that come up to my mind, are
> - -fstack-usage generates a separate output file, *.su and so we have to
>   manage them to be incorporated in the kernel binary.
>   This implies that (common) kernel makefiles might have to be a bit
> changed.
> - more worse, what if kernel module case? We will have no way to let the
> kernel
>   know the stack usage without adding an extra step at loading.
>
> If we need to put some information about stack usage in the kernel, that
> should
> not be much different from dwarf frame data (.eh_frame).
>
> -Takahiro AKASHI
>
>
>> Will
>>
>