Re: Prototype implementation: Improving effectiveness and generality of auto-vectorization
On Fri, Jan 8, 2016 at 5:11 PM, Alan Lawrence wrote: > On Tues, Oct 27, 2015 at 2:39 PM, Richard Biener > wrote: >> >> On Mon, Oct 26, 2015 at 6:59 AM, sameera >> wrote: >>> >>> >>> Richard, we have defined the input language for convenience in prototype >>> implementation. However, we will be using GIMPLE as our IR. As per >>> grammar >>> of our tree, p-tree denote the permute order associated with the >>> statement, >>> whereas c-tree is actually the GIMPLE instruction, which performs compute >>> operation. I tried looking at structures used in SLP, however they can >>> not >>> be used as they are, as main difference between current SLP >>> implementation >>> in GCC versus our representation is that, permute order in SLP is part of >>> the tree node in current GCC, whereas in our representation permute order >>> is >>> represented as independent tree-node. Hence, I have created new tree >>> structure for our pass, which will create p-tree nodes for permute order, >>> and c-tree node which points to appropriate gimple statement. >> >> >> Yes, that's the whole purpose - get the vectorizer (and SLP) a better >> data structure >> which is explicit about permutes. > > I somewhat miss an idea on how to deal with irregular SLP cases - that is, does the AST model each (current) SLP node as a single statement or does it model individual vector elements (so you can insert no-op compensation code to make the operations match). Consider a[i] = b[i] * 3; a[i+1] = b[i+1]; which can be vectorized with SLP if you realize you can multiply b[i+1] by 1. >>> >>> >>> >>> The pass structure we are having for our optimization is as follows: >>> - New pass target-aware-loop-vect with following subpasses: >>> - Loop structure identification >>> - Restricted loop construction >>> - Loop analysis (6 phases of target-aware reordering-instruction >>> selection >>> algorithm) >>> - Loop transformation (Cost aware gimple code generation) >>> We might need some optimizations to be performed on loop body before we >>> start our optimization for reordering instruction selection, so that >>> input >>> program can be transformed to fit into our restricted loop structure. >>> These >>> transformations can be performed by restricted loop construction subpass. >>> Eg.: multi-dimentional array, nested loops, reduction chains etc. can be >>> supported by transforming input loop. The case that you have mentioned, >>> can >>> be transformed at this level. > > As of implementing the whole thing in GCC I recently spent some more time thinking and came to the conclusion that the path to the fastest improvements in GCC code-gen would be to build the new AST after the current analysis phase finished, do the optimization and drive code-generation off the new AST. Thus keep things like data-ref and dependence analysis as is, as well as group analysis and SLP tree construction. Build the AST from the vectorizer "IL" at the point vect_transform_loop / vect_slp_transform_bb is called. >>> >>> >>> >>> Current pass structure that we have designed, does exactly that. The only >>> difference is that, we are planning to use the transform phase as a >>> co-process of our transform pass, than reconstruct an AST from our >>> representation to pass to vect_transform_loop. >>> More and more of the rest of the vectorizer code can then be "slowly" moved to work on the AST and AST construction be moved earlier. > > > So I think an incremental approach is much more likely to get us some > benefit, and I'm wondering how much of the benefit here stems from which > bit, and how many of these changes can be broken off. > > For example, making permutes into explicit operations on the SLP tree, and > adding a pass that tries to push them up/down the tree into each other, > might be done on the existing codebase, and would be a significant win in > itself (bringing some/much(?) of improvements in interleaving code of your > proposal). > > I'm not really sure how much of the benefit stems from what, though - e.g. > does reducing vector length down to fit the target machine only after the > other steps, bring any benefits besides raising abstraction? (Which > certainly can be a benefit!). AFAICT this could be done independently of > other changes? > > Another limitation of the present SLP structure is it's not understanding > sharing (until later CSE, hence costing wrongly); it would be good to fix > this (making the SLP tree into a DAG, and potentially reusing subtrees by > adding permutes). This is a step we'd like to take anyway, but might tie in > with your "Bottom-up commoning of p-node subtrees". And would help with (BB > as well as loop) cost calculations. > > Are there any really fundamental differences that we cannot overcome in > steps? ( / are there any steps we shouldn't take in isolation, because we'd > hav
Re: __builtin_memcpy and alignment assumptions
On Fri, Jan 8, 2016 at 7:36 PM, Steve Ellcey wrote: > On Fri, 2016-01-08 at 12:56 +0100, Richard Biener wrote: >> On Fri, Jan 8, 2016 at 12:40 PM, Eric Botcazou >> wrote: >> >> I think we only assume it if the pointer is actually dereferenced, >> >> otherwise >> >> it just breaks too much code in the wild. And while memcpy dereferences, >> >> it dereferences it through a char * cast, and thus only the minimum >> >> alignment is assumed. >> > >> > Yet the compiler was generating the expected code for Steve's testcase on >> > strict-alignment architectures until very recently (GCC 4.5 IIUC) and this >> > worked perfectly. > > Yes, I just checked and I did get the better code in GCC 4.5 and I get > the current slower code in GCC 4.6. > >> Consider >> >> int a[256]; >> int >> main() >> { >> void *p = (char *)a + 1; >> void *q = (char *)a + 5; >> __builtin_memcpy (p, q, 4); >> return 0; >> } >> >> where the ME would be entitled to "drop" the char */void * conversions >> and use &a typed temps. > > I am not sure how this works but I tweaked get_pointer_alignment_1 so > that if there was no align info or if get_ptr_info_alignment returned > false then the routine would return type based alignment information > instead of default 'void *' alignment. In that case and using your > example, GCC still accessed p & q as pointers to unaligned data. > > In fact if I used int pointers: > > int a[256]; > int main() > { > int *p = (int *)((char *)a + 1); > int *q = (int *)((char *)a + 5); > __builtin_memcpy (p, q, 4); > return 0; > } > > GCC did unaligned accesses when optimizing, but when unoptimized (and > with my change) GCC did aligned accesses, which would not work on a > strict alignment machine like MIPS This seems to match what happens > with: > > int a[256]; > int main() > { > int *p = (int *)((char *)a + 1); > int *q = (int *)((char *)a + 5); > *p = *q; > return 0; > } > > When I optimize it, GCC does unaligned accesses and when unoptimized > GCC does aligned accesses which will not work on MIPS. Of course reading the fine-print in the C standard makes these testcases undefined (you use a int * type for not properly aligned pointers, -Wcast-align should warn about this). Btw, your change to get_pointer_alignment would basically boil down to doing get_object_alignment. Richard.
Re: ipa vrp implementation in gcc
On Mon, Jan 11, 2016 at 1:38 AM, Kugan wrote: > Hi All, > > I am looking at implementing a ipa vrp pass. Jan Hubicka also talks > about this in 2013 GNU Cauldron as one of the optimization he would like > to see in gcc. So my question is, is any one implementing it. If not we > would like to do that. > > I also looked at the ipa-cp implementation to see how this can be done. > Going by this, one of the way to implement this is (skipping all the > details): > > - Have an early tree-vrp so that we can have value ranges for parameters > at call sites. I'd rather use the IPA analysis phase for this and use a VRP algorithm that doesn't require ASSERT_EXPR insertion. > - Create jump functions that captures the value ranges of call sites > propagate the value ranges. In 2013 talk, Jan Hubicka talks about > > - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to > have its own and much more simpler implementation ? No idea. > - Once we have the value ranges for parameter/return values, we could > rely on tree-vrp to use this and do the optimizations Yep. IPA transform phase should annotate parameter default defs with computed ranges. > Does this make any sense? Any thoughts/suggestions to work on this is > highly appreciated. IPA alignment propagation should already be somewhat similar as in doing an intersection step during propagation. Richard. > Thanks, > Kugan
Re: Some real-life feedback on -Wmisleading-indentation
On Mon, 2016-01-11 at 15:20 +0800, Gerald Pfeifer wrote: > Compiling Wine with GCC trunk (to become GCC 6) I noticed four > dozen of warnings triggered by -Wmisleading-indentation. > > Some are simply weird formatting, some may be indicative of > real issues -- and I have started to look into them one by > one and submitting patches (to Wine). > > However, there is a pattern I noticed which, while a bit weird > in terms of formatting, does not really strike me as something > -Wmisleading-indentation should warn about: > > VariantInit(&ret); > hr = IWinHttpRequest_Invoke(request, ...); > ok(hr == DISP_E_UNKNOWNINTERFACE, "error %#x\n", hr); > > VariantInit(&ret); > if (0) /* crashes */ > hr = IWinHttpRequest_Invoke(request, ...); > > params.cArgs = 1; > hr = IWinHttpRequest_Invoke(request, ...); > ok(hr == DISP_E_TYPEMISMATCH, "error %#x\n", hr); > > VariantInit(&arg[2]); > > This is from the Wine testsuite, and the if (0) in colum one guards > one invication of the function under test that would crash (so is > the equivalent of #if 0...#endif, except that it avoids conditional > compilation). > > Is this a bit unusual? Definitely? Does it look like a one of > those cases a programmer would actually be tempted to misread? > I don't think so. > > What do you think? How often does this pattern occur in Wine? In Chapter 31 of "Code Complete" (I'm looking at the 2nd edition), McConnell discusses the use of whitespace in code layout; he talks about both blank lines and indentation as being useful for providing hints to the human reader about the structure of the code. In several places he talks about the use of blank lines for separating groups of related statements into "paragraphs", that blank lines in code can and should be used to demarcate logical "blocks" of related statements (e.g. pp747-8 of 2nd edition). I think that in the Wine example above the blank lines do effectively create "paragraphs" of code to a human, and I agree that a human reader is unlikely to think of the if (0) as guarding the params.cArgs = 1; since they're in different "paragraphs". I apologize if I'm belaboring the point here, but I think that -Wmisleading-indentation should make use of blank lines of code. I've posted patches to do so here, firstly here: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html which was rejected thusly by Jeff: > I would argue that each of these does represent misleading > indentation and that the warning is warranted for each. > Perhaps they aren't as bad as prior cases, but I'd still > consider them mis-leading. (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03242.html) and Bernd: > I think you could (barely) make a case for the last one being > ok, but I disagree fairly strongly about the other two cases - > they are clearly misindented, so why wouldn't we warn? (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03253.html) I later posted: "[PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall" https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01011.html which was rejected by Bernd: > I personally see no use for the blank line heuristic, in fact > I think it is misguided. To my eyes a warning is clearly > warranted for the examples in this testcase. Do we actually > have any users calling for this heuristic? (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01015.html) > That's a fairly low false-positive rate if we even want to > call it that. I'd fix the three examples and leave the warning > code as is. We can revisit the decision if we get a flood of > bug reports. https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01040.html and Pedro noted: > IMHO, the problem with the levels idea will be if/when later > you come up with some other orthogonal heuristic to catch some > other class of indentation problem, and users want to enable > it but not the blank-lines heuristic, or vice-versa. > Also, levels don't allow finer-selection > with "-Werror -Wno-error=misleading-indentation", IIUC. (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01036.html) In the light of Gerald's report on Wine, can we revisit this please, and perhaps use the blank line heuristic for -Wall? I'd like to keep the levels idea, and to simply have something this in the docs: "The warning can optionally accept a level (either 1 or 2), for specifying increasing levels of strictness. @option{-Wmisleading-indentation=1} is enabled by @option{-Wall} in C and C++ and uses a set of heuristics intended to provide useful results on typical code. @option{-Wmisleading-indentation=2} is stricter." or somesuch. There may be an argument that -Wmisleading-indentation should be renamed to -Wmisleading-whitespace, since indentation is a subset of whitespace. I don't know how I feel about that. Hope this is constructive Dave
Re: ipa vrp implementation in gcc
> On Mon, Jan 11, 2016 at 4:07 PM, Richard Biener > wrote: >> >> On Mon, Jan 11, 2016 at 1:38 AM, Kugan >> wrote: >> > Hi All, >> > >> > I am looking at implementing a ipa vrp pass. Jan Hubicka also talks >> > about this in 2013 GNU Cauldron as one of the optimization he would like >> > to see in gcc. So my question is, is any one implementing it. If not we >> > would like to do that. >> > > > Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal for > his work and it is very similar to extending ipa-cp pass. I am also in touch > with Jan Hubicka. These comments will certainly help me but if is urgent for > any one you can begin work on this. Jan has shown interest to mentor me for > this project but any help from community is always appreciated. > >> >> > I also looked at the ipa-cp implementation to see how this can be done. >> > Going by this, one of the way to implement this is (skipping all the >> > details): >> > >> > - Have an early tree-vrp so that we can have value ranges for parameters >> > at call sites. > > Actually a tree-vrp pass already exists. But as Jan has suggested me that > ipa-vrp implementation should not be too much costly. So I am also thinking > to include this work in my proposal and also using the analysis to improve > LTO heuristics as the project duration will be around 2.5 months. >> >> >> I'd rather use the IPA analysis phase for this and use a VRP algorithm >> that doesn't require ASSERT_EXPR insertion. >> >> > - Create jump functions that captures the value ranges of call sites >> > propagate the value ranges. In 2013 talk, Jan Hubicka talks about >> > >> > - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to >> > have its own and much more simpler implementation ? >> >> No idea. >> >> > - Once we have the value ranges for parameter/return values, we could >> > rely on tree-vrp to use this and do the optimizations >> >> Yep. IPA transform phase should annotate parameter default defs with >> computed ranges. >> >> > Does this make any sense? Any thoughts/suggestions to work on this is >> > highly appreciated. >> >> IPA alignment propagation should already be somewhat similar as in doing >> an intersection step during propagation. >> >> Richard. >> >> > Thanks, >> > Kugan > > Your comments certainly helps me to develop my proposal. Please let me know > any updated to avoid the confusion and duplication of work. > Sincerely, > Vivek
Re: ipa vrp implementation in gcc
> On Mon, Jan 11, 2016 at 1:38 AM, Kugan > wrote: > > Hi All, > > > > I am looking at implementing a ipa vrp pass. Jan Hubicka also talks > > about this in 2013 GNU Cauldron as one of the optimization he would like > > to see in gcc. So my question is, is any one implementing it. If not we > > would like to do that. > > > > I also looked at the ipa-cp implementation to see how this can be done. > > Going by this, one of the way to implement this is (skipping all the > > details): > > > > - Have an early tree-vrp so that we can have value ranges for parameters > > at call sites. > > I'd rather use the IPA analysis phase for this and use a VRP algorithm > that doesn't require ASSERT_EXPR insertion. Another potential use of value ranges is the profile estimation. http://www.lighterra.com/papers/valuerangeprop/Patterson1995-ValueRangeProp.pdf It seems to me that we may want to have something that can feed sane loop bounds for profile estimation as well and we can easily store the known value ranges to SSA name annotations. So I think separate local pass to compute value ranges (perhaps with less accuracy than full blown VRP) is desirable. > > > - Create jump functions that captures the value ranges of call sites > > propagate the value ranges. In 2013 talk, Jan Hubicka talks about > > > > - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to > > have its own and much more simpler implementation ? > > No idea. I think the ipa-prop.c probably won't need any siginificant changes. The code basically analyze what values are passed thorugh the function and this works for constants as well as for intervals. In fact ipa-cp already uses the same ipa-prop analysis for 1) constant propagation 2) alignment propagation 3) propagation of known polymorphic call contextes. So replacing 1) by value range propagation should be easily doable. I would also like to replace alignment propagation by bitwise constant propagation (i.e. propagating what bits are known to be zero and what bits are known to be one). We already do have bitwise CCP, so we could deal with this basically in the same way as we deal with value ranges. ipa-prop could use bit of clenaing up and modularizing that I hope will be done next stage1 :) > > > - Once we have the value ranges for parameter/return values, we could > > rely on tree-vrp to use this and do the optimizations > > Yep. IPA transform phase should annotate parameter default defs with > computed ranges. Yep, in addition we will end up with known value ranges stored in aggregates for that we need better separate representaiton See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68930 > > > Does this make any sense? Any thoughts/suggestions to work on this is > > highly appreciated. > > IPA alignment propagation should already be somewhat similar as in doing > an intersection step during propagation. The merge operation is same as VRP does, but yep, IPA alignment code is good one to follow for start. Martin implemented it in a bit limited way so it never performs cloning, but we can deal with that incrementally. > > Richard. > > > Thanks, > > Kugan
Re: ivopts vs. garbage collection
Hi, On Fri, 8 Jan 2016, Richard Biener wrote: > > The only solution here is for ivopts to keep a pointer to the array, > > not a pointer to some location near, but outside of the array. > > Yes, the solution is to make IVOPTs not do this (eventually controlled > by a parameter because clearly it thinks doing this is beneficial > cost-wise). Well, that's a hack. A solution is to design something that works generally for garbage collected languages with such requirements instead of arbitrarily limiting transformations here and there. It could be something like the notion of derived pointers, where the base pointer needs to stay alive as long as the derived pointers are. All potential GC points where a derived pointer is alive also needs to have the base pointer be alive (they could explicitely add uses of the base pointers, or alternatively anthing computing liveness needs to deal with this). For normal transformation that don't deal with explicit liveness sets (i.e. most of our SSA transforms) it's enough if the instruction creating the derived pointer from the base can't be looked through, i.e. is an optimization barrier. Ciao, Michael.
Re: Some real-life feedback on -Wmisleading-indentation
On 11/01/16 07:20, Gerald Pfeifer wrote: This is from the Wine testsuite, and the if (0) in colum one guards one invication of the function under test that would crash (so is the equivalent of #if 0...#endif, except that it avoids conditional compilation). Perhaps a good heuristic is to disable the warning if the 'if' starts at the first column. Surely, such code is "special"! About the blank line heuristic, either include it by default or not at all. I really dislike the idea of having several levels: it just shows that we could not decide for robust heuristics or there are limitations in the current implementation that no one is willing to fix (talking here about other warnings with numerical levels). What goes in each level becomes arbitrary, and non-default levels are rarely used. Cheers, Manuel.
Re: ivopts vs. garbage collection
On Mon, Jan 11, 2016 at 10:00 AM, Michael Matz wrote: > > On Fri, 8 Jan 2016, Richard Biener wrote: > >> > The only solution here is for ivopts to keep a pointer to the array, >> > not a pointer to some location near, but outside of the array. >> >> Yes, the solution is to make IVOPTs not do this (eventually controlled >> by a parameter because clearly it thinks doing this is beneficial >> cost-wise). > > Well, that's a hack. A solution is to design something that works > generally for garbage collected languages with such requirements instead > of arbitrarily limiting transformations here and there. It could be > something like the notion of derived pointers, where the base pointer > needs to stay alive as long as the derived pointers are. All potential GC > points where a derived pointer is alive also needs to have the base > pointer be alive (they could explicitely add uses of the base pointers, or > alternatively anthing computing liveness needs to deal with this). For > normal transformation that don't deal with explicit liveness sets (i.e. > most of our SSA transforms) it's enough if the instruction creating the > derived pointer from the base can't be looked through, i.e. is an > optimization barrier. What do you suggest we do for GCC 6? Your suggestion of every derived pointer keeping the base pointer alive sounds too strong to me; it certainly sounds too strong for Go. For Go it should be sufficient to ensure that every derived pointer points within the bounds of the object. Only when a derived pointer points outside of the object (including just after the end of the object) is it necessary to explicitly keep a base pointer alive. It's also not obvious to me that making a pointer transformation into an optimization barrier would be a win overall. For something like ivopts it seems better to simply not introduce the pointer transformation--to apply ivopts only to non-pointers when GC matters (GC matters not only for Go and Java, but also for any C/C++ program using something like the Boehm collector). (I'm still puzzled by the fact that Java has apparently not encountered this problem in all these years.) Ian
Re: How to determine source location of pragma?
On 11/01/16 01:08, Vanush Vaswani wrote: I am new to GCC internals. I'm trying to create a plugin to operate on pragmas. Currently have this working using c_register_pragma with a callback. The callback performs pragma_lex and is able to retrieve the string token of the pragma based on this example. https://github.com/wh5a/gcc-plugin/blob/master/pragma_plugin.c How can I determine the line number of the pragma? I tried DECL_SOURCE_LINE on the tree type but this returns 0. Do I have to interact with cpplib? pragma_lex in GCC 6 has an optional &loc parameter. See handle_pragma_diagnostic() in c-pragma.c to see how it is used. Cheers, Manuel.
Re: ivopts vs. garbage collection
Hi, On Mon, 11 Jan 2016, Ian Lance Taylor wrote: > > Well, that's a hack. A solution is to design something that works > > generally for garbage collected languages with such requirements > > instead of arbitrarily limiting transformations here and there. It > > could be something like the notion of derived pointers, where the base > > pointer needs to stay alive as long as the derived pointers are. All > > potential GC points where a derived pointer is alive also needs to > > have the base pointer be alive (they could explicitely add uses of the > > base pointers, or alternatively anthing computing liveness needs to > > deal with this). For normal transformation that don't deal with > > explicit liveness sets (i.e. most of our SSA transforms) it's enough > > if the instruction creating the derived pointer from the base can't be > > looked through, i.e. is an optimization barrier. > > What do you suggest we do for GCC 6? Realistically? The hack of course. > Your suggestion of every derived pointer keeping the base pointer alive > sounds too strong to me; it certainly sounds too strong for Go. For Go > it should be sufficient to ensure that every derived pointer points > within the bounds of the object. Okay, that's a certain type of GC. Others might need exact offset-zero base pointers. > It's also not obvious to me that making a pointer transformation into > an optimization barrier would be a win overall. It will almost always be a pessimization (the additional live value needs at least a place on the stack). > For something like > ivopts it seems better to simply not introduce the pointer > transformation--to apply ivopts only to non-pointers when GC matters Of course that deals only with ivopts. In practice that might be enough, even for a long time, but it's not really a full solution, there are other transformations on pointers (e.g. the vectorizer fiddling with alignment), some of them creating out of object pointers (e.g. chopping off the lowest few bits). Another problem is to define "when GC matters". With LTO you can't rely on anything from the frontend, so it needs to be encoded in the IL, or at the very least in a per-function item, with corresponding avoidance of inlining of GC into non-GC aware functions. > (I'm still puzzled by the fact that Java has apparently not encountered > this problem in all these years.) Probably it just so happened that the base pointer for some derived pointer was lying around in some memory location. It's not very likely that the only reference is in a register, it can happen only shortly after allocating the new block, but before it's actually stored into some other structure (or freed, i.e. when it's only temporary, but then the base pointer would be live as well), perhaps this pattern doesn't happen very often in java code. Obviously it does in Go. Perhaps we can limit the ivopts hack also to pointers that are problematic. Only if the base pointer comes from a new allocation (is not loaded from memory) and isn't stored into memory before use do we need to avoid manipulating it too much. Ciao, Michael.
Re: ivopts vs. garbage collection
On January 11, 2016 8:35:25 PM GMT+01:00, Michael Matz wrote: >Hi, > >On Mon, 11 Jan 2016, Ian Lance Taylor wrote: > >> > Well, that's a hack. A solution is to design something that works >> > generally for garbage collected languages with such requirements >> > instead of arbitrarily limiting transformations here and there. It > >> > could be something like the notion of derived pointers, where the >base >> > pointer needs to stay alive as long as the derived pointers are. >All >> > potential GC points where a derived pointer is alive also needs to >> > have the base pointer be alive (they could explicitely add uses of >the >> > base pointers, or alternatively anthing computing liveness needs to > >> > deal with this). For normal transformation that don't deal with >> > explicit liveness sets (i.e. most of our SSA transforms) it's >enough >> > if the instruction creating the derived pointer from the base can't >be >> > looked through, i.e. is an optimization barrier. >> >> What do you suggest we do for GCC 6? > >Realistically? The hack of course. > >> Your suggestion of every derived pointer keeping the base pointer >alive >> sounds too strong to me; it certainly sounds too strong for Go. For >Go >> it should be sufficient to ensure that every derived pointer points >> within the bounds of the object. > >Okay, that's a certain type of GC. Others might need exact offset-zero > >base pointers. > >> It's also not obvious to me that making a pointer transformation into >> an optimization barrier would be a win overall. > >It will almost always be a pessimization (the additional live value >needs >at least a place on the stack). > >> For something like >> ivopts it seems better to simply not introduce the pointer >> transformation--to apply ivopts only to non-pointers when GC matters > >Of course that deals only with ivopts. In practice that might be >enough, >even for a long time, but it's not really a full solution, there are >other >transformations on pointers (e.g. the vectorizer fiddling with >alignment), >some of them creating out of object pointers (e.g. chopping off the >lowest >few bits). > >Another problem is to define "when GC matters". With LTO you can't >rely >on anything from the frontend, so it needs to be encoded in the IL, or >at >the very least in a per-function item, with corresponding avoidance of >inlining of GC into non-GC aware functions. Here you can extend the hack to apply said flag to the final link if it is in at least one LTO object. As we do for some already. And yes, avoiding IVOPTs on pointer values is certainly the easiest fix. Richard. >> (I'm still puzzled by the fact that Java has apparently not >encountered >> this problem in all these years.) > >Probably it just so happened that the base pointer for some derived >pointer was lying around in some memory location. It's not very likely > >that the only reference is in a register, it can happen only shortly >after >allocating the new block, but before it's actually stored into some >other >structure (or freed, i.e. when it's only temporary, but then the base >pointer would be live as well), perhaps this pattern doesn't happen >very >often in java code. Obviously it does in Go. Perhaps we can limit the > >ivopts hack also to pointers that are problematic. Only if the base >pointer comes from a new allocation (is not loaded from memory) and >isn't >stored into memory before use do we need to avoid manipulating it too >much. > > >Ciao, >Michael.
Re: ivopts vs. garbage collection
> "Michael" == Michael Matz writes: Michael> Well, that's a hack. A solution is to design something that Michael> works generally for garbage collected languages with such Michael> requirements instead of arbitrarily limiting transformations Michael> here and there. It could be something like the notion of Michael> derived pointers, where the base pointer needs to stay alive as Michael> long as the derived pointers are. This was done once in GCC, for the Modula 3 compiler. There was a paper about it, but I can't find it any more. The basic idea was to emit a description of the stack frame that their GC could read. They had a moving GC that could use this information to rewrite the frame when moving objects. Tom
Re: [Patch] MIPS FDE deletion
Hi Catherine, On Fri, 16 Oct 2015, Moore, Catherine wrote: > MIPS16 call stubs now have .cfi directives. If the linker decides that > one of these call stubs is not required, it will emit incorrect frame > info. This patch suppresses the generation of the frame info by setting > the output section of the stub to ABS. Does it mean PR target/53276 has been fixed now? What was the commit to add .cfi support for the stubs? Maciej
RE: [Patch] MIPS FDE deletion
> -Original Message- > From: Maciej W. Rozycki [mailto:ma...@imgtec.com] > Sent: Monday, January 11, 2016 5:00 PM > To: Moore, Catherine > Cc: binut...@sourceware.org; gcc@gcc.gnu.org; Richard Sandiford > Subject: Re: [Patch] MIPS FDE deletion > > Hi Catherine, > > On Fri, 16 Oct 2015, Moore, Catherine wrote: > > > MIPS16 call stubs now have .cfi directives. If the linker decides > > that one of these call stubs is not required, it will emit incorrect > > frame info. This patch suppresses the generation of the frame info by > > setting the output section of the stub to ABS. I have a slight update to this patch that I need to submit. > > Does it mean PR target/53276 has been fixed now? What was the commit to > add .cfi support for the stubs? > Hi Maciej, I don't know about the status of PR target/53276. The commit to add .cfi support for call stubs was this one: r184379 | rsandifo | 2012-02-19 08:44:54 -0800 (Sun, 19 Feb 2012) | 7 lines gcc/ * config/mips/mips.c (mips16_build_call_stub): Add CFI information to stubs with non-sibling calls. libgcc/ * config/mips/mips16.S (CALL_STUB_RET): Add CFI information. Catherine
Re: Some real-life feedback on -Wmisleading-indentation
On 01/11/2016 07:53 AM, David Malcolm wrote: In Chapter 31 of "Code Complete" (I'm looking at the 2nd edition), McConnell discusses the use of whitespace in code layout; he talks about both blank lines and indentation as being useful for providing hints to the human reader about the structure of the code. In several places he talks about the use of blank lines for separating groups of related statements into "paragraphs", that blank lines in code can and should be used to demarcate logical "blocks" of related statements (e.g. pp747-8 of 2nd edition). I think that in the Wine example above the blank lines do effectively create "paragraphs" of code to a human, and I agree that a human reader is unlikely to think of the if (0) as guarding the params.cArgs = 1; since they're in different "paragraphs". I apologize if I'm belaboring the point here, but I think that -Wmisleading-indentation should make use of blank lines of code. I've posted patches to do so here, firstly here: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html which was rejected thusly by Jeff: I would argue that each of these does represent misleading indentation and that the warning is warranted for each. Perhaps they aren't as bad as prior cases, but I'd still consider them mis-leading. (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03242.html) I still stand by that assessment. ISTM for the wine case the backwards indentation (column-wise) of the IF may be the right filter, maybe that in conjunction with the blank line heuristic. However, I stand by my belief that the blank line heuristic is wrong when used by itself. Jeff
Re: [RFC][AArch64] function prologue analyzer in linux kernel
Will, On 01/09/2016 12:53 AM, Will Deacon wrote: On Fri, Jan 08, 2016 at 02:36:32PM +0900, AKASHI Takahiro wrote: On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote: On 07/01/16 14:22, Will Deacon wrote: On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote: So I'd like to introduce a function prologue analyzer to determine a size allocated by a function's prologue and deduce it from "Depth". My implementation of this analyzer has been submitted to linux-arm-kernel mailing list[1]. I borrowed some ideas from gdb's analyzer[2], especially a loop of instruction decoding as well as stop of decoding at exiting a basic block, but implemented my own simplified one because gdb version seems to do a bit more than what we expect here. Anyhow, since it is somewhat heuristic (and may not be maintainable for a long term), could you review it from a broader viewpoint of toolchain, please? My main issue with this is that we cannot rely on the frame layout generated by the compiler and there's little point in asking for commitment here. Therefore, the heuristics will need updating as and when we identify new frames that we can't handle. That's pretty fragile and puts us on the back foot when faced with newer compilers. This might be sustainable if we don't expect to encounter much variation, but even that would require some sort of "buy-in" from the various toolchain communities. GCC already has an option (-fstack-usage) to determine the stack usage on a per-function basis and produce a report at build time. Why can't we use that to provide the information we need, rather than attempt to compute it at runtime based on your analyser? If -fstack-usage is not sufficient, understanding why might allow us to propose a better option. Can you not use the dwarf frame unwind data? That's always sufficient to recover the CFA (canonical frame address - the value in SP when executing the first instruction in a function). It seems to me it's unlikely you're going to need something that's an exceedingly high performance operation. Thank you for your comment. Yeah, but we need some utility routines to handle unwind data(.debug_frame). In fact, some guy has already attempted to merge (part of) libunwind into the kernel[1], but it was rejected by the kernel community (including Linus if I correctly remember). It seems that they thought the code was still buggy. The ARC guys seem to have sneaked something in for their architecture: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/unwind.c so it might not be impossible if we don't require all the bells and whistles of libunwind. Thanks. I didn't notice this code. That is one of reasons that I wanted to implement my own analyzer. I still don't understand why you can't use fstack-usage. Can you please tell me why that doesn't work? Am I missing something? I don't know how gcc calculates the usage here, but I guess it would be more robust than my analyzer. The issues, that come up to my mind, are - -fstack-usage generates a separate output file, *.su and so we have to manage them to be incorporated in the kernel binary. This implies that (common) kernel makefiles might have to be a bit changed. - more worse, what if kernel module case? We will have no way to let the kernel know the stack usage without adding an extra step at loading. If we need to put some information about stack usage in the kernel, that should not be much different from dwarf frame data (.eh_frame). -Takahiro AKASHI Will
Re: [RFC][AArch64] function prologue analyzer in linux kernel
On Mon, Jan 11, 2016 at 10:11 PM, AKASHI Takahiro wrote: > Will, > > > On 01/09/2016 12:53 AM, Will Deacon wrote: >> >> On Fri, Jan 08, 2016 at 02:36:32PM +0900, AKASHI Takahiro wrote: >>> >>> On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote: On 07/01/16 14:22, Will Deacon wrote: > > On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote: >> >> So I'd like to introduce a function prologue analyzer to determine >> a size allocated by a function's prologue and deduce it from "Depth". >> My implementation of this analyzer has been submitted to >> linux-arm-kernel mailing list[1]. >> I borrowed some ideas from gdb's analyzer[2], especially a loop of >> instruction decoding as well as stop of decoding at exiting a basic >> block, >> but implemented my own simplified one because gdb version seems to do >> a bit more than what we expect here. >> Anyhow, since it is somewhat heuristic (and may not be maintainable >> for >> a long term), could you review it from a broader viewpoint of >> toolchain, >> please? >> > My main issue with this is that we cannot rely on the frame layout > generated by the compiler and there's little point in asking for > commitment here. Therefore, the heuristics will need updating as and > when we identify new frames that we can't handle. That's pretty fragile > and puts us on the back foot when faced with newer compilers. This > might > be sustainable if we don't expect to encounter much variation, but even > that would require some sort of "buy-in" from the various toolchain > communities. > > GCC already has an option (-fstack-usage) to determine the stack usage > on a per-function basis and produce a report at build time. Why can't > we use that to provide the information we need, rather than attempt to > compute it at runtime based on your analyser? > > If -fstack-usage is not sufficient, understanding why might allow us to > propose a better option. Can you not use the dwarf frame unwind data? That's always sufficient to recover the CFA (canonical frame address - the value in SP when executing the first instruction in a function). It seems to me it's unlikely you're going to need something that's an exceedingly high performance operation. >>> >>> >>> Thank you for your comment. >>> Yeah, but we need some utility routines to handle unwind >>> data(.debug_frame). >>> In fact, some guy has already attempted to merge (part of) libunwind into >>> the kernel[1], but it was rejected by the kernel community (including >>> Linus >>> if I correctly remember). It seems that they thought the code was still >>> buggy. >> >> >> The ARC guys seem to have sneaked something in for their architecture: >> >> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/unwind.c >> >> so it might not be impossible if we don't require all the bells and >> whistles of libunwind. > > > Thanks. I didn't notice this code. > >>> That is one of reasons that I wanted to implement my own analyzer. >> >> >> I still don't understand why you can't use fstack-usage. Can you please >> tell me why that doesn't work? Am I missing something? > > > I don't know how gcc calculates the usage here, but I guess it would be more > robust than my analyzer. -fstack-usage does not work when there are VLAs or alloca's. So there is no way to figure that part out without analysis of the actual assembly code. I still think dwarf unwind info is the way forward. Thanks, Andrew > > The issues, that come up to my mind, are > - -fstack-usage generates a separate output file, *.su and so we have to > manage them to be incorporated in the kernel binary. > This implies that (common) kernel makefiles might have to be a bit > changed. > - more worse, what if kernel module case? We will have no way to let the > kernel > know the stack usage without adding an extra step at loading. > > If we need to put some information about stack usage in the kernel, that > should > not be much different from dwarf frame data (.eh_frame). > > -Takahiro AKASHI > > >> Will >> >