On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zad...@naturalbridge.com> wrote: > > On 11/04/2012 11:54 AM, Richard Biener wrote: >> >> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford >> <rdsandif...@googlemail.com> wrote: >>> >>> Kenneth Zadeck <zad...@naturalbridge.com> writes: >>>> >>>> I would like you to respond to at least point 1 of this email. In it >>>> there is code from the rtl level that was written twice, once for the >>>> case when the size of the mode is less than the size of a HWI and once >>>> for the case where the size of the mode is less that 2 HWIs. >>>> >>>> my patch changes this to one instance of the code that works no matter >>>> how large the data passed to it is. >>>> >>>> you have made a specific requirement for wide int to be a template that >>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I >>>> would like to know how this particular fragment is to be rewritten in >>>> this model? It seems that I would have to retain the structure where >>>> there is one version of the code for each size that the template is >>>> instantiated. >>> >>> I think richi's argument was that wide_int should be split into two. >>> There should be a "bare-metal" class that just has a length and HWIs, >>> and the main wide_int class should be an extension on top of that >>> that does things to a bit precision instead. Presumably with some >>> template magic so that the length (number of HWIs) is a constant for: >>> >>> typedef foo<2> double_int; >>> >>> and a variable for wide_int (because in wide_int the length would be >>> the number of significant HWIs rather than the size of the underlying >>> array). wide_int would also record the precision and apply it after >>> the full HWI operation. >>> >>> So the wide_int class would still provide "as wide as we need" >>> arithmetic, >>> as in your rtl patch. I don't think he was objecting to that. >> >> That summarizes one part of my complaints / suggestions correctly. In >> other >> mails I suggested to not make it a template but a constant over object >> lifetime >> 'bitsize' (or maxlen) field. Both suggestions likely require more thought >> than >> I put into them. The main reason is that with C++ you can abstract from >> where >> wide-int information pieces are stored and thus use the arithmetic / >> operation >> workers without copying the (source) "wide-int" objects. Thus you should >> be able to write adaptors for double-int storage, tree or RTX storage. > > We had considered something along these lines and rejected it. I am not > really opposed to doing something like this, but it is not an obvious > winning idea and is likely not to be a good idea. Here was our thought > process: > > if you abstract away the storage inside a wide int, then you should be able > to copy a pointer to the block of data from either the rtl level integer > constant or the tree level one into the wide int. It is certainly true > that making a wide_int from one of these is an extremely common operation > and doing this would avoid those copies. > > However, this causes two problems: > 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make > the object. it created the base object and then it allocated the array. > Richard S noticed that we could just allocate one CONST_WIDE_INT that had > the array in it. Doing it this way saves one ggc allocation and one > indirection when accessing the data within the CONST_WIDE_INT. Our plan is > to use the same trick at the tree level. So to avoid the copying, you seem > to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
I did not propose having a pointer to the data in the RTX or tree int. Just the short-lived wide-ints (which are on the stack) would have a pointer to the data - which can then obviously point into the RTX and tree data. > 2) You are now stuck either ggcing the storage inside a wide_int when they > are created as part of an expression or you have to play some game to > represent the two different storage plans inside of wide_int. Hm? wide-ints are short-lived and thus never live across a garbage collection point. We create non-GCed objects pointing to GCed objects all the time and everywhere this way. > Clearly this > is where you think that we should be going by suggesting that we abstract > away the internal storage. However, this comes at a price: what is > currently an array access in my patches would (i believe) become a function > call. No, the workers (that perform the array accesses) will simply get a pointer to the first data element. Then whether it's embedded or external is of no interest to them. > From a performance point of view, i believe that this is a non > starter. If you can figure out how to design this so that it is not a > function call, i would consider this a viable option. > > On the other side of this you are clearly correct that we are copying the > data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs. But > this is why we represent data inside of the wide_ints, the INT_CSTs and the > CONST_WIDE_INTs in a compressed form. Even with very big types, which are > generally rare, the constants them selves are very small. So the copy > operation is a loop that almost always copies one element, even with > tree-vrp which doubles the sizes of every type. > > There is the third option which is that the storage inside the wide int is > just ggced storage. We rejected this because of the functional nature of > wide-ints. There are zillions created, they can be stack allocated, and > they last for very short periods of time. Of course - GCing wide-ints is a non-starter. > >>> As is probably obvious, I don't agree FWIW. It seems like an unnecessary >>> complication without any clear use. Especially since the number of >> >> Maybe the double_int typedef is without any clear use. Properly >> abstracting from the storage / information providers will save >> compile-time, memory and code though. I don't see that any thought >> was spent on how to avoid excessive copying or dealing with >> long(er)-lived objects and their storage needs. > > I actually disagree. Wide ints can use a bloated amount of storage > because they are designed to be very short lived and very low cost objects > that are stack allocated. For long term storage, there is INT_CST at the > tree level and CONST_WIDE_INT at the rtl level. Those use a very compact > storage model. The copying entailed is only a small part of the overall > performance. Well, but both trees and RTXen are not viable for short-lived things because the are GCed! double-ints were suitable for this kind of stuff because the also have a moderate size. With wide-ints size becomes a problem (or GC, if you instead use trees or RTXen). > Everything that you are suggesting along these lines is adding to the weight > of a wide-int object. On the contrary - it lessens their weight (with external already existing storage) or does not do anything to it (with the embedded storage). > You have to understand there will be many more > wide-ints created in a normal compilation than were ever created with > double-int. This is because the rtl level had no object like this at all > and at the tree level, many of the places that should have used double int, > short cut the code and only did the transformations if the types fit in a > HWI. Your argument shows that the copy-in/out from tree/RTX to/from wide-int will become a very frequent operation and thus it is worth optimizing it. > This is why we are extremely defensive about this issue. We really did > think a lot about it. I'm sure you did. Richard.