On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zad...@naturalbridge.com> wrote:
>
> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>
>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>> <rdsandif...@googlemail.com> wrote:
>>>
>>> Kenneth Zadeck <zad...@naturalbridge.com> writes:
>>>>
>>>> I would like you to respond to at least point 1 of this email.   In it
>>>> there is code from the rtl level that was written twice, once for the
>>>> case when the size of the mode is less than the size of a HWI and once
>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>
>>>> my patch changes this to one instance of the code that works no matter
>>>> how large the data passed to it is.
>>>>
>>>> you have made a specific requirement for wide int to be a template that
>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
>>>> would like to know how this particular fragment is to be rewritten in
>>>> this model?   It seems that I would have to retain the structure where
>>>> there is one version of the code for each size that the template is
>>>> instantiated.
>>>
>>> I think richi's argument was that wide_int should be split into two.
>>> There should be a "bare-metal" class that just has a length and HWIs,
>>> and the main wide_int class should be an extension on top of that
>>> that does things to a bit precision instead.  Presumably with some
>>> template magic so that the length (number of HWIs) is a constant for:
>>>
>>>    typedef foo<2> double_int;
>>>
>>> and a variable for wide_int (because in wide_int the length would be
>>> the number of significant HWIs rather than the size of the underlying
>>> array).  wide_int would also record the precision and apply it after
>>> the full HWI operation.
>>>
>>> So the wide_int class would still provide "as wide as we need"
>>> arithmetic,
>>> as in your rtl patch.  I don't think he was objecting to that.
>>
>> That summarizes one part of my complaints / suggestions correctly.  In
>> other
>> mails I suggested to not make it a template but a constant over object
>> lifetime
>> 'bitsize' (or maxlen) field.  Both suggestions likely require more thought
>> than
>> I put into them.  The main reason is that with C++ you can abstract from
>> where
>> wide-int information pieces are stored and thus use the arithmetic /
>> operation
>> workers without copying the (source) "wide-int" objects.  Thus you should
>> be able to write adaptors for double-int storage, tree or RTX storage.
>
> We had considered something along these lines and rejected it.   I am not
> really opposed to doing something like this, but it is not an obvious
> winning idea and is likely not to be a good idea.   Here was our thought
> process:
>
> if you abstract away the storage inside a wide int, then you should be able
> to copy a pointer to the block of data from either the rtl level integer
> constant or the tree level one into the wide int.   It is certainly true
> that making a wide_int from one of these is an extremely common operation
> and doing this would avoid those copies.
>
> However, this causes two problems:
> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make
> the object.   it created the base object and then it allocated the array.
> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
> the array in it.   Doing it this way saves one ggc allocation and one
> indirection when accessing the data within the CONST_WIDE_INT.   Our plan is
> to use the same trick at the tree level.   So to avoid the copying, you seem
> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.

I did not propose having a pointer to the data in the RTX or tree int.  Just
the short-lived wide-ints (which are on the stack) would have a pointer to
the data - which can then obviously point into the RTX and tree data.

> 2) You are now stuck either ggcing the storage inside a wide_int when they
> are created as part of an expression or you have to play some game to
> represent the two different storage plans inside of wide_int.

Hm?  wide-ints are short-lived and thus never live across a garbage collection
point.  We create non-GCed objects pointing to GCed objects all the time
and everywhere this way.

>   Clearly this
> is where you think that we should be going by suggesting that we abstract
> away the internal storage.   However, this comes at a price:   what is
> currently an array access in my patches would (i believe) become a function
> call.

No, the workers (that perform the array accesses) will simply get
a pointer to the first data element.  Then whether it's embedded or
external is of no interest to them.

>  From a performance point of view, i believe that this is a non
> starter. If you can figure out how to design this so that it is not a
> function call, i would consider this a viable option.
>
> On the other side of this you are clearly correct that we are copying the
> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.    But
> this is why we represent data inside of the wide_ints, the INT_CSTs and the
> CONST_WIDE_INTs in a compressed form.   Even with very big types, which are
> generally rare, the constants them selves are very small.   So the copy
> operation is a loop that almost always copies one element, even with
> tree-vrp which doubles the sizes of every type.
>
> There is the third option which is that the storage inside the wide int is
> just ggced storage.  We rejected this because of the functional nature of
> wide-ints.    There are zillions created, they can be stack allocated, and
> they last for very short periods of time.

Of course - GCing wide-ints is a non-starter.

>
>>> As is probably obvious, I don't agree FWIW.  It seems like an unnecessary
>>> complication without any clear use.  Especially since the number of
>>
>> Maybe the double_int typedef is without any clear use.  Properly
>> abstracting from the storage / information providers will save
>> compile-time, memory and code though.  I don't see that any thought
>> was spent on how to avoid excessive copying or dealing with
>> long(er)-lived objects and their storage needs.
>
> I actually disagree.    Wide ints can use a bloated amount of storage
> because they are designed to be very short lived and very low cost objects
> that are stack allocated.   For long term storage, there is INT_CST at the
> tree level and CONST_WIDE_INT at the rtl level.  Those use a very compact
> storage model.   The copying entailed is only a small part of the overall
> performance.

Well, but both trees and RTXen are not viable for short-lived things because
the are GCed!  double-ints were suitable for this kind of stuff because
the also have a moderate size.  With wide-ints size becomes a problem
(or GC, if you instead use trees or RTXen).

> Everything that you are suggesting along these lines is adding to the weight
> of a wide-int object.

On the contrary - it lessens their weight (with external already
existing storage)
or does not do anything to it (with the embedded storage).

>  You have to understand there will be many more
> wide-ints created in a normal compilation than were ever created with
> double-int.    This is because the rtl level had no object like this at all
> and at the tree level, many of the places that should have used double int,
> short cut the code and only did the transformations if the types fit in a
> HWI.

Your argument shows that the copy-in/out from tree/RTX to/from wide-int
will become a very frequent operation and thus it is worth optimizing it.

> This is why we are extremely defensive about this issue.   We really did
> think a lot about it.

I'm sure you did.

Richard.

Reply via email to