On 11/26/2012 10:03 AM, Richard Biener wrote:
On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <zad...@naturalbridge.com> wrote:
On 11/04/2012 11:54 AM, Richard Biener wrote:
On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
<rdsandif...@googlemail.com> wrote:
Kenneth Zadeck <zad...@naturalbridge.com> writes:
I would like you to respond to at least point 1 of this email. In it
there is code from the rtl level that was written twice, once for the
case when the size of the mode is less than the size of a HWI and once
for the case where the size of the mode is less that 2 HWIs.
my patch changes this to one instance of the code that works no matter
how large the data passed to it is.
you have made a specific requirement for wide int to be a template that
can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I
would like to know how this particular fragment is to be rewritten in
this model? It seems that I would have to retain the structure where
there is one version of the code for each size that the template is
instantiated.
I think richi's argument was that wide_int should be split into two.
There should be a "bare-metal" class that just has a length and HWIs,
and the main wide_int class should be an extension on top of that
that does things to a bit precision instead. Presumably with some
template magic so that the length (number of HWIs) is a constant for:
typedef foo<2> double_int;
and a variable for wide_int (because in wide_int the length would be
the number of significant HWIs rather than the size of the underlying
array). wide_int would also record the precision and apply it after
the full HWI operation.
So the wide_int class would still provide "as wide as we need"
arithmetic,
as in your rtl patch. I don't think he was objecting to that.
That summarizes one part of my complaints / suggestions correctly. In
other
mails I suggested to not make it a template but a constant over object
lifetime
'bitsize' (or maxlen) field. Both suggestions likely require more thought
than
I put into them. The main reason is that with C++ you can abstract from
where
wide-int information pieces are stored and thus use the arithmetic /
operation
workers without copying the (source) "wide-int" objects. Thus you should
be able to write adaptors for double-int storage, tree or RTX storage.
We had considered something along these lines and rejected it. I am not
really opposed to doing something like this, but it is not an obvious
winning idea and is likely not to be a good idea. Here was our thought
process:
if you abstract away the storage inside a wide int, then you should be able
to copy a pointer to the block of data from either the rtl level integer
constant or the tree level one into the wide int. It is certainly true
that making a wide_int from one of these is an extremely common operation
and doing this would avoid those copies.
However, this causes two problems:
1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make
the object. it created the base object and then it allocated the array.
Richard S noticed that we could just allocate one CONST_WIDE_INT that had
the array in it. Doing it this way saves one ggc allocation and one
indirection when accessing the data within the CONST_WIDE_INT. Our plan is
to use the same trick at the tree level. So to avoid the copying, you seem
to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
I did not propose having a pointer to the data in the RTX or tree int. Just
the short-lived wide-ints (which are on the stack) would have a pointer to
the data - which can then obviously point into the RTX and tree data.
There is the issue then what if some wide-ints are not short lived. It
makes me nervous to create internal pointers to gc ed memory.
2) You are now stuck either ggcing the storage inside a wide_int when they
are created as part of an expression or you have to play some game to
represent the two different storage plans inside of wide_int.
Hm? wide-ints are short-lived and thus never live across a garbage collection
point. We create non-GCed objects pointing to GCed objects all the time
and everywhere this way.
Again, this makes me nervous but it could be done. However, it does
mean that now the wide ints that are not created from rtxes or trees
will be more expensive because they are not going to get their storage
"for free", they are going to alloca it.
however, it still is not clear, given that 99% of the wide ints are
going to fit in a single hwi, that this would be a noticeable win.
Clearly this
is where you think that we should be going by suggesting that we abstract
away the internal storage. However, this comes at a price: what is
currently an array access in my patches would (i believe) become a function
call.
No, the workers (that perform the array accesses) will simply get
a pointer to the first data element. Then whether it's embedded or
external is of no interest to them.
so is your plan that the wide int constructors from rtx or tree would
just copy the pointer to the array on top of the array that is otherwise
allocated on the stack? I can easily do this. But as i said, the
gain seems quite small.
And of course, going the other way still does need the copy.
From a performance point of view, i believe that this is a non
starter. If you can figure out how to design this so that it is not a
function call, i would consider this a viable option.
On the other side of this you are clearly correct that we are copying the
data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs. But
this is why we represent data inside of the wide_ints, the INT_CSTs and the
CONST_WIDE_INTs in a compressed form. Even with very big types, which are
generally rare, the constants them selves are very small. So the copy
operation is a loop that almost always copies one element, even with
tree-vrp which doubles the sizes of every type.
There is the third option which is that the storage inside the wide int is
just ggced storage. We rejected this because of the functional nature of
wide-ints. There are zillions created, they can be stack allocated, and
they last for very short periods of time.
Of course - GCing wide-ints is a non-starter.
As is probably obvious, I don't agree FWIW. It seems like an unnecessary
complication without any clear use. Especially since the number of
Maybe the double_int typedef is without any clear use. Properly
abstracting from the storage / information providers will save
compile-time, memory and code though. I don't see that any thought
was spent on how to avoid excessive copying or dealing with
long(er)-lived objects and their storage needs.
I actually disagree. Wide ints can use a bloated amount of storage
because they are designed to be very short lived and very low cost objects
that are stack allocated. For long term storage, there is INT_CST at the
tree level and CONST_WIDE_INT at the rtl level. Those use a very compact
storage model. The copying entailed is only a small part of the overall
performance.
Well, but both trees and RTXen are not viable for short-lived things because
the are GCed! double-ints were suitable for this kind of stuff because
the also have a moderate size. With wide-ints size becomes a problem
(or GC, if you instead use trees or RTXen).
Everything that you are suggesting along these lines is adding to the weight
of a wide-int object.
On the contrary - it lessens their weight (with external already
existing storage)
or does not do anything to it (with the embedded storage).
You have to understand there will be many more
wide-ints created in a normal compilation than were ever created with
double-int. This is because the rtl level had no object like this at all
and at the tree level, many of the places that should have used double int,
short cut the code and only did the transformations if the types fit in a
HWI.
Your argument shows that the copy-in/out from tree/RTX to/from wide-int
will become a very frequent operation and thus it is worth optimizing it.
This is why we are extremely defensive about this issue. We really did
think a lot about it.
I'm sure you did.
Richard.