Re: patch to fix constant math - 4th patch - the wide-int class.

Richard Biener Wed, 31 Oct 2012 07:24:36 -0700

On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck
<zad...@naturalbridge.com> wrote:
>
> On 10/31/2012 10:05 AM, Richard Biener wrote:
>>
>> On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
>> <zad...@naturalbridge.com> wrote:
>>>
>>> On 10/31/2012 08:11 AM, Richard Biener wrote:
>>>>
>>>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>>>> <rdsandif...@googlemail.com> wrote:
>>>>>
>>>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>>>>
>>>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>>>> <rdsandif...@googlemail.com> wrote:
>>>>>>>
>>>>>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>>>>>>
>>>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>>>> <zad...@naturalbridge.com> wrote:
>>>>>>>>>
>>>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>>>> <mikest...@comcast.net>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener
>>>>>>>>>>> <richard.guent...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>>>> <zad...@naturalbridge.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max
>>>>>>>>>>>>>> byte
>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not believe that this can happen.   The core compiler
>>>>>>>>>>>>> includes all
>>>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>>>
>>>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>>>
>>>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes
>>>>>>>>>>> that
>>>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>>>> bigger, we'd
>>>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>>>
>>>>>>>>>>>>>> I still would like to have the ability to provide
>>>>>>>>>>>>>> specializations of
>>>>>>>>>>>>>> wide_int
>>>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>>>> templated
>>>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2>
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you want to go down this path after the patches get in, go
>>>>>>>>>>>>> for
>>>>>>>>>>>>> it.
>>>>>>>>>>>>> I
>>>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>>>> This was not meant to be a plug in replacement for double int.
>>>>>>>>>>>>> This
>>>>>>>>>>>>> goal of
>>>>>>>>>>>>> this patch is to get the compiler to do the constant math the
>>>>>>>>>>>>> way
>>>>>>>>>>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>>> target does it.   Any such instantiation is by definition
>>>>>>>>>>>>> placing
>>>>>>>>>>>>> some
>>>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, what I don't really like is that we now have two
>>>>>>>>>>>> implementations
>>>>>>>>>>>> of functions that perform integer math on two-HWI sized
>>>>>>>>>>>> integers.
>>>>>>>>>>>> What
>>>>>>>>>>>> I also don't like too much is that we have two different
>>>>>>>>>>>> interfaces to
>>>>>>>>>>>> operate
>>>>>>>>>>>> on them!  Can't you see how I come to not liking this?
>>>>>>>>>>>> Especially
>>>>>>>>>>>> the
>>>>>>>>>>>> latter …
>>>>>>>>>>>
>>>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int
>>>>>>>>>>> is a
>>>>>>>>>>> waste of time, as the time is better spent removing double-int
>>>>>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>> compiler.  All the necessary semantics and code of double-int
>>>>>>>>>>> _has_
>>>>>>>>>>> been
>>>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way
>>>>>>>>>>> to
>>>>>>>>>>> vend
>>>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>>>> then all the
>>>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>>>> must then
>>>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>>>> double-int; it is
>>>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can
>>>>>>>>>>> change
>>>>>>>>>>> that.
>>>>>>>>>>
>>>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we
>>>>>>>>>> will
>>>>>>>>>> get rid of it - you
>>>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you
>>>>>>>>>> for
>>>>>>>>>> example
>>>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>>>
>>>>>>>>>> To be able to remove double_int in favor of wide_int requires _at
>>>>>>>>>> least_
>>>>>>>>>> templating wide_int on 'len' and providing specializations for 1
>>>>>>>>>> and
>>>>>>>>>> 2.
>>>>>>>>>>
>>>>>>>>>> It might be a non-issue for math that operates on trees or RTXen
>>>>>>>>>> due
>>>>>>>>>> to
>>>>>>>>>> the allocation overhead we pay, but in recent years we
>>>>>>>>>> transitioned
>>>>>>>>>> important
>>>>>>>>>> paths away from using tree math to using double_ints _for speed
>>>>>>>>>> reasons_.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>> i do not know why you believe this about the speed.     double int
>>>>>>>>> always
>>>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>>>
>>>>>>>>> the thing about wide int, is that since it does math to the
>>>>>>>>> precision's
>>>>>>>>> size, it almost never does uses synthetic operations since the
>>>>>>>>> sizes
>>>>>>>>> for
>>>>>>>>> almost every instance can be done using the native math on the
>>>>>>>>> machine.
>>>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>>>> natively.
>>>>>>>>> I seriously doubt that you are going to do TI mode math much faster
>>>>>>>>> than i
>>>>>>>>> do it and if you do who cares.
>>>>>>>>>
>>>>>>>>> the number of calls does not effect the performance in any negative
>>>>>>>>> way and
>>>>>>>>> it fact is more efficient since common things that require more
>>>>>>>>> than
>>>>>>>>> one
>>>>>>>>> operation in double in are typically done in a single operation.
>>>>>>>>
>>>>>>>> Simple double-int operations like
>>>>>>>>
>>>>>>>> inline double_int
>>>>>>>> double_int::and_not (double_int b) const
>>>>>>>> {
>>>>>>>>     double_int result;
>>>>>>>>     result.low = low & ~b.low;
>>>>>>>>     result.high = high & ~b.high;
>>>>>>>>     return result;
>>>>>>>> }
>>>>>>>>
>>>>>>>> are always going to be faster than conditionally executing only one
>>>>>>>> operation
>>>>>>>> (but inside an offline function).
>>>>>>>
>>>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>>>> appropriate here.
>>>>>>>
>>>>>>> It's interesting that you gave this example, since before you were
>>>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>>>> removed in favour of separate and() and not() operations, but why
>>>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>>>
>>>>>> I was more concerned about fused operations that use precision
>>>>>> or bitsize as input.  That is for example
>>>>>>
>>>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>>>> +  bool only_sign_bit_p () const;
>>>>>>
>>>>>> The first is construct a wide-int with precision prec (and sign- or
>>>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>>>> should not be necessary and existing callers should be questioned
>>>>>> instead of introducing it.
>>>>>>
>>>>>> In fact wide-int seems to have so many "fused" operations that
>>>>>> we run out of sensible recognizable names for them.  Which results
>>>>>> in a lot of confusion on what the functions actually do (at least for
>>>>>> me).
>>>>>
>>>>> Well, I suppose I can't really say anything useful either way on
>>>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>>>
>>>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>>>> of what's fused and what isn't.
>>>>>>
>>>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>>>> operations introduced separately together with the places they are
>>>>>> used.  In the current way it's way too tedious to go over all of them
>>>>>> and match them with callers, lookup enough context and then
>>>>>> make up my mind on whether the caller should do sth different or not.
>>>>>>
>>>>>> Thus, consider the big initial API a reason that all this review takes
>>>>>> so long ...
>>>>>>
>>>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>>>> routines that pull out the integer representation again and create a
>>>>>>> tree to hold the result, is going to be much slower than using either
>>>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>>>> measurable difference between double_int and wide_int here.
>>>>>>>
>>>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>>>> wide integer really shouldn't have any significance.
>>>>>>>
>>>>>>> Your main complaint seems to be that the wide_int API is different
>>>>>>> from the double_int one, but we can't literally use the same API,
>>>>>>> since
>>>>>>> double_int has an implicit precision and bitsize, and wide_int
>>>>>>> doesn't.
>>>>>>> Having a precision that is separate from the underlying
>>>>>>> representation
>>>>>>> is IMO the most important feature of wide_int, so:
>>>>>>>
>>>>>>>      template wide_int<2> double_int;
>>>>>>>
>>>>>>> is never going to be a drop-in, API-compatible replacement for
>>>>>>> double_int.
>>>>>>
>>>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>>>> you have a double_int<N> class.
>>>>>
>>>>> But you don't!  Because...
>>>>>
>>>>>> Thus wide-int should have a base
>>>>>> of that kind and just add precision / bitsize ontop of that.  It
>>>>>> wouldn't
>>>>>> be a step forward if we end up replacing double_int uses with
>>>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>>>> would it?
>>>>>
>>>>> ...the precision and bitsize isn't an optional extra, either
>>>>> conceptually
>>>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>>>> the hood, but the value of N is an internal implementation detail.
>>>>> No operations are done to N HWIs, they're done to the number of bits
>>>>> in the operands.  Whereas a double_int<N> class does everything to N
>>>>> HWIs.
>>>>
>>>> If that's the only effect then either bitsize or precision is redundant
>>>> (and
>>>> we also have len ...).  Note I did not mention len above, thus the base
>>>> class would retain 'len' and double-int would simply use 2 for it
>>>> (if you don't template it but make it variable).
>>>>
>>>> Richard.
>>>>
>>> NO, in your own words, there are two parts of the compiler that want the
>>> infinite model.   The rest wants to do the math the way the target does
>>> it.
>>> My version now accommodates both.    In tree vrp it scans the gimple and
>>> determines what the largest type is and that is the basis of all of the
>>> math
>>> in this pass.  If you just make double int bigger, then you are paying
>>> for
>>> big math everywhere.
>>
>> You have an artificial limit on what 'len' can be.  And you do not
>> accomodate
>> users that do not want to pay the storage penalty for that arbitrary upper
>> limit
>> choice.  That's all because 'len' may grow (mutate).  You could
>> alternatively
>> not allow bitsize to grow / mutate and have allocation tied to bitsize
>> instead
>> of len.
>
> It is not artificial, it is based on the target.  I chose to do it that way
> because i "knew" that having a fixed size would be faster than doing an
> alloca on for every wide-int.    If we feel that it is important to be able
> to do truly arbitrary infinite precision arithmetic (as opposed to just some
> fixed amount larger than the size of the type) then we can have a subclass
> that does this. However, there is not a need to do this in the compiler
> currently and so using this as an argument against wide-int is really not
> fair.


Well, it is artifical as you had to increase it by a factor of two to handle
the use in VRP.  It is also "artificial" as it wastes storage.

> However, as the machines allow wider math, your really do not want to
> penalize every program that is compiled with this burden.   It was ok for
> double int to do so, but when machines commonly have oi mode, it will still
> be the case that more than 99% of the variables will be 64 bits or less.
>
> I do worry a lot about adding layers like this on the efficiency of the
> compiler.   you made a valid point that a lot of the double int routines
> could be done inline and my plan is take the parts of the functions that
> notice that the precision fits in a hwi and do them inline.
>
> If your proposed solution causes a function call to access the elements,
> then we are doomed.

Well, if it causes a function call to access a pointer to the array of elements
which you can cache then it wouldn't be that bad.  And with templates
we can inline the access anyway.

Richard.

>> Ideally the wide-int interface would have two storage models:
>>
>> class alloc_storage
>> {
>>    unsigned len; /* or bitsize */
>>    HOST_WIDE_INT *hwis;
>>
>>    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
>> }
>>
>> class max_mode
>> {
>>    HOST_WIDE_INT hwis[largest integer mode size in hwi];
>>
>>    HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
>> }
>>
>> template <class storage>
>> class wide_int : storage
>> {
>>
>> so you can even re-use the (const) in-place storage of INTEGER_CSTs
>> or RTL CONST_WIDEs.  And VRP would simply have its own storage
>> supporting 2 times the largest integer mode (or whatever choice it has).
>> And double-int would simply embed two.
>>
>> Maybe this is the perfect example for introducing virtual functions as
>> well
>> to ease inter-operability between the wide-int variants without making
>> each
>> member a template on the 2nd wide-int operand (it's of course
>> auto-deduced,
>> but well ...).
>>
>> The above is just a brain-dump, details may need further thinking
>> Like the operator[], maybe the storage model should just be able
>> to return a pointer to the array of HWIs, this way the actual workers
>> can be out-of-line and non-templates.
>>
>> Richard.
>>
>>>>> Richard
>>>
>>>
>

Re: patch to fix constant math - 4th patch - the wide-int class.

Reply via email to