Re: patch to fix constant math - 4th patch - the wide-int class.

Richard Biener Wed, 31 Oct 2012 05:45:24 -0700

On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford
<rdsandif...@googlemail.com> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>> <rdsandif...@googlemail.com> wrote:
>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>> <rdsandif...@googlemail.com> wrote:
>>>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>> <zad...@naturalbridge.com> wrote:
>>>>>>>
>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>> <mikest...@comcast.net> wrote:
>>>>>>>>>
>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener 
>>>>>>>>> <richard.guent...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>> <zad...@naturalbridge.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>
>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte 
>>>>>>>>>>>> mode
>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>
>>>>>>>>>>> I do not believe that this can happen.  The core compiler
>>>>>>>>>>> includes all
>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>
>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>
>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>> bigger, we'd
>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>
>>>>>>>>>>>> I still would like to have the ability to provide specializations 
>>>>>>>>>>>> of
>>>>>>>>>>>> wide_int
>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>> templated
>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>
>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>
>>>>>>>>>>> If you want to go down this path after the patches get in, go for 
>>>>>>>>>>> it.
>>>>>>>>>>> I
>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>>>> goal of
>>>>>>>>>>> this patch is to get the compiler to do the constant math the way 
>>>>>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>> target does it.   Any such instantiation is by definition placing 
>>>>>>>>>>> some
>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>
>>>>>>>>>> Well, what I don't really like is that we now have two 
>>>>>>>>>> implementations
>>>>>>>>>> of functions that perform integer math on two-HWI sized integers.  
>>>>>>>>>> What
>>>>>>>>>> I also don't like too much is that we have two different interfaces 
>>>>>>>>>> to
>>>>>>>>>> operate
>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially 
>>>>>>>>>> the
>>>>>>>>>> latter …
>>>>>>>>>
>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>>>> waste of time, as the time is better spent removing double-int from 
>>>>>>>>> the
>>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ 
>>>>>>>>> been
>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to 
>>>>>>>>> vend
>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>> then all the
>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>> must then
>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>> double-int; it is
>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change 
>>>>>>>>> that.
>>>>>>>>
>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>>> get rid of it - you
>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>>>> example
>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>
>>>>>>>> To be able to remove double_int in favor of wide_int requires _at 
>>>>>>>> least_
>>>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>>>
>>>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>> important
>>>>>>>> paths away from using tree math to using double_ints _for speed 
>>>>>>>> reasons_.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>
>>>>>>> i do not know why you believe this about the speed.     double int 
>>>>>>> always
>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>
>>>>>>> the thing about wide int, is that since it does math to the precision's
>>>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>>>> almost every instance can be done using the native math on the machine.
>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>> natively.
>>>>>>> I seriously doubt that you are going to do TI mode math much faster 
>>>>>>> than i
>>>>>>> do it and if you do who cares.
>>>>>>>
>>>>>>> the number of calls does not effect the performance in any
>>>>>>> negative way and
>>>>>>> it fact is more efficient since common things that require more than one
>>>>>>> operation in double in are typically done in a single operation.
>>>>>>
>>>>>> Simple double-int operations like
>>>>>>
>>>>>> inline double_int
>>>>>> double_int::and_not (double_int b) const
>>>>>> {
>>>>>>   double_int result;
>>>>>>   result.low = low & ~b.low;
>>>>>>   result.high = high & ~b.high;
>>>>>>   return result;
>>>>>> }
>>>>>>
>>>>>> are always going to be faster than conditionally executing only one
>>>>>> operation
>>>>>> (but inside an offline function).
>>>>>
>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>> appropriate here.
>>>>>
>>>>> It's interesting that you gave this example, since before you were
>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>> removed in favour of separate and() and not() operations, but why
>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>
>>>> I was more concerned about fused operations that use precision
>>>> or bitsize as input.  That is for example
>>>>
>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>> +  bool only_sign_bit_p () const;
>>>>
>>>> The first is construct a wide-int with precision prec (and sign- or
>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>> should not be necessary and existing callers should be questioned
>>>> instead of introducing it.
>>>>
>>>> In fact wide-int seems to have so many "fused" operations that
>>>> we run out of sensible recognizable names for them.  Which results
>>>> in a lot of confusion on what the functions actually do (at least for me).
>>>
>>> Well, I suppose I can't really say anything useful either way on
>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>
>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>> of what's fused and what isn't.
>>>>
>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>> operations introduced separately together with the places they are
>>>> used.  In the current way it's way too tedious to go over all of them
>>>> and match them with callers, lookup enough context and then
>>>> make up my mind on whether the caller should do sth different or not.
>>>>
>>>> Thus, consider the big initial API a reason that all this review takes
>>>> so long ...
>>>>
>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>> routines that pull out the integer representation again and create a
>>>>> tree to hold the result, is going to be much slower than using either
>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>> measurable difference between double_int and wide_int here.
>>>>>
>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>> wide integer really shouldn't have any significance.
>>>>>
>>>>> Your main complaint seems to be that the wide_int API is different
>>>>> from the double_int one, but we can't literally use the same API, since
>>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>>> Having a precision that is separate from the underlying representation
>>>>> is IMO the most important feature of wide_int, so:
>>>>>
>>>>>    template wide_int<2> double_int;
>>>>>
>>>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>>>
>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>> you have a double_int<N> class.
>>>
>>> But you don't!  Because...
>>>
>>>> Thus wide-int should have a base
>>>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>>>> be a step forward if we end up replacing double_int uses with
>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>> would it?
>>>
>>> ...the precision and bitsize isn't an optional extra, either conceptually
>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>> the hood, but the value of N is an internal implementation detail.
>>> No operations are done to N HWIs, they're done to the number of bits
>>> in the operands.  Whereas a double_int<N> class does everything to N HWIs.
>>
>> If that's the only effect then either bitsize or precision is redundant (and
>> we also have len ...).  Note I did not mention len above, thus the base
>> class would retain 'len' and double-int would simply use 2 for it
>> (if you don't template it but make it variable).
>
> But that means that wide_int has to model a P-bit operation as a
> "normal" len*HOST_WIDE_INT operation and then fix up the result
> after the fact, which seems unnecessarily convoluted.


It does that right now.  The operations are carried out in a loop
over len HOST_WIDE_INT parts, the last HWI is then special-treated
to account for precision/size.  (yes, 'len' is also used as optimization - the
fact that len ends up being mutable is another thing I dislike about
wide-int.  If wide-ints are cheap then all ops should be non-mutating
(at least to 'len')).

>  I still don't
> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
> X*HOST_WIDE_INT operation for any X) has any special meaning.

Well, the same reason as a HOST_WIDE_INT variable has a meaning.
We use it to constrain what we (efficiently) want to work on.  For example
CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).

Oh, and I don't necessary see a use of double_int in its current form
but for an integer representation on the host that is efficient to manipulate
integer constants of a target dependent size.  For example the target
detail that we have partial integer modes with bitsize > precision and that
the bits > precision appearantly have a meaning when looking at the
bit-representation of a constant should not be part of the base class
of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
about the reason we track bitsize in addition to precision - I think it's
abstraction at the wrong level, the tree level does fine without knowing
about bitsize).

Richard.

> Richard

Re: patch to fix constant math - 4th patch - the wide-int class.

Reply via email to