On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford <rdsandif...@googlemail.com> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford >> <rdsandif...@googlemail.com> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford >>>> <rdsandif...@googlemail.com> wrote: >>>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck >>>>>> <zad...@naturalbridge.com> wrote: >>>>>>> >>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote: >>>>>>>> >>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump >>>>>>>> <mikest...@comcast.net> wrote: >>>>>>>>> >>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener >>>>>>>>> <richard.guent...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck >>>>>>>>>> <zad...@naturalbridge.com> wrote: >>>>>>>>>>> >>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote: >>>>>>>>>>>> >>>>>>>>>>>> + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / >>>>>>>>>>>> HOST_BITS_PER_WIDE_INT]; >>>>>>>>>>>> >>>>>>>>>>>> are we sure this rounds properly? Consider a port with max byte >>>>>>>>>>>> mode >>>>>>>>>>>> size 4 on a 64bit host. >>>>>>>>>>> >>>>>>>>>>> I do not believe that this can happen. The core compiler >>>>>>>>>>> includes all >>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits. >>>>>>>>>> >>>>>>>>>> And mode bitsizes are always power-of-two? I suppose so. >>>>>>>>> >>>>>>>>> Actually, no, they are not. Partial int modes can have bit sizes that >>>>>>>>> are not power of two, and, if there isn't an int mode that is >>>>>>>>> bigger, we'd >>>>>>>>> want to round up the partial int bit size. Something like ((2 * >>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / >>>>>>>>> HOST_BITS_PER_WIDE_INT should do it. >>>>>>>>> >>>>>>>>>>>> I still would like to have the ability to provide specializations >>>>>>>>>>>> of >>>>>>>>>>>> wide_int >>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template >>>>>>>>>>>> templated >>>>>>>>>>>> on the number of HWIs in val. Interface-wise wide_int<2> should be >>>>>>>>>>>> identical to double_int, thus we should be able to do >>>>>>>>>>>> >>>>>>>>>>>> typedef wide_int<2> double_int; >>>>>>>>>>> >>>>>>>>>>> If you want to go down this path after the patches get in, go for >>>>>>>>>>> it. >>>>>>>>>>> I >>>>>>>>>>> see no use at all for this. >>>>>>>>>>> This was not meant to be a plug in replacement for double int. This >>>>>>>>>>> goal of >>>>>>>>>>> this patch is to get the compiler to do the constant math the way >>>>>>>>>>> that >>>>>>>>>>> the >>>>>>>>>>> target does it. Any such instantiation is by definition placing >>>>>>>>>>> some >>>>>>>>>>> predefined limit that some target may not want. >>>>>>>>>> >>>>>>>>>> Well, what I don't really like is that we now have two >>>>>>>>>> implementations >>>>>>>>>> of functions that perform integer math on two-HWI sized integers. >>>>>>>>>> What >>>>>>>>>> I also don't like too much is that we have two different interfaces >>>>>>>>>> to >>>>>>>>>> operate >>>>>>>>>> on them! Can't you see how I come to not liking this? Especially >>>>>>>>>> the >>>>>>>>>> latter … >>>>>>>>> >>>>>>>>> double_int is logically dead. Reactoring wide-int and double-int is a >>>>>>>>> waste of time, as the time is better spent removing double-int from >>>>>>>>> the >>>>>>>>> compiler. All the necessary semantics and code of double-int _has_ >>>>>>>>> been >>>>>>>>> refactored into wide-int already. Changing wide-int in any way to >>>>>>>>> vend >>>>>>>>> anything to double-int is wrong, as once double-int is removed, >>>>>>>>> then all the >>>>>>>>> api changes to make double-int share from wide-int is wasted and >>>>>>>>> must then >>>>>>>>> be removed. The path forward is the complete removal of >>>>>>>>> double-int; it is >>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change >>>>>>>>> that. >>>>>>>> >>>>>>>> double_int, compared to wide_int, is fast and lean. I doubt we will >>>>>>>> get rid of it - you >>>>>>>> will make compile-time math a _lot_ slower. Just profile when you for >>>>>>>> example >>>>>>>> change get_inner_reference to use wide_ints. >>>>>>>> >>>>>>>> To be able to remove double_int in favor of wide_int requires _at >>>>>>>> least_ >>>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2. >>>>>>>> >>>>>>>> It might be a non-issue for math that operates on trees or RTXen due to >>>>>>>> the allocation overhead we pay, but in recent years we transitioned >>>>>>>> important >>>>>>>> paths away from using tree math to using double_ints _for speed >>>>>>>> reasons_. >>>>>>>> >>>>>>>> Richard. >>>>>>> >>>>>>> i do not know why you believe this about the speed. double int >>>>>>> always >>>>>>> does synthetic math since you do everything at 128 bit precision. >>>>>>> >>>>>>> the thing about wide int, is that since it does math to the precision's >>>>>>> size, it almost never does uses synthetic operations since the sizes for >>>>>>> almost every instance can be done using the native math on the machine. >>>>>>> almost every call has a check to see if the operation can be done >>>>>>> natively. >>>>>>> I seriously doubt that you are going to do TI mode math much faster >>>>>>> than i >>>>>>> do it and if you do who cares. >>>>>>> >>>>>>> the number of calls does not effect the performance in any >>>>>>> negative way and >>>>>>> it fact is more efficient since common things that require more than one >>>>>>> operation in double in are typically done in a single operation. >>>>>> >>>>>> Simple double-int operations like >>>>>> >>>>>> inline double_int >>>>>> double_int::and_not (double_int b) const >>>>>> { >>>>>> double_int result; >>>>>> result.low = low & ~b.low; >>>>>> result.high = high & ~b.high; >>>>>> return result; >>>>>> } >>>>>> >>>>>> are always going to be faster than conditionally executing only one >>>>>> operation >>>>>> (but inside an offline function). >>>>> >>>>> OK, this is really in reply to the 4.8 thing, but it felt more >>>>> appropriate here. >>>>> >>>>> It's interesting that you gave this example, since before you were >>>>> complaining about too many fused ops. Clearly this one could be >>>>> removed in favour of separate and() and not() operations, but why >>>>> not provide a fused one if there are clients who'll make use of it? >>>> >>>> I was more concerned about fused operations that use precision >>>> or bitsize as input. That is for example >>>> >>>>>> + bool only_sign_bit_p (unsigned int prec) const; >>>>>> + bool only_sign_bit_p () const; >>>> >>>> The first is construct a wide-int with precision prec (and sign- or >>>> zero-extend it) and then call only_sign_bit_p on it. Such function >>>> should not be necessary and existing callers should be questioned >>>> instead of introducing it. >>>> >>>> In fact wide-int seems to have so many "fused" operations that >>>> we run out of sensible recognizable names for them. Which results >>>> in a lot of confusion on what the functions actually do (at least for me). >>> >>> Well, I suppose I can't really say anything useful either way on >>> that one, since I'm not writing the patch and I'm not reviewing it :-) >>> >>>>> I think Kenny's API is just taking that to its logical conclusion. >>>>> There doesn't seem to be anything sacrosanct about the current choice >>>>> of what's fused and what isn't. >>>> >>>> Maybe. I'd rather have seen an initial small wide-int API and fused >>>> operations introduced separately together with the places they are >>>> used. In the current way it's way too tedious to go over all of them >>>> and match them with callers, lookup enough context and then >>>> make up my mind on whether the caller should do sth different or not. >>>> >>>> Thus, consider the big initial API a reason that all this review takes >>>> so long ... >>>> >>>>> The speed problem we had using trees for internal arithmetic isn't >>>>> IMO a good argument for keeping double_int in preference to wide_int. >>>>> Allocating and constructing tree objects to hold temporary values, >>>>> storing an integer representation in it, then calling tree arithmetic >>>>> routines that pull out the integer representation again and create a >>>>> tree to hold the result, is going to be much slower than using either >>>>> double_int or wide_int. I'd be very surprised if we notice any >>>>> measurable difference between double_int and wide_int here. >>>>> >>>>> I still see no reason to keep double_int around. The width of a host >>>>> wide integer really shouldn't have any significance. >>>>> >>>>> Your main complaint seems to be that the wide_int API is different >>>>> from the double_int one, but we can't literally use the same API, since >>>>> double_int has an implicit precision and bitsize, and wide_int doesn't. >>>>> Having a precision that is separate from the underlying representation >>>>> is IMO the most important feature of wide_int, so: >>>>> >>>>> template wide_int<2> double_int; >>>>> >>>>> is never going to be a drop-in, API-compatible replacement for double_int. >>>> >>>> My reasoning was that if you strip wide-int of precision and bitsize >>>> you have a double_int<N> class. >>> >>> But you don't! Because... >>> >>>> Thus wide-int should have a base >>>> of that kind and just add precision / bitsize ontop of that. It wouldn't >>>> be a step forward if we end up replacing double_int uses with >>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT, >>>> would it? >>> >>> ...the precision and bitsize isn't an optional extra, either conceptually >>> or in implementation. wide_int happens to use N HOST_WIDE_INTS under >>> the hood, but the value of N is an internal implementation detail. >>> No operations are done to N HWIs, they're done to the number of bits >>> in the operands. Whereas a double_int<N> class does everything to N HWIs. >> >> If that's the only effect then either bitsize or precision is redundant (and >> we also have len ...). Note I did not mention len above, thus the base >> class would retain 'len' and double-int would simply use 2 for it >> (if you don't template it but make it variable). > > But that means that wide_int has to model a P-bit operation as a > "normal" len*HOST_WIDE_INT operation and then fix up the result > after the fact, which seems unnecessarily convoluted.
It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). > I still don't > see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision > X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 host). Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipulate integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize > precision and that the bits > precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know more about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). Richard. > Richard