On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck <zad...@naturalbridge.com> wrote: > > On 10/31/2012 10:05 AM, Richard Biener wrote: >> >> On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck >> <zad...@naturalbridge.com> wrote: >>> >>> On 10/31/2012 08:11 AM, Richard Biener wrote: >>>> >>>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford >>>> <rdsandif...@googlemail.com> wrote: >>>>> >>>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>>> >>>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford >>>>>> <rdsandif...@googlemail.com> wrote: >>>>>>> >>>>>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>>>>> >>>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck >>>>>>>> <zad...@naturalbridge.com> wrote: >>>>>>>>> >>>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote: >>>>>>>>>> >>>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump >>>>>>>>>> <mikest...@comcast.net> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck >>>>>>>>>>>> <zad...@naturalbridge.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / >>>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT]; >>>>>>>>>>>>>> >>>>>>>>>>>>>> are we sure this rounds properly? Consider a port with max >>>>>>>>>>>>>> byte >>>>>>>>>>>>>> mode >>>>>>>>>>>>>> size 4 on a 64bit host. >>>>>>>>>>>>> >>>>>>>>>>>>> I do not believe that this can happen. The core compiler >>>>>>>>>>>>> includes all >>>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits. >>>>>>>>>>>> >>>>>>>>>>>> And mode bitsizes are always power-of-two? I suppose so. >>>>>>>>>>> >>>>>>>>>>> Actually, no, they are not. Partial int modes can have bit sizes >>>>>>>>>>> that >>>>>>>>>>> are not power of two, and, if there isn't an int mode that is >>>>>>>>>>> bigger, we'd >>>>>>>>>>> want to round up the partial int bit size. Something like ((2 * >>>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / >>>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it. >>>>>>>>>>> >>>>>>>>>>>>>> I still would like to have the ability to provide >>>>>>>>>>>>>> specializations of >>>>>>>>>>>>>> wide_int >>>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template >>>>>>>>>>>>>> templated >>>>>>>>>>>>>> on the number of HWIs in val. Interface-wise wide_int<2> >>>>>>>>>>>>>> should >>>>>>>>>>>>>> be >>>>>>>>>>>>>> identical to double_int, thus we should be able to do >>>>>>>>>>>>>> >>>>>>>>>>>>>> typedef wide_int<2> double_int; >>>>>>>>>>>>> >>>>>>>>>>>>> If you want to go down this path after the patches get in, go >>>>>>>>>>>>> for >>>>>>>>>>>>> it. >>>>>>>>>>>>> I >>>>>>>>>>>>> see no use at all for this. >>>>>>>>>>>>> This was not meant to be a plug in replacement for double int. >>>>>>>>>>>>> This >>>>>>>>>>>>> goal of >>>>>>>>>>>>> this patch is to get the compiler to do the constant math the >>>>>>>>>>>>> way >>>>>>>>>>>>> that >>>>>>>>>>>>> the >>>>>>>>>>>>> target does it. Any such instantiation is by definition >>>>>>>>>>>>> placing >>>>>>>>>>>>> some >>>>>>>>>>>>> predefined limit that some target may not want. >>>>>>>>>>>> >>>>>>>>>>>> Well, what I don't really like is that we now have two >>>>>>>>>>>> implementations >>>>>>>>>>>> of functions that perform integer math on two-HWI sized >>>>>>>>>>>> integers. >>>>>>>>>>>> What >>>>>>>>>>>> I also don't like too much is that we have two different >>>>>>>>>>>> interfaces to >>>>>>>>>>>> operate >>>>>>>>>>>> on them! Can't you see how I come to not liking this? >>>>>>>>>>>> Especially >>>>>>>>>>>> the >>>>>>>>>>>> latter … >>>>>>>>>>> >>>>>>>>>>> double_int is logically dead. Reactoring wide-int and double-int >>>>>>>>>>> is a >>>>>>>>>>> waste of time, as the time is better spent removing double-int >>>>>>>>>>> from >>>>>>>>>>> the >>>>>>>>>>> compiler. All the necessary semantics and code of double-int >>>>>>>>>>> _has_ >>>>>>>>>>> been >>>>>>>>>>> refactored into wide-int already. Changing wide-int in any way >>>>>>>>>>> to >>>>>>>>>>> vend >>>>>>>>>>> anything to double-int is wrong, as once double-int is removed, >>>>>>>>>>> then all the >>>>>>>>>>> api changes to make double-int share from wide-int is wasted and >>>>>>>>>>> must then >>>>>>>>>>> be removed. The path forward is the complete removal of >>>>>>>>>>> double-int; it is >>>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can >>>>>>>>>>> change >>>>>>>>>>> that. >>>>>>>>>> >>>>>>>>>> double_int, compared to wide_int, is fast and lean. I doubt we >>>>>>>>>> will >>>>>>>>>> get rid of it - you >>>>>>>>>> will make compile-time math a _lot_ slower. Just profile when you >>>>>>>>>> for >>>>>>>>>> example >>>>>>>>>> change get_inner_reference to use wide_ints. >>>>>>>>>> >>>>>>>>>> To be able to remove double_int in favor of wide_int requires _at >>>>>>>>>> least_ >>>>>>>>>> templating wide_int on 'len' and providing specializations for 1 >>>>>>>>>> and >>>>>>>>>> 2. >>>>>>>>>> >>>>>>>>>> It might be a non-issue for math that operates on trees or RTXen >>>>>>>>>> due >>>>>>>>>> to >>>>>>>>>> the allocation overhead we pay, but in recent years we >>>>>>>>>> transitioned >>>>>>>>>> important >>>>>>>>>> paths away from using tree math to using double_ints _for speed >>>>>>>>>> reasons_. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>> >>>>>>>>> i do not know why you believe this about the speed. double int >>>>>>>>> always >>>>>>>>> does synthetic math since you do everything at 128 bit precision. >>>>>>>>> >>>>>>>>> the thing about wide int, is that since it does math to the >>>>>>>>> precision's >>>>>>>>> size, it almost never does uses synthetic operations since the >>>>>>>>> sizes >>>>>>>>> for >>>>>>>>> almost every instance can be done using the native math on the >>>>>>>>> machine. >>>>>>>>> almost every call has a check to see if the operation can be done >>>>>>>>> natively. >>>>>>>>> I seriously doubt that you are going to do TI mode math much faster >>>>>>>>> than i >>>>>>>>> do it and if you do who cares. >>>>>>>>> >>>>>>>>> the number of calls does not effect the performance in any negative >>>>>>>>> way and >>>>>>>>> it fact is more efficient since common things that require more >>>>>>>>> than >>>>>>>>> one >>>>>>>>> operation in double in are typically done in a single operation. >>>>>>>> >>>>>>>> Simple double-int operations like >>>>>>>> >>>>>>>> inline double_int >>>>>>>> double_int::and_not (double_int b) const >>>>>>>> { >>>>>>>> double_int result; >>>>>>>> result.low = low & ~b.low; >>>>>>>> result.high = high & ~b.high; >>>>>>>> return result; >>>>>>>> } >>>>>>>> >>>>>>>> are always going to be faster than conditionally executing only one >>>>>>>> operation >>>>>>>> (but inside an offline function). >>>>>>> >>>>>>> OK, this is really in reply to the 4.8 thing, but it felt more >>>>>>> appropriate here. >>>>>>> >>>>>>> It's interesting that you gave this example, since before you were >>>>>>> complaining about too many fused ops. Clearly this one could be >>>>>>> removed in favour of separate and() and not() operations, but why >>>>>>> not provide a fused one if there are clients who'll make use of it? >>>>>> >>>>>> I was more concerned about fused operations that use precision >>>>>> or bitsize as input. That is for example >>>>>> >>>>>>>> + bool only_sign_bit_p (unsigned int prec) const; >>>>>>>> + bool only_sign_bit_p () const; >>>>>> >>>>>> The first is construct a wide-int with precision prec (and sign- or >>>>>> zero-extend it) and then call only_sign_bit_p on it. Such function >>>>>> should not be necessary and existing callers should be questioned >>>>>> instead of introducing it. >>>>>> >>>>>> In fact wide-int seems to have so many "fused" operations that >>>>>> we run out of sensible recognizable names for them. Which results >>>>>> in a lot of confusion on what the functions actually do (at least for >>>>>> me). >>>>> >>>>> Well, I suppose I can't really say anything useful either way on >>>>> that one, since I'm not writing the patch and I'm not reviewing it :-) >>>>> >>>>>>> I think Kenny's API is just taking that to its logical conclusion. >>>>>>> There doesn't seem to be anything sacrosanct about the current choice >>>>>>> of what's fused and what isn't. >>>>>> >>>>>> Maybe. I'd rather have seen an initial small wide-int API and fused >>>>>> operations introduced separately together with the places they are >>>>>> used. In the current way it's way too tedious to go over all of them >>>>>> and match them with callers, lookup enough context and then >>>>>> make up my mind on whether the caller should do sth different or not. >>>>>> >>>>>> Thus, consider the big initial API a reason that all this review takes >>>>>> so long ... >>>>>> >>>>>>> The speed problem we had using trees for internal arithmetic isn't >>>>>>> IMO a good argument for keeping double_int in preference to wide_int. >>>>>>> Allocating and constructing tree objects to hold temporary values, >>>>>>> storing an integer representation in it, then calling tree arithmetic >>>>>>> routines that pull out the integer representation again and create a >>>>>>> tree to hold the result, is going to be much slower than using either >>>>>>> double_int or wide_int. I'd be very surprised if we notice any >>>>>>> measurable difference between double_int and wide_int here. >>>>>>> >>>>>>> I still see no reason to keep double_int around. The width of a host >>>>>>> wide integer really shouldn't have any significance. >>>>>>> >>>>>>> Your main complaint seems to be that the wide_int API is different >>>>>>> from the double_int one, but we can't literally use the same API, >>>>>>> since >>>>>>> double_int has an implicit precision and bitsize, and wide_int >>>>>>> doesn't. >>>>>>> Having a precision that is separate from the underlying >>>>>>> representation >>>>>>> is IMO the most important feature of wide_int, so: >>>>>>> >>>>>>> template wide_int<2> double_int; >>>>>>> >>>>>>> is never going to be a drop-in, API-compatible replacement for >>>>>>> double_int. >>>>>> >>>>>> My reasoning was that if you strip wide-int of precision and bitsize >>>>>> you have a double_int<N> class. >>>>> >>>>> But you don't! Because... >>>>> >>>>>> Thus wide-int should have a base >>>>>> of that kind and just add precision / bitsize ontop of that. It >>>>>> wouldn't >>>>>> be a step forward if we end up replacing double_int uses with >>>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT, >>>>>> would it? >>>>> >>>>> ...the precision and bitsize isn't an optional extra, either >>>>> conceptually >>>>> or in implementation. wide_int happens to use N HOST_WIDE_INTS under >>>>> the hood, but the value of N is an internal implementation detail. >>>>> No operations are done to N HWIs, they're done to the number of bits >>>>> in the operands. Whereas a double_int<N> class does everything to N >>>>> HWIs. >>>> >>>> If that's the only effect then either bitsize or precision is redundant >>>> (and >>>> we also have len ...). Note I did not mention len above, thus the base >>>> class would retain 'len' and double-int would simply use 2 for it >>>> (if you don't template it but make it variable). >>>> >>>> Richard. >>>> >>> NO, in your own words, there are two parts of the compiler that want the >>> infinite model. The rest wants to do the math the way the target does >>> it. >>> My version now accommodates both. In tree vrp it scans the gimple and >>> determines what the largest type is and that is the basis of all of the >>> math >>> in this pass. If you just make double int bigger, then you are paying >>> for >>> big math everywhere. >> >> You have an artificial limit on what 'len' can be. And you do not >> accomodate >> users that do not want to pay the storage penalty for that arbitrary upper >> limit >> choice. That's all because 'len' may grow (mutate). You could >> alternatively >> not allow bitsize to grow / mutate and have allocation tied to bitsize >> instead >> of len. > > It is not artificial, it is based on the target. I chose to do it that way > because i "knew" that having a fixed size would be faster than doing an > alloca on for every wide-int. If we feel that it is important to be able > to do truly arbitrary infinite precision arithmetic (as opposed to just some > fixed amount larger than the size of the type) then we can have a subclass > that does this. However, there is not a need to do this in the compiler > currently and so using this as an argument against wide-int is really not > fair.
Well, it is artifical as you had to increase it by a factor of two to handle the use in VRP. It is also "artificial" as it wastes storage. > However, as the machines allow wider math, your really do not want to > penalize every program that is compiled with this burden. It was ok for > double int to do so, but when machines commonly have oi mode, it will still > be the case that more than 99% of the variables will be 64 bits or less. > > I do worry a lot about adding layers like this on the efficiency of the > compiler. you made a valid point that a lot of the double int routines > could be done inline and my plan is take the parts of the functions that > notice that the precision fits in a hwi and do them inline. > > If your proposed solution causes a function call to access the elements, > then we are doomed. Well, if it causes a function call to access a pointer to the array of elements which you can cache then it wouldn't be that bad. And with templates we can inline the access anyway. Richard. >> Ideally the wide-int interface would have two storage models: >> >> class alloc_storage >> { >> unsigned len; /* or bitsize */ >> HOST_WIDE_INT *hwis; >> >> HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; } >> } >> >> class max_mode >> { >> HOST_WIDE_INT hwis[largest integer mode size in hwi]; >> >> HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; } >> } >> >> template <class storage> >> class wide_int : storage >> { >> >> so you can even re-use the (const) in-place storage of INTEGER_CSTs >> or RTL CONST_WIDEs. And VRP would simply have its own storage >> supporting 2 times the largest integer mode (or whatever choice it has). >> And double-int would simply embed two. >> >> Maybe this is the perfect example for introducing virtual functions as >> well >> to ease inter-operability between the wide-int variants without making >> each >> member a template on the 2nd wide-int operand (it's of course >> auto-deduced, >> but well ...). >> >> The above is just a brain-dump, details may need further thinking >> Like the operator[], maybe the storage model should just be able >> to return a pointer to the array of HWIs, this way the actual workers >> can be out-of-line and non-templates. >> >> Richard. >> >>>>> Richard >>> >>> >