On Wed, Jun 5, 2013 at 12:00 PM, Charles R Harris <[email protected] > wrote:
> > > On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris < > [email protected]> wrote: > >> >> >> On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith <[email protected]> wrote: >> >>> On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor >>> <[email protected]> wrote: >>> > On 05.06.2013 16:33, Nathaniel Smith wrote: >>> >> The slow down people are worried about is, suppose that 'xp' has >>> >> 1,000,000 entries, and the user wants to interpolate 1 point. If we >>> >> can assume the array is sorted, then we can find which bin the 1 point >>> >> falls into by using binary search (np.searchsorted), which will >>> >> require examining ~log2(1,000,000) = 20 entries in the array. Checking >>> >> that it's sorted, though, will require examining all 1,000,000 points >>> >> -- it turns an O(log n) operation into an O(n) operation. And if this >>> >> is being called as part of some larger algorithm, this effect can >>> >> cascade -- if it gets called m times from a loop, now that's O(mn) >>> >> instead of (m log n), etc. That's often the difference between >>> >> tractable and intractable. >>> >> >>> >> If you submit a PR that gives interp the option to check for >>> >> monotonicity, then we'll almost certainly merge it :-). If you also >>> >> make it the default then there might be some debate, though I'd argue >>> >> for it... >>> > >>> > I would not like it when the performance goes from log to linear by >>> default. >>> > It is documented that the coordinates must be increasing after all. >>> >>> I agree, I don't like it either. But the problem is there are two >>> contradictory things and I don't like either of them: >>> 1) performance going from log to linear by default (for a subset of >>> somewhat extreme cases) >>> 2) people silently getting the wrong results (and according to reports >>> in this thread, the warning in the docs does not actually prevent >>> this) >>> >>> The question is which one do we dislike more. My feeling is that in >>> the cases where (1) comes up, it will annoy people and get them to >>> check the docs and find the go-faster switch, while the first warning >>> of (2) is when your spaceship crashes, so we should worry more about >>> (2). >>> >>> > How about as a compromise the function checks one or two closest >>> > neighbors around the interpolation point and warns/errors if those are >>> > not monotonic. >>> > >>> > Its not fool proof but should at least catch the very wrong cases with >>> > an acceptable performance loss. >>> >>> There are tons of ways for data to accidentally end up sorted within >>> each local region, but unsorted overall, e.g., if you're combining >>> results from parallel simulation runs. Such data would systematically >>> pass this check, but still give nonsensical answers. >>> >> >> Some actual benchmarks might be useful to the discussion. >> > > And perhaps an inplace C function ismonotone would be generally useful. > Or make greater.reduce operate correctly. Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
