subject:"\[math\] speeding up percentile based statistics"

Re: [math] speeding up percentile based statistics

2010-11-03 Thread Leanne Guy

On 9/27/10 09:55 PM, luc.maison...@free.fr wrote: - "Gilles Sadowski" a écrit : [...] ---CUT--- double[] a = new double[] {1, 2, 3}; double[] b = new double[] {1, 2, 3, 4, 5}; Percentile pA = new Percentile(a); Percentile pB = new Percentile(b); double r; r = pA

Re: [math] speeding up percentile based statistics

2010-10-03 Thread Phil Steitz

On 10/3/10 3:55 PM, Luc Maisonobe wrote: Le 26/09/2010 21:34, Luc Maisonobe a écrit : Le 26/09/2010 21:05, Phil Steitz a écrit : On 9/26/10 2:30 PM, Luc Maisonobe wrote: Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : 2010/9/26 Gilles Sadowski: [...] 1) do nothing to check the array

Re: [math] speeding up percentile based statistics

2010-10-03 Thread Luc Maisonobe

Le 26/09/2010 21:34, Luc Maisonobe a écrit : > Le 26/09/2010 21:05, Phil Steitz a écrit : >> On 9/26/10 2:30 PM, Luc Maisonobe wrote: >>> Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : 2010/9/26 Gilles Sadowski: >> [...] >> >> 1) do nothing to check the array is the same be

Re: [math] speeding up percentile based statistics

2010-09-27 Thread luc . maisonobe

- "Gilles Sadowski" a écrit : > > [...] > > >> > > >> ---CUT--- > > >>double[] a = new double[] {1, 2, 3}; > > >>double[] b = new double[] {1, 2, 3, 4, 5}; > > >>Percentile pA = new Percentile(a); > > >>Percentile pB = new Percentile(b); > > >> > > >>double r; > > >>r

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Gilles Sadowski

> [...] > >> > >> ---CUT--- > >>double[] a = new double[] {1, 2, 3}; > >>double[] b = new double[] {1, 2, 3, 4, 5}; > >>Percentile pA = new Percentile(a); > >>Percentile pB = new Percentile(b); > >> > >>double r; > >>r = pA.evaluate(50); > >>r = pB.evaluate(50); > >>

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe

Le 26/09/2010 21:05, Phil Steitz a écrit : > On 9/26/10 2:30 PM, Luc Maisonobe wrote: >> Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : >>> 2010/9/26 Gilles Sadowski: > [...] > > 1) do nothing to check the array is the same between calls and > blindly > assumes it I

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz

On 9/26/10 2:30 PM, Luc Maisonobe wrote: Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : 2010/9/26 Gilles Sadowski: [...] 1) do nothing to check the array is the same between calls and blindly assumes it IS the same. Users would really need to call clearCache when they provide

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe

Le 26/09/2010 20:24, Phil Steitz a écrit : > On 9/26/10 12:51 PM, Luc Maisonobe wrote: >> Le 26/09/2010 18:21, Ted Dunning a écrit : >>> I was about to write this suggestion down. >>> >>> Instead, I will simply concur. >>> >>> +1 >>> >>> The safe version can do the checksum or make a copy. Either

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz

On 9/26/10 1:27 PM, Mikkel Meyer Andersen wrote: 2010/9/26 Gilles Sadowski: [...] 1) do nothing to check the array is the same between calls and blindly assumes it IS the same. Users would really need to call clearCache when they provide a new array pros: very simple cons:

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe

Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : > 2010/9/26 Gilles Sadowski : >>> [...] >>> >>> 1) do nothing to check the array is the same between calls and blindly >>> assumes it IS the same. Users would really need to call clearCache >>> when they provide a new array >>> pros:

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz

On 9/26/10 12:51 PM, Luc Maisonobe wrote: Le 26/09/2010 18:21, Ted Dunning a écrit : I was about to write this suggestion down. Instead, I will simply concur. +1 The safe version can do the checksum or make a copy. Either is safe and the costs are similar (O(n) on every call and no memory or

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen

2010/9/26 Gilles Sadowski : >> [...] >> >> 1) do nothing to check the array is the same between calls and blindly >> assumes it IS the same. Users would really need to call clearCache >> when they provide a new array >> pros: very simple >> cons: error-prone for the user as it reli

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Gilles Sadowski

> [...] > > 1) do nothing to check the array is the same between calls and blindly > assumes it IS the same. Users would really need to call clearCache > when they provide a new array > pros: very simple > cons: error-prone for the user as it relies only on reading and > under

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe

Le 26/09/2010 18:55, Mikkel Meyer Andersen a écrit : > I don't see any season for computing hash value if pointers don't match. If pointers don't match, the two arrays are obviously different, but wen still need to compute the hash to cache it for the check on next call. Luc > Den 2010 9 26 18:5

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen

I don't see any season for computing hash value if pointers don't match. Den 2010 9 26 18:52 skrev "Luc Maisonobe" : > Le 26/09/2010 18:21, Ted Dunning a écrit : >> I was about to write this suggestion down. >> >> Instead, I will simply concur. >> >> +1 >> >> The safe version can do the checksum or

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe

Le 26/09/2010 18:21, Ted Dunning a écrit : > I was about to write this suggestion down. > > Instead, I will simply concur. > > +1 > > The safe version can do the checksum or make a copy. Either is safe and the > costs are similar (O(n) on every call and no memory or O(1) on every call > after t

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Ted Dunning

I was about to write this suggestion down. Instead, I will simply concur. +1 The safe version can do the checksum or make a copy. Either is safe and the costs are similar (O(n) on every call and no memory or O(1) on every call after the first, O(n) memory). I would tend toward the copy option.

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen

Hi, Why not simply make 3) the default, but also supply a *IMSureWhatIMDoing-method only implementing 2) (it is so cheap that we might as well do it instead of not doing any check at all). This means that users who read the documentation can gain something when they explicitly ask for it, while us

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe

Le 23/09/2010 20:57, Luc Maisonobe a écrit : > Hi, > > Resuming this thread, as I will be soon able to work on this (I hope). > I propose to start by solving > using a simple > partition-based selection algorithm and preserving pivot information. >

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Luc Maisonobe

Le 23/09/2010 22:38, Ted Dunning a écrit : > If you use the median of three or five randomly selected points, the worst > case/average case distinction is not going to come up. You could even use > the first, middle, end and two randomly selected points if you get nutty > about it. I intended to

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Ted Dunning

If you use the median of three or five randomly selected points, the worst case/average case distinction is not going to come up. You could even use the first, middle, end and two randomly selected points if you get nutty about it. On Thu, Sep 23, 2010 at 11:57 AM, Luc Maisonobe wrote: > Partiti

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Luc Maisonobe

Hi, Resuming this thread, as I will be soon able to work on this (I hope). I propose to start by solving using a simple partition-based selection algorithm and preserving pivot information. This would help both the single call use-case (we don't sor

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix

Ted, O(n), not n Expected case is n + n/2 + n/4 ... < 2n Yes, that I am already more incline to buy. Dim. Dimitri Pourbaix * Institut d'Astronomie et d'Astrophysique * Don't worry, be ha

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Mikkel Meyer Andersen

Hi, Just a small pedantry correction: Luc, you wrote: > The evaluation that is only an approximation is the on-line algorithm, > because it does not keep all values in memory. However, even when > everything is in memory one should be aware that the result of the > computation is the sample quant

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Ted Dunning

O(n), not n Expected case is n + n/2 + n/4 ... < 2n On Sat, Sep 18, 2010 at 12:05 PM, Luc Maisonobe wrote: > Le 18/09/2010 20:43, Dimitri Pourbaix a écrit : > > Luc, > > > >>> On the other hand, the speed-up version would no longer > >>> be mathematically correct, as a rough approximate of the p

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe

Le 18/09/2010 20:43, Dimitri Pourbaix a écrit : > Luc, > >>> On the other hand, the speed-up version would no longer >>> be mathematically correct, as a rough approximate of the pivot would be >>> adopted. >> >> No, the exact pivot is used. Its evaluation is only delayed. It would be >> correct. >

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix

Luc, On the other hand, the speed-up version would no longer be mathematically correct, as a rough approximate of the pivot would be adopted. No, the exact pivot is used. Its evaluation is only delayed. It would be correct. Given a **general** array, I do not think one can identify the media

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe

Le 18/09/2010 20:01, Dimitri Pourbaix a écrit : > Hi, > > If I understand correctly, the argument against the present implementation > is that if several percentiles are requested, they all require the sorting > of the array. Yes. > On the other hand, the speed-up version would no longer > be ma

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix

Hi, If I understand correctly, the argument against the present implementation is that if several percentiles are requested, they all require the sorting of the array. On the other hand, the speed-up version would no longer be mathematically correct, as a rough approximate of the pivot would be

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe

Le 18/09/2010 17:39, Luc Maisonobe a écrit : > Le 18/09/2010 17:36, Phil Steitz a écrit : >> On 9/18/10 10:24 AM, Luc Maisonobe wrote: >>> Le 18/09/2010 16:16, Phil Steitz a écrit : On 9/17/10 3:19 PM, Luc Maisonobe wrote: > Le 17/09/2010 19:55, Ted Dunning a écrit : >> There are also

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe

Le 18/09/2010 17:36, Phil Steitz a écrit : > On 9/18/10 10:24 AM, Luc Maisonobe wrote: >> Le 18/09/2010 16:16, Phil Steitz a écrit : >>> On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : > There are also on-line percentile estimation methods that require >

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Phil Steitz

On 9/18/10 10:24 AM, Luc Maisonobe wrote: Le 18/09/2010 16:16, Phil Steitz a écrit : On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estim

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe

Le 18/09/2010 16:16, Phil Steitz a écrit : > On 9/17/10 3:19 PM, Luc Maisonobe wrote: >> Le 17/09/2010 19:55, Ted Dunning a écrit : >>> There are also on-line percentile estimation methods that require only a >>> single pass over the data in order to get good estimates of several >>> quantile >>> a

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Phil Steitz

On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estimates of several quantile at the same time. If you are interested in getting an estimat

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning

I just meant that with the nearly correct estimates from the first pass you could do a more accurate job in the second pass, possibly by partitions, possibly by just applying the incremental algorithm a second time. Because it would have good density estimates from the first pass, it would presuma

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning

No. I think I missed you point. My comment about code complexity still applies somewhat. Storing the state of the sort and testing the corner cases can be difficult. On Fri, Sep 17, 2010 at 12:19 PM, Luc Maisonobe wrote: > > Moreover, since sort is n log n without a radix sort, then the > > in

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Luc Maisonobe

Le 17/09/2010 19:55, Ted Dunning a écrit : > There are also on-line percentile estimation methods that require only a > single pass over the data in order to get good estimates of several quantile > at the same time. > > If you are interested in getting an estimate of some quantile of the > underl

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning

There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estimates of several quantile at the same time. If you are interested in getting an estimate of some quantile of the underlying distribution that generated your data then these o

[math] speeding up percentile based statistics

2010-09-17 Thread Luc Maisonobe

Hi all, During a recent face to face discussion with some commons-math users from a large project, it appeared one implementation choice for Percentile statistics was a huge performance bottleneck. When the Percentile.evaluate method is called, the sample array is copied and sorted. In one use ca

39 matches

Mail list logo