Re: [math] speeding up percentile based statistics

2010-11-03 Thread Leanne Guy
On 9/27/10 09:55 PM, luc.maison...@free.fr wrote: - "Gilles Sadowski" a écrit : [...] ---CUT--- double[] a = new double[] {1, 2, 3}; double[] b = new double[] {1, 2, 3, 4, 5}; Percentile pA = new Percentile(a); Percentile pB = new Percentile(b); double r; r = pA

Re: [math] speeding up percentile based statistics

2010-10-03 Thread Phil Steitz
On 10/3/10 3:55 PM, Luc Maisonobe wrote: Le 26/09/2010 21:34, Luc Maisonobe a écrit : Le 26/09/2010 21:05, Phil Steitz a écrit : On 9/26/10 2:30 PM, Luc Maisonobe wrote: Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : 2010/9/26 Gilles Sadowski: [...] 1) do nothing to check the array

Re: [math] speeding up percentile based statistics

2010-10-03 Thread Luc Maisonobe
Le 26/09/2010 21:34, Luc Maisonobe a écrit : > Le 26/09/2010 21:05, Phil Steitz a écrit : >> On 9/26/10 2:30 PM, Luc Maisonobe wrote: >>> Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : 2010/9/26 Gilles Sadowski: >> [...] >> >> 1) do nothing to check the array is the same be

Re: [math] speeding up percentile based statistics

2010-09-27 Thread luc . maisonobe
- "Gilles Sadowski" a écrit : > > [...] > > >> > > >> ---CUT--- > > >>double[] a = new double[] {1, 2, 3}; > > >>double[] b = new double[] {1, 2, 3, 4, 5}; > > >>Percentile pA = new Percentile(a); > > >>Percentile pB = new Percentile(b); > > >> > > >>double r; > > >>r

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Gilles Sadowski
> [...] > >> > >> ---CUT--- > >>double[] a = new double[] {1, 2, 3}; > >>double[] b = new double[] {1, 2, 3, 4, 5}; > >>Percentile pA = new Percentile(a); > >>Percentile pB = new Percentile(b); > >> > >>double r; > >>r = pA.evaluate(50); > >>r = pB.evaluate(50); > >>

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
Le 26/09/2010 21:05, Phil Steitz a écrit : > On 9/26/10 2:30 PM, Luc Maisonobe wrote: >> Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : >>> 2010/9/26 Gilles Sadowski: > [...] > > 1) do nothing to check the array is the same between calls and > blindly > assumes it I

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz
On 9/26/10 2:30 PM, Luc Maisonobe wrote: Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : 2010/9/26 Gilles Sadowski: [...] 1) do nothing to check the array is the same between calls and blindly assumes it IS the same. Users would really need to call clearCache when they provide

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
Le 26/09/2010 20:24, Phil Steitz a écrit : > On 9/26/10 12:51 PM, Luc Maisonobe wrote: >> Le 26/09/2010 18:21, Ted Dunning a écrit : >>> I was about to write this suggestion down. >>> >>> Instead, I will simply concur. >>> >>> +1 >>> >>> The safe version can do the checksum or make a copy. Either

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz
On 9/26/10 1:27 PM, Mikkel Meyer Andersen wrote: 2010/9/26 Gilles Sadowski: [...] 1) do nothing to check the array is the same between calls and blindly assumes it IS the same. Users would really need to call clearCache when they provide a new array pros: very simple cons:

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : > 2010/9/26 Gilles Sadowski : >>> [...] >>> >>> 1) do nothing to check the array is the same between calls and blindly >>> assumes it IS the same. Users would really need to call clearCache >>> when they provide a new array >>> pros:

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz
On 9/26/10 12:51 PM, Luc Maisonobe wrote: Le 26/09/2010 18:21, Ted Dunning a écrit : I was about to write this suggestion down. Instead, I will simply concur. +1 The safe version can do the checksum or make a copy. Either is safe and the costs are similar (O(n) on every call and no memory or

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen
2010/9/26 Gilles Sadowski : >> [...] >> >>  1) do nothing to check the array is the same between calls and blindly >>     assumes it IS the same. Users would really need to call clearCache >>     when they provide a new array >>     pros: very simple >>     cons: error-prone for the user as it reli

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Gilles Sadowski
> [...] > > 1) do nothing to check the array is the same between calls and blindly > assumes it IS the same. Users would really need to call clearCache > when they provide a new array > pros: very simple > cons: error-prone for the user as it relies only on reading and > under

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
Le 26/09/2010 18:55, Mikkel Meyer Andersen a écrit : > I don't see any season for computing hash value if pointers don't match. If pointers don't match, the two arrays are obviously different, but wen still need to compute the hash to cache it for the check on next call. Luc > Den 2010 9 26 18:5

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen
I don't see any season for computing hash value if pointers don't match. Den 2010 9 26 18:52 skrev "Luc Maisonobe" : > Le 26/09/2010 18:21, Ted Dunning a écrit : >> I was about to write this suggestion down. >> >> Instead, I will simply concur. >> >> +1 >> >> The safe version can do the checksum or

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
Le 26/09/2010 18:21, Ted Dunning a écrit : > I was about to write this suggestion down. > > Instead, I will simply concur. > > +1 > > The safe version can do the checksum or make a copy. Either is safe and the > costs are similar (O(n) on every call and no memory or O(1) on every call > after t

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Ted Dunning
I was about to write this suggestion down. Instead, I will simply concur. +1 The safe version can do the checksum or make a copy. Either is safe and the costs are similar (O(n) on every call and no memory or O(1) on every call after the first, O(n) memory). I would tend toward the copy option.

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen
Hi, Why not simply make 3) the default, but also supply a *IMSureWhatIMDoing-method only implementing 2) (it is so cheap that we might as well do it instead of not doing any check at all). This means that users who read the documentation can gain something when they explicitly ask for it, while us

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
Le 23/09/2010 20:57, Luc Maisonobe a écrit : > Hi, > > Resuming this thread, as I will be soon able to work on this (I hope). > I propose to start by solving > using a simple > partition-based selection algorithm and preserving pivot information. >

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Luc Maisonobe
Le 23/09/2010 22:38, Ted Dunning a écrit : > If you use the median of three or five randomly selected points, the worst > case/average case distinction is not going to come up. You could even use > the first, middle, end and two randomly selected points if you get nutty > about it. I intended to

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Ted Dunning
If you use the median of three or five randomly selected points, the worst case/average case distinction is not going to come up. You could even use the first, middle, end and two randomly selected points if you get nutty about it. On Thu, Sep 23, 2010 at 11:57 AM, Luc Maisonobe wrote: > Partiti

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Luc Maisonobe
Hi, Resuming this thread, as I will be soon able to work on this (I hope). I propose to start by solving using a simple partition-based selection algorithm and preserving pivot information. This would help both the single call use-case (we don't sor

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix
Ted, O(n), not n Expected case is n + n/2 + n/4 ... < 2n Yes, that I am already more incline to buy. Dim. Dimitri Pourbaix * Institut d'Astronomie et d'Astrophysique * Don't worry, be ha

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Mikkel Meyer Andersen
Hi, Just a small pedantry correction: Luc, you wrote: > The evaluation that is only an approximation is the on-line algorithm, > because it does not keep all values in memory. However, even when > everything is in memory one should be aware that the result of the > computation is the sample quant

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Ted Dunning
O(n), not n Expected case is n + n/2 + n/4 ... < 2n On Sat, Sep 18, 2010 at 12:05 PM, Luc Maisonobe wrote: > Le 18/09/2010 20:43, Dimitri Pourbaix a écrit : > > Luc, > > > >>> On the other hand, the speed-up version would no longer > >>> be mathematically correct, as a rough approximate of the p

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 20:43, Dimitri Pourbaix a écrit : > Luc, > >>> On the other hand, the speed-up version would no longer >>> be mathematically correct, as a rough approximate of the pivot would be >>> adopted. >> >> No, the exact pivot is used. Its evaluation is only delayed. It would be >> correct. >

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix
Luc, On the other hand, the speed-up version would no longer be mathematically correct, as a rough approximate of the pivot would be adopted. No, the exact pivot is used. Its evaluation is only delayed. It would be correct. Given a **general** array, I do not think one can identify the media

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 20:01, Dimitri Pourbaix a écrit : > Hi, > > If I understand correctly, the argument against the present implementation > is that if several percentiles are requested, they all require the sorting > of the array. Yes. > On the other hand, the speed-up version would no longer > be ma

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix
Hi, If I understand correctly, the argument against the present implementation is that if several percentiles are requested, they all require the sorting of the array. On the other hand, the speed-up version would no longer be mathematically correct, as a rough approximate of the pivot would be

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 17:39, Luc Maisonobe a écrit : > Le 18/09/2010 17:36, Phil Steitz a écrit : >> On 9/18/10 10:24 AM, Luc Maisonobe wrote: >>> Le 18/09/2010 16:16, Phil Steitz a écrit : On 9/17/10 3:19 PM, Luc Maisonobe wrote: > Le 17/09/2010 19:55, Ted Dunning a écrit : >> There are also

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 17:36, Phil Steitz a écrit : > On 9/18/10 10:24 AM, Luc Maisonobe wrote: >> Le 18/09/2010 16:16, Phil Steitz a écrit : >>> On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : > There are also on-line percentile estimation methods that require >

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Phil Steitz
On 9/18/10 10:24 AM, Luc Maisonobe wrote: Le 18/09/2010 16:16, Phil Steitz a écrit : On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estim

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 16:16, Phil Steitz a écrit : > On 9/17/10 3:19 PM, Luc Maisonobe wrote: >> Le 17/09/2010 19:55, Ted Dunning a écrit : >>> There are also on-line percentile estimation methods that require only a >>> single pass over the data in order to get good estimates of several >>> quantile >>> a

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Phil Steitz
On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estimates of several quantile at the same time. If you are interested in getting an estimat

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning
I just meant that with the nearly correct estimates from the first pass you could do a more accurate job in the second pass, possibly by partitions, possibly by just applying the incremental algorithm a second time. Because it would have good density estimates from the first pass, it would presuma

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning
No. I think I missed you point. My comment about code complexity still applies somewhat. Storing the state of the sort and testing the corner cases can be difficult. On Fri, Sep 17, 2010 at 12:19 PM, Luc Maisonobe wrote: > > Moreover, since sort is n log n without a radix sort, then the > > in

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Luc Maisonobe
Le 17/09/2010 19:55, Ted Dunning a écrit : > There are also on-line percentile estimation methods that require only a > single pass over the data in order to get good estimates of several quantile > at the same time. > > If you are interested in getting an estimate of some quantile of the > underl

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning
There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estimates of several quantile at the same time. If you are interested in getting an estimate of some quantile of the underlying distribution that generated your data then these o

[math] speeding up percentile based statistics

2010-09-17 Thread Luc Maisonobe
Hi all, During a recent face to face discussion with some commons-math users from a large project, it appeared one implementation choice for Percentile statistics was a huge performance bottleneck. When the Percentile.evaluate method is called, the sample array is copied and sorted. In one use ca