Re: [math] weighted percentile function

2020-07-23 Thread Xeno Amess
pr first, and if this not suitable, they will tell you. 余政倫 于2020年7月23日周四 下午10:03写道: > For some reason, > I need to build a function to compute percentiles of weighted samples. > The only formula I found to estimate weighted percentile is: > [image: image.png] > This formula corres

[math] weighted percentile function

2020-07-23 Thread 余政倫
For some reason, I need to build a function to compute percentiles of weighted samples. The only formula I found to estimate weighted percentile is: [image: image.png] This formula corresponds to R-7 (one of the current rules to estimate with non-weighted samples.) I have implemented it and face

Re: [MATH-1120] Refactor KthSelector and PivotingStrategy out of Percentile

2014-07-12 Thread venkatesha murthy
8 AM, venkatesha murthy < > >> venkateshamurth...@gmail.com> wrote: > >> > >> > >> > >> On Wed, Jun 25, 2014 at 12:54 PM, Luc Maisonobe > >> wrote: > >> > >>> Hi Venkat, > >>> > >>> Le 25/

Re: [MATH-1120] Refactor KthSelector and PivotingStrategy out of Percentile

2014-07-09 Thread Luc Maisonobe
atesha m > wrote: > >> >> >> >> >> On Saturday, 28 June 2014 12:08 AM, venkatesha murthy < >> venkateshamurth...@gmail.com> wrote: >> >> >> >> On Wed, Jun 25, 2014 at 12:54 PM, Luc Maisonobe >> wrote: >> >>> Hi Venkat

Re: [MATH-1120] Refactor KthSelector and PivotingStrategy out of Percentile

2014-07-09 Thread venkatesha murthy
t; > > Hi Venkat, > > > > Le 25/06/2014 06:21, venkatesha murthy a écrit : > > > The Percentile actually uses KthSelector logic and is dependent on only > > > KthSelector > > > however the variability part is pivoting strategy. > > > > &

Re: [MATH-1120] Refactor KthSelector and PivotingStrategy out of Percentile

2014-07-05 Thread venkatesha m
On Saturday, 28 June 2014 12:08 AM, venkatesha murthy wrote: On Wed, Jun 25, 2014 at 12:54 PM, Luc Maisonobe wrote: > Hi Venkat, > > Le 25/06/2014 06:21, venkatesha murthy a écrit : > > The Percentile actually uses KthSelector logic and is dependent on only > > Kt

Re: [MATH-1120] Refactor KthSelector and PivotingStrategy out of Percentile

2014-06-27 Thread venkatesha murthy
On Wed, Jun 25, 2014 at 12:54 PM, Luc Maisonobe wrote: > Hi Venkat, > > Le 25/06/2014 06:21, venkatesha murthy a écrit : > > The Percentile actually uses KthSelector logic and is dependent on only > > KthSelector > > however the variability part is pivoting strate

Re: [MATH-1120] Refactor KthSelector and PivotingStrategy out of Percentile

2014-06-25 Thread Luc Maisonobe
Hi Venkat, Le 25/06/2014 06:21, venkatesha murthy a écrit : > The Percentile actually uses KthSelector logic and is dependent on only > KthSelector > however the variability part is pivoting strategy. > > Given that both KthSelector and Pivoting are independent we could make th

[MATH-1120] Refactor KthSelector and PivotingStrategy out of Percentile

2014-06-24 Thread venkatesha murthy
The Percentile actually uses KthSelector logic and is dependent on only KthSelector however the variability part is pivoting strategy. Given that both KthSelector and Pivoting are independent we could make them as utility classes or may be functions with in MathUtils with exposed interfaces

Re: [Math] MATH-1129 and Re: [MATH-1120] Needed opinion about support on variations in, percentile calculation

2014-06-19 Thread venkatesha murthy
; > > Please find my response below. The new patch has the suggested > > changes in the switch case for nan handling; But; However i have my > > view points on the different default nan strategies associated to > > types. Please permit me to explain (sorry for long summary) &

Re: [Math] MATH-1129 and Re: [MATH-1120] Needed opinion about support on variations in, percentile calculation

2014-06-18 Thread Luc Maisonobe
low. The new patch has the suggested > changes in the switch case for nan handling; But; However i have my > view points on the different default nan strategies associated to > types. Please permit me to explain (sorry for long summary) > > First, i would like to leave the Default imple

Re: [Math] MATH-1129 and Re: [MATH-1120] Needed opinion about support on variations in, percentile calculation

2014-06-18 Thread venkatesha m
es in the switch case for nan handling; But; However i have my view points on the different default nan strategies associated to types. Please permit me to explain (sorry for long summary) First, i would like to leave the Default implementation of Percentile as-is (Meaning in my MATH-1120 pat

Re: [Math] MATH-1129 and Re: [MATH-1120] Needed opinion about support on variations in, percentile calculation

2014-06-18 Thread Gilles
rformances. That's fine with me. I was really waiting from someone else to apply this patch. ;-) Gilles What is the meaning of percentile when NaNs are kept in the data (strategy "FIXED")? For all the other strategies, the NaNs will not cause a problem within the algorithms b

Re: [Math] MATH-1129 and Re: [MATH-1120] Needed opinion about support on variations in, percentile calculation

2014-06-18 Thread Luc Maisonobe
end); I can put the insertion sort back here while editing the patch before committing it. > IMHO, it is too many changes at once... You know the popular saying about « premature optimization ». I think it would be fair to temporarily forget about performance, fix the issue and set up a new grou

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-06-12 Thread venkatesha murthy
n 11, 2014 at 4:10 PM, Gilles > wrote: > >> On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote: >> >>> Hi All, >>> >>> I am looking for opinion on the name of the enum for the various >>> estimation strategies. >>> This is a pu

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-06-11 Thread venkatesha murthy
On Wed, Jun 11, 2014 at 4:10 PM, Gilles wrote: > On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote: > >> Hi All, >> >> I am looking for opinion on the name of the enum for the various >> estimation strategies. >> This is a public static enum un

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-06-11 Thread Gilles
3:57 +0800 (SGT), venkatesha m wrote: Hi All, I am looking for opinion on the name of the enum for the various estimation strategies. This is a public static enum under Percentile and i wish to call it EstimationTecnique. Would appreciate if you can provide feedback on the name or the current pro

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-06-11 Thread Schalk W . Cronjé
ing for opinion on the name of the enum for the various >> estimation strategies. >> This is a public static enum under Percentile and i wish to call it >> EstimationTecnique. >> Would appreciate if you can provide feedback on the name or the >> current pro

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-06-11 Thread Gilles
On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote: Hi All, I am looking for opinion on the name of the enum for the various estimation strategies. This is a public static enum under Percentile and i wish to call it EstimationTecnique. Would appreciate if you can provide feedback on

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-06-09 Thread venkatesha m
Hi All, I am looking for opinion on the name of the enum for the various estimation strategies. This is a public static enum under Percentile and i wish to call it EstimationTecnique. Would appreciate if you can provide feedback on the name or the current proposed name is fine. I have the

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-06-01 Thread venkatesha murthy
I have gone through Wikipedia and R functions to get an understanding. My idea is to come up with different estimation techniques as strategies (Enums) and constrction inject during percentile object creation. The evaluate method could then use this estimation tecnhique to complete the

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-05-21 Thread venkatesha murthy
:26 -0700, Phil Steitz wrote: > >> On 5/21/14, 12:18 PM, venkatesha murthy wrote: > >>> Hi All, > >>> > >>> The existing Percentile class calculates the percentile based on > >>> the > >>> quantile position of the array fixed as >

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-05-21 Thread Phil Steitz
On 5/21/14, 1:43 PM, Gilles wrote: > On Wed, 21 May 2014 13:16:26 -0700, Phil Steitz wrote: >> On 5/21/14, 12:18 PM, venkatesha murthy wrote: >>> Hi All, >>> >>> The existing Percentile class calculates the percentile based on >>> the >>> qua

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-05-21 Thread Gilles
On Wed, 21 May 2014 13:16:26 -0700, Phil Steitz wrote: On 5/21/14, 12:18 PM, venkatesha murthy wrote: Hi All, The existing Percentile class calculates the percentile based on the quantile position of the array fixed as p * (N+1)/100 for a pth Percentile on an Array of size N. However if we

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-05-21 Thread Phil Steitz
On 5/21/14, 12:18 PM, venkatesha murthy wrote: > Hi All, > > The existing Percentile class calculates the percentile based on the > quantile position of the array fixed as > p * (N+1)/100 for a pth Percentile on an Array of size N. However if we > were to add these number

RE: [MATH-1120] Needed opinion about support on variations in percentile calculation

2014-05-21 Thread Patrick Meyer
There are various ways to compute percentiles and statistical programs use different methods by default. I would prefer that we do like R and provide multiple options for the type of percentile computation. See http://stat.ethz.ch/R-manual/R-patched/library/stats/html/quantile.html Patrick

[MATH-1120] Needed opinion about support on variations in percentile calculation

2014-05-21 Thread venkatesha murthy
Hi All, The existing Percentile class calculates the percentile based on the quantile position of the array fixed as p * (N+1)/100 for a pth Percentile on an Array of size N. However if we were to add these numbers in MS Excel to calculate the percentile it provides a different result and closely

[MATH-1120] Needed opinion about support on variations in percentile calculation

2014-05-21 Thread venkatesha m
Hi All, The existing Percentile class calculates the percentile based on the quantile position of the array fixed as p * (N+1)/100 for a pth Percentile on an Array of size N. However if we were to add these numbers in MS Excel to calculate the percentile it provides a different result and

Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data

2014-03-23 Thread Ted Dunning
On Sun, Mar 23, 2014 at 2:09 AM, Thomas Neidhart wrote: > > There is already an issue for this: > > https://issues.apache.org/jira/browse/MATH-418 > > It links also other implementations and algorithms, maybe you could add > a link to your's as well? > Done. Thanks for the pointer.

Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data

2014-03-23 Thread Thomas Neidhart
On 03/22/2014 11:31 PM, Ted Dunning wrote: > Murthy, > > I recently developed an alternative algorithm which provides superior > accuracy for extreme quantiles. You can read more at > > https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf?raw=true > > The library invol

Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data

2014-03-22 Thread Ted Dunning
, 2:11 PM, venkatesha m wrote: > > Hi, > > > > I would like to propose for adding new way of computing the percentile > without needing to store most of input data. Since this is my first time > on contributing to apache; please help me / correct me if i miss any

Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data

2014-03-22 Thread Phil Steitz
On 3/22/14, 2:11 PM, venkatesha m wrote: > Hi, > > I would like to propose for adding new way of computing the percentile > without needing to store most of input data. Since this is my first time on > contributing to apache; please help me / correct me if i miss any procedure &

[math] Proposal for New way of Computing an approximate Percentile without storing input data

2014-03-22 Thread venkatesha m
Hi, I would like to propose for adding new way of computing the percentile without needing to store most of input data.  Since this is my first time on contributing to apache; please help me / correct me if i miss any procedure here. Here are the details. Description:  The Percentile

Re: [math] Method min, max, percentile in StatUtils support int[], long[], etc

2013-10-17 Thread Karl Yang
pt integer arrays. 2013/10/18 Phil Steitz > On 10/16/13 6:36 PM, 杨栓 wrote: > > Hi, > > > > I always need to compute min value, max value, percentile values of > > millions data, these data may are int, long or double. But class > StatUtils only > > supports doubl

Re: [math] Method min, max, percentile in StatUtils support int[], long[], etc

2013-10-17 Thread Phil Steitz
On 10/16/13 6:36 PM, 杨栓 wrote: > Hi, > > I always need to compute min value, max value, percentile values of > millions data, these data may are int, long or double. But class StatUtils > only > supports double[], I need to convert int[], long[] to double[]. Shall we &g

Re: [math] Method min, max, percentile in StatUtils support int[], long[], etc

2013-10-16 Thread Anshul Zunke
+1 On Thu, Oct 17, 2013 at 7:06 AM, 杨栓 wrote: > Hi, > > I always need to compute min value, max value, percentile values of > millions data, these data may are int, long or double. But class StatUtils > only > supports double[], I need to convert int[], long[] to double[]

[math] Method min, max, percentile in StatUtils support int[], long[], etc

2013-10-16 Thread 杨栓
Hi, I always need to compute min value, max value, percentile values of millions data, these data may are int, long or double. But class StatUtils only supports double[], I need to convert int[], long[] to double[]. Shall we add a feature that StatUtils supports any other numeric type?

[Math] Percentile (low number of samples)

2012-07-23 Thread Gilles Sadowski
Hi. It could be considered strange that CM reports 0.2 (instead of some value close to 1) as the 10-percentile of the { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 } array. Quoting the Javadoc: --- For large samples, the different methods agree closely, but when sample sizes are small, different methods

Re: [math] speeding up percentile based statistics

2010-11-03 Thread Leanne Guy
On 9/27/10 09:55 PM, luc.maison...@free.fr wrote: - "Gilles Sadowski" a écrit : [...] ---CUT--- double[] a = new double[] {1, 2, 3}; double[] b = new double[] {1, 2, 3, 4, 5}; Percentile pA = new Percentile(a); Percentile pB = new Percentile(b); doubl

Re: (math] Percentile performance improvement

2010-11-03 Thread Leanne Guy
i all, The new implementation of Percentile improving performances has been checked in. It should solve issue <https://issues.apache.org/jira/browse/MATH-417>. As discussed here, the implementation is based on a selection algorithm instead of a complete sort. A setData method has also been a

(math] Percentile performance improvement

2010-10-10 Thread Luc Maisonobe
Hi all, The new implementation of Percentile improving performances has been checked in. It should solve issue <https://issues.apache.org/jira/browse/MATH-417>. As discussed here, the implementation is based on a selection algorithm instead of a complete sort. A setData method has als

Re: [math] speeding up percentile based statistics

2010-10-03 Thread Phil Steitz
cking 4) remove the double[] values parameter from the API and use a separate addValues method, hence the class will reset its cache only when addValues is called pros: works in all cases cons: Percentile could not implement UnivariateStatistic anymore My preference is c

Re: [math] speeding up percentile based statistics

2010-10-03 Thread Luc Maisonobe
nt using an hash code. Users would generally >>>>>> don't >>>>>> need to call clearCache at all so it would not be provided >>>>>> pros: works in all cases >>>>>> cons: add linear cost for array checking >>>>&g

Re: [math] speeding up percentile based statistics

2010-09-27 Thread luc . maisonobe
- "Gilles Sadowski" a écrit : > > [...] > > >> > > >> ---CUT--- > > >>double[] a = new double[] {1, 2, 3}; > > >> double[] b = new double[] {1, 2, 3, 4, 5}; > > >> Percentile pA = new Percentil

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Gilles Sadowski
> [...] > >> > >> ---CUT--- > >>double[] a = new double[] {1, 2, 3}; > >>double[] b = new double[] {1, 2, 3, 4, 5}; > >>Percentile pA = new Percentile(a); > >>Percentile pB = new Percentile(b); > >> > &g

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
ot be provided >>>>> pros: works in all cases >>>>> cons: add linear cost for array checking >>>>> >>>>> 4) remove the double[] values parameter from the API and use a >>>>> separate >>>>> addValues met

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz
when addValues is called pros: works in all cases cons: Percentile could not implement UnivariateStatistic anymore My preference is choice 2. What do you think ? IIUC, the interface method ("evaluate") was designed around the assumption that successive calls are indepen

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
g when they >>>> explicitly ask for it, while users who don't read the docs are still >>>> safe. >>>> >>>> Cheers, Mikkel. >>>> >>>> 2010/9/26 Luc Maisonobe: >>>>> Le 23/09/2010 20:57, Luc Ma

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz
n all cases cons: Percentile could not implement UnivariateStatistic anymore My preference is choice 2. What do you think ? IIUC, the interface method ("evaluate") was designed around the assumption that successive calls are independent: the array argument is not encapsula

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
n all cases >>> cons: add linear cost for array checking >>> >>> 4) remove the double[] values parameter from the API and use a separate >>> addValues method, hence the class will reset its cache only when >>> addValues is called >&

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Phil Steitz
o on subsequent calls work already done in the first calls. The current API of the Percentile method is derived from AbstractUnivariateStatistic and provides several evaluate methods that have the data array as a parameter. This is compatible with the one-call use case without any semantic change, but

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen
and use a separate >>     addValues method, hence the class will reset its cache only when >>     addValues is called >>     pros: works in all cases >>     cons: Percentile could not implement UnivariateStatistic anymore >> >> My preference is choice 2. >>

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Gilles Sadowski
he at all so it would not be provided > pros: works in all cases > cons: add linear cost for array checking > > 4) remove the double[] values parameter from the API and use a separate > addValues method, hence the class will reset its cache only when > addValues is

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
heers, Mikkel. >>>> >>>> 2010/9/26 Luc Maisonobe : >>>>> Le 23/09/2010 20:57, Luc Maisonobe a écrit : >>>>>> Hi, >>>>>> >>>>>> Resuming this thread, as I will be soon able to work on this (I hope). >>&

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen
t;>>>> Hi, >>>>> >>>>> Resuming this thread, as I will be soon able to work on this (I hope). >>>>> I propose to start by solving >>>>> <https://issues.apache.org/jira/browse/MATH-417> using a simple >>>>&g

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
pivot information. >>>> This would help both the single call use-case (we don't sort everything) >>>> and the multiple call case (we don't redo on subsequent calls work >>>> already done in the first calls. >>> >>> The current API of the Perce

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Ted Dunning
n. > >> This would help both the single call use-case (we don't sort everything) > >> and the multiple call case (we don't redo on subsequent calls work > >> already done in the first calls. > > > > The current API of the Percentile method is derived fro

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Mikkel Meyer Andersen
nt calls work >> already done in the first calls. > > The current API of the Percentile method is derived from > AbstractUnivariateStatistic and provides several evaluate methods that > have the data array as a parameter. > > This is compatible with the one-call use case withou

Re: [math] speeding up percentile based statistics

2010-09-26 Thread Luc Maisonobe
erving pivot information. > This would help both the single call use-case (we don't sort everything) > and the multiple call case (we don't redo on subsequent calls work > already done in the first calls. The current API of the Percentile method is derived from AbstractUnivar

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Luc Maisonobe
Le 23/09/2010 22:38, Ted Dunning a écrit : > If you use the median of three or five randomly selected points, the worst > case/average case distinction is not going to come up. You could even use > the first, middle, end and two randomly selected points if you get nutty > about it. I intended to

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Ted Dunning
If you use the median of three or five randomly selected points, the worst case/average case distinction is not going to come up. You could even use the first, middle, end and two randomly selected points if you get nutty about it. On Thu, Sep 23, 2010 at 11:57 AM, Luc Maisonobe wrote: > Partiti

Re: [math] speeding up percentile based statistics

2010-09-23 Thread Luc Maisonobe
Hi, Resuming this thread, as I will be soon able to work on this (I hope). I propose to start by solving using a simple partition-based selection algorithm and preserving pivot information. This would help both the single call use-case (we don't sor

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix
Ted, O(n), not n Expected case is n + n/2 + n/4 ... < 2n Yes, that I am already more incline to buy. Dim. Dimitri Pourbaix * Institut d'Astronomie et d'Astrophysique * Don't worry, be ha

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Mikkel Meyer Andersen
Hi, Just a small pedantry correction: Luc, you wrote: > The evaluation that is only an approximation is the on-line algorithm, > because it does not keep all values in memory. However, even when > everything is in memory one should be aware that the result of the > computation is the sample quant

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Ted Dunning
O(n), not n Expected case is n + n/2 + n/4 ... < 2n On Sat, Sep 18, 2010 at 12:05 PM, Luc Maisonobe wrote: > Le 18/09/2010 20:43, Dimitri Pourbaix a écrit : > > Luc, > > > >>> On the other hand, the speed-up version would no longer > >>> be mathematically correct, as a rough approximate of the p

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 20:43, Dimitri Pourbaix a écrit : > Luc, > >>> On the other hand, the speed-up version would no longer >>> be mathematically correct, as a rough approximate of the pivot would be >>> adopted. >> >> No, the exact pivot is used. Its evaluation is only delayed. It would be >> correct. >

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix
Luc, On the other hand, the speed-up version would no longer be mathematically correct, as a rough approximate of the pivot would be adopted. No, the exact pivot is used. Its evaluation is only delayed. It would be correct. Given a **general** array, I do not think one can identify the media

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
factors). > There is a compromise which would satisfy both sides, is not > there? What about sorting and storing the sorted version of the array, > once for all at the constructor level? That is what one would do anyway > if speed was an issue (in one application, I need 5, 50, and 95-pe

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Dimitri Pourbaix
adopted. There is a compromise which would satisfy both sides, is not there? What about sorting and storing the sorted version of the array, once for all at the constructor level? That is what one would do anyway if speed was an issue (in one application, I need 5, 50, and 95-percentile of the

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
, Ted Dunning a écrit : >>>>>> There are also on-line percentile estimation methods that require >>>>>> only a >>>>>> single pass over the data in order to get good estimates of several >>>>>> quantile >>>>>> a

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 17:36, Phil Steitz a écrit : > On 9/18/10 10:24 AM, Luc Maisonobe wrote: >> Le 18/09/2010 16:16, Phil Steitz a écrit : >>> On 9/17/10 3:19 PM, Luc Maisonobe wrote: >>>> Le 17/09/2010 19:55, Ted Dunning a écrit : >>>>> There are also on-li

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Phil Steitz
On 9/18/10 10:24 AM, Luc Maisonobe wrote: Le 18/09/2010 16:16, Phil Steitz a écrit : On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : There are also on-line percentile estimation methods that require only a single pass over the data in order to get good

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Luc Maisonobe
Le 18/09/2010 16:16, Phil Steitz a écrit : > On 9/17/10 3:19 PM, Luc Maisonobe wrote: >> Le 17/09/2010 19:55, Ted Dunning a écrit : >>> There are also on-line percentile estimation methods that require only a >>> single pass over the data in order to get good estima

Re: [math] speeding up percentile based statistics

2010-09-18 Thread Phil Steitz
On 9/17/10 3:19 PM, Luc Maisonobe wrote: Le 17/09/2010 19:55, Ted Dunning a écrit : There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estimates of several quantile at the same time. If you are interested in getting an

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning
I just meant that with the nearly correct estimates from the first pass you could do a more accurate job in the second pass, possibly by partitions, possibly by just applying the incremental algorithm a second time. Because it would have good density estimates from the first pass, it would presuma

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning
No. I think I missed you point. My comment about code complexity still applies somewhat. Storing the state of the sort and testing the corner cases can be difficult. On Fri, Sep 17, 2010 at 12:19 PM, Luc Maisonobe wrote: > > Moreover, since sort is n log n without a radix sort, then the > > in

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Luc Maisonobe
Le 17/09/2010 19:55, Ted Dunning a écrit : > There are also on-line percentile estimation methods that require only a > single pass over the data in order to get good estimates of several quantile > at the same time. > > If you are interested in getting an estimate of some

Re: [math] speeding up percentile based statistics

2010-09-17 Thread Ted Dunning
There are also on-line percentile estimation methods that require only a single pass over the data in order to get good estimates of several quantile at the same time. If you are interested in getting an estimate of some quantile of the underlying distribution that generated your data then these

[math] speeding up percentile based statistics

2010-09-17 Thread Luc Maisonobe
Hi all, During a recent face to face discussion with some commons-math users from a large project, it appeared one implementation choice for Percentile statistics was a huge performance bottleneck. When the Percentile.evaluate method is called, the sample array is copied and sorted. In one use

Re: Percentile

2007-10-15 Thread Bradford Cross
Did u take a look at the thread about my rolling statistics? if not i will forward it. for the rolling percentile, i needed a sorted deque. for the first pass, i just wrapped the ResizeableDoubleArray from commons math together with a LinkedList using the Collections.BinarySearch() for inserts

Re: Percentile

2007-10-15 Thread Hanson Char
this weekend as part of my work on Rolling > statistics (RollingPercentile.) I need to submit the patch this week so that > we can commit the changes - but there is some pending discussion about API > and high level design. > > I agree that we should also replace the algorithm for Per

Re: Percentile

2007-10-15 Thread Bradford Cross
for Percentile. Would you care to join me? ;-) On 10/15/07, Hanson Char <[EMAIL PROTECTED]> wrote: > > The current implementation of Percentile relies on sorting the > underlying array. There is much faster way like the use of Hoare's > partitioning that would take only l

Percentile

2007-10-15 Thread Hanson Char
The current implementation of Percentile relies on sorting the underlying array. There is much faster way like the use of Hoare's partitioning that would take only linear instead of n*ln(n) time. Has such improvement be considered before ? Any reason not to do so ? Cheers, Hanson