Ted: Thanks for the pointer. I've been looking for a replacement for cold
for a while because it doesn't allow the merge operation for QuantileBin1D
and as you pointed out it is a generally messy library. stream-lib could be
the replacement I"m looking for ... if it passes other requirements. Will
keep you posted.

Phil: Looks like deriving the KS-stats from first principles may be the
best way. IMHO, it is the distribution of extrema of binomial distributions
at each discrete quantile of m (best if m < n). That extrema is probably
the most efficient to compute with MC simulations. Its an interesting
problem, so best of luck.

I'm curious about the application that lead to needing the test for small
samples.

Cheers,
-Ajo


On Sat, Aug 10, 2013 at 1:26 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> On Sat, Aug 10, 2013 at 8:59 AM, Ajo Fod <ajo....@gmail.com> wrote:
>
> > If the data doesn't fit, you probably need a StorelessQuantile estimator
> > like QuantileBin1D from the colt libraries. Then pick a resolution and do
> > the single pass search.
> >
>
> Peripheral to the actual topic, but the Colt libraries are out of date in
> almost every respect.  When we added unit tests, even the most basic
> functions turned up dozens of serious bugs.  With respect to more advanced
> estimation such as quantiles, nothing in Colt comes close to streamlib.
>  Even the Mahout on-line estimators are generally superior.
>
> QuantileBin1D, in particular, lacks the machinery of QDigests (not
> suprising since they were published in 2004, long after Colt went dormant).
>  Check out
>
>
> https://github.com/clearspring/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java
>
> and the original paper
>
> http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
>

Reply via email to