Ted: Thanks for the pointer. I've been looking for a replacement for cold for a while because it doesn't allow the merge operation for QuantileBin1D and as you pointed out it is a generally messy library. stream-lib could be the replacement I"m looking for ... if it passes other requirements. Will keep you posted.
Phil: Looks like deriving the KS-stats from first principles may be the best way. IMHO, it is the distribution of extrema of binomial distributions at each discrete quantile of m (best if m < n). That extrema is probably the most efficient to compute with MC simulations. Its an interesting problem, so best of luck. I'm curious about the application that lead to needing the test for small samples. Cheers, -Ajo On Sat, Aug 10, 2013 at 1:26 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > On Sat, Aug 10, 2013 at 8:59 AM, Ajo Fod <ajo....@gmail.com> wrote: > > > If the data doesn't fit, you probably need a StorelessQuantile estimator > > like QuantileBin1D from the colt libraries. Then pick a resolution and do > > the single pass search. > > > > Peripheral to the actual topic, but the Colt libraries are out of date in > almost every respect. When we added unit tests, even the most basic > functions turned up dozens of serious bugs. With respect to more advanced > estimation such as quantiles, nothing in Colt comes close to streamlib. > Even the Mahout on-line estimators are generally superior. > > QuantileBin1D, in particular, lacks the machinery of QDigests (not > suprising since they were published in 2004, long after Colt went dormant). > Check out > > > https://github.com/clearspring/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java > > and the original paper > > http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf >