On Wed, Jun 18, 2008 at 12:10:18AM +0100, S. Nunes wrote: > Thanks for the suggestion, however I'm looking for a score since my > goal is to rank thousands of distributions. > For instance, given a large text, I would like to rank all terms > according to their distribution (dispersion) within the text. > > Terms evenly distributed in the text should have a low score. Terms > following an uneven distribution should rank higher.
as a perhaps rather rough-and-ready approach, you could look at the variance of the difference series, considering your example: > Thanks again, > -- > S?rgio Nunes > > 2008/6/17 Moshe Olshansky <[EMAIL PROTECTED]>: > > You could also look at the difference between your empirical distribution > > and the uniform distribution (something like Kolmogorov-Smirnov test). > > > > > > --- On Tue, 17/6/08, S. Nunes <[EMAIL PROTECTED]> wrote: [...] > >> An example: > >> > >> [0; 0.2; 0.4; 0.6; 0.8; 1] - function should be ~ 0 > var(diff(c(0, 0.2, 0.4, 0.6, 0.8, 1))) [1] 2.311116e-33 (that's 0 obviously, with some error due to floating point processing) > >> [0; 0.1; 0.1; 0.15; 1] - function should be > 1 > var(diff(c(0, 0.1, 0.1, 0.15, 1))) [1] 0.1616667 Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | email: [EMAIL PROTECTED] | | WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.