Re: [R] Measuring dispersion

Jim Lemon Wed, 18 Jun 2008 04:40:47 -0700

S. Nunes wrote:

Thanks for the suggestion, however I'm looking for a score since my
goal is to rank thousands of distributions.
For instance, given a large text, I would like to rank all terms
according to their distribution (dispersion) within the text.


Terms evenly distributed in the text should have a low score. Terms
following an uneven distribution should rank higher.

Hi Sergio,

para1<-"If you just want an index of the uniformity of the distributionof words within a given block of text, one method is to take thevariance of the differences between the indices."

para2<-"As an example, consider the distribution of the word the in thissentence and the one above by taking the two variances of thedifferences between the indices."


# imagine that the paragraphs are stored as two character strings
splitpara1<-unlist(strsplit(para1," "))
splitpara2<-unlist(strsplit(para2," "))
paraindex1<-which(splitpara1%in%"the")
paraindex2<-which(splitpara2%in%"the")
para1var<-var(diff(paraindex1))
para2var<-var(diff(paraindex2))
para1var
para2var

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Measuring dispersion

Reply via email to