cut(data, breaks=n) splits the data in n bins of (approximately) the same size.
The used size is obtained by: max(data) - min(data) ------------------------------------ n > x=rnorm(x) > cut(x,breaks=3) [1] (1.79,9.97] (-6.39,1.79] (9.97,18.2] (9.97,18.2] (-6.39,1.79] [6] (1.79,9.97] (-6.39,1.79] (1.79,9.97] (-6.39,1.79] (-6.39,1.79] Levels: (-6.39,1.79] (1.79,9.97] (9.97,18.2] Then you have: > 18.2-9.97 [1] 8.23 > 9.97-1.79 [1] 8.18 > 1.79+6.39 [1] 8.18 > > (max(x)-min(x))/3 [1] 8.164187 I don't know the reasons for the little differences (I am wondering about). I hope it is useful. domenico melissa cline wrote: > Hello, > > I'm trying to bin a quantity into 2-3 bins for calculating entropy and > mutual information. One of the approaches I'm exploring is the cut() > function, which is what the mutualInfo function in binDist uses. When it's > called in the format cut(data, breaks=n), it somehow splits the data into n > distinct bins. Can anyone tell me how cut() decides where to cut? > > Thanks, > > Melissa > > > > --------------------------------------------------------------- > Melissa Cline, Independent Investigator > MCD Biology, UCSC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.