melissa cline wrote: > Hello, > > I'm trying to bin a quantity into 2-3 bins for calculating entropy and > mutual information. One of the approaches I'm exploring is the cut() > function, which is what the mutualInfo function in binDist uses. When it's > called in the format cut(data, breaks=n), it somehow splits the data into n > distinct bins. Can anyone tell me how cut() decides where to cut? > > This is one case where reading the actual R code is easier that explaining what it does. From cut.default
if (length(breaks) == 1) { if (is.na(breaks) | breaks < 2) stop("invalid number of intervals") nb <- as.integer(breaks + 1) dx <- diff(rx <- range(x, na.rm = TRUE)) if (dx == 0) dx <- rx[1] breaks <- seq.int(rx[1] - dx/1000, rx[2] + dx/1000, length.out = nb) } so basically it takes the range, extends it a bit and splits in into <breaks> equally long segments. (For the sometimes more attractive option of splitting into groups of roughly equal size, there is cut2 in the Hmisc package, or use quantile()) -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.