Dear list members, I’m looking for a way to divide numbers into simple (i.e., integer-valued) intervals, and thought the ‘cut’ function in ‘base’ or the ‘cut2’ function in ‘Hmisc’ would, er, cut it. However, they seem to give rather surprising results.
Since I want the endpoints of the intervals to be integers, I used the ‘dig.lab’ and ‘digits’ arguments. One assumption I made: If the number x gets the label (a, b], then x lies in the interval (a, b]. It turns out that this assumption was incorrect. Example: $ cut(c(20.8, 21.3, 21.7, 23, 25), 2, dig.lab=1) [1] (21,23] (21,23] (21,23] (23,25] (23,25] Levels: (21,23] (23,25] So the first number, 20.8, get put in the interval (21,23], which seem strange. I can see why this could happen, though, as perhaps the 20.8 is rounded to 21 before binning. But it’s even stranger that the *integer* 23 is put in in the interval (23,25] instead of in the interval (21,23]. Can anyone explain why? I then turned to ‘cut2’ in ‘Hmisc’. But again I was surprised by the result: $ cut2(c(20.8, 21.3, 21.7, 23), g=2, digits=1) [1] [21,22) [21,22) [22,23] [22,23] Levels: [21,22) [22,23] Again 20.8 is placed in an interval that doesn’t mathematically contain it. And 21.3 and 21.7 are placed in *different* intervals, instead of both being placed in the interval [21,22). This may perhaps strictly not be a bug, but it’s certainly surprising behaviour! Since obviously none of the two functions do what I require them to do, is there a different function that does, hidden deep inside some R package? This function should take as input a vector of numbers, and output a vector of non-overlapping (but ‘touching’) intervals with integer end-points so that each number is in exactly one interval. It should of course also include information on which interval each number belongs to. Version information (though I also observe this on R 2.13.1 on Windows): $ sessionInfo() R version 2.13.1 Patched (2011-07-25 r56494) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=nn_NO.UTF-8 LC_NUMERIC=C [3] LC_TIME=nn_NO.UTF-8 LC_COLLATE=nn_NO.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=nn_NO.UTF-8 [7] LC_PAPER=nn_NO.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.14.0 grid_2.13.1 lattice_0.19-30 -- Karl Ove Hufthammer ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.