Re: [R] Trying to understand cut

peter dalgaard Sun, 17 Apr 2016 00:38:20 -0700

This isn't really FAQ 7.31 (for once). 

The clue is in this part of cut.default():


            breaks <- seq.int(rx[1L], rx[2L], length.out = nb)
            breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] + 
                dx/1000)

which _is_ as documented. Notice that it is based on the range(values) which in 
your example is 0-99.9, so the thing boils down to

> rx <- range(values)
> dx <- diff(rx)
> dx
[1] 99.9
> nb <- 11
> breaks <- seq.int(rx[1L], rx[2L], length.out = nb)
> breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] + 
+                 dx/1000)
>  breaks
 [1] -0.0999  9.9900 19.9800 29.9700 39.9600 49.9500 59.9400 69.9300 79.9200
[10] 89.9100 99.9999

Notice that all the breakpoints have a nonzero 2nd decimal digit, which none of 
your data  have, so no data are on interval boundaries and left/right and 
include.lowest have no effect. There's a little fuzz at the ends to prevent the 
extremes from being excluded without having to explicitly set 
include.lowest=TRUE.

Short version: If you want fine control over the cutpoints, do not use 
cut(x,n)...

-pd

PS: To read the FAQ, go to www.r-project.org, and click "FAQs" (under 
Documentation, to the left).


> On 17 Apr 2016, at 06:12 , John Sorkin <jsor...@grecc.umaryland.edu> wrote:
> 
> Jeff,
> Perhaps I was sloppy with my notation:
> I want groups
>> =0 <10
>> =10 <20
>> =20<30
> ......
>> =90 <100
> 
> In any event, my question remains, why did the four different versions of cut 
> give me the same results? I hope someone can explain to me the function of 
> include.lowest and right in the call to cut. As demonstrated in my example 
> below, the parameters do not seem to alter the results of using cut.
> Thank you,
> John
> 
> 
> P.S. How do I find FAQ 7.31?
> Thank you,
> John
> 
> I 
> 
> 
> 
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
>>>> Jeff Newmiller <jdnew...@dcn.davis.ca.us> 04/16/16 11:07 PM >>>
> Have you read FAQ 7.31 recently, John? Your whole premise is flawed. You 
> should be thinking of ranges [0,10), [10,20), and so on because numbers 
> ending in 0.9 are never going to be exact. 
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> 
> On April 16, 2016 7:38:50 PM PDT, John Sorkin <jsor...@grecc.umaryland.edu> 
> wrote:
> I am trying to understand cut so I can divide a list of numbers into 10 group:
>  0-9.0
> 10-10.9
> 20-20.9
> 30-30.9,
> 40-40.9,
> 50-50.9
> 60-60.9
> 70-70.9
> 80-80.9
> 90-90.9
> 
> As I try to do this, I have been playing with the cut function. Surprising 
> the following for applications of cut give me the exact same groups. This 
> surprises me given that I have varied parameters include.lowest and right. 
> Can someone help me understand what include.lowest and right do? I have 
> looked at the help page, but I don't seem to understand what I am being told!
> Thank you,
> John
> 
> values <- c((0:99),c(0.9:99.9))
> sort(values)
> c1<-cut(values,10,include.lowest=FALSE,right=TRUE)
> c2<-cut(values,10,include.lowest=FALSE,right=FALSE)
> c3<-cut(values,10,include.lowest=TRUE,right=TRUE)
> c4<-cut(values,10,include.lowest=TRUE,right=FALSE)
> cbind(min=aggregate(values,list(c1),min),max=aggregate(values,list(c1),max))
> cbind(min=aggregate(values,list(c2),min),max=aggregate(values,list(c2),max))
> cbind(min=aggregate(values,list(c3),min),max=aggregate(values,list(c3),max))
> cbind(min=aggregate(values,list(c4),min),max=aggregate(values,list(c4),max))
> 
> You can run the code below, or inspect the results I got which are reproduced 
> below:
> 
> cbind(min=aggregate(values,list(c1),min),max=aggregate(values,list(c1),max))
> 
>      min.Group.1 min.x    max.Group.1 max.x
> 1  (-0.0999,9.91]     0 (-0.0999,9.91]   9.9
> 2     (9.91,19.9]    10    (9.91,19.9]  19.9
> 3     (19.9,29.9]    20    (19.9,29.9]  29.9
> 4     (29.9,39.9]    30    (29.9,39.9]  39.9
> 5       (39.9,50]    40      (39.9,50]  49.9
> 6         (50,60]    50        (50,60]  59.9
> 7         (60,70]    60        (60,70]  69.9
> 8         (70,80]    70        (70,80]  79.9
> 9         (80,90]    80        (80,90]  89.9
> 10       (90,100]    90       (90,100]  99.9
> cbind(min=aggregate(values,list(c2),min),max=aggregate(values,list(c2),max))
> 
>      min.Group.1 min.x    max.Group.1 max.x
> 1  [-0.0999,9.91)     0 [-0.0999,9.91)   9.9
> 2     [9.91,19.9)    10    [9.91,19.9)  19.9
> 3     [19.9,29.9)    20    [19.9,29.9)  29.9
> 4     [29.9,39.9)    30    [29.9,39.9)  39.9
> 5       [39.9,50)    40      [39.9,50)  49.9
> 6         [50,60)    50        [50,60)  59.9
> 7         [60,70)    60        [60,70)  69.9
> 8         [70,80)    70        [70,80)  79.9
> 9         [80,90)    80        [80,90)  89.9
> 10       [90,100)    90       [90,100)  99.9
> cbind(min=aggregate(values,list(c3),min),max=aggregate(values,list(c3),max))
> 
>      min.Group.1 min.x    max.Group.1 max.x
> 1  [-0.0999,9.91]     0 [-0.0999,9.91]   9.9
> 2     (9.91,19.9]    10    (9.91,19.9]  19.9
> 3     (19.9,29.9]    20    (19.9,29.9]  29.9
> 4     (29.9,39.9]    30    (29.9,39.9]  39.9
> 5       (39.9,50]    40      (39.9,50]  49.9
> 6         (50,60]    50        (50,60]  59.9
> 7         (60,70]    60        (60,70]  69.9
> 8         (70,80]    70        (70,80]  79.9
> 9         (80,90]    80        (80,90]  89.9
> 10       (90,100]    90       (90,100]  99.9
> cbind(min=aggregate(values,list(c4),min),max=aggregate(values,list(c4),max))
> 
>      min.Group.1 min.x    max.Group.1 max.x
> 1 [-0.0999,9.91)     0 [-0.0999,9.91)   9.9
> 2     [9.91,19.9)    10    [9.91,19.9)  19.9
> 3     [19.9,29.9)    20    [19.9,29.9)  29.9
> 4     [29.9,39.9)    30    [29.9,39.9)  39.9
> 5       [39.9,50)    40      [39.9,50)  49.9
> 6         [50,60)    50        [50,60)  59.9
> 7         [60,70)    60        [60,70)  69.9
> 8         [70,80)    70        [70,80)  79.9
> 9         [80,90)    80        [80,90)  89.9
> 10       [90,100]    90       [90,100]  99.9
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
> 
> Confidentiality Statement:
> This email message, including any attachments, isfor the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized use, disclosure or distribution is prohibited. 
> If you are not the intended recipient, please contact the sender by reply 
> email and destroy all copies of the original message. 
> 
> 
> 
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Confidentiality Statement:
> This email message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized use, disclosure or distribution is prohibited. 
> If you are not the intended recipient, please contact the sender by reply 
> email and destroy all copies of the original message. 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to understand cut

Reply via email to