Re: [R] grouping

Val Tue, 03 Apr 2012 11:23:37 -0700

Hi All,

On the same data  points
x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )


I want to have have the following output  as data frame

x       group   group mean
46       1        42.3
125     2        89.6
36       1        42.3
193     3        235.25
209     3        235.25
78       2        89.6
66       2        89.6
242     3        235.25
297     3        235.25
45       1        42.3

I tried the following code


dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))))
gxc <- with(dat, tapply(xc, group, mean))
dat$gxc <- gxce[as.character(dat$group)]
txc=dat$gxc

it did not work for me.













On Tue, Apr 3, 2012 at 10:15 AM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On Apr 3, 2012, at 10:11 AM, Val wrote:
>
> David W and all,
>
> Thank you very much for your help.
>
> Here is the final output that I want in the form of data frame. The data
> frame should contain  x, group and group_ mean in the following way
>
> x       group   group mean
> 46       1        42.3
> 125     2        89.6
> 36       1        42.3
> 193     3        235.25
> 209     3        235.25
> 78       2        89.6
> 66       2        89.6
> 242     3        235.25
> 297     3        235.25
> 45       1        42.3
>
>
> I you want group means in a vector the same length as x then instead of
> using tapply as done in earlier solutions you should use `ave`.
>
> --
> DW
>
>
>
> Thanks a lot
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius <dwinsem...@comcast.net>wrote:
>
>>
>> On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:
>>
>>  Use cut2 as I suggested and David demonstrated.
>>>
>>
>> Agree that Hmisc::cut2 is extremely handy and I also like that fact that
>> the closed ends of intervals are on the left side (which is not the same
>> behavior as cut()), which has the otehr effect of setting include.lowest =
>> TRUE which is not the default for cut() either (to my continued amazement).
>>
>> But let me add the method I use when doing it "by hand":
>>
>> cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)
>>
>> --
>> David.
>>
>>
>>
>>
>>> Michael
>>>
>>> On Tue, Apr 3, 2012 at 9:31 AM, Val <valkr...@gmail.com> wrote:
>>>
>>>> Thank you all (David, Michael, Giovanni)  for your prompt response.
>>>>
>>>> First there was a typo error for the group mean it was 89.6 not 87.
>>>>
>>>> For a small data set and few groupings I can use  prob=c(0, .333, .66
>>>> ,1) to
>>>> group in to three groups in this case. However,  if I want to extend the
>>>> number of groupings say 10 or 15 then do I have to figure it out the
>>>>  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>>>>
>>>> Is there a short cut for that?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>>>> <michael.weyla...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Ignoring the fact your desired answers are wrong, I'd split the
>>>>> separating part and the group means parts into three steps:
>>>>>
>>>>> i) quantile() can help you get the split points,
>>>>> ii)  findInterval() can assign each y to a group
>>>>> iii) then ave() or tapply() will do group-wise means
>>>>>
>>>>> Something like:
>>>>>
>>>>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"
>>>>> here.
>>>>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
>>>>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>>>>
>>>>> You could also use cut2 from the Hmisc package to combine findInterval
>>>>> and quantile into a single step.
>>>>>
>>>>> Depending on your desired output.
>>>>>
>>>>> Hope that helps,
>>>>> Michael
>>>>>
>>>>> On Tue, Apr 3, 2012 at 8:47 AM, Val <valkr...@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Assume that I have the following 10 data points.
>>>>>>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>>>>>>
>>>>>> sort x  and get the following
>>>>>>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>>>>>>
>>>>>> I want to  group the sorted  data point (y)  into  equal number of
>>>>>> observation per group. In this case there will be three groups.  The
>>>>>> first
>>>>>> two groups  will have three observation  and the third will have four
>>>>>> observations
>>>>>>
>>>>>> group 1  = 34, 45, 46
>>>>>> group 2  = 66, 78, 125
>>>>>> group 3  = 193, 209, 242,297
>>>>>>
>>>>>> Finally I want to calculate the group mean
>>>>>>
>>>>>> group 1  =  42
>>>>>> group 2  =  87
>>>>>> group 3  =  234
>>>>>>
>>>>>> Can anyone help me out?
>>>>>>
>>>>>> In SAS I used to do it using proc rank.
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Val
>>>>>>
>>>>>>       [[alternative HTML version deleted]]
>>>>>>
>>>>>
>>>>>
>>>>>> ______________________________**________________
>>>>>> R-help@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>>
>>> ______________________________**________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping

Reply via email to