Max,
Since the dataset is sorted by ID, with ties broken by N, the following
should do it and do it quickly. It grabs the rows just before ID
changes.
> with(data, data[ c(ID[-1] != ID[-length(ID)], TRUE),, drop=FALSE])
ID Type N
7 45900 I 7
10 49270 E 3
24 46550 I 7
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Max Webber
> Sent: Wednesday, May 06, 2009 3:09 PM
> To: [email protected]
> Subject: [R] by-group processing
>
> Given a dataframe like
>
> > data
> ID Type N
> 1 45900 A 1
> 2 45900 B 2
> 3 45900 C 3
> 4 45900 D 4
> 5 45900 E 5
> 6 45900 F 6
> 7 45900 I 7
> 8 49270 A 1
> 9 49270 B 2
> 10 49270 E 3
> 18 46550 A 1
> 19 46550 B 2
> 20 46550 C 3
> 21 46550 D 4
> 22 46550 E 5
> 23 46550 F 6
> 24 46550 I 7
> >
>
> containing an identifier (ID), a variable type code (Type), and
> a running count of the number of records per ID (N), how can I
> return a dataframe of only those records with the maximum value
> of N for each ID? For instance,
>
> > data
> ID Type N
> 7 45900 I 7
> 10 49270 E 3
> 24 46550 I 7
>
> I know that I can use
>
> > tapply ( data $ N , data $ ID , max )
> 45900 46550 49270
> 7 7 3
> >
>
> to get the values of the maximum N for each ID, but how is it
> that I can find the index of these values to subsequently use to
> subscript data?
>
>
> --
> maxine-webber
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.