Thanks to everyone. Joshua's response seemed the most concise one, but it
used up so much memory that my R just gave error. I checked the other
replies and all in all I came up with this, and thought to share it with
others and get comments.

My structure was as follows:

ACCOUNT   RULE  DATE
A1             xxxx     2010-01-01
A2             xxxx     2007-05-01
A2             xxxx     2007-05-01
 A2             xxxx     2005-05-01
A2             xxxx     2005-05-01
 A1             xxxx     2009-01-01

The most efficient solution I came across involves the following steps:

1. Find the latest date for each account, and convert it to a data frame:

a<-tapply(my.mapping$DATE,my.mapping$ACCOUNT,max)
a<-data.frame(ACCOUNT=names(a),DT=as.Date(a,"%Y-%m-%d"))
2. merge the set with the original data

my.mapping<-merge(x=my.mapping,y=a,by.x="ACCOUNT",by.y="ACCOUNT")

3. Create a take column, which is to confirm if the date of the row is the
maximum date for the account.
my.mapping<-cbind(my.mapping,TAKE=my.mapping$DATE==my.mapping$DT)
4. Filter out all lines except those with TAKE==TRUE.

my.mapping<-my.mapping[my.mapping$TAKE==TRUE,]
The running time for my whole list was 4.5 sec which is far better than any
other ways I tried. Let me have your thoughts on that.

Ali

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to