Re: [R] Removing rows with earlier dates

David Winsemius Wed, 29 Dec 2010 07:28:28 -0800


On Dec 29, 2010, at 9:24 AM, Ali Salekfard wrote:

Thanks to everyone. Joshua's response seemed the most concise one,but it
used up so much memory that my R just gave error. I checked the other
replies and all in all I came up with this, and thought to share itwith
others and get comments.

My structure was as follows:

ACCOUNT   RULE  DATE
A1             xxxx     2010-01-01
A2             xxxx     2007-05-01
A2             xxxx     2007-05-01
A2             xxxx     2005-05-01
A2             xxxx     2005-05-01
A1             xxxx     2009-01-01
The most efficient solution I came across involves the followingsteps:
1. Find the latest date for each account, and convert it to a dataframe:
a<-tapply(my.mapping$DATE,my.mapping$ACCOUNT,max)
a<-data.frame(ACCOUNT=names(a),DT=as.Date(a,"%Y-%m-%d"))
2. merge the set with the original data

my.mapping<-merge(x=my.mapping,y=a,by.x="ACCOUNT",by.y="ACCOUNT")
3. Create a take column, which is to confirm if the date of the rowis the
maximum date for the account.
my.mapping<-cbind(my.mapping,TAKE=my.mapping$DATE==my.mapping$DT)
4. Filter out all lines except those with TAKE==TRUE.

my.mapping<-my.mapping[my.mapping$TAKE==TRUE,]
The running time for my whole list was 4.5 sec which is far betterthan any
other ways I tried. Let me have your thoughts on that.

My first thought is that you should use more spaces in your code. Itlooks quite a bit more complex than the method I suggested (and mybenchmark says mine was maybe 50% faster, but with Maechler'simprovements is now about 4 times faster. I guess I shouldn't throwtoo many stones about coding style.)


my.mapping[ with(my.mapping, DATE == ave( DATE,
                                          ACCOUNT,
                                          FUN=max} ), ]
#------------------
require(rbenchmark)
ave.method = function(df, acc, dt)
   {df[with( df, dt == ave(dt, acc, FUN=max)), ]}
merge.method = function(df, acc, dt) {
   a<- tapply(df[[dt]], df[[acc]],max)
   a  <- data.frame(ACCOUNT=names(a), DT=a)
   df <- merge(x=df, y=a, by.x=acc, by.y="ACCOUNT")
   df <- cbind(df, TAKE=df[dt]==df$DT)
df <- df[df$TAKE==TRUE,]}
benchmark(
   rep=ave.method(airquality, "Month", "Day"),
   pat=merge.method(airquality, "Month", "Day"),
   replications=1000,
   order=c('replications', 'elapsed'))
#-----------------

test replications elapsed relative user.self sys.self user.childsys.child1 rep 1000 2.523 1.000000 2.512 0.0180 02 pat 1000 7.847 3.110186 7.773 0.0920 0

It does give the same answers when tested on airquality, though. Thatsays something for it I suppose. (Had you offered a sensible testdataset in your first posting , I would have offered a solution usingyour column names, but as it was I figured you should have been ableto make the mappings.)



--
David.

Ali



David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Removing rows with earlier dates

Reply via email to