>>>>> David Winsemius <dwinsem...@comcast.net> >>>>> on Fri, 24 Dec 2010 11:47:05 -0500 writes:
> On Dec 24, 2010, at 11:04 AM, David Winsemius wrote: >> >> On Dec 24, 2010, at 8:45 AM, Ali Salekfard wrote: >> >>> Hi all, >>> >>> I'm new to the list but have benfited from it quite extensively. >>> Straight to >>> my rather strange question: >>> >>> I have a data frame that contains mapping rules in this way: >>> >>> ACCOUNT, RULE COLUMNS, Effective Date >>> >>> >>> The dataframe comes from a database that stores all dates. What I >>> would like >>> to do is to create a data frame with only the most recent rule for >>> each >>> account. In traditional programming languages I would loop through >>> each >>> account find the most recent rule(s) and fill up my updated data >>> frame. >>> >>> Does anyone have any better idea to use R's magic (Its syntax is >>> still >>> magical to me) for this problem? >> >> It's going to remain magic until you start thinking about what is >> needed. In this case the need is for a good understanding of the >> structure of the data object and the str function is the usual way >> to examine such AND to then communicate with the list. Read the >> Posting Guide again and the references it cites, please. >> >>> >> >> Here would have been my first attempt, assuming a dataframe named >> dfrm: >> #make sure the most recent is on top >> dfrm <- dfrm[ order(dfrm["Effective Date"], decreasing=TRUE), ] >> # then pull the first record within ACCOUNT >> tapply(dfrm, dfrm$ACCOUNT , FUN= "[", 1 , ) >> >> >>> By the way the list of rules is quite extensive (144643 lines to be >>> precise), and there are usually 1-3 most recent rules (rows) for each >>> account. >> >> That is a bit different than the initial problem statement in which >> you asked for the "only the most recent" within each account. How >> are we supposed to get 3 _most_ recent rules? I think you are >> expecting us to read your mind regarding how you are thinking about >> this problem and pull all the records with the maximum date within >> an account. >> >> Perhaps this effort to create a logical vector would be in the right >> direction: >> >> dfrm[ ave(dfrm["Effective Date"], dfrm[ , "ACCOUNT"], function(x) x >> == max(x), ] >> >> It should pull all records for which the Effective Date is equal to >> the maximum within ACCOUNT. It is going to depend on whether >> "Effective Date" of of a class that can be properly compared with >> max(). Both Date and character representations of dates in standard >> y-m-d form would qualify. Other date formats might not: >> > max("01-02-2011", "02-01-2010") >> [1] "02-01-2010" >> > When I used the strategy on the airquality dataset I do not get the > results I expected, but a modification did succeed: >> airquality[ airquality$Day == ave(airquality$Day, airquality$Month, > FUN=function(x){ max(x)} ), ] > Ozone Solar.R Wind Temp Month Day > 31 37 279 7.4 76 5 31 > 61 NA 138 8.0 83 6 30 > 92 59 254 9.2 81 7 31 > 123 85 188 6.3 94 8 31 > 153 20 223 11.5 68 9 30 Hmm, yes, but " FUN = function(x) { max(x) } " is so ugly that it hurts my R-eyes. Just use 'FUN = max' .. please .. and as we are in making things more readable, I'd like to propose using with() in these cases --> > airquality[with(airquality, Day == ave(Day, Month, FUN=max)),] Ozone Solar.R Wind Temp Month Day 31 37 279 7.4 76 5 31 61 NA 138 8.0 83 6 30 92 59 254 9.2 81 7 31 123 85 188 6.3 94 8 31 153 20 223 11.5 68 9 30 Regards, Martin Maechler, ETH Zurich > I do suspect it requires that the dataframe be sorted to get the > joint conditions lined up correctly. The earlier method should have > used an as.logical() wrapper and would then not have needed pre- > sorting the dataframe, so try instead: > frm[ as.logical(ave(dfrm["Effective Date"], dfrm[ , "ACCOUNT"], > function(x) x == max(x)), ] >> >> >> -- >> David Winsemius, MD >> West Hartford, CT ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.