Hi Don, Yes, I am error checking a dataset produced by a query. Most likely a problem with the query but wanted to assess the problem first.
BTW Arun provided another solution which is similar to yours but uses the function ave instead: testSeq[!!(with(testSeq,ave(YoS,ID,FUN=function(x) any(c(0,diff(x))>1)))),] I appreciate your response on this. Dan -----Original Message----- From: MacQueen, Don Sent: Thursday, November 21, 2013 3:58 PM To: Lopez, Dan; R help (r-help@r-project.org) Subject: Re: [R] How do I identify non-sequential data? Dan, Does this do it? ## where dt is the data tmp <- split(dt, dt$ID) foo <- lapply(tmp, function(x) any(diff(x$YoS) > 1)) foo <- data.frame( ID=names(foo), gap=unlist(foo)) Note that I ignored dept. Little hard to see how YoS can increase by more than one when the year increases by only one ... unless this is a search for erroneous data. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 11/21/13 3:32 PM, "Lopez, Dan" <lopez...@llnl.gov> wrote: >Hi R Experts, > >About the data: >My data consists of people (ID) with years of service (Yos) for each >year. An ID can appear multiple times. >The data is sorted by ID then by Year. > >Problem: >I need to extract ID data with non-sequential YoS rows. For example >below that would be all rows for ID 33 and 16 since they have a >non-sequential YoS. >To accomplish this I figured I could create a column called 'CheckVal' >that takes current row YoS minus previous row YoS. The first instance >for each ID will be 0. 'CheckVal' in the below data set was created in Excel. >I want to know how to do this in R. >Is there a package I can use or specific function or set of functions I >can use to accomplish this? > >#My data looks like: >> testSeq > > ID Year YoS CheckVal dept > >1 12 2010 1.1 0.0 A > >2 12 2011 2.1 1.0 A > >3 44 2009 1.4 0.0 C > >4 44 2010 2.4 1.0 C > >5 44 2011 3.4 1.0 B > >6 33 2009 2.3 0.0 A > >7 33 2010 4.4 2.1 A > >8 16 2009 1.6 0.0 B > >9 16 2010 2.6 1.0 B > >10 16 2011 5.6 3.0 C > >11 16 2012 6.6 1.0 A > >#here is dput of data for R > >Structure(list(ID = c(12, 12, 44, 44, 44, 33, 33, 16, 16, 16, > >16), Year = c(2010, 2011, 2009, 2010, 2011, 2009, 2010, 2009, > >2010, 2011, 2012), YoS = c(1.1, 2.1, 1.4, 2.4, 3.4, 2.3, 4.4, > >1.6, 2.6, 5.6, 6.6), CheckVal = c(0, 1, 0, 1, 1, 0, 2.1, 0, 1, > >3, 1), dept = structure(c(1L, 1L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, > >3L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names = >c("ID", > >"Year", "YoS", "CheckVal", "dept"), row.names = c(NA, 11L), class = >"data.frame") > >Dan >Workforce Analyst >LLNL > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.