Hi,
You may also try: set.seed(425) ##your code tmp <- data.frame(.... ##### tmp1 <- tmp str(tmp1) #'data.frame': 1000 obs. of 3 variables: # $ X1: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... # $ X2: Factor w/ 127 levels "1","10","100",..: 1 1 1 1 1 1 1 1 2 2 ... # $ X3: Factor w/ 56 levels "01.01.1990","01.01.1991",..: 1 21 17 37 33 51 48 10 11 45 #... tmp1 <- tmp1[with(tmp1,order(X2, as.Date(X3, "%d.%m.%Y"))),] tmp2 <- tmp1[with(tmp1,!ave(as.numeric(as.character(X1)),X2, FUN=function(x) cumsum(cumsum(x)) >1 )),] ###checking results with Jim's method tmp2New <- tmp2 tmp2New$X3 <- as.Date(tmp2New$X3, "%d.%m.%Y") identical(tmp2New,newtmp) ##Jim's result #[1] TRUE A.K. On Saturday, April 26, 2014 12:07 AM, Jim Lemon <j...@bitwrit.com.au> wrote: On 04/26/2014 12:42 PM, Jennifer Sabatier wrote: > So, I know that's a confusing Subject header. > > Here's similar data: > > > tmp<- data.frame(matrix( > c(rbinom(1000, 1, .03), > array(1:127, c(1000,1)), > array(format(seq(ISOdate(1990,1,1), by='month', > length=56), format='%d.%m.%Y'), c(1000,1))), > ncol=3)) > tmp<- tmp[with(tmp, order(X2, X3)), ] > table(tmp$X1) > > > X1 is the variable of interest - disease status. It's a survival-type of > variable, where you are 0 until you become 1. > X2 is the person ID variable. > X3 is the clinic date (here it's monthly, just for example...but in my real > data it's a bit more complicated - definitely not equally spaced nor the > same number of visits to the clinic per ID.). > > Some people stay X1 = 0 for all clinic visits. Only a small proportion > become X1=1. > > However, the data has errors I need to clean off. Once someone becomes > X1=1 they should have no more rows in the dataset. These are data entry > errors. > > In my data I have people who continue to have rows in the data. Sometimes > the rows show X1=0 and sometimes X1=1. Sometimes there's just one more row > and sometimes there are many more rows. > > How can I go through, find the first X1 = 1, and then delete any rows after > that, for each value of X2? > > Thanks! > > Jen > Hi Jen, This might do what you want: tmp$X3<-as.Date(tmp$X3,"%d.%m.%Y") tmp<-tmp[order(tmp$X2,tmp$X3),] first<-TRUE for(patno in unique(tmp$X2)) { cat(patno,"\n") tmpbit<-tmp[tmp$X2 == patno,] firstone<-which(tmpbit$X1 == 1)[1] cat(firstone,"\n") if(is.na(firstone)) firstone<-dim(tmpbit)[1] newtmpbit<-tmpbit[1:firstone,] if(first) { newtmp<-newtmpbit first<-FALSE } else newtmp<-rbind(newtmp,newtmpbit) } Jim ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.