Re: [R] Within ID variable delete all rows after reaching a specific value

Jeff Newmiller Sat, 26 Apr 2014 00:35:24 -0700

Jennifer:

a) Don't post in HTML... read the Posting Guide.

b) Don't make data frames by first making matrices... you rarely createwhat you think you are creating. In your case, your code creates a bunchof factor columns... use the str() function to verify that your data aresensible before analyzing it.


c) The ave and cumsum functions are useful here:

tmp <- data.frame( X1 = rbinom( 1000, 1, .03 )
                 , X2 = array( 1:127, c(1000,1) )
                 , X3 = array( format( seq( ISOdate(1990,1,1)
                                          , by='month'
                                          , length=56 )
                                     , format='%d.%m.%Y')
                             , c( 1000, 1 ) ) )
tmp <- tmp[ with( tmp, order( X2, X3 ) ), ]
tmp2 <- subset( tmp
              , 1 >= ave( X1
                        , X2
                        , FUN=function( x ) {
                            cumsum( cumsum( x ) )
                          } ) ) )

which generates a vector of increasing values once the first nonzero valueis found in each group, and then only keeps the rows for which thoseincreasing values are zero or one.


On Sat, 26 Apr 2014, Jim Lemon wrote:

On 04/26/2014 12:42 PM, Jennifer Sabatier wrote:

So, I know that's a confusing Subject header.

Here's similar data:


tmp<- data.frame(matrix(
                         c(rbinom(1000, 1, .03),
                           array(1:127, c(1000,1)),
                           array(format(seq(ISOdate(1990,1,1), by='month',
length=56), format='%d.%m.%Y'), c(1000,1))),
                         ncol=3))
tmp<- tmp[with(tmp, order(X2, X3)), ]
table(tmp$X1)


X1 is the variable of interest - disease status.  It's a survival-type of
variable, where you are 0 until you become 1.
X2 is the person ID variable.
X3 is the clinic date (here it's monthly, just for example...but in my real
data it's a bit more complicated - definitely not equally spaced nor the
same number of visits to the clinic per ID.).

Some people stay X1 = 0 for all clinic visits.  Only a small proportion
become X1=1.

However, the data has errors I need to clean off.  Once someone becomes
X1=1 they should have no more rows in the dataset.  These are data entry
errors.

In my data I have people who continue to have rows in the data.  Sometimes
the rows show X1=0 and sometimes X1=1.  Sometimes there's just one more row
and sometimes there are many more rows.

How can I go through, find the first X1 = 1, and then delete any rows after
that, for each value of X2?

Thanks!

Jen

Hi Jen,
This might do what you want:

tmp$X3<-as.Date(tmp$X3,"%d.%m.%Y")
tmp<-tmp[order(tmp$X2,tmp$X3),]
first<-TRUE
for(patno in unique(tmp$X2)) {
cat(patno,"\n")
tmpbit<-tmp[tmp$X2 == patno,]
firstone<-which(tmpbit$X1 == 1)[1]
cat(firstone,"\n")
if(is.na(firstone)) firstone<-dim(tmpbit)[1]
newtmpbit<-tmpbit[1:firstone,]
if(first) {
 newtmp<-newtmpbit
 first<-FALSE
}
else newtmp<-rbind(newtmp,newtmpbit)
}

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Within ID variable delete all rows after reaching a specific value

Reply via email to