Background: Our research group collected data from students via the web about their drinking habits (alcohol) over the last 90 days. As you might guess, some students seem to have lost interest and completed some information but not all. Unfortunately, the survey was programmed to "pre-populate" the fields with zeroes (to make it easier for students to complete).

Obviously, when we see a stretch of zeroes, we've no idea whether this is "true" data or not, but we'd like to at least do some sensitivity analyses by dropping "trailing zeroes" (ie, when there are non-zero responses for some duration of the data that then "flat line" into all zeroes to the end of the time period)

I've included a toy dataset below.

Basically, we have the data in the "long" format, and what I'd like to do is subset the data.frame by deleting rows that occur at the end of a person's data that are all zeroes. In a nutshell, select rows from a person that are continuously zero, up to first non-zero, starting at the end of their data (which, below, would be time = 10).

With the toy data, this would be the last 6 rows of ids #10 and #8 (for example). I can begin to think about how I might do this via grep/regexp but am a bit stumped about how to translate that to this type of data.

Any thoughts appreciated.

cheers, Dave

### toy dataset
set.seed(123)
toy.df <- data.frame(id = factor(rep(1:10, each=10)),
                                                time = rep(1:10, 10),
                                           dv = rnbinom(100, mu = 0.5, size = 
100))
toy.df

library(lattice)

xyplot(dv ~ time | id, data = toy.df, type = c("g","l"))

--
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datk...@u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)               
1100 NE 45th Street, Suite 300  
Seattle, WA  98105      
206-616-3879    
http://depts.washington.edu/cshrb/
(Mon-Wed)       

Center for Healthcare Improvement, for Addictions, Mental Illness,
  Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104?
206-897-4210
http://www.chammp.org
(Thurs)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to