Hi, Welcome to R and the help list!
On Mon, Nov 1, 2010 at 12:34 PM, blurg <ian.jh...@gmail.com> wrote: > > I have a data set similar to the set below where 1 and 2 indicate test > results and 0 indicates time points in between where there are no test > results. I would like to allocate the time points leading up to a test > result with the value of the test result. > > What I have: What I want: > 1 1 > 0 1 > 0 1 > 0 1 > 1 1 > 0 2 > 0 2 > 2 2 > 0 1 > 0 1 > 1 1 > 0 2 > 2 2 > > I have attempted methods creating a data.frame of the the breaks/changes in > of values to from 0 to 1 or to 2. > x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) > x1 <- which(diff(x) == 1) > x2 <- which(diff(x) == 2) ## Functions that *I think* does what you want myfun <- function(x) { dat <- rle(x) i <- which(dat$values == 0) dat$lengths[i + 1] <- with(dat, lengths[i + 1] + lengths[i]) return(with(dat, rep(values[-i], lengths[-i]))) } ## Three test pieces of data x <- c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) y <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) z <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,0) ## your example, works myfun(x) ## test case 2 (begins with a number), works myfun(y) ## test case 3 (ends with 0), fails myfun(z) So, if things work how I think they do, that function should do what you need as long as the last value is not 0, which kind of makes sense because what value would be assigned anyways? Side note, I created a sample vector with 10 million elements, and it took about 9 seconds to run it through my function. @list members, I welcome someone checking my work, I'm uneasy about a couple aspects generalizing properly. > > What ever the solution, I can't be entered by hand due to the size of the > dataset (>10 million and change). Any ideas? This is my first time posting > to this forum and I am relatively new to R, so please don't flame me to > hard. Although this list can certainly be tough at times, for your peace of mind you pretty much did everything right as far as I am concerned. You described your problem, included a small set of sample data that was easily read into R (for future reference say you have a more complex object that is not as easy to create, dput() will save you and us trouble), and even showed what you tried to do. Finally, in your explanation you gave both sample data AND desired outcome. This gives us a "gold standard" to test our code against, rather than hoping our results match what your described you want. I am always thrilled when I'm not left re-reading a paragraph long, English explanation that can be shown nicely with a few numbers. > Desperate times call for desperate measures. and assuming you have put forth some effort trying to solve it yourself and took the time to help us answer your question (as you clearly did here), the help list should not be a desperate measure :) Cheers, Josh Thanks. > -- > View this message in context: > http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.