To help with the clarification, I renamed 'col1' to 'year' and 'col2' to 'origDat'. With that said...
The reason the second 'block' of 1's (four consecutive 1's appearing in DF$origDat[11:14]) is preserved is because they are only separated by a total of 1 year (1998 in DF$year) from a larger group of consecutive 1's (years 1999 through 2002). Because the first block of 1's are separated from from any other block of ones by at least 2 years, which I have deemed to be too large of a gap in data (0's are a surrogate for missing data), the 1's appearing in DF$year[3:6] should be reset to 0. I modified the script based on David's suggestion of rle (I was previously unaware of it) to that shown below, and it works for all three example DF's provided at the top of the script. That is, after running the script with any of the first 3 DF's provided, the data in DF$finalDat (as compared to DF$origDat) is reflective of what I'm after. HOWEVER, the use of nested while loops and if statements strikes me as antithetical to elegant R scripting. Second, my script, as currently constituted, has a significant bug in that the rules I've set forth are not completely satisfied. If DF4 is used (uncomment the line: "DF <- DF4") the blocks of 1's at the beginning and end of DF$origDat are preserved, whereas the middle (and largest continuous) block of 1's appearing in the middle of DF$origDat are reset to 0. Thus, I think I'm in need of a more elegant way of pursuing this problem...should anyone be so inclined to offer of additional thoughts. The (semi-) working script using rle is: DF <- data.frame(year=rep(1991:2004, each=2), origDat=c(0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1)) #DF <- data.frame(year=rep(1991:2004, each=2), # origDat=c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1)) #DF <- data.frame(year=rep(1991:2004, each=2), # origDat=c(1,1,1,1,1,1,0,0,0,0,1,1,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1)) # An example that doesn't work DF4 <- data.frame(year=rep(1991:2004, each=2), origDat=c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1)) #DF <- DF4 DF$inc <- c(1, abs(diff(DF$origDat))) DF$cumsum <- cumsum(DF$inc) ex1 <- aggregate(year ~ cumsum, data=DF, function(x) length(unique(x))) names(ex1) <- c('cumsum','isl') tmp1a <- merge(DF, ex1, by="cumsum", all.x=TRUE) tmp1a$isl2 <- (-1*tmp1a$origDat) * tmp1a$isl tmp1a$isl2[tmp1a$isl2==0] <- tmp1a$isl[tmp1a$isl2==0] tmp1a$isl2 <- -1 * tmp1a$isl2 DF$grpng <- tmp1a$isl2 runlen <- data.frame(cumsum = seq(1:length(rle(DF$grpng)$lengths)), len = rle(DF$grpng)$lengths, val = rle(DF$grpng)$values) i <- 1 while(i <= nrow(runlen)){ if(runlen[i,'val'] >= 2){ # As long as a '-2' or smaller doesn't follow, # then the current group of data is NOT # too 'distant' from other data and should be # preserved. Otherwise, the current grp of # 1's should be reset to 0 j <- i + 1 while(j <= nrow(runlen)){ if(runlen[j,'val'] <= -2){ # If code enters here, then swich the sign of 'val' to # effectively inactivate this block of 1's runlen[i,'val'] <- -1 * runlen[i,'val'] } #print(paste0("j: ",as.character(j))) j <- j + 1 } } else if (runlen[i,'val'] > 0 & runlen[i,'val'] < 2){ # If the script enters here, then the current group of data # doesn't meet the minimum continuous length requirement of # 2 or more years (in this example a check of >0 & <2 seems # silly, but in the real-world dataset 2 will be replaced with # a much larger example. runlen[i,'val'] <- -1 * runlen[i,'val'] } #print(paste0("i: ",as.character(i))) i <- i + 1 } runlen$finalDat <- ifelse(runlen$val < 0, 0, 1) DF <- merge(DF, runlen, by = 'cumsum', all.x = TRUE) DF [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.