Here is an answer to part 1: > binary<-c(1,1,1,0,0,0,1,1,1,0,0) > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) > start<-c(12,17,18,20,25,36,12,15,16,17,19) > Table<-cbind(Chromosome,start,binary) > # determine where the start/end of each group is > # use indices since the size is large > startEnd <- lapply(split(seq(nrow(Table)) + , list(Table[, "Chromosome"], Table[, 'binary']) + , drop = TRUE + ) + , function(.indx){ + se <- range(.indx) + c(Chromosome2 = unname(Table[se[1L], "Chromosome"]) + , position_start = unname(Table[se[1L], 'start']) + , position_end = unname(Table[se[2L], 'start']) + , binary2 = unname(Table[se[1L], 'binary']) + ) + }) > do.call(rbind, startEnd) Chromosome2 position_start position_end binary2 1.0 1 20 36 0 2.0 2 17 19 0 1.1 1 12 18 1 2.1 2 12 16 1 > >
On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvande...@live.be> wrote: > My question is twofold. > > Part 1: > My data looks like this: > > (example set, real data has 2*10^6 rows) > binary<-c(1,1,1,0,0,0,1,1,1,0,0) > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) > start<-c(12,17,18,20,25,36,12,15,16,17,19) > Table<-cbind(Chromosome,start,binary) > Chromosome start binary > [1,] 1 12 1 > [2,] 1 17 1 > [3,] 1 18 1 > [4,] 1 20 0 > [5,] 1 25 0 > [6,] 1 36 0 > [7,] 2 12 1 > [8,] 2 15 1 > [9,] 2 16 1 > [10,] 2 17 0 > [11,] 2 19 0 > > As output I need a shortlist for each binary block: giving me the starting > and ending position of each block. > Which for these example would look like this: > Chromosome2 position_start position_end binary2 > [1,] 1 12 18 1 > [2,] 1 20 36 0 > [3,] 2 12 16 1 > [4,] 2 17 19 0 > > Part 2: > Based on the output of part 1, I need to assign the binary to rows of > another data set. If the position value in this second data set falls in one > of the blocks defined in the shortlist made in part1,the binary value of the > shortlist should be assigned to an extra column for this row. This would > look something like this: > Chromosome3 position Value binary3 > [1,] "1" "12" "a" "1" > [2,] "1" "13" "b" "1" > [3,] "1" "14" "c" "1" > [4,] "1" "15" "d" "1" > [5,] "1" "16" "e" "1" > [6,] "1" "18" "f" "1" > [7,] "1" "20" "g" "0" > [8,] "1" "21" "h" "0" > [9,] "1" "22" "i" "0" > [10,] "1" "23" "j" "0" > [11,] "1" "25" "k" "0" > [12,] "1" "35" "l" "0" > [13,] "2" "12" "m" "1" > [14,] "2" "13" "n" "1" > [15,] "2" "14" "o" "1" > [16,] "2" "15" "p" "1" > [17,] "2" "16" "q" "1" > [18,] "2" "17" "s" "0" > [19,] "2" "18" "d" "0" > [20,] "2" "19" "f" "0" > > > Many thanks in advance, > > Niels > > -- > View this message in context: > http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.