> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of jim holtman > Sent: Wednesday, April 20, 2011 9:59 AM > To: baboon2010 > Cc: r-help@r-project.org > Subject: Re: [R] 'Record' row values every time the binary > value in acollumn changes > > Here is an answer to part 1: > > > binary<-c(1,1,1,0,0,0,1,1,1,0,0) > > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) > > start<-c(12,17,18,20,25,36,12,15,16,17,19) > > Table<-cbind(Chromosome,start,binary) > > # determine where the start/end of each group is > > # use indices since the size is large > > startEnd <- lapply(split(seq(nrow(Table)) > + , list(Table[, "Chromosome"], Table[, > 'binary']) > + , drop = TRUE > + ) > + , function(.indx){ > + se <- range(.indx) > + c(Chromosome2 = unname(Table[se[1L], "Chromosome"]) > + , position_start = unname(Table[se[1L], 'start']) > + , position_end = unname(Table[se[2L], 'start']) > + , binary2 = unname(Table[se[1L], 'binary']) > + ) > + }) > > do.call(rbind, startEnd) > Chromosome2 position_start position_end binary2 > 1.0 1 20 36 0 > 2.0 2 17 19 0 > 1.1 1 12 18 1 > 2.1 2 12 16 1
The following will likely be quicker way to find where a column changes values than that lapply() when there are lots of rows: f1 <- function (Table) { isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)]) isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE) with(data.frame(Table), { first <- isFirstInRun(binary) last <- isLastInRun(binary) cbind(Chromosome2 = Chromosome[first], position_start = start[first], position_end = start[last], binary2 = binary[first]) }) } E.g., > f1(Table) Chromosome2 position_start position_end binary2 [1,] 1 12 18 1 [2,] 1 20 36 0 [3,] 2 12 16 1 [4,] 2 17 19 0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > > > > > > On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 > <nielsvande...@live.be> wrote: > > My question is twofold. > > > > Part 1: > > My data looks like this: > > > > (example set, real data has 2*10^6 rows) > > binary<-c(1,1,1,0,0,0,1,1,1,0,0) > > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) > > start<-c(12,17,18,20,25,36,12,15,16,17,19) > > Table<-cbind(Chromosome,start,binary) > > Chromosome start binary > > [1,] 1 12 1 > > [2,] 1 17 1 > > [3,] 1 18 1 > > [4,] 1 20 0 > > [5,] 1 25 0 > > [6,] 1 36 0 > > [7,] 2 12 1 > > [8,] 2 15 1 > > [9,] 2 16 1 > > [10,] 2 17 0 > > [11,] 2 19 0 > > > > As output I need a shortlist for each binary block: giving > me the starting > > and ending position of each block. > > Which for these example would look like this: > > Chromosome2 position_start position_end binary2 > > [1,] 1 12 18 1 > > [2,] 1 20 36 0 > > [3,] 2 12 16 1 > > [4,] 2 17 19 0 > > > > Part 2: > > Based on the output of part 1, I need to assign the binary > to rows of > > another data set. If the position value in this second data > set falls in one > > of the blocks defined in the shortlist made in part1,the > binary value of the > > shortlist should be assigned to an extra column for this > row. This would > > look something like this: > > Chromosome3 position Value binary3 > > [1,] "1" "12" "a" "1" > > [2,] "1" "13" "b" "1" > > [3,] "1" "14" "c" "1" > > [4,] "1" "15" "d" "1" > > [5,] "1" "16" "e" "1" > > [6,] "1" "18" "f" "1" > > [7,] "1" "20" "g" "0" > > [8,] "1" "21" "h" "0" > > [9,] "1" "22" "i" "0" > > [10,] "1" "23" "j" "0" > > [11,] "1" "25" "k" "0" > > [12,] "1" "35" "l" "0" > > [13,] "2" "12" "m" "1" > > [14,] "2" "13" "n" "1" > > [15,] "2" "14" "o" "1" > > [16,] "2" "15" "p" "1" > > [17,] "2" "16" "q" "1" > > [18,] "2" "17" "s" "0" > > [19,] "2" "18" "d" "0" > > [20,] "2" "19" "f" "0" > > > > > > Many thanks in advance, > > > > Niels > > > > -- > > View this message in context: > http://r.789695.n4.nabble.com/Record-row-values-every-time-the -binary-value-in-a-collumn-changes-tp3462496p3462496.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.