Here's one way to do part 1:
rr = rle(Table[,'binary'])
cc = cumsum(rr$lengths)+1
thestarts = c(1,cc[cc<=nrow(Table)])
theends = cc-1
answer =
cbind(Table[thestarts,'Chromosome'],Table[thestarts,'start'],Table[theends,'start'],rr$values)
answer
[,1] [,2] [,3] [,4]
[1,] 1 12 18 1
[2,] 1 20 36 0
[3,] 2 12 16 1
[4,] 2 17 19 0
If I understand you correctly, here's a way to do part 2:
Next =
matrix(c(rep(1,12),rep(2,8),c(12,13,14,15,16,18,20,21,22,23,25,35,12,13,14,15,16,17,18,19)),ncol=2)
apply(Next,1,function(x)answer[answer[,1]==x[1] & x[2] >= answer[,2] & x[2] <=
answer[,3],4])
[1] 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu
On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvande...@live.be> wrote:
My question is twofold.
Part 1:
My data looks like this:
(example set, real data has 2*10^6 rows)
binary<-c(1,1,1,0,0,0,1,1,1,0,0)
Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
start<-c(12,17,18,20,25,36,12,15,16,17,19)
Table<-cbind(Chromosome,start,binary)
Chromosome start binary
[1,] 1 12 1
[2,] 1 17 1
[3,] 1 18 1
[4,] 1 20 0
[5,] 1 25 0
[6,] 1 36 0
[7,] 2 12 1
[8,] 2 15 1
[9,] 2 16 1
[10,] 2 17 0
[11,] 2 19 0
As output I need a shortlist for each binary block: giving me the starting
and ending position of each block.
Which for these example would look like this:
Chromosome2 position_start position_end binary2
[1,] 1 12 18 1
[2,] 1 20 36 0
[3,] 2 12 16 1
[4,] 2 17 19 0
Part 2:
Based on the output of part 1, I need to assign the binary to rows of
another data set. If the position value in this second data set falls in one
of the blocks defined in the shortlist made in part1,the binary value of the
shortlist should be assigned to an extra column for this row. This would
look something like this:
Chromosome3 position Value binary3
[1,] "1" "12" "a" "1"
[2,] "1" "13" "b" "1"
[3,] "1" "14" "c" "1"
[4,] "1" "15" "d" "1"
[5,] "1" "16" "e" "1"
[6,] "1" "18" "f" "1"
[7,] "1" "20" "g" "0"
[8,] "1" "21" "h" "0"
[9,] "1" "22" "i" "0"
[10,] "1" "23" "j" "0"
[11,] "1" "25" "k" "0"
[12,] "1" "35" "l" "0"
[13,] "2" "12" "m" "1"
[14,] "2" "13" "n" "1"
[15,] "2" "14" "o" "1"
[16,] "2" "15" "p" "1"
[17,] "2" "16" "q" "1"
[18,] "2" "17" "s" "0"
[19,] "2" "18" "d" "0"
[20,] "2" "19" "f" "0"
Many thanks in advance,
Niels
--
View this message in context:
http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.