Hi: Here are a couple more options using packages plyr and data.table. The labels in the second part are changed because they didn't make sense in a 2M line file (well, mine may not either, but it's a start). You can always change them to something more pertinent.
# Question 1: Table <- data.frame(binary, chromosome = Chromosome, start) library(plyr) (df <- ddply(Table, .(chromosome, binary), summarise, position_start = min(start), position_end = max(start))) chromosome binary position_start position_end 1 1 0 20 36 2 1 1 12 18 3 2 0 17 19 4 2 1 12 16 library(data.table) dTable <- data.table(Table, key = 'chromosome, binary') (dt <- dTable[, list(position_start = min(start), position_end = max(start)), by = 'chromosome, binary']) chromosome binary position_start position_end [1,] 1 0 20 36 [2,] 1 1 12 18 [3,] 2 0 17 19 [4,] 2 1 12 16 ## Question 2: For plyr, it's easy to write a function that takes a generic input data frame (in this case, a single line) and then outputs a data frame with positions and labels. tfun <- function(df) { diff <- with(df, position_end - position_start + 1) position <- with(df, seq(position_start, position_end)) value <- paste(df$chromosome, df$binary, letters[1:diff], sep = '.') data.frame(chromosome = df$chromosome, position, value, binary = df$binary) } # Then: > ddply(df, .(chromosome, binary), tfun) chromosome position value binary 1 1 20 1.0.a 0 2 1 21 1.0.b 0 3 1 22 1.0.c 0 4 1 23 1.0.d 0 5 1 24 1.0.e 0 6 1 25 1.0.f 0 7 1 26 1.0.g 0 8 1 27 1.0.h 0 9 1 28 1.0.i 0 10 1 29 1.0.j 0 11 1 30 1.0.k 0 12 1 31 1.0.l 0 13 1 32 1.0.m 0 14 1 33 1.0.n 0 15 1 34 1.0.o 0 16 1 35 1.0.p 0 17 1 36 1.0.q 0 18 1 12 1.1.a 1 19 1 13 1.1.b 1 20 1 14 1.1.c 1 21 1 15 1.1.d 1 22 1 16 1.1.e 1 23 1 17 1.1.f 1 24 1 18 1.1.g 1 25 2 17 2.0.a 0 26 2 18 2.0.b 0 27 2 19 2.0.c 0 28 2 12 2.1.a 1 29 2 13 2.1.b 1 30 2 14 2.1.c 1 31 2 15 2.1.d 1 32 2 16 2.1.e 1 # For data.table, one can apply the internals of tfun directly: dt[, list(chromosome = chromosome, position = seq(position_start, position_end), value = paste(chromosome, binary, letters[1:(position_end - position_start + 1)], sep = '.'), binary = binary), by = 'chromosome, binary'] chromosome binary chromosome.1 position value binary.1 1 0 1 20 1.0.a 0 1 0 1 21 1.0.b 0 1 0 1 22 1.0.c 0 1 0 1 23 1.0.d 0 1 0 1 24 1.0.e 0 1 0 1 25 1.0.f 0 1 0 1 26 1.0.g 0 1 0 1 27 1.0.h 0 1 0 1 28 1.0.i 0 1 0 1 29 1.0.j 0 1 0 1 30 1.0.k 0 1 0 1 31 1.0.l 0 1 0 1 32 1.0.m 0 1 0 1 33 1.0.n 0 1 0 1 34 1.0.o 0 1 0 1 35 1.0.p 0 1 0 1 36 1.0.q 0 1 1 1 12 1.1.a 1 1 1 1 13 1.1.b 1 1 1 1 14 1.1.c 1 1 1 1 15 1.1.d 1 1 1 1 16 1.1.e 1 1 1 1 17 1.1.f 1 1 1 1 18 1.1.g 1 2 0 2 17 2.0.a 0 2 0 2 18 2.0.b 0 2 0 2 19 2.0.c 0 2 1 2 12 2.1.a 1 2 1 2 13 2.1.b 1 2 1 2 14 2.1.c 1 2 1 2 15 2.1.d 1 2 1 2 16 2.1.e 1 cn chromosome binary chromosome position value binary HTH, Dennis On Wed, Apr 20, 2011 at 2:01 AM, baboon2010 <nielsvande...@live.be> wrote: > My question is twofold. > > Part 1: > My data looks like this: > > (example set, real data has 2*10^6 rows) > binary<-c(1,1,1,0,0,0,1,1,1,0,0) > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) > start<-c(12,17,18,20,25,36,12,15,16,17,19) > Table<-cbind(Chromosome,start,binary) > Chromosome start binary > [1,] 1 12 1 > [2,] 1 17 1 > [3,] 1 18 1 > [4,] 1 20 0 > [5,] 1 25 0 > [6,] 1 36 0 > [7,] 2 12 1 > [8,] 2 15 1 > [9,] 2 16 1 > [10,] 2 17 0 > [11,] 2 19 0 > > As output I need a shortlist for each binary block: giving me the starting > and ending position of each block. > Which for these example would look like this: > Chromosome2 position_start position_end binary2 > [1,] 1 12 18 1 > [2,] 1 20 36 0 > [3,] 2 12 16 1 > [4,] 2 17 19 0 > > Part 2: > Based on the output of part 1, I need to assign the binary to rows of > another data set. If the position value in this second data set falls in one > of the blocks defined in the shortlist made in part1,the binary value of the > shortlist should be assigned to an extra column for this row. This would > look something like this: > Chromosome3 position Value binary3 > [1,] "1" "12" "a" "1" > [2,] "1" "13" "b" "1" > [3,] "1" "14" "c" "1" > [4,] "1" "15" "d" "1" > [5,] "1" "16" "e" "1" > [6,] "1" "18" "f" "1" > [7,] "1" "20" "g" "0" > [8,] "1" "21" "h" "0" > [9,] "1" "22" "i" "0" > [10,] "1" "23" "j" "0" > [11,] "1" "25" "k" "0" > [12,] "1" "35" "l" "0" > [13,] "2" "12" "m" "1" > [14,] "2" "13" "n" "1" > [15,] "2" "14" "o" "1" > [16,] "2" "15" "p" "1" > [17,] "2" "16" "q" "1" > [18,] "2" "17" "s" "0" > [19,] "2" "18" "d" "0" > [20,] "2" "19" "f" "0" > > > Many thanks in advance, > > Niels > > -- > View this message in context: > http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.