Here one approach: > x <- read.table(textConnection("Chr start end sample1 sample2 + chr2 9896633 9896683 0 0 + chr2 9896639 9896690 0 0 + chr2 14314039 14314098 0 -0.35 + chr2 14404467 14404502 0 -0.35 + chr2 14421718 14421777 -0.43 -0.35 + chr2 16031710 16031769 -0.43 -0.35 + chr2 16036178 16036237 -0.43 -0.35 + chr2 16048665 16048724 -0.43 -0.35 + chr2 37491676 37491735 0 0 + chr2 37702947 37703009 0 0"), header = TRUE, as.is = TRUE) > closeAllConnections() > > result <- lapply(c('sample1', 'sample2'), function(.samp){ + # split by breaks in the values + .grps <- split(x, cumsum(c(0, diff(x[[.samp]]) != 0))) + + # combine the list of dataframes + .range <- do.call(rbind, lapply(.grps, function(.set){ + # create a dataframe of the results + data.frame(Sample = .samp + , Chr = .set$Chr[1L] + , Start = min(.set$start) + , End = max(.set$end) + , Values = .set[[.samp]][1L] + , Probes = nrow(.set) + ) + })) + }) > # put the list of dataframes together > result <- do.call(rbind, result) > result Sample Chr Start End Values Probes 0 sample1 chr2 9896633 14404502 0.00 4 1 sample1 chr2 14421718 16048724 -0.43 4 2 sample1 chr2 37491676 37703009 0.00 2 01 sample2 chr2 9896633 9896690 0.00 2 11 sample2 chr2 14314039 16048724 -0.35 6 21 sample2 chr2 37491676 37703009 0.00 2 >
On Mon, Sep 26, 2011 at 10:30 AM, sujitha <virith...@gmail.com> wrote: > Hi group, > > This is how my test file looks like: > Chr start end sample1 sample2 > chr2 9896633 9896683 0 0 > chr2 9896639 9896690 0 0 > chr2 14314039 14314098 0 -0.35 > chr2 14404467 14404502 0 -0.35 > chr2 14421718 14421777 -0.43 -0.35 > chr2 16031710 16031769 -0.43 -0.35 > chr2 16036178 16036237 -0.43 -0.35 > chr2 16048665 16048724 -0.43 -0.35 > chr2 37491676 37491735 0 0 > chr2 37702947 37703009 0 0 > > This is the output that I am expecting: > Sample Chr Start End Values Probes > sample1 chr2 9896633 14404502 0 4 > sample1 chr2 14421718 16048724 -0.43 4 > sample1 chr2 37491676 37703001 0 2 > sample2 chr2 9896633 9896690 0 2 > sample2 chr2 14314039 16048724 -0.35 6 > sample2 chr2 37491676 37703009 0 2 > > Here the Chr value is same but can be any other value aswell so unique among > the similar values. The Start for the first line would be the least value > until values are similiar (4) then the end would be highest value. The > values is the unique value among the common values and probes is number of > similar values. > > Code: >>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric')) > #reading the test file >>s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]])) > # to get the last 2 columns >> names(s)=c("Values","Probes") >>G=1 >> for(i in 1:length(s$Probes)){ > + if(G==1){first<-unique(m$Chr[G:s$Probes[i]]) > + second<-min(m$Start[G:s$Probes[i]]) > + third<-max(m$End[G:s$Probes[i]]) > + c<-cbind(first,second,third,s$Values[i],s$Probes[i]) > + print (c) > + G=(G+s$Probes[i])} > + else if((G-1) < length(m$Sample1)) { > + first<-unique(m$Chr[G:(G+s$Probes[i]-1)]) > + second<-min(m$Start[G:(G+s$Probes[i]-1)]) > + third<-max(m$End[G:(G+s$Probes[i]-1)]) > + c<-cbind(first,second,third,s$Values[i],s$Probes[i]) > + print (c) > + G=(G+s$Probes[i])} > + else { > + G=1 > + first<-unique(m$Chr[G:s$Probes[i]]) > + second<-min(m$Start[G:s$Probes[i]]) > + third<-max(m$End[G:s$Probes[i]]) > + c<-cbind(first,second,third,s$Values[i],s$Probes[i]) > + print (c) > + G=(G+s$Probes[i])} > + } > so the output is: > first second third > [1,] "chr2" "9896633" "14404502" "0" "4" > first second third > [1,] "chr2" "14421718" "16048724" "-0.43" "4" > first second third > [1,] "chr2" "37491676" "37703009" "0" "2" > first second third > [1,] "chr2" "9896633" "9896690" "0" "2" > first second third > [1,] "chr2" "14314039" "16048724" "-0.35" "6" > first second third > [1,] "chr2" "37491676" "37703009" "0" "2" > > I get almost the required output but just need 3 modifications to this code: > 1) Since this is just a small part of the file (with 2 samples), but my > actual file has 150 samples, so how do I write rle function for that? > 2) How do I store all the executed c values as a dataframe (here I am just > printing the values)? > 3) How do I include sample name in execution? > Waiting for your reply , > Thanks, > Suji > > > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.