Thanks,It does work for the sample data. When I use it for my actual data it is throwing this error Error in data.frame(Sample = .samp, Chr = .set$Chr[1L], Start = min(.set$Start), : arguments imply differing number of rows: 1, 0 I am not able to understand Why I am getting this? waiting for your reply, Thanks, Suji On Wed, Sep 28, 2011 at 4:15 PM, jim holtman <jholt...@gmail.com> wrote:
> I only used textConnection for the sample data. Just put your file > name in the read.table; e.g., > > > x<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric')) > > as you have in your email. I used 'x' in my code, so I replaced your > 'm' with 'x'. > > Try it and see if it works; no reason it shouldn't. > > > > On Wed, Sep 28, 2011 at 3:03 PM, viritha k <virith...@gmail.com> wrote: > > Hi Jim, > > Thanks for the reply, ok but I dont want to use textConnection and paste > > each line but want the input to be read from a file like > > > m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'). > > So how do I incorporate that in your code. > > Thanks, > > Suji > > On Wed, Sep 28, 2011 at 2:40 PM, jim holtman <jholt...@gmail.com> wrote: > >> > >> The solution that I sent will handle the 150 different samples; just > >> list the column names in the argument to the top 'lapply'. You don't > >> need the 'rle' in my approach. > >> > >> On Wed, Sep 28, 2011 at 2:13 PM, viritha k <virith...@gmail.com> wrote: > >> > Hi, > >> > This is the code that I wrote for 3 samples: > >> > code: > >> > >> >> > >>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric')) > >> >> > >> >> > >> >> > s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]])) > >> > > >> >> names(s)=c("Values","Probes") > >> >> > >> >> > >> >> > c<-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE) > >> >> G=1 > >> >> n=4 > >> > > >> >> for(i in 1:length(s$Probes)){ > >> > > >> > + if(G==1){c[i,1]<-names(m[n]) > >> > + c[i,2]<-unique(m$Chr[G:s$Probes[i]]) > >> > + c[i,3]<-min(m$Start[G:s$Probes[i]]) > >> > + c[i,4]<-max(m$End[G:s$Probes[i]]) > >> > + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) > >> > > >> > + G=(G+s$Probes[i])} > >> > + else if((G-1) < length(m$Sample1)) { > >> > > >> > + c[i,1]<-names(m[n]) > >> > + c[i,2]<-unique(m$Chr[G:(G+s$Probes[i]-1)]) > >> > + c[i,3]<-min(m$Start[G:(G+s$Probes[i]-1)]) > >> > + c[i,4]<-max(m$End[G:(G+s$Probes[i]-1)]) > >> > + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) > >> > > >> > + G=(G+s$Probes[i])} > >> > + else { > >> > + G=1 > >> > > >> > + n=n+1 > >> > + c[i,1]<-names(m[n]) > >> > + c[i,2]<-unique(m$Chr[G:s$Probes[i]]) > >> > + c[i,3]<-min(m$Start[G:s$Probes[i]]) > >> > + c[i,4]<-max(m$End[G:s$Probes[i]]) > >> > + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) > >> > > >> > + G=(G+s$Probes[i])}} > >> > > >> >> c > >> > > >> > Sample Chr Start End Values Probes > >> > > >> > 1 Sample1 chr2 9896633 14404502 0 4 > >> > 2 Sample1 chr2 14421718 16048724 -0.43 4 > >> > 3 Sample1 chr2 37491676 37703009 0 2 > >> > 4 Sample2 chr2 9896633 9896690 0 2 > >> > 5 Sample2 chr2 14314039 16048724 -0.35 6 > >> > 6 Sample2 chr2 37491676 37703009 0 2 > >> > 7 Sample3 chr2 9896633 14314098 0 3 > >> > 8 Sample3 chr2 14404467 16031769 0.32 3 > >> > 9 Sample3 chr2 16036178 37491735 0.45 3 > >> > 10 Sample3 chr2 37702947 37703009 0 1 > >> > > >> > > >> > The problem that I am facing is for expanding rle function for values > >> > and > >> > probes. > >> > Defintely your code looks simpler, but I would like to read the file > by > >> > just > >> > giving the name of the file as written in my code because my original > >> > file > >> > contains 150 samples,but how to use lapply or rle function for 150 > such > >> > samples, if my file contain 150 samples similiar to sample1 and > sample2. > >> > > >> > waiting for your reply, > >> > Thanks, > >> > Suji > >> > > >> > On Wed, Sep 28, 2011 at 11:37 AM, jim holtman <jholt...@gmail.com> > >> > wrote: > >> >> > >> >> Here one approach: > >> >> > >> >> > x <- read.table(textConnection("Chr start end sample1 sample2 > >> >> + chr2 9896633 9896683 0 0 > >> >> + chr2 9896639 9896690 0 0 > >> >> + chr2 14314039 14314098 0 -0.35 > >> >> + chr2 14404467 14404502 0 -0.35 > >> >> + chr2 14421718 14421777 -0.43 -0.35 > >> >> + chr2 16031710 16031769 -0.43 -0.35 > >> >> + chr2 16036178 16036237 -0.43 -0.35 > >> >> + chr2 16048665 16048724 -0.43 -0.35 > >> >> + chr2 37491676 37491735 0 0 > >> >> + chr2 37702947 37703009 0 0"), header = TRUE, as.is = TRUE) > >> >> > closeAllConnections() > >> >> > > >> >> > result <- lapply(c('sample1', 'sample2'), function(.samp){ > >> >> + # split by breaks in the values > >> >> + .grps <- split(x, cumsum(c(0, diff(x[[.samp]]) != 0))) > >> >> + > >> >> + # combine the list of dataframes > >> >> + .range <- do.call(rbind, lapply(.grps, function(.set){ > >> >> + # create a dataframe of the results > >> >> + data.frame(Sample = .samp > >> >> + , Chr = .set$Chr[1L] > >> >> + , Start = min(.set$start) > >> >> + , End = max(.set$end) > >> >> + , Values = .set[[.samp]][1L] > >> >> + , Probes = nrow(.set) > >> >> + ) > >> >> + })) > >> >> + }) > >> >> > # put the list of dataframes together > >> >> > result <- do.call(rbind, result) > >> >> > result > >> >> Sample Chr Start End Values Probes > >> >> 0 sample1 chr2 9896633 14404502 0.00 4 > >> >> 1 sample1 chr2 14421718 16048724 -0.43 4 > >> >> 2 sample1 chr2 37491676 37703009 0.00 2 > >> >> 01 sample2 chr2 9896633 9896690 0.00 2 > >> >> 11 sample2 chr2 14314039 16048724 -0.35 6 > >> >> 21 sample2 chr2 37491676 37703009 0.00 2 > >> >> > > >> >> > >> >> > >> >> On Mon, Sep 26, 2011 at 10:30 AM, sujitha <virith...@gmail.com> > wrote: > >> >> > Hi group, > >> >> > > >> >> > This is how my test file looks like: > >> >> > Chr start end sample1 sample2 > >> >> > chr2 9896633 9896683 0 0 > >> >> > chr2 9896639 9896690 0 0 > >> >> > chr2 14314039 14314098 0 -0.35 > >> >> > chr2 14404467 14404502 0 -0.35 > >> >> > chr2 14421718 14421777 -0.43 -0.35 > >> >> > chr2 16031710 16031769 -0.43 -0.35 > >> >> > chr2 16036178 16036237 -0.43 -0.35 > >> >> > chr2 16048665 16048724 -0.43 -0.35 > >> >> > chr2 37491676 37491735 0 0 > >> >> > chr2 37702947 37703009 0 0 > >> >> > > >> >> > This is the output that I am expecting: > >> >> > Sample Chr Start End Values Probes > >> >> > sample1 chr2 9896633 14404502 0 4 > >> >> > sample1 chr2 14421718 16048724 -0.43 4 > >> >> > sample1 chr2 37491676 37703001 0 2 > >> >> > sample2 chr2 9896633 9896690 0 2 > >> >> > sample2 chr2 14314039 16048724 -0.35 6 > >> >> > sample2 chr2 37491676 37703009 0 2 > >> >> > > >> >> > Here the Chr value is same but can be any other value aswell so > >> >> > unique > >> >> > among > >> >> > the similar values. The Start for the first line would be the least > >> >> > value > >> >> > until values are similiar (4) then the end would be highest value. > >> >> > The > >> >> > values is the unique value among the common values and probes is > >> >> > number > >> >> > of > >> >> > similar values. > >> >> > > >> >> > Code: > >> >> > >> >> >> > >> >> >> >> > >>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric')) > >> >> > #reading the test file > >> >> > >> >> >> > >> >> >> >> > >>s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]])) > >> >> > # to get the last 2 columns > >> >> >> names(s)=c("Values","Probes") > >> >> >>G=1 > >> >> >> for(i in 1:length(s$Probes)){ > >> >> > + if(G==1){first<-unique(m$Chr[G:s$Probes[i]]) > >> >> > + second<-min(m$Start[G:s$Probes[i]]) > >> >> > + third<-max(m$End[G:s$Probes[i]]) > >> >> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i]) > >> >> > + print (c) > >> >> > + G=(G+s$Probes[i])} > >> >> > + else if((G-1) < length(m$Sample1)) { > >> >> > + first<-unique(m$Chr[G:(G+s$Probes[i]-1)]) > >> >> > + second<-min(m$Start[G:(G+s$Probes[i]-1)]) > >> >> > + third<-max(m$End[G:(G+s$Probes[i]-1)]) > >> >> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i]) > >> >> > + print (c) > >> >> > + G=(G+s$Probes[i])} > >> >> > + else { > >> >> > + G=1 > >> >> > + first<-unique(m$Chr[G:s$Probes[i]]) > >> >> > + second<-min(m$Start[G:s$Probes[i]]) > >> >> > + third<-max(m$End[G:s$Probes[i]]) > >> >> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i]) > >> >> > + print (c) > >> >> > + G=(G+s$Probes[i])} > >> >> > + } > >> >> > so the output is: > >> >> > first second third > >> >> > [1,] "chr2" "9896633" "14404502" "0" "4" > >> >> > first second third > >> >> > [1,] "chr2" "14421718" "16048724" "-0.43" "4" > >> >> > first second third > >> >> > [1,] "chr2" "37491676" "37703009" "0" "2" > >> >> > first second third > >> >> > [1,] "chr2" "9896633" "9896690" "0" "2" > >> >> > first second third > >> >> > [1,] "chr2" "14314039" "16048724" "-0.35" "6" > >> >> > first second third > >> >> > [1,] "chr2" "37491676" "37703009" "0" "2" > >> >> > > >> >> > I get almost the required output but just need 3 modifications to > >> >> > this > >> >> > code: > >> >> > 1) Since this is just a small part of the file (with 2 samples), > but > >> >> > my > >> >> > actual file has 150 samples, so how do I write rle function for > that? > >> >> > 2) How do I store all the executed c values as a dataframe (here I > am > >> >> > just > >> >> > printing the values)? > >> >> > 3) How do I include sample name in execution? > >> >> > Waiting for your reply , > >> >> > Thanks, > >> >> > Suji > >> >> > > >> >> > > >> >> > -- > >> >> > View this message in context: > >> >> > > >> >> > > http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html > >> >> > Sent from the R help mailing list archive at Nabble.com. > >> >> > > >> >> > ______________________________________________ > >> >> > R-help@r-project.org mailing list > >> >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >> > PLEASE do read the posting guide > >> >> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > >> >> > and provide commented, minimal, self-contained, reproducible code. > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Jim Holtman > >> >> Data Munger Guru > >> >> > >> >> What is the problem that you are trying to solve? > >> > > >> > > >> > >> > >> > >> -- > >> Jim Holtman > >> Data Munger Guru > >> > >> What is the problem that you are trying to solve? > > > > > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.