Re: [R] How to Store the executed values in a dataframe & rle function

viritha k Thu, 29 Sep 2011 13:19:09 -0700

Thanks,It does work for the sample data.
When I use it for my actual data it is throwing this error
Error in data.frame(Sample = .samp, Chr = .set$Chr[1L], Start =
min(.set$Start),  :
  arguments imply differing number of rows: 1, 0
I am not able to understand Why I am getting this?
waiting for your reply,
Thanks,
Suji
On Wed, Sep 28, 2011 at 4:15 PM, jim holtman <jholt...@gmail.com> wrote:


> I only used textConnection for the sample data.  Just put your file
> name in the read.table; e.g.,
>
>
> x<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
>
> as you have in your email.  I used 'x' in my code, so I replaced your
> 'm' with 'x'.
>
> Try it and see if it works; no reason it shouldn't.
>
>
>
> On Wed, Sep 28, 2011 at 3:03 PM, viritha k <virith...@gmail.com> wrote:
> > Hi Jim,
> >  Thanks for the reply, ok but I dont want to use textConnection and paste
> > each line but want the input to be read from a file like
> >
> m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric').
> > So how do I incorporate that in your code.
> > Thanks,
> > Suji
> > On Wed, Sep 28, 2011 at 2:40 PM, jim holtman <jholt...@gmail.com> wrote:
> >>
> >> The solution that I sent will handle the 150 different samples; just
> >> list the column names in the argument to the top 'lapply'.  You don't
> >> need the 'rle' in my approach.
> >>
> >> On Wed, Sep 28, 2011 at 2:13 PM, viritha k <virith...@gmail.com> wrote:
> >> > Hi,
> >> > This is the code that I wrote for 3 samples:
> >> > code:
> >>
> >> >>
> >>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))
> >> >>
> >> >>
> >> >>
> s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
> >> >
> >> >> names(s)=c("Values","Probes")
> >> >>
> >> >>
> >> >>
> c<-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
> >> >> G=1
> >> >> n=4
> >> >
> >> >> for(i in 1:length(s$Probes)){
> >> >
> >> > + if(G==1){c[i,1]<-names(m[n])
> >> > + c[i,2]<-unique(m$Chr[G:s$Probes[i]])
> >> > + c[i,3]<-min(m$Start[G:s$Probes[i]])
> >> > + c[i,4]<-max(m$End[G:s$Probes[i]])
> >> > + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
> >> >
> >> > + G=(G+s$Probes[i])}
> >> > + else if((G-1) < length(m$Sample1)) {
> >> >
> >> > + c[i,1]<-names(m[n])
> >> > + c[i,2]<-unique(m$Chr[G:(G+s$Probes[i]-1)])
> >> > + c[i,3]<-min(m$Start[G:(G+s$Probes[i]-1)])
> >> > + c[i,4]<-max(m$End[G:(G+s$Probes[i]-1)])
> >> > + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
> >> >
> >> > + G=(G+s$Probes[i])}
> >> > + else {
> >> > + G=1
> >> >
> >> > + n=n+1
> >> > +  c[i,1]<-names(m[n])
> >> > + c[i,2]<-unique(m$Chr[G:s$Probes[i]])
> >> > + c[i,3]<-min(m$Start[G:s$Probes[i]])
> >> > + c[i,4]<-max(m$End[G:s$Probes[i]])
> >> > + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
> >> >
> >> > + G=(G+s$Probes[i])}}
> >> >
> >> >> c
> >> >
> >> >     Sample  Chr    Start      End Values Probes
> >> >
> >> > 1  Sample1 chr2  9896633 14404502      0      4
> >> > 2  Sample1 chr2 14421718 16048724  -0.43      4
> >> > 3  Sample1 chr2 37491676 37703009      0      2
> >> > 4  Sample2 chr2  9896633  9896690      0      2
> >> > 5  Sample2 chr2 14314039 16048724  -0.35      6
> >> > 6  Sample2 chr2 37491676 37703009      0      2
> >> > 7  Sample3 chr2  9896633 14314098      0      3
> >> > 8  Sample3 chr2 14404467 16031769   0.32      3
> >> > 9  Sample3 chr2 16036178 37491735   0.45      3
> >> > 10 Sample3 chr2 37702947 37703009      0      1
> >> >
> >> >
> >> > The problem that I am facing is for expanding rle function for values
> >> > and
> >> > probes.
> >> > Defintely your code looks simpler, but I would like to read the file
> by
> >> > just
> >> > giving the name of the file as written in my code because my original
> >> > file
> >> > contains 150 samples,but how to use lapply or rle function for 150
> such
> >> > samples, if my file contain 150 samples similiar to sample1 and
> sample2.
> >> >
> >> > waiting for your reply,
> >> > Thanks,
> >> > Suji
> >> >
> >> > On Wed, Sep 28, 2011 at 11:37 AM, jim holtman <jholt...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Here one approach:
> >> >>
> >> >> > x <- read.table(textConnection("Chr start end sample1 sample2
> >> >> + chr2 9896633 9896683 0 0
> >> >> + chr2 9896639 9896690 0 0
> >> >> + chr2 14314039 14314098 0 -0.35
> >> >> + chr2 14404467 14404502 0 -0.35
> >> >> + chr2 14421718 14421777 -0.43 -0.35
> >> >> + chr2 16031710 16031769 -0.43 -0.35
> >> >> + chr2 16036178 16036237 -0.43 -0.35
> >> >> + chr2 16048665 16048724 -0.43 -0.35
> >> >> + chr2 37491676 37491735 0 0
> >> >> + chr2 37702947 37703009 0 0"), header = TRUE, as.is = TRUE)
> >> >> > closeAllConnections()
> >> >> >
> >> >> > result <- lapply(c('sample1', 'sample2'), function(.samp){
> >> >> +     # split by breaks in the values
> >> >> +     .grps <- split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
> >> >> +
> >> >> +     # combine the list of dataframes
> >> >> +     .range <- do.call(rbind, lapply(.grps, function(.set){
> >> >> +         # create a dataframe of the results
> >> >> +         data.frame(Sample = .samp
> >> >> +                    , Chr = .set$Chr[1L]
> >> >> +                    , Start = min(.set$start)
> >> >> +                    , End = max(.set$end)
> >> >> +                    , Values = .set[[.samp]][1L]
> >> >> +                    , Probes = nrow(.set)
> >> >> +                    )
> >> >> +         }))
> >> >> +     })
> >> >> > # put the list of dataframes together
> >> >> > result <- do.call(rbind, result)
> >> >> > result
> >> >>    Sample  Chr    Start      End Values Probes
> >> >> 0  sample1 chr2  9896633 14404502   0.00      4
> >> >> 1  sample1 chr2 14421718 16048724  -0.43      4
> >> >> 2  sample1 chr2 37491676 37703009   0.00      2
> >> >> 01 sample2 chr2  9896633  9896690   0.00      2
> >> >> 11 sample2 chr2 14314039 16048724  -0.35      6
> >> >> 21 sample2 chr2 37491676 37703009   0.00      2
> >> >> >
> >> >>
> >> >>
> >> >> On Mon, Sep 26, 2011 at 10:30 AM, sujitha <virith...@gmail.com>
> wrote:
> >> >> > Hi group,
> >> >> >
> >> >> > This is how my test file looks like:
> >> >> > Chr start end sample1 sample2
> >> >> > chr2 9896633 9896683 0 0
> >> >> > chr2 9896639 9896690 0 0
> >> >> > chr2 14314039 14314098 0 -0.35
> >> >> > chr2 14404467 14404502 0 -0.35
> >> >> > chr2 14421718 14421777 -0.43 -0.35
> >> >> > chr2 16031710 16031769 -0.43 -0.35
> >> >> > chr2 16036178 16036237 -0.43 -0.35
> >> >> > chr2 16048665 16048724 -0.43 -0.35
> >> >> > chr2 37491676 37491735 0 0
> >> >> > chr2 37702947 37703009 0 0
> >> >> >
> >> >> > This is the output that I am expecting:
> >> >> > Sample Chr Start End Values Probes
> >> >> > sample1 chr2 9896633 14404502 0 4
> >> >> > sample1 chr2 14421718 16048724 -0.43 4
> >> >> > sample1 chr2 37491676 37703001 0 2
> >> >> > sample2 chr2 9896633 9896690 0 2
> >> >> > sample2  chr2 14314039 16048724 -0.35 6
> >> >> > sample2 chr2 37491676 37703009 0 2
> >> >> >
> >> >> > Here the Chr value is same but can be any other value aswell so
> >> >> > unique
> >> >> > among
> >> >> > the similar values. The Start for the first line would be the least
> >> >> > value
> >> >> > until values are similiar (4) then the end would be highest value.
> >> >> > The
> >> >> > values is the unique value among the common values and probes is
> >> >> > number
> >> >> > of
> >> >> > similar values.
> >> >> >
> >> >> > Code:
> >> >>
> >> >> >>
> >> >> >> >>
> >>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
> >> >> > #reading the test file
> >> >>
> >> >> >>
> >> >> >> >>
> >>s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]]))
> >> >> > # to get the last 2 columns
> >> >> >> names(s)=c("Values","Probes")
> >> >> >>G=1
> >> >> >> for(i in 1:length(s$Probes)){
> >> >> > + if(G==1){first<-unique(m$Chr[G:s$Probes[i]])
> >> >> > + second<-min(m$Start[G:s$Probes[i]])
> >> >> > + third<-max(m$End[G:s$Probes[i]])
> >> >> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> >> >> > + print (c)
> >> >> > + G=(G+s$Probes[i])}
> >> >> > + else if((G-1) < length(m$Sample1)) {
> >> >> > + first<-unique(m$Chr[G:(G+s$Probes[i]-1)])
> >> >> > + second<-min(m$Start[G:(G+s$Probes[i]-1)])
> >> >> > + third<-max(m$End[G:(G+s$Probes[i]-1)])
> >> >> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> >> >> > + print (c)
> >> >> > + G=(G+s$Probes[i])}
> >> >> > + else {
> >> >> > + G=1
> >> >> > + first<-unique(m$Chr[G:s$Probes[i]])
> >> >> > + second<-min(m$Start[G:s$Probes[i]])
> >> >> > + third<-max(m$End[G:s$Probes[i]])
> >> >> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> >> >> > + print (c)
> >> >> > + G=(G+s$Probes[i])}
> >> >> > + }
> >> >> > so the output is:
> >> >> >     first  second    third
> >> >> > [1,] "chr2" "9896633" "14404502" "0" "4"
> >> >> >     first  second     third
> >> >> > [1,] "chr2" "14421718" "16048724" "-0.43" "4"
> >> >> >     first  second     third
> >> >> > [1,] "chr2" "37491676" "37703009" "0" "2"
> >> >> >     first  second    third
> >> >> > [1,] "chr2" "9896633" "9896690" "0" "2"
> >> >> >     first  second     third
> >> >> > [1,] "chr2" "14314039" "16048724" "-0.35" "6"
> >> >> >     first  second     third
> >> >> > [1,] "chr2" "37491676" "37703009" "0" "2"
> >> >> >
> >> >> > I get almost the required output but just need 3 modifications to
> >> >> > this
> >> >> > code:
> >> >> > 1) Since this is just a small part of the file (with 2 samples),
> but
> >> >> > my
> >> >> > actual file has 150 samples, so how do I write rle function for
> that?
> >> >> > 2) How do I store all the executed c values as a dataframe (here I
> am
> >> >> > just
> >> >> > printing the values)?
> >> >> > 3) How do I include sample name in execution?
> >> >> > Waiting for your reply ,
> >> >> > Thanks,
> >> >> > Suji
> >> >> >
> >> >> >
> >> >> > --
> >> >> > View this message in context:
> >> >> >
> >> >> >
> http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html
> >> >> > Sent from the R help mailing list archive at Nabble.com.
> >> >> >
> >> >> > ______________________________________________
> >> >> > R-help@r-project.org mailing list
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide
> >> >> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jim Holtman
> >> >> Data Munger Guru
> >> >>
> >> >> What is the problem that you are trying to solve?
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jim Holtman
> >> Data Munger Guru
> >>
> >> What is the problem that you are trying to solve?
> >
> >
>
>
>
> --
>  Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Store the executed values in a dataframe & rle function

Reply via email to