Re: [R] Need some hint on faster data manipulation.

Kenn Konstabel Sat, 17 May 2008 13:30:43 -0700

Can it be this:

foo<-tapply(d$tt, d$v, min)
data.frame(v=names(foo), tt=foo)


On Sat, May 17, 2008 at 10:56 PM, jim holtman <[EMAIL PROTECTED]> wrote:

> Is this what you want:
>
> >   v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))
> >
> >            tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
> >            d<-data.frame(v,tt)
> > do.call(rbind, lapply(split(d, d$v), function(x){
> +     x[which.min(x$tt),]
> + }))
>    v tt
> v1 v1  1
> v2 v2  1
> v3 v3  4
> v4 v4  2
> v5 v5  1
> >
> >
>
>
> On Sat, May 17, 2008 at 3:48 PM, souvik banerjee <[EMAIL PROTECTED]>
> wrote:
>
> > Hi,
> >            I am facing a problem in data manipulation. Suppose a data
> frame
> > contains two columns. The first column consists of some repeated
> characters
> > and the second consists of some numerical values. The problem is to
> extract
> > and create a new data frame consisting of rows of each unique character
> of
> > first column with minimum second column entry. For example if "d" is the
> > data frame, created with the following R code
> >
> >
> >            v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))
> >
> >            tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
> >            d<-data.frame(v,tt)
> >
> > then the answer would be
> >
> >
> >                          v         tt
> >
> >                         v1         1
> >
> >                         v2         1
> >
> >                         v3         4
> >
> >                         v4         2
> >
> >                         v5         1
> >
> >
> >
> > I have written a small R code given below that does the job (assumming
> "d"
> > to the initial data frame)
> >
> >
> >
> >            b<-data.frame(NULL)
> >
> >            i<-1
> >
> >            x<-d[1,]
> >
> >            while(i<dim(d)[1])
> >
> >            {
> >
> >                        if(length(unique(x[,1]))==1)
> >
> >                        {
> >
> >                                    x<-rbind(x,d[i+1,])
> >
> >                                    i=i+1
> >
> >                        }
> >
> >                        if(length(unique(x[,1]))>1)
> >
> >                        {
> >
> >                                    y<-x[1:(nrow(x)-1),]
> >
> >                                    z<-which(y[,2]==min(y[,2]))
> >
> >                                    b<-rbind(b,y[z,])
> >
> >                                    x<-d[i,]
> >
> >                        }
> >
> >            }
> >
> >            z<-which(x[,2]==min(x[,2]))
> >
> >            b<-rbind(b,x[z,])
> >
> >            b
> >
> >
> >
> > The code is working properly giving me the desired result, but the
> problem
> > is that  I have to repeat this procedure for many data frames and nearly
> > all
> > the data frame contains approximately 15,000 repeated characters with
> more
> > than 12,500 unique characters. Using the above code in a loop is taking a
> > considerable amount of time to compute.
> > Can anybody suggest me of a faster approach?
> >
> > Regards
> >
> >  Souvik Bandyopadhyay
> > Research Fellow,
> > Dept Of Statistics
> > Calcutta University
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html<
> http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need some hint on faster data manipulation.

Reply via email to