Re: [R] Need some hint on faster data manipulation.

jim holtman Sat, 17 May 2008 12:57:46 -0700

Is this what you want:

>   v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))
>
>            tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
>            d<-data.frame(v,tt)
> do.call(rbind, lapply(split(d, d$v), function(x){
+     x[which.min(x$tt),]
+ }))
    v tt
v1 v1  1
v2 v2  1
v3 v3  4
v4 v4  2
v5 v5  1
>
>



On Sat, May 17, 2008 at 3:48 PM, souvik banerjee <[EMAIL PROTECTED]>
wrote:

> Hi,
>            I am facing a problem in data manipulation. Suppose a data frame
> contains two columns. The first column consists of some repeated characters
> and the second consists of some numerical values. The problem is to extract
> and create a new data frame consisting of rows of each unique character of
> first column with minimum second column entry. For example if "d" is the
> data frame, created with the following R code
>
>
>            v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))
>
>            tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
>            d<-data.frame(v,tt)
>
> then the answer would be
>
>
>                          v         tt
>
>                         v1         1
>
>                         v2         1
>
>                         v3         4
>
>                         v4         2
>
>                         v5         1
>
>
>
> I have written a small R code given below that does the job (assumming "d"
> to the initial data frame)
>
>
>
>            b<-data.frame(NULL)
>
>            i<-1
>
>            x<-d[1,]
>
>            while(i<dim(d)[1])
>
>            {
>
>                        if(length(unique(x[,1]))==1)
>
>                        {
>
>                                    x<-rbind(x,d[i+1,])
>
>                                    i=i+1
>
>                        }
>
>                        if(length(unique(x[,1]))>1)
>
>                        {
>
>                                    y<-x[1:(nrow(x)-1),]
>
>                                    z<-which(y[,2]==min(y[,2]))
>
>                                    b<-rbind(b,y[z,])
>
>                                    x<-d[i,]
>
>                        }
>
>            }
>
>            z<-which(x[,2]==min(x[,2]))
>
>            b<-rbind(b,x[z,])
>
>            b
>
>
>
> The code is working properly giving me the desired result, but the problem
> is that  I have to repeat this procedure for many data frames and nearly
> all
> the data frame contains approximately 15,000 repeated characters with more
> than 12,500 unique characters. Using the above code in a loop is taking a
> considerable amount of time to compute.
> Can anybody suggest me of a faster approach?
>
> Regards
>
>  Souvik Bandyopadhyay
> Research Fellow,
> Dept Of Statistics
> Calcutta University
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need some hint on faster data manipulation.

Reply via email to