[R] Need some hint on faster data manipulation.

souvik banerjee Sat, 17 May 2008 12:49:59 -0700

Hi,
            I am facing a problem in data manipulation. Suppose a data frame
contains two columns. The first column consists of some repeated characters
and the second consists of some numerical values. The problem is to extract
and create a new data frame consisting of rows of each unique character of
first column with minimum second column entry. For example if "d" is the
data frame, created with the following R code



            v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))

            tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
            d<-data.frame(v,tt)

then the answer would be


                          v         tt

                         v1         1

                         v2         1

                         v3         4

                         v4         2

                         v5         1



I have written a small R code given below that does the job (assumming "d"
to the initial data frame)



            b<-data.frame(NULL)

            i<-1

            x<-d[1,]

            while(i<dim(d)[1])

            {

                        if(length(unique(x[,1]))==1)

                        {

                                    x<-rbind(x,d[i+1,])

                                    i=i+1

                        }

                        if(length(unique(x[,1]))>1)

                        {

                                    y<-x[1:(nrow(x)-1),]

                                    z<-which(y[,2]==min(y[,2]))

                                    b<-rbind(b,y[z,])

                                    x<-d[i,]

                        }

            }

            z<-which(x[,2]==min(x[,2]))

            b<-rbind(b,x[z,])

            b



The code is working properly giving me the desired result, but the problem
is that  I have to repeat this procedure for many data frames and nearly all
the data frame contains approximately 15,000 repeated characters with more
than 12,500 unique characters. Using the above code in a loop is taking a
considerable amount of time to compute.
Can anybody suggest me of a faster approach?

Regards

 Souvik Bandyopadhyay
Research Fellow,
Dept Of Statistics
Calcutta University

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need some hint on faster data manipulation.

Reply via email to