Hi, I am facing a problem in data manipulation. Suppose a data frame contains two columns. The first column consists of some repeated characters and the second consists of some numerical values. The problem is to extract and create a new data frame consisting of rows of each unique character of first column with minimum second column entry. For example if "d" is the data frame, created with the following R code
v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6)) tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4) d<-data.frame(v,tt) then the answer would be v tt v1 1 v2 1 v3 4 v4 2 v5 1 I have written a small R code given below that does the job (assumming "d" to the initial data frame) b<-data.frame(NULL) i<-1 x<-d[1,] while(i<dim(d)[1]) { if(length(unique(x[,1]))==1) { x<-rbind(x,d[i+1,]) i=i+1 } if(length(unique(x[,1]))>1) { y<-x[1:(nrow(x)-1),] z<-which(y[,2]==min(y[,2])) b<-rbind(b,y[z,]) x<-d[i,] } } z<-which(x[,2]==min(x[,2])) b<-rbind(b,x[z,]) b The code is working properly giving me the desired result, but the problem is that I have to repeat this procedure for many data frames and nearly all the data frame contains approximately 15,000 repeated characters with more than 12,500 unique characters. Using the above code in a loop is taking a considerable amount of time to compute. Can anybody suggest me of a faster approach? Regards Souvik Bandyopadhyay Research Fellow, Dept Of Statistics Calcutta University [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.