Hi, This seemed to be faster than the other two methods: vec1<- as.character(rep(dat1[,1],each=(ncol(dat1)-1))) vec2<- as.character(unlist(t(dat1[,-1]))) vec3<- rep(rep(c(TRUE,FALSE),c(1,(ncol(dat1)-2))),nrow(dat1)) dat2<-data.frame(DSYSRTKY=vec1,CODE=vec2,PRIMAIRY=vec3,stringsAsFactors=FALSE) dat3<- dat2[dat2[,2]!="",] row.names(dat3)<- 1:nrow(dat3) dat3New<-within(dat3,{ID<-row.names(dat3)})[,c(4,1:3)]
#Out1## Output dataset Out1$PRIMAIRY<- as.logical(Out1$PRIMAIRY) identical(Out1,dat3New) #[1] TRUE #Speed test indx<-rep(1:nrow(dat1),6e4) dat2<- dat1[indx,] system.time({ vec1<- as.character(rep(dat2[,1],each=(ncol(dat2)-1))) vec2<- as.character(unlist(t(dat2[,-1]))) vec3<- rep(rep(c(TRUE,FALSE),c(1,(ncol(dat2)-2))),nrow(dat2)) dat4<-data.frame(DSYSRTKY=vec1,CODE=vec2,PRIMAIRY=vec3,stringsAsFactors=FALSE) dat5<- dat4[dat4[,2]!="",] row.names(dat5)<- 1:nrow(dat5) dat5New<-within(dat5,{ID<-row.names(dat5)})[,c(4,1:3)] }) # user system elapsed # 12.620 0.684 13.333 dim(dat5New) #[1] 2880000 4 A.K. Hi Arun, The second method is indeed working much faster. It worked fast for my 600.000 row record. Still I have 2 bigger files where processing becomes an issue even though I have lots of memory (32 gig) for the second statement: res2<-reshape(dat2,idvar="newCol",varying=list(2:26),direction="long") Would data.table also take less memory? Maybe even speed things up would be good. How would I do it? I think splitting the dataframe before merging it might also be an option and after that combining them, any ideas on that? Regards Dirk ----- Original Message ----- From: arun <smartpink...@yahoo.com> To: R help <r-help@r-project.org> Cc: Sent: Wednesday, August 14, 2013 10:39 AM Subject: Re: [R] Create rows for columns in dataframe Hi, I tried the second method on a bigger dataset. This is what I get, indx<-rep(1:nrow(dat1),6e4) dat2<- dat1[indx,] system.time({ vec1<- paste(dat2[,1],dat2[,2],colnames(dat2)[2],sep=".") res2<-reshape(dat2,idvar="newCol",varying=list(2:26),direction="long") res3<-res2[order(res2[,4]),] res4<- res3[res3[,3]!="",-4] vec2<-paste(res4[,1],res4[,3],paste0("C",res4[,2]),sep=".") res4$PRIMAIRY<-vec2%in%vec1 row.names(res4)<-1:nrow(res4) res4$ID<- row.names(res4) res4[,c(1,3)]<- lapply(res4[,c(1,3)],as.character) res5<-res4[,c(5,1,3,4)] colnames(res5)[3]<-"CODE"}) # user system elapsed #144.672 2.072 147.034 #reshape() step is taking most of the time dim(res5) #[1] 2880000 4 #Comparing this to the first method on a smaller subset of dat2. dat2New<- dat2[1:3e4,] system.time({ res1<-do.call(rbind,lapply(seq_len(nrow(dat2New)),function(i) {x1<-as.character(unlist(dat2New[i,-1]));CODE<-x1[x1!=""];PRIMAIRY<-x1[x1!=""]==head(x1,1); DSYSRTKY=as.numeric(as.character(dat2[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE) })) res1$ID<- row.names(res1) res2<-res1[,c(4,1:3)] }) # user system elapsed #166.452 15.752 182.643 nrow(dat2)-nrow(dat2New) #[1] 330000 You might also try library(data.table). Should be faster.. A.K. ----- Original Message ----- From: Dark <i...@software-solutions.nl> To: r-help@r-project.org Cc: Sent: Wednesday, August 14, 2013 5:41 AM Subject: Re: [R] Create rows for columns in dataframe Hi A.K, Thanks for your great help. I'm now running your first suggestion on a 600.000 row sample after verifying it works on a smaller sample. It's now been running for 40 minutes. Which method do you think will be faster? Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673704.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.