Dear All, I have a following for-loop code which is basically intended to read in many excel files (each file has many columns and rows) in a directory and extract the some rows and columns out of each file and then combine them together into a dataframe. I use for loop which can do the work but quite slow. How to make it faster using lapply function ? Thanks in advance!
temp.df<-c() # create an empty list to store the extracted result from each excel file inside for-loop for (i in list.files()) { # loop through each excel file in the directory temp<-read_xlsx(i,sheet=1,range=cell_cols(c(1,30,38:42))) # from package "readxl" to read in excel file temp<-temp[grep("^geneA$|^geneB$|^geneC$",temp$Id),] # extract rows based on temp$id names(temp)<-gsub("^.*] ","",names(temp)) # clean up column names temp.df<-append(temp.df, list(as.data.frame(temp))) # change the dataframe to list, so it can be append to list. if (i == list.files()[length(list.files())]){ # if it is last excel file, then combine all the rows in the list into a dataframe because they all have same column names temp.df.all<-do.call("rbind",temp.df) write_xlsx(temp.df.all, path="output.xlsx") # write_xlsx from package writexl. } } *Stephen* [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.