I neglected to include my test case, > df <- data.frame(x=1:(10^7))
Martin Martin Morgan <[EMAIL PROTECTED]> writes: > write.table with large data frames takes quite a long time > >> system.time({ > + write.table(df, '/tmp/dftest.txt', row.names=FALSE) > + }, gcFirst=TRUE) > user system elapsed > 97.302 1.532 98.837 > > A reason is because dimnames is always called, causing 'anonymous' row > names to be created as character vectors. Avoiding this in > src/library/utils, along the lines of > > Index: write.table.R > =================================================================== > --- write.table.R (revision 44717) > +++ write.table.R (working copy) > @@ -27,13 +27,18 @@ > > if(!is.data.frame(x) && !is.matrix(x)) x <- data.frame(x) > > + makeRownames <- is.logical(row.names) && !is.na(row.names) && > + row.names==TRUE > + makeColnames <- is.logical(col.names) && !is.na(col.names) && > + col.names==TRUE > if(is.matrix(x)) { > ## fix up dimnames as as.data.frame would > p <- ncol(x) > d <- dimnames(x) > if(is.null(d)) d <- list(NULL, NULL) > - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x)) > - if(is.null(d[[2]]) && p > 0) d[[2]] <- paste("V", 1:p, sep="") > + if (is.null(d[[1]]) && makeRownames) d[[1]] <- seq_len(nrow(x)) > + if(is.null(d[[2]]) && p > 0 && makeColnames) > + d[[2]] <- paste("V", 1:p, sep="") > if(is.logical(quote) && quote) > quote <- if(is.character(x)) seq_len(p) else numeric(0) > } else { > @@ -53,8 +58,8 @@ > quote <- ord[quote]; quote <- quote[quote > 0] > } > } > - d <- dimnames(x) > - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x)) > + d <- list(if (makeRownames==TRUE) row.names(x) else NULL, > + if (makeColnames==TRUE) names(x) else NULL) > p <- ncol(x) > } > nocols <- p==0 > > improves performance at least in proportion to nrow(x): > >> system.time({ > + write.table(df, '/tmp/dftest1.txt', row.names=FALSE) > + }, gcFirst=TRUE) > user system elapsed > 8.132 0.608 8.899 > > Martin > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel