On Jan 2, 2008, at 6:05 PM, Talbot Katz wrote: > Hi. > > I have a matrix stored in a large, tab-delimited flat file. The > first row contains column names. Because the matrix is symmetric, > the file has lower triangular format, so the second row contains > one number, the third row two numbers, etc. In general, row k+1 > contains k numbers; the matrix has 3000 rows, so the file has 3001 > rows. The file has variable length records, so each row ends with > its last piece of data. I read in the file and produced the full > symmetric matrix as follows: > >> mana01 <- scan( file = "C:/mat.dat", sep = "\t", nlines = 1, what >> = "character" )Read 3000 items> nco <- length( mana01 )> malt <- >> matrix(0, nrow = nco, ncol = nco )> colnames( malt ) <- mana01> >> rownames( malt ) <- mana01> for ( i in 1:3000 ) { malt[ i, (1:i) ] >> <- scan( file="C:/mat.dat", skip = i, n = i, quiet = TRUE ) } >> mat <- malt + t( malt ) - diag( diag( malt ) )> > > The for loop took a couple of hours to complete. I suspect there's > a much faster way to do this. Any suggestions? Thanks!
I saw Jim's reply just after having just written a solution, so here is my take on it. The key thing, as Jim mentioned, is to not use scan each time, but to read the whole thing in and then process it. I read the lines, used strsplit to get a list of each individual line, and then used sapply after extending each row by the right number of zeros. Not sure which of the two is faster. nms <- scan("~/Desktop/testing.txt", sep="\t", nlines=1, what=character(0)) x <- scan("~/Desktop/testing.txt", sep="\n", skip=1, what=character (0)) # read as a vector of lines splt <- strsplit(x,"\t") # split at the tabs nr <- length(nms) splt <- sapply(splt, function(x) c(as.numeric(x), rep(0,nr-length (x)))) # extend each for by the right number of zeros. Haris Skiadas Department of Mathematics and Computer Science Hanover College ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.