Hello fellow R users, I am trying to read a 6.9 million row text file with 26 columns separated by spaces into R using ff. When I specify a small number for first.rows, next.rows and nrows it is read with no issue. However, when I try to specify larger next.rows values and no nrows parameter to read the entire file, I keep getting errors. Please see code below.
I am trying to this on a m1.large EC2 machine running R with 14.8 GB of memory. I haven't been able to read the entire dataset into memory using traditional read.table. I guess I am not sure given the error message if I need to specify further parameters. Thank you, Marck Vaisman ma...@vaisman.us http://www.linkedin.com/in/marckvaisman http://twitter.com/#!/wahalulu <http://twitter.com/#%21/wahalulu> > results.five <- read.table("./results/results.txt", + header = F, nrows = 5) # read 5 lines for structure > classes <- sapply(results.five, class) # to specify colClasses > classes V1 V2 V3 V4 V5 V6 V7 V8 "integer" "factor" "integer" "integer" "integer" "integer" "integer" "numeric" V9 V10 V11 V12 V13 V14 V15 V16 "numeric" "numeric" "numeric" "integer" "numeric" "numeric" "numeric" "numeric" V17 V18 V19 V20 V21 V22 V23 V24 "integer" "numeric" "numeric" "numeric" "numeric" "factor" "numeric" "numeric" V25 V26 "numeric" "numeric" > library(ff) > results.ff <- read.table.ffdf(file = "./results/results.txt", + header = F, + colClasses = classes, + first.rows = 1000, + next.rows = 1000, + nrows = 10000) > dim(results.ff) [1] 10000 26 > results.ff <- read.table.ffdf(file = "./results/results.txt", + header = F, + colClasses = classes, + first.rows = 10000, + next.rows = 100000) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer', got '3e+05' > rff <- read.table.ffdf(file = "./results/results.txt", + header = F, + colClasses = classes, + first.rows = 10000, + next.rows = 100000) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer', got '3e+05' > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.