Hello fellow R users,

I am trying to read a 6.9 million row text file with 26 columns separated by
spaces into R using ff. When I specify a small number for first.rows,
next.rows and nrows it is read with no issue. However, when I try to specify
larger next.rows values and no nrows parameter to read the entire file, I
keep getting errors. Please see code below.

I am trying to this on a m1.large EC2 machine running R with 14.8 GB of
memory. I haven't been able to read the entire dataset into memory using
traditional read.table.

I guess I am not sure given the error message if I need to specify further
parameters.

Thank you,
Marck Vaisman

ma...@vaisman.us
http://www.linkedin.com/in/marckvaisman
http://twitter.com/#!/wahalulu <http://twitter.com/#%21/wahalulu>

> results.five <- read.table("./results/results.txt",
+                          header = F, nrows = 5)   # read 5 lines for
structure
> classes <- sapply(results.five, class)   # to specify colClasses
> classes
       V1        V2        V3        V4        V5        V6        V7
V8
"integer"  "factor" "integer" "integer" "integer" "integer" "integer"
"numeric"
       V9       V10       V11       V12       V13       V14       V15
V16
"numeric" "numeric" "numeric" "integer" "numeric" "numeric" "numeric"
"numeric"
      V17       V18       V19       V20       V21       V22       V23
V24
"integer" "numeric" "numeric" "numeric" "numeric"  "factor" "numeric"
"numeric"
      V25       V26
"numeric" "numeric"
> library(ff)
> results.ff <- read.table.ffdf(file = "./results/results.txt",
+                                     header = F,
+                                     colClasses = classes,
+                                     first.rows = 1000,
+                                     next.rows = 1000,
+                                     nrows = 10000)
> dim(results.ff)
[1] 10000    26
> results.ff <- read.table.ffdf(file = "./results/results.txt",
+                                     header = F,
+                                     colClasses = classes,
+                                     first.rows = 10000,
+                                     next.rows = 100000)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
  scan() expected 'an integer', got '3e+05'
> rff <- read.table.ffdf(file = "./results/results.txt",
+                                     header = F,
+                                     colClasses = classes,
+                                     first.rows = 10000,
+                                     next.rows = 100000)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
  scan() expected 'an integer', got '3e+05'
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to