If you can manage to write out your data in separate binary files, one for each 
column, then another possibility is using package ff. You can link those binary 
columns into R by defining an ffdf dataframe: columns are memory mapped and you 
can access those parts you need - without initially importing them. This is 
much faster than a csv import and also works for files that are too large to 
import at once. If all your columns have the same storage.mode (vmode in ff), 
then another alternative is writing out all your data in one single binary 
matrix with major row-order (because that can be written row by row from your 
program) and link the file into R as a single ff_matrix.

Since ffdf in ff is new, I give a mini-tutorial below.
Let me know how that works for you.

Kind regards


Jens Oehlschlägel




library(ff)

# Create example csv
fnam <- "/tmp/example.csv"
write.csv(data.frame(a=1:9, b=1:9+0.1), file=fnam, row.names=FALSE)

# Create example binary files on disk.
# Reading csv into ffdf actually stores
# each column as a binary file on disk.
# Using a pattern outside fftempdir automatically sets finalizer="close"
# and thus makes those binary files permanent.
path <- "/tmp/example_"
x <- read.csv.ffdf(file=fnam, ff_args=list(pattern=path))
close(x)

# Note that a standard ffdf is made-up column by column from simple ff objects.
# More coplex mappings from ff objects into ffdf are possible, 
# but let's keep it simple for now.
p <- physical(x)
p

# Now let's just create an ffdf from existing binary files.
# Step one: create an ff object for each binary file (without reading them).

# Note that because we open ff files outside fftempdir, 
# the default finalizer is "close", not "delete", 
# so the file will not be deleted on finalization
# files are opened for memory mapping, but not read
ffcols <- vector("list", length(p))
for (i in 1:length(p)){
  ffcols[[i]] <- ff(filename=filename(p[[i]]), vmode=vmode(p[[i]]))
}
ffcols

# step two: bundle several ff objects into one ffdf data.frame 
# (still without reading data)
ffdafr <- ffdf(a=ffcols[[1]], b=ffcols[[2]])

# now reading rows from this will return a standard data.frame 
# (and only read the required rows)
ffdafr[1:4,]
ffdafr[5:9,]


# As an alternative create an example binary 
# (double) matrix in major row order
y <- as.ff(t(ffdafr[,]), filename="d:/tmp/example_single_matrix.ff")

# Again we can link this existing binary file.
# if we know the size of the matrix we can do
z <- ff(filename=filename(y), vmode="double", dim=c(9,2), dimorder=c(2,1))
z
rm(z)

# If we only know the number of columns we can do
z <- ff(filename=filename(y), vmode="double")
# and set dim later
dim(z) <- c(length(z)/2, 2)
# Note that so far we have interpreted the file in major column order
z
# To interpret the file in major column order we set dimorder 
# (a generalization for n-way arrays)
dimorder(z) <- c(2,1)
z


# removing the ff objects will trigger finalizer 
# at next garbage collection
rm(x, ffcols, ffdafr, y, z)
gc()

# since we carefully selected the "close" finalizer, 
# the files still exist
dir(path="/tmp", pattern="example_")

# now remove them physically
unlink(file.path("/tmp", dir(path="/tmp", pattern="example_")))

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to