Just wanted to start a discussion on whether R could ship with more
appropriate GC parameters. Right now, loading the recommended package
Matrix leads to:
> library(Matrix)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1076796 57.61368491 73.1 1198505 64.1
Vcells 1671329 12.
On Thu, Jan 15, 2015 at 11:08 AM, Simon Urbanek
wrote:
> In addition to the major points that others made: if you care about speed,
> don't use compression. With today's fast disks it's an order of magnitude
> slower to use compression:
>
>> d=lapply(1:10, function(x) as.integer(rnorm(1e7)))
>>
On Thu, 15 Jan 2015, Christian Sigg wrote:
Given a large data.frame, a function trains a series of models by looping over
two steps:
1. Create a model-specific subset of the complete training data
2. Train a model on the subset data
The function returns a list of trained models which are late
In addition to the major points that others made: if you care about speed,
don't use compression. With today's fast disks it's an order of magnitude
slower to use compression:
> d=lapply(1:10, function(x) as.integer(rnorm(1e7)))
> system.time(saveRDS(d, file="test.rds.gz"))
user system elaps
On 01/15/2015 01:45 PM, Stewart Morris wrote:
Hi,
I am dealing with very large datasets and it takes a long time to save a
workspace image.
The options to save compressed data are: "gzip", "bzip2" or "xz", the
default being gzip. I wonder if it's possible to include the pbzip2
(http://compres
On 15/01/2015 12:45, Stewart Morris wrote:
Hi,
I am dealing with very large datasets and it takes a long time to save a
workspace image.
Sounds like bad practice on your part ... saving images is not
recommended for careful work.
The options to save compressed data are: "gzip", "bzip2" or
Given a large data.frame, a function trains a series of models by looping over
two steps:
1. Create a model-specific subset of the complete training data
2. Train a model on the subset data
The function returns a list of trained models which are later used for
prediction on test data.
Due to h
Hi,
I am dealing with very large datasets and it takes a long time to save a
workspace image.
The options to save compressed data are: "gzip", "bzip2" or "xz", the
default being gzip. I wonder if it's possible to include the pbzip2
(http://compression.ca/pbzip2/) algorithm as an option when