[Rd] default min-v/nsize parameters

2015-01-15 Thread Michael Lawrence
Just wanted to start a discussion on whether R could ship with more appropriate GC parameters. Right now, loading the recommended package Matrix leads to: > library(Matrix) > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1076796 57.61368491 73.1 1198505 64.1 Vcells 1671329 12.

Re: [Rd] Request to speed up save()

2015-01-15 Thread Nathan Kurz
On Thu, Jan 15, 2015 at 11:08 AM, Simon Urbanek wrote: > In addition to the major points that others made: if you care about speed, > don't use compression. With today's fast disks it's an order of magnitude > slower to use compression: > >> d=lapply(1:10, function(x) as.integer(rnorm(1e7))) >>

Re: [Rd] Closing over Garbage

2015-01-15 Thread luke-tierney
On Thu, 15 Jan 2015, Christian Sigg wrote: Given a large data.frame, a function trains a series of models by looping over two steps: 1. Create a model-specific subset of the complete training data 2. Train a model on the subset data The function returns a list of trained models which are late

Re: [Rd] Request to speed up save()

2015-01-15 Thread Simon Urbanek
In addition to the major points that others made: if you care about speed, don't use compression. With today's fast disks it's an order of magnitude slower to use compression: > d=lapply(1:10, function(x) as.integer(rnorm(1e7))) > system.time(saveRDS(d, file="test.rds.gz")) user system elaps

Re: [Rd] Request to speed up save()

2015-01-15 Thread Dénes Tóth
On 01/15/2015 01:45 PM, Stewart Morris wrote: Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. The options to save compressed data are: "gzip", "bzip2" or "xz", the default being gzip. I wonder if it's possible to include the pbzip2 (http://compres

Re: [Rd] Request to speed up save()

2015-01-15 Thread Prof Brian Ripley
On 15/01/2015 12:45, Stewart Morris wrote: Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. Sounds like bad practice on your part ... saving images is not recommended for careful work. The options to save compressed data are: "gzip", "bzip2" or

[Rd] Closing over Garbage

2015-01-15 Thread Christian Sigg
Given a large data.frame, a function trains a series of models by looping over two steps: 1. Create a model-specific subset of the complete training data 2. Train a model on the subset data The function returns a list of trained models which are later used for prediction on test data. Due to h

[Rd] Request to speed up save()

2015-01-15 Thread Stewart Morris
Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. The options to save compressed data are: "gzip", "bzip2" or "xz", the default being gzip. I wonder if it's possible to include the pbzip2 (http://compression.ca/pbzip2/) algorithm as an option when