[R] Efficiently parallelize across columns of a data.table

2016-08-20 Thread Rebecca Payne
Makes sense. Thanks for the clear explanation. Rebecca On Friday, August 19, 2016, Peter Langfelder > wrote: > Last time I looked (admittedly a few years back), on unix-alikes > (which you seem to be using, based on your use of top), > foreach/doParallel used forking. This means each worker gets

Re: [R] Efficiently parallelize across columns of a data.table

2016-08-19 Thread Peter Langfelder
Last time I looked (admittedly a few years back), on unix-alikes (which you seem to be using, based on your use of top), foreach/doParallel used forking. This means each worker gets a copy of the entire R session, __but__ modern operating systems do not actually copy on spawn, they only copy on wri

[R] Efficiently parallelize across columns of a data.table

2016-08-19 Thread Rebecca Payne
I am trying to parallelize a task across columns of a data.table using foreach and doParallel. My data is large relative to my system memory (about 40%) so I'm avoiding making any copies of the full table. The function I am parallelizing is pretty simple, taking as input a fixed set of columns and