Re: [R] Problem parallelizing across cores

Jeff Newmiller Wed, 28 Aug 2019 17:10:55 -0700

Your first option is always to serially compute results. When the computation 
time is long compared to session overhead and data I/O, you can consider 
parallel computing. You should first consider laying out your independent 
computation work units as a sequence, and then allocate segments of that 
sequence to your workers, each of which will perform their respective 
sub-sequences serially so as to minimize the overhead penalty... so yes, you 
absolutely can use a method that starts new instances of R ("SNOW"). You also 
have forking on Linux which has lower overhead, but not zero so the exact same 
logic can be applied. But if you arbitrarily shorten your serial computations 
too much then you cannot optimize your use of available processing resources as 
you have already observed.

However, your lack of reproducible example is a strong indicator that you are 
not really asking a question about R... so do some reading and focus your next 
question on R or the base R parallel package per the Posting Guide. (Do read 
that... posting HTML is a good way
for your message to get scrambled before we see it.) Wide-ranging discussions 
on computer science and HPC hardware constraints are outside the topic here.

On August 28, 2019 11:06:57 AM PDT, James Spottiswoode 
<james.spottiswo...@gmail.com> wrote:
>Hi All,
>
>I have a piece of well optimized R code for doing text analysis running
>under Linux on an AWS instance.  The code first loads a number of
>packages
>and some needed data and the actual analysis is done by a function
>called,
>say, f(string).  I would like to parallelize calling this function
>across
>the 8 cores of the instance to increase throughput.  I have looked at
>the
>packages doParallel and future but am not clear how to do this.  Any
>method
>that brings up an R instance when the function is called will not work
>for
>me as the time to load the packages and data is comparable to the
>execution
>time of the function leading to no speed up.  Therefore I need to keep
>a
>number of instances of the R code running continuously so that the data
>loading only occurs once when the R processes are first started and
>thereafter the function f(string) is ready to run in each instance.  I
>hope
>I have put this clearly.
>
>I’d much appreciate any suggestions.  Thanks in advance,
>
>James Spottiswoode
>
>
>--
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem parallelizing across cores

Reply via email to