A summary for reference: the new detectCores() for Windows in R-devel
seems to be working both for logical and physical cores on systems with
>64 logical processors (thanks to Arun for testing!). If the feature
is important for anyone particularly using an older version of Windows
and/or on a system with >64 logical processors, it would be nice if you
could test and report any possible problem.
As I mentioned earlier, in older versions of R one can as a workaround
use "wmic" to detect the number of processors on systems with >64
logical processors (with appropriate error handling added as needed):
# detectCores()
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
value=TRUE))))
#detectCores(logical=FALSE)
out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
value=TRUE))))
The remaining problem with running using >64 processors on Windows
turned out to be due to a bug in sockets communication, debugged and
fixed in R-devel by Luke Tierney.
Tomas
On 08/29/2018 12:42 PM, Srinivasan, Arunkumar wrote:
> Dear Tomas, thank you very much. I installed r-devel r75201 and tested.
>
> The machine with 88 cores has NUMA disabled. It therefore has 2 processor
> groups with 64 and 24 processors each.
>
> require(parallel)
> detectCores()
> # [1] 88
>
> This is great!
>
> Then I went on to test with a simple 'foreach()' loop. I started with 64
> processors (max limit of 1 processor group). I ran with a simple function of
> 0.5s sleep.
>
> require(snow)
> require(doSNOW)
> require(foreach)
>
> cl <- makeCluster(64L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
> # user system elapsed
> # 0.06 0.00 0.64
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> # user system elapsed
> # 0.03 0.01 1.04
> stopCluster(cl)
>
> With a cluster of 64 processors and loop running with 64 iterations, it
> completed in ~.5s (0.64), and with 65 iterations, it took ~1s as expected.
>
> cl <- makeCluster(65L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
> user system elapsed
> 0.03 0.02 0.61
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> # Timing stopped at: 0.08 0 293
> stopCluster(cl)
>
> However, when I increased the cluster to have 65 processors, a loop with 64
> iterations seem to complete as expected, but using all 65 processors to loop
> over 65 iterations didn't seem to complete. I stopped it after ~5mins. The
> same happens with the cluster started with any number between 65 and 88. It
> seems to me like we are still not being able to use >64 processors all at the
> same time even if detectCores() returns the right count now.
>
> I'd appreciate your thoughts on this.
>
> Best,
> Arun.
>
> -----Original Message-----
> From: Tomas Kalibera <[email protected]>
> Sent: 27 August 2018 19:43
> To: Srinivasan, Arunkumar <[email protected]>;
> [email protected]
> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is
> enabled or disabled
>
> Dear Arun,
>
> thank you for checking the workaround scripts.
>
> I've modified detectCores() to use GetLogicalProcessorInformationEx. It is in
> revision 75198 of R-devel, could you please test it on your machines? For a
> binary, you can wait until the R-devel snapshot build gets to at least this
> svn revision.
>
> Thanks for the link to the processor groups documentation. I don't have a
> machine to test this on, but I would hope that snow clusters (e.g.
> PSOCK) should work fine on systems with >64 logical processors as they spawn
> new processes (not just threads). Note that FORK clusters are not supported
> on Windows.
>
> Thanks
> Tomas
>
> On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:
>> Dear Tomas, thank you for looking into this. Here's the output:
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> [1] "NumberOfLogicalProcessors \r" "22 \r" "22
>> \r"
>> [4] "20 \r" "22 \r" "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [I've asked the IT team to understand why one of the values is 20 instead of
>> 22].
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> [1] "NumberOfCores \r" "22 \r" "22 \r" "20
>> \r" "22 \r"
>> [6] "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [Currently hyperthreading is disabled. So this output being identical to the
>> previous output makes sense].
>>
>> system("wmic computersystem get numberofprocessors")
>> NumberOfProcessors
>> 4
>>
>> In addition, I'd also bring to your attention this documentation:
>> https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups
>> on processor groups which explain how one should go about running a process
>> ro run on multiple groups (which seems to be different to NUMA). All this
>> seems overly complicated to allow a process to use all cores by default TBH.
>>
>> Here's a project on Github 'fio' where the issue of running a process on
>> more than 1 processor group has come up -
>> https://github.com/axboe/fio/issues/527 and is addressed -
>> https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h
>> . I am not sure though if this is entirely relevant since we would be
>> forking new processes in R instead of allowing a single process to use all
>> cores. Apologies if this is utterly irrelevant.
>>
>> Thank you,
>> Arun.
>>
>> From: Tomas Kalibera <[email protected]>
>> Sent: 21 August 2018 11:50
>> To: Srinivasan, Arunkumar <[email protected]>;
>> [email protected]
>> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA
>> is enabled or disabled
>>
>> Dear Arun,
>>
>> thank you for the report. I agree with the analysis, detectCores() will only
>> report logical processors in the NUMA group in which R is running. I don't
>> have a system to test on, could you please check these workarounds for me on
>> your systems?
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of physical processors - as a sanity check
>>
>> system("wmic computersystem get numberofprocessors")
>>
>> Thanks,
>> Tomas
>>
>> On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
>> Dear R-devel list,
>>
>> R's detectCores() function internally calls "ncpus" function to get the
>> total number of logical processors. However, this doesnot seem to take NUMA
>> into account on Windows machines.
>>
>> On a machine having 48 processors (24 cores) in total and windows server
>> 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each
>> having 24 CPUs), then R's detectCores() only detects 24 instead of the total
>> 48. If NUMA is disabled, detectCores() returns 48.
>>
>> Similarly, on a machine with 88 cores (176 processors) and windows server
>> 2012, detectCores() with NUMA disabled only returns the maximum value of 64.
>> If NUMA is enabled with 4 nodes (44 processors each), then detectCores()
>> will only return 44. This is particularly limiting since we cannot get to
>> use all processors by enabling/disabling NUMA in this case.
>>
>> We think this is because R's ncpus.c file uses
>> "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION"
>> (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx)
>> instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX"
>> (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx).
>> Specifically, quoting from the first link:
>>
>> "On systems with more than 64 logical processors, the
>> GetLogicalProcessorInformation function retrieves logical processor
>> information about processors in the
>> https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx
>> to which the calling thread is currently assigned. Use the
>> https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx
>> function to retrieve information about processors in all processor groups
>> on the system."
>>
>> Therefore, it might be possible to get the right count of total processors
>> even with NUMA enabled by using "GetLogicalProcessorInformationEX". It'd be
>> nice to know what you think.
>>
>> Thank you very much,
>> Arun.
>>
>> --
>> Arun Srinivasan
>> Analyst, Millennium Management LLC
>> 50 Berkeley Street | London, W1J 8HD
>>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel