A summary for reference: the new detectCores() for Windows in R-devel 
seems to be working both for logical and physical cores on systems with 
 >64 logical processors  (thanks to Arun for testing!). If the feature 
is important for anyone particularly using an older version of Windows 
and/or on a system with >64 logical processors, it would be nice if you 
could test and report any possible problem.

As I mentioned earlier, in older versions of R one can as a workaround 
use "wmic" to detect the number of processors on systems with >64 
logical processors (with appropriate error handling added as needed):

# detectCores()
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE))))

#detectCores(logical=FALSE)
out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE))))

The remaining problem with running using >64 processors on Windows 
turned out to be due to a bug in sockets communication, debugged and 
fixed in R-devel by Luke Tierney.

Tomas

On 08/29/2018 12:42 PM, Srinivasan, Arunkumar wrote:
> Dear Tomas, thank you very much. I installed r-devel r75201 and tested.
>
> The machine with 88 cores has NUMA disabled. It therefore has 2 processor 
> groups with 64 and 24 processors each.
>
> require(parallel)
> detectCores()
> # [1] 88
>
> This is great!
>
> Then I went on to test with a simple 'foreach()' loop. I started with 64 
> processors (max limit of 1 processor group). I ran with a simple function of 
> 0.5s sleep.
>
> require(snow)
> require(doSNOW)
> require(foreach)
>
> cl <- makeCluster(64L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
> # user  system elapsed
> # 0.06    0.00    0.64
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> #    user  system elapsed
> #    0.03    0.01    1.04
> stopCluster(cl)
>
> With a cluster of 64 processors and loop running with 64 iterations, it 
> completed in ~.5s (0.64), and with 65 iterations, it took ~1s as expected.
>   
> cl <- makeCluster(65L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
>     user  system elapsed
>     0.03    0.02    0.61
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> # Timing stopped at: 0.08 0 293
> stopCluster(cl)
>
> However, when I increased the cluster to have 65 processors, a loop with 64 
> iterations seem to complete as expected, but using all 65 processors to loop 
> over 65 iterations didn't seem to complete. I stopped it after ~5mins. The 
> same happens with the cluster started with any number between 65 and 88. It 
> seems to me like we are still not being able to use >64 processors all at the 
> same time even if detectCores() returns the right count now.
>
> I'd appreciate your thoughts on this.
>
> Best,
> Arun.
>
> -----Original Message-----
> From: Tomas Kalibera <tomas.kalib...@gmail.com>
> Sent: 27 August 2018 19:43
> To: Srinivasan, Arunkumar <arunkumar.sriniva...@uk.mlp.com>; 
> r-devel@r-project.org
> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is 
> enabled or disabled
>
> Dear Arun,
>
> thank you for checking the workaround scripts.
>
> I've modified detectCores() to use GetLogicalProcessorInformationEx. It is in 
> revision 75198 of R-devel, could you please test it on your machines? For a 
> binary, you can wait until the R-devel snapshot build gets to at least this 
> svn revision.
>
> Thanks for the link to the processor groups documentation. I don't have a 
> machine to test this on, but I would hope that snow clusters (e.g.
> PSOCK) should work fine on systems with >64 logical processors as they spawn 
> new processes (not just threads). Note that FORK clusters are not supported 
> on Windows.
>
> Thanks
> Tomas
>
> On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:
>> Dear Tomas, thank you for looking into this. Here's the output:
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> [1] "NumberOfLogicalProcessors  \r" "22                         \r" "22      
>>                    \r"
>> [4] "20                         \r" "22                         \r" "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [I've asked the IT team to understand why one of the values is 20 instead of 
>> 22].
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> [1] "NumberOfCores  \r" "22             \r" "22             \r" "20          
>>    \r" "22             \r"
>> [6] "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [Currently hyperthreading is disabled. So this output being identical to the 
>> previous output makes sense].
>>
>> system("wmic computersystem get numberofprocessors")
>> NumberOfProcessors
>> 4
>>
>> In addition, I'd also bring to your attention this documentation: 
>> https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups 
>> on processor groups which explain how one should go about running a process 
>> ro run on multiple groups (which seems to be different to NUMA). All this 
>> seems overly complicated to allow a process to use all cores by default TBH.
>>
>> Here's a project on Github 'fio' where the issue of running a process on 
>> more than 1 processor group has come up -  
>> https://github.com/axboe/fio/issues/527 and is addressed - 
>> https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h
>>  . I am not sure though if this is entirely relevant since we would be 
>> forking new processes in R instead of allowing a single process to use all 
>> cores. Apologies if this is utterly irrelevant.
>>
>> Thank you,
>> Arun.
>>
>> From: Tomas Kalibera <tomas.kalib...@gmail.com>
>> Sent: 21 August 2018 11:50
>> To: Srinivasan, Arunkumar <arunkumar.sriniva...@uk.mlp.com>;
>> r-devel@r-project.org
>> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA
>> is enabled or disabled
>>
>> Dear Arun,
>>
>> thank you for the report. I agree with the analysis, detectCores() will only 
>> report logical processors in the NUMA group in which R is running. I don't 
>> have a system to test on, could you please check these workarounds for me on 
>> your systems?
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of physical processors - as a sanity check
>>
>> system("wmic computersystem get numberofprocessors")
>>
>> Thanks,
>> Tomas
>>
>> On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
>> Dear R-devel list,
>>
>> R's detectCores() function internally calls "ncpus" function to get the 
>> total number of logical processors. However, this doesnot seem to take NUMA 
>> into account on Windows machines.
>>
>> On a machine having 48 processors (24 cores) in total and windows server 
>> 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each 
>> having 24 CPUs), then R's detectCores() only detects 24 instead of the total 
>> 48. If NUMA is disabled, detectCores() returns 48.
>>
>> Similarly, on a machine with 88 cores (176 processors) and windows server 
>> 2012, detectCores() with NUMA disabled only returns the maximum value of 64. 
>> If NUMA is enabled with 4 nodes (44 processors each), then detectCores() 
>> will only return 44. This is particularly limiting since we cannot get to 
>> use all processors by enabling/disabling NUMA in this case.
>>
>> We think this is because R's ncpus.c file uses 
>> "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" 
>> (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx)
>>  instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" 
>> (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx).
>>  Specifically, quoting from the first link:
>>
>> "On systems with more than 64 logical processors, the 
>> GetLogicalProcessorInformation function retrieves logical processor 
>> information about processors in the 
>> https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx
>>  to which the calling thread is currently assigned. Use the 
>> https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx
>>  function to retrieve information about processors in all processor groups 
>> on the system."
>>
>> Therefore, it might be possible to get the right count of total processors 
>> even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be 
>> nice to know what you think.
>>
>> Thank you very much,
>> Arun.
>>
>> --
>> Arun Srinivasan
>> Analyst, Millennium Management LLC
>> 50 Berkeley Street | London, W1J 8HD
>>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to