When you seeing low performance on host with 2 GPU cards did you try to set 
task affinity in such way where both cuda tasks occupy different CPU cores?
I found this strategy helpful with early versions of ATI AstroPulse app (it 
required lot of CPU in addition to GPU and if 2 tasks were active for GPU app 
they sometimes compete for same CPU core even if all other CPU cores occupied 
only with idle-priority CPU tasks). Looks like it's Windows scheduler 
pecularity - it can't do priority-based scheduling between CPU cores, only 
between processes running on same core. I've seen such behavior on Vista x86.
Also, it was reported by few s...@home project participants that rising GPU app 
priority higher than below normal (actually, higher than boinc.exe/boincmgr.exe 
 processes, that is, higher than normal) speedups CUDA app. Perhaps sometimes 
(especially on hosts with few GPU cards installed) GPU app process competes 
with BOINC itself for CPU cycles.

In both listed cases GPU app performance was improved w/o complete CPU core 
reservation for GPU app.


----- Original Message ----- 
From: Oliver Bock 
To: David Anderson ; boinc_dev ; Boinc Projects 
Sent: Monday, December 20, 2010 2:12 PM
Subject: [boinc_dev] CUDA task scheduling


Hi everyone,

We just deployed a new CUDA application (called BRP3) as part of the
einst...@home project. This app roughly up to 75% of a GPU and 3-30% of
a CPU, depending on the GPU model/performance. Thus our scheduler
currently issues these tasks with the following settings:

hu.avg_ncpus = 0.2
hu.ncudas = 1

Please note that BOINC (e.g. sched/sched_customize) revision 22832 is
used in this case.

The problem is that with the settings above BOINC starts CUDA tasks in
addition to CPU tasks that already occupy all existing CPU cores. This
means on a system having four CPU cores and two CUDA devices, four CPU
tasks and two CUDA tasks are launched. Although this behavior is
intended, it doesn't really work out for us because the performance of
the CUDA tasks is degraded significantly - GPU usage goes down to less
than 10%, increasing the runtime by the same factor. Although the CUDA
tasks run with slightly higher priority (below normal on Windows) than
the CPU tasks (low on Windows) they are limited by the already
fully-occupied CPU cores which are still required for up to 30% of the
computation.

Since we couldn't yet release a Linux or Mac OS version we don't know
whether this is a Windows time-slicing issue or not. Are there any other
projects running CUDA tasks in a comparable way?

The only workaround in sight would be to acquire a full CPU core once
again but that's certainly not ideal.

Any ideas are welcome!


Cheers,
Oliver
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to