When you seeing low performance on host with 2 GPU cards did you try to set task affinity in such way where both cuda tasks occupy different CPU cores? I found this strategy helpful with early versions of ATI AstroPulse app (it required lot of CPU in addition to GPU and if 2 tasks were active for GPU app they sometimes compete for same CPU core even if all other CPU cores occupied only with idle-priority CPU tasks). Looks like it's Windows scheduler pecularity - it can't do priority-based scheduling between CPU cores, only between processes running on same core. I've seen such behavior on Vista x86. Also, it was reported by few s...@home project participants that rising GPU app priority higher than below normal (actually, higher than boinc.exe/boincmgr.exe processes, that is, higher than normal) speedups CUDA app. Perhaps sometimes (especially on hosts with few GPU cards installed) GPU app process competes with BOINC itself for CPU cycles.
In both listed cases GPU app performance was improved w/o complete CPU core reservation for GPU app. ----- Original Message ----- From: Oliver Bock To: David Anderson ; boinc_dev ; Boinc Projects Sent: Monday, December 20, 2010 2:12 PM Subject: [boinc_dev] CUDA task scheduling Hi everyone, We just deployed a new CUDA application (called BRP3) as part of the einst...@home project. This app roughly up to 75% of a GPU and 3-30% of a CPU, depending on the GPU model/performance. Thus our scheduler currently issues these tasks with the following settings: hu.avg_ncpus = 0.2 hu.ncudas = 1 Please note that BOINC (e.g. sched/sched_customize) revision 22832 is used in this case. The problem is that with the settings above BOINC starts CUDA tasks in addition to CPU tasks that already occupy all existing CPU cores. This means on a system having four CPU cores and two CUDA devices, four CPU tasks and two CUDA tasks are launched. Although this behavior is intended, it doesn't really work out for us because the performance of the CUDA tasks is degraded significantly - GPU usage goes down to less than 10%, increasing the runtime by the same factor. Although the CUDA tasks run with slightly higher priority (below normal on Windows) than the CPU tasks (low on Windows) they are limited by the already fully-occupied CPU cores which are still required for up to 30% of the computation. Since we couldn't yet release a Linux or Mac OS version we don't know whether this is a Windows time-slicing issue or not. Are there any other projects running CUDA tasks in a comparable way? The only workaround in sight would be to acquire a full CPU core once again but that's certainly not ideal. Any ideas are welcome! Cheers, Oliver _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
