On 01/29/16 10:18, Jakub Jelinek wrote:
On Thu, Jan 28, 2016 at 10:38:51AM -0500, Nathan Sidwell wrote:
This patch adds default compute dimension handling. Users rarely specify
compute dimensions, expecting the toolchain to DTRT. More savvy users would
like to specify global defaults. This patch permits both.
Isn't it better to be able to override the defaults on the library side?
I mean, when when somebody is compiling the code, often he doesn't know the
exact properties of the hw it will be run on, if he does, I think it is
better to specify them explicitly in the code. But if he doesn't, one just
has to hope libgomp will figure out the best defaults.
So, wouldn't it be better to add some env var that would allow to control
this instead?
You have anticipated part 2 of this patch, which would allow a default to be
deferred to runtime in the manner you describe.
Generally, one can know at compile time the upper bound on workers (it's part of
the chip specification), but the number of physical gangs depends on the
accelerator card. (That is true for PTX and IIUC for other GPGPUs too.) So, you
may want defer num gangs to runtime -- but of course then you lose constant
folding opportunities.
nathan