On Tue, Jul 07, 2015 at 10:12:56AM -0400, Nathan Sidwell wrote: > On 07/07/15 05:54, Jakub Jelinek wrote: > >On Mon, Jul 06, 2015 at 03:34:51PM -0400, Nathan Sidwell wrote: > > >How does this interact with > >#pragma acc routine {gang,worker,vector,seq} ? > >Or is that something to be added later on? > > That is to be added later on. I suspect such routines will trivially work, > as they'll be marked up with the loop head/tail functions and levels builtin > (the latter might need a bit of reworking). What will need additional work > at that point is the callers of routines -- they're typically called from a > foo-single mode, but need to get all threads into the called function. I'm > thinking each call site will look like a mini-loop[*] surrounded by a > hesd/tail marker. (all that can be done in the device-side compiler once > real call sites are known.)
Wouldn't function attributes be better for that case, and just use the internal functions for the case when the mode is being changed in the middle of function? I agree that fork/join might be less confusing. BTW, where do you plan to lower the internal functions for non-PTX? Doing it in RTL mach reorg is too late for those, we shouldn't be writing it for each single target, as for non-PTX (perhaps non-HSA) I bet the behavior is the same. Jakub