On Wed, Oct 21, 2015 at 03:49:08PM -0400, Nathan Sidwell wrote:
> This patch is the device-specific half of the previous patch. It processes
> the partition head & tail markers and loop abstraction functions inserted
> during omp lowering.
>
> In the oacc_device_lower pass we scan the CFG reconstructing the set of
> nested loops demarked by IFN_UNIQUE (HEAD_MARK) & IFN_UNIQUE (TAIL_MARK)
> functions. The HEAD_MARK function provides the loop partition information
> provided by the user. Once constructed we can iterate over that structure
> checking partitioning consistency (for instance an inner loop must use a
> dimension 'inside' an outer loop). We also assign specific partitioning axes
> here. Partitioning updates the parameters of the IFN_LOOP and IFN_FORK/JOIN
> functions appropriately.
>
> Once partitioning has been determined, we iterate over the CFG scanning for
> the marker, fork/join and loop functions. The marker functions are deleted,
> the fork & join functions are conditionally deleted (using the target hook
> of patch 3), and the loop function is expanded into code calculating the
> loop parameters depending on how the loop has been partitioned. This uses
> the OACC_DIM_POS and OACC_DIM_SIZE builtins included in patch 7.
So, how do you expand the OACC loops on non-PTX devices (host, or say
XeonPhi)? Do you drop the IFNs and replace stuff with normal loops?
I don't see anything that would e.g. set the various flags that e.g. OpenMP
#pragma omp simd or Cilk+ #pragma simd sets, like loop->safelen,
loop->force_vectorize, maybe loop->simduid and promote some vars to simduid
arrays if that is relevant to OpenACC.
Jakub