omp workshare (PR35423) & beginner questions

2008-04-09 Thread Vasilis Liaskovitis
Hello,

I am a beginner interested in learning gcc internals and contributing
to the community.

I have started implementing PR35423 - omp workshare in the fortran
front-end. I have some questions - any guidance and suggestions are
welcome:

- For scalar assignments, wrapping them in OMP_SINGLE clause.

- Array/subarray assignments: For assignments handled by the
scalarizer,  I now create an OMP_FOR loop instead of a LOOP_EXPR for
the outermost scalarized loop. This achieves worksharing at the
outermost loop level.

Some array assignments are handled by functions (e.g.
gfc_build_memcpy_call generates calls to memcpy). For these, I believe
we need to divide the arrays into chunks and have each thread call the
builtin function on its own chunk. E.g. If we have the following call
in a parallel workshare construct:

memcpy(dst, src, len)

I generate this pseudocode:

{
  numthreads = omp_get_numthreads();
  chunksize = len / numthreads;
  chunksize = chunksize + ( len != chunksize*numthreads)
}

#omp for
   for (i = 0; i < numthreads; i++) {
  mysrc = src + i*chunksize;
  mydst = dst + i*chunksize;
  mylen = min(chunksize, len - (i*chunksize));
  memcpy(mydst, mysrc, mylen);
  }

If you have a suggestion to implement this in a simpler way, let me know.

The above code executes parallel in every thread. Alternatively, the
first block above can be wrapped in omp_single, but the numthreads &
chunksize variables should then be
declared shared instead of private. All the variables above
are private by default, since they are declared in a parallel
construct.

How can I set the scoping for a specific variable in a given
omp for construct? Is the following correct to make a variable shared:

tmp = build_omp_clause(OMP_CLAUSE_SHARED);
OMP_CLAUSE_DECL(tmp) = variable;
omp_clauses = gfc_tran_add_clause(tmp, );

-  I still need to do worksharing for array reduction operators (e.g.
SUM,ALL, MAXLOC etc). For these, I think a combination of OMP_FOR/OMP_SINGLE or
OMP_REDUCTION is needed. I will also try to work on WHERE and
FORALL statements.


I am also interested in gomp3 implementation and performance issues.
If there are not-worked-on issues suitable for newbies, please share
or update http://gcc.gnu.org/wiki/openmp. Can someone elaborate on the
"Fine tune the auto scheduling feature for parallel loops" issue?

Also, in the beginner projects
(http://gcc.gnu.org/projects/beginner.html) , under "optimizer
improvements", it would be good to know which of these projects are
not stale/obsolete. Middle-end or x86/x86_64 backend issues for
beginners are of interest.

thanks,

- Vasilis


Re: omp workshare (PR35423) & beginner questions

2008-04-20 Thread Vasilis Liaskovitis
Hi,

Thanks for the help. Some more questions:

1) I am trying to workshare reduction operators, currently working on
SUM.

   INTEGER N
 REAL AA(N), MYSUM
 !$OMP PARALLEL
 !$OMP WORKSHARE
 MYSUM = SUM(AA)
 !$OMP END WORKSHARE
 !$OMP END PARALLEL

To compute SUM, the scalarizer creates a temporary variable (let's call
it val2) for accumulating the sum.

In order to workshare the sum, I am attempting to create an OMP_FOR loop
with an omp reduction clause for the temporary val2. In pseudocode this
would be

OMP DO REDUCTION(+:val)
   DO I=1,N
   val2 = val2 + AA[I]
END DO

The problem is that I get an error from the gimplifier: "reduction
variable val.2 is private in outer context". I think this is because the
parallel region assumes val2 is a private variable.

I have tried creating an extra omp clause shared for val2

sharedreduction = build_omp_clause(OMP_CLAUSE_SHARED);
OMP_CLAUSE_DECL(sharedreduction) = reduction_variable;

where reduction_variable is the tree node for val2. I am attaching this
clause to the clauses of the OMP_PARALLEL construct.

Doing this breaks the following assertion in gimplify.c:omp_add_variable

/* The only combination of data sharing classes we should see is
FIRSTPRIVATE and LASTPRIVATE.  */
 nflags = n->value | flags;
 gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
 == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE));

I think this happens because val2 is first added with GOVD_SHARED |
GOVD_EXPLICIT flags because of my shared clause, and later re-added
(from the default parallel construct handling?) with GOVD_LOCAL |
GOVD_SEEN attributes.

Ignoring this, another assertion breaks in expr.c:

/* Variables inherited from containing functions should have
been lowered by this point.  */
 context = decl_function_context (exp);
gcc_assert (!context
 || context == current_function_decl
 || TREE_STATIC (exp)
 /* ??? C++ creates functions that are not
TREE_STATIC*/
 || TREE_CODE (exp) == FUNCTION_DECL);

I guess val2 is not lowered properly? Ignoring this assertion triggers
an rtl error (assigning wrong machine codes DI to SF) so something is
definitely wrong.
Do I need to attach val2's tree node declaration somewhere else?

2) again for the reduction operators, I would subsequently do the scalar
assignment MYSUM = val2 by one thread using omp single. Is there a
better way? I don't think I can use the program-defined mysum as the
reduction variable inside the sum loop because the rhs needs to be
evaluated before the lhs is assigned to.

3) gfc_check_dependency seems to be an appropriate helper function for
the dependence analysis in the statements of the workshare . If you have
other suggestions let me know.

thanks,

- Vasilis

On Mon, Apr 14, 2008 at 6:47 AM, Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> Hi!
>
>
>  On Wed, Apr 09, 2008 at 11:29:24PM -0500, Vasilis Liaskovitis wrote:
>  > I am a beginner interested in learning gcc internals and contributing
>  > to the community.
>
>  Thanks for showing interest in this area!
>
>
>  > I have started implementing PR35423 - omp workshare in the fortran
>  > front-end. I have some questions - any guidance and suggestions are
>  > welcome:
>  >
>  > - For scalar assignments, wrapping them in OMP_SINGLE clause.
>
>  Yes, though if there is a couple of adjacent scalar assignments which don't
>  involve function calls and won't take too long to execute, you want
>  to put them all into one OMP_SINGLE.  If the assignments make take long
>  because of function calls and there are several such ones adjacent,
>  you can use OMP_WORKSHARE.
>
>  Furthermore, for all statements, not just the scalar ones, you want to
>  do dependency analysis between all the statements within !$omp workshare,
>  and make OMP_SINGLE, OMP_FOR or OMP_SECTIONS and add OMP_CLAUSE_NOWAIT
>  to them where no barrier is needed.
>
>
>  > - Array/subarray assignments: For assignments handled by the
>  > scalarizer,  I now create an OMP_FOR loop instead of a LOOP_EXPR for
>  > the outermost scalarized loop. This achieves worksharing at the
>  > outermost loop level.
>
>  Yes, though on gomp-3_0-branch you actually could use collapsed OMP_FOR
>  loop too.  Just bear in mind that for best performance at least with
>  static OMP_FOR scheduling ideally the same memory (part of array in this
>  case) is accessed by the same thread, as then it is in that CPU's caches.
>  Of course that's not always possible, but if it can be done, gfortran
>  should try that.
>
>
>  > Some array assignments are handled by functions (e.g.
>  > gfc_build_memcpy_call generates calls to memcpy). For these, I believe
>  > we need to divide the