>From the last GCC Summit we learned about (Fortran) Frontend Based
Optimizations from Toon and it was suggested to tackle one issue by
promoting arrays to a first-class middle-end type.  Discussion with
Sebastian also reveals that this makes sense to ease the analysis
of the into-GRAPHITE transformation by not dropping information that
cannot be re-computed.


For example Fortran has the "load-before-store" semantics on its
vector operations, i.e. if the same array cell is accessed twice it is
first read, then written i.e. updated.  The scalarizer has to use
temporary arrays for ensuring this semantics, so for example for
"A = A + B" with A and B arrays, the
scalarizer creates a temporary buffer that will hold the intermediate
results of the addition before updating the array A: i.e. "T = A + B"
followed by "A = T".  This decompression of the memory use is easy to
perform (see the array privatization techniques for parallelization),
but the inverse operation, i.e. the compression is much more
difficult, as it has to analyze in IPA mode the uses of T: before
removing the initialization of T it has to ensure that there are no
other uses of T.

Another point for the motivation is that vector constructs should
survive till the point where GCC knows/uses information about the
vector target machine with using the vectorizer (as part of the
GRAPHITE framework) to create regular (optimized) scalar IL from
the array operations.


So this is a proposal to make this happen, allow statements and
expressions that operate on (multidimensional) arrays.

The first and maybe most important issue is that temporaries required
by either design issues of the IL or by optimizations are not going
to be cheap.  The second issue is the question of if (yes I'd say) and
when (very early I'd say) we want to lower statements / expressions
on arrays and generate loop constructs for them.


On the first issue, how to represent array expressions in the IL we
can take the pragmatic approach for now and merely try to make the
expressions survive until they get picked up by GRAPHITE (or other
to-be-invented special optimizers) but allow scalar optimizers to
operate on scalar parts of the expressions.  To avoid creating
temporaries of array type, which the gimplifier in its current form
cannot do for variable length types, we do not create full GIMPLE
form of array expressions, but retain GENERIC expressions in RHS.
With the very experimental prototype patch (that just was done to
experiment with forms of the IL) we would for example end up with

  float a[16];
  float b[16];
  float c[16];

  tmp = 0.0;
  a.1 = &a;
  b.2 = &b;
  c.3 = &c;
  n.4 = n; 
  D.1558 = (long unsigned int) n.4;
  D.1559 = D.1558 * 4;
  *a.1 = WITH_SIZE_EXPR <(*b.2 + *c.3) + tmp, D.1559>;
  D.1560 = a[0];
  return D.1560;

for the whole array expression C source

  float test5 (int n)
  {
    float a[16];
    float b[16];
    float c[16];
    float tmp = 0.0;
    *(float (*)[n])&a = *(float (*)[n])&b + *(float (*)[n])&c + tmp;
    return a[0];
  }

Ignore the WITH_SIZE_EXPR here, the point is that we have two
operations and three operands on the rhs, two arrays, one scalar.
CCP happily propagates 0.0 into the expression and if you do not
run SRA (which ICEs) you survive until expand, which of course has
no idea how to deal with this expression ;)

So we think this approach is sound if these expressions do not survive
for too long (that is, only until after out-of-GRAPHITE).


A different approach would be to only allow whole arrays as gimple
values and proceed with usual gimplification of expressions.  To
avoid to have to allocate memory for the array temporaries we could
represent them as pointers to (not yet) allocated storage like
for the above case

   float (* D.29)[n] = (float (*)[n])&a;
   float (* D.30)[n] = (float (*)[n])&b;
   float (* D.31)[n] = (float (*)[n])&c;
   float (* D.42)[n];
   D.42 = __builtin_vla_tmp_alloc (n * 4);
   *D.42 = *D.30 + *D.31;
   *D.29 = *D.42 + tmp;
   __builtin_vla_tmp_free (D.43);

where we still would have de-references of pointers to arrays als
gimple values as well.  This approach would also fit inserted
temporaries that are required for language semantic reasons.


On the second issue we are thinking of leveraging GRAPHITE which needs
to create scalar code out of its intermediate representation anyway to
do the lowering to scalar code.  Especially we do not want to force
expand to deal with the whole-array expressions.  But this is
obviously the issue which we can most easily post-pone until we can
do experiments.


There are two more aspects that need consideration.  First, if and if
yes, what kind of, tree codes we want to use for the operators.  It
would be possible to re-use PLUS_EXPR and MINUS_EXPR in general and
use MULT_EXPR and RDIV_EXPR for mixed array / scalar operands.  But
in the end I don't think this is desirable and certainly not sufficient
thinking of matrix multiplication, contraction and expansions.  So
a sound approach would be to introduce a ARRAY_EXPR and use sub-codes
to distinguish the operations.  The alternative is to use builtin
functions, but that ends up with nested function calls in the IL which
I think should be avoided, if only because of estethic reasons.


The second aspect is that with the current experimental patch (and
possibly with the final design) the array shape is encoded in the
type of the access rather than using an explicit descriptor object
as the Fortran frontend does right now.  This has the advantage of
less IL (and alias conflicts with such an object) and probably
makes the analysis phase easier.  But this raises the issue that
the middle-end as of today cannot express array shapes with strides
not equal to one - still this should be an easy extension of the
current array type description.


Then there is the question of semantics of an array expression.  The
simplest and most useful one is IMHO to make iteration order undefined,
that is make statements that have overlapping LHS and RHS undefined
and require the frontend to introduce temporaries where language
semantics request that.

The other semantics to follow could be the Fortran load-before-store
semantics, which would move all the dependency checking code out of
the frontend into the middle-end.

But we can also easily have both semantics in parallel by creating
two different assignment operators, one that states there is no
need for temporaries to follow load-before-store semantics and one
stating the opposite.  Obviously the second can be lowered to the
first one somewhen.


So, to sum up all of the above we propose to:

  - Allow expressions on whole arrays.
  - Iteration order for evaluating an expression is undefined.
  - There will be both load-before-store and undefined load/store
    ordering assignment statements.
  - The IL will have whole-arrays including pointer-dereferences
    to whole-arrays as gimple values.
  - The IL will have fake / intermediate VLA temporaries as pointers
    to (defered) allocated memory.
  - The IL will be lowered to GIMPLE loops using GRAPHITE.
  - Expressions on arrays and mixed scalar/array operations will
    use separate tree codes possibly sub-coding a common parent code.
  - ARRAY_TYPEs will be extended to specify a stride.

We do not (yet) propose language extensions to leverage the middle-end
capabilities to other frontends than Fortran.  But making this
feature available using builtin functions that would be lowered to
the array IL by the gimplifier is probably easy.  (No, the prototype
patch is _not_ a language extension proposal ;))

Find the experimentation patch below, with the limitation that it
only allows binary + as operand and that it implements GENERIC
rhs instead of the VLA temporaries approach.  All testcases will ICE at least
at the point of expansion, SRA is another ICEr.  Otherwise working
testcases are appended below, before the actual patch.

Thanks,
Richard.


void test0 (double *p, int pn0, /* int pstride0 == 0, */
                       int pn1, int pstride1,
                       int pn2 /*, int pstride2 unused, */)
{
  *((double (*)[pn2 + pn1 * pstride1][pn1 /* + pn0 * pstride0 */][pn0])p)
        = 0.0;
}
void test1 (double *p, int pn0, int pn1, int pstride1, int pn2,
            double *q, int qn0, int qn1, int qstride1, int qn2)
{
  *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p)
        = *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q);
}
void test2 (double *p, int pn0, int pn1, int pstride1, int pn2,
            double *q, int qn0, int qn1, int qstride1, int qn2)
{
  *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p)
        = *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q) + 1.0;
}
void test3 (double *p, int pn0, int pn1, int pstride1, int pn2,
            double *q, int qn0, int qn1, int qstride1, int qn2)
{
  *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p)
        += *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q);
}
void test4 (double *p, int pn0, int pn1, int pstride1, int pn2,
            double *q, int qn0, int qn1, int qstride1, int qn2)
{
  *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p)
        += *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q) + 1.0;
}
float test5 (int n)
{
  float a[16];
  float b[16];
  float c[16];
  float tmp = 0.0;
  *(float (*)[n])&a = *(float (*)[n])&b + *(float (*)[n])&c + tmp;
  return a[0];
}


Index: gcc/fold-const.c
===================================================================
*** gcc/fold-const.c    (revision 130435)
--- gcc/fold-const.c    (working copy)
*************** fold_convert (tree type, tree arg)
*** 2487,2492 ****
--- 2487,2504 ----
    if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (orig))
      return fold_build1 (NOP_EXPR, type, arg);
  
+   /* XXX  With VLA types we would need to do more complicated
+      matching if a conversion is possible or not.  */
+   if (TREE_CODE (type) == ARRAY_TYPE
+       && TREE_CODE (orig) == ARRAY_TYPE)
+     return fold_build1 (NOP_EXPR, type, arg);
+ 
+   /* XXX  Hack alert!  We for sure need extra tree codes for
+      mixed-mode arguments.  */
+   if (TREE_CODE (type) == ARRAY_TYPE
+       && TREE_CODE (orig) == REAL_TYPE)
+     return arg;
+ 
    switch (TREE_CODE (type))
      {
      case INTEGER_TYPE: case ENUMERAL_TYPE: case BOOLEAN_TYPE:
Index: gcc/tree-gimple.c
===================================================================
*** gcc/tree-gimple.c   (revision 130435)
--- gcc/tree-gimple.c   (working copy)
*************** is_gimple_non_addressable (tree t)
*** 382,387 ****
--- 382,394 ----
  bool
  is_gimple_val (tree t)
  {
+   /* Allow array operands in expressions.  */
+   if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE
+       /*&& (is_gimple_variable (t)
+         || (TREE_CODE (t) == INDIRECT_REF
+             && is_gimple_variable (TREE_OPERAND (t, 0))))*/)
+     return true;
+ 
    /* Make loads from volatiles and memory vars explicit.  */
    if (is_gimple_variable (t)
        && is_gimple_reg_type (TREE_TYPE (t))
Index: gcc/c-typeck.c
===================================================================
*** gcc/c-typeck.c      (revision 130435)
--- gcc/c-typeck.c      (working copy)
*************** default_function_array_conversion (struc
*** 1663,1668 ****
--- 1663,1674 ----
        if (TREE_NO_WARNING (orig_exp))
          TREE_NO_WARNING (exp.value) = 1;
  
+       /* XXX */
+       if (TREE_CODE (exp.value) == INDIRECT_REF
+           && (TREE_CODE (TREE_OPERAND (exp.value, 0)) == NOP_EXPR
+               || TREE_CODE (TREE_OPERAND (exp.value, 0)) == CONVERT_EXPR))
+         return exp;
+ 
        lvalue_array_p = !not_lvalue && lvalue_p (exp.value);
        if (!flag_isoc99 && !lvalue_array_p)
          {
*************** convert_for_assignment (tree type, tree 
*** 4368,4373 ****
--- 4374,4398 ----
      }
    else if (codel == BOOLEAN_TYPE && coder == POINTER_TYPE)
      return convert (type, rhs);
+   /* XXX  For middle-end operations on arrays we handle ARRAY vs. REAL
+      types.  This needs to be appropriately extended to handle any (valid)
+      scalar type.  */
+   else if (codel == ARRAY_TYPE
+          && coder == REAL_TYPE)
+     {
+       tree etype = type;
+       do {
+       etype = TREE_TYPE (etype);
+       } while (TREE_CODE (etype) == ARRAY_TYPE);
+       return convert (etype, rhs);
+     }
+   else if (codel == ARRAY_TYPE
+          && coder == ARRAY_TYPE)
+     {
+       /* XXX  No automatic scalar conversion with this assignment.
+        XXX  No check for matching layout.  */
+       return rhs;
+     }
  
    switch (errtype)
      {
*************** build_binary_op (enum tree_code code, tr
*** 7962,7967 ****
--- 7987,8002 ----
        return pointer_int_sum (PLUS_EXPR, op0, op1);
        else if (code1 == POINTER_TYPE && code0 == INTEGER_TYPE)
        return pointer_int_sum (PLUS_EXPR, op1, op0);
+       else if ((code0 == ARRAY_TYPE && code1 == REAL_TYPE)
+              || (code0 == REAL_TYPE && code1 == ARRAY_TYPE)
+              || (code0 == ARRAY_TYPE && code1 == ARRAY_TYPE))
+       {
+         /* XXX  Check compatibility.  */
+         if (code0 == REAL_TYPE)
+           return build2 (PLUS_EXPR, type1, op1, op0);
+         else
+           return build2 (PLUS_EXPR, type0, op0, op1);
+       }
        else
        common = 1;
        break;
Index: gcc/c-parser.c
===================================================================
*** gcc/c-parser.c      (revision 130435)
--- gcc/c-parser.c      (working copy)
*************** c_parser_expr_no_commas (c_parser *parse
*** 4431,4437 ****
      }
    c_parser_consume_token (parser);
    rhs = c_parser_expr_no_commas (parser, NULL);
!   rhs = default_function_array_conversion (rhs);
    ret.value = build_modify_expr (lhs.value, code, rhs.value);
    if (code == NOP_EXPR)
      ret.original_code = MODIFY_EXPR;
--- 4431,4439 ----
      }
    c_parser_consume_token (parser);
    rhs = c_parser_expr_no_commas (parser, NULL);
!   /* XXX */
!   if (TREE_CODE (TREE_TYPE (lhs.value)) != ARRAY_TYPE)
!     rhs = default_function_array_conversion (rhs);
    ret.value = build_modify_expr (lhs.value, code, rhs.value);
    if (code == NOP_EXPR)
      ret.original_code = MODIFY_EXPR;

Reply via email to