> > From the last GCC Summit we learned about (Fortran) Frontend Based > Optimizations from Toon and it was suggested to tackle one issue by > promoting arrays to a first-class middle-end type. Discussion with > Sebastian also reveals that this makes sense to ease the analysis > of the into-GRAPHITE transformation by not dropping information that > cannot be re-computed. > > ...
Maybe something like the infrastructure you propose could also fascilitate supporting vectorization into "virtual vectors" (vectors whose size is unknown)? (as raised in this thread: http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00687.html - i.e. to vectorizer w/o knowing any target specific info, including vector sizes). (Or is there a different way to express such a thing?) This would also let us split the vectorizer into a target-independent part, that uses the vector size as a parameter, followed by a target dependent part that materializes the vectors into actual vectors as supported by the target (including alignment handling, etc). The target dependent part would be invoked when the target machine is known at compile time. For cases when you use dynamic/JIT compilation (e.g. targets like http://gcc.gnu.org/projects/cli.html) the target dependent part would take place during JIT compilation later on. > > Another point for the motivation is that vector constructs should > survive till the point where GCC knows/uses information about the > vector target machine with using the vectorizer (as part of the > GRAPHITE framework) to create regular (optimized) scalar IL from > the array operations. > (Just to clarify - when you say "using the vectorizer to create regular scalar IL" from whole-array constructs, you mean a "scalar IL" that also includes the kind of operations on vector types we have today, right?) Until such time when the vectorizer is incorporated in GRAPHITE, the vectorizer could already benefit from preserving whole-array operations (by avoiding the introduction of redundant dependences/order-constraints). However, the early lowering you require means that we can't keep the array expressions around until vectorization in its current location. Is that a strict restriction? Can this IL extension live longer? thanks, dorit > > So this is a proposal to make this happen, allow statements and > expressions that operate on (multidimensional) arrays. > > The first and maybe most important issue is that temporaries required > by either design issues of the IL or by optimizations are not going > to be cheap. The second issue is the question of if (yes I'd say) and > when (very early I'd say) we want to lower statements / expressions > on arrays and generate loop constructs for them. > > > On the first issue, how to represent array expressions in the IL we > can take the pragmatic approach for now and merely try to make the > expressions survive until they get picked up by GRAPHITE (or other > to-be-invented special optimizers) but allow scalar optimizers to > operate on scalar parts of the expressions. To avoid creating > temporaries of array type, which the gimplifier in its current form > cannot do for variable length types, we do not create full GIMPLE > form of array expressions, but retain GENERIC expressions in RHS. > With the very experimental prototype patch (that just was done to > experiment with forms of the IL) we would for example end up with > > float a[16]; > float b[16]; > float c[16]; > > tmp = 0.0; > a.1 = &a; > b.2 = &b; > c.3 = &c; > n.4 = n; > D.1558 = (long unsigned int) n.4; > D.1559 = D.1558 * 4; > *a.1 = WITH_SIZE_EXPR <(*b.2 + *c.3) + tmp, D.1559>; > D.1560 = a[0]; > return D.1560; > > for the whole array expression C source > > float test5 (int n) > { > float a[16]; > float b[16]; > float c[16]; > float tmp = 0.0; > *(float (*)[n])&a = *(float (*)[n])&b + *(float (*)[n])&c + tmp; > return a[0]; > } > > Ignore the WITH_SIZE_EXPR here, the point is that we have two > operations and three operands on the rhs, two arrays, one scalar. > CCP happily propagates 0.0 into the expression and if you do not > run SRA (which ICEs) you survive until expand, which of course has > no idea how to deal with this expression ;) > > So we think this approach is sound if these expressions do not survive > for too long (that is, only until after out-of-GRAPHITE). > > > A different approach would be to only allow whole arrays as gimple > values and proceed with usual gimplification of expressions. To > avoid to have to allocate memory for the array temporaries we could > represent them as pointers to (not yet) allocated storage like > for the above case > > float (* D.29)[n] = (float (*)[n])&a; > float (* D.30)[n] = (float (*)[n])&b; > float (* D.31)[n] = (float (*)[n])&c; > float (* D.42)[n]; > D.42 = __builtin_vla_tmp_alloc (n * 4); > *D.42 = *D.30 + *D.31; > *D.29 = *D.42 + tmp; > __builtin_vla_tmp_free (D.43); > > where we still would have de-references of pointers to arrays als > gimple values as well. This approach would also fit inserted > temporaries that are required for language semantic reasons. > > > On the second issue we are thinking of leveraging GRAPHITE which needs > to create scalar code out of its intermediate representation anyway to > do the lowering to scalar code. Especially we do not want to force > expand to deal with the whole-array expressions. But this is > obviously the issue which we can most easily post-pone until we can > do experiments. > > > There are two more aspects that need consideration. First, if and if > yes, what kind of, tree codes we want to use for the operators. It > would be possible to re-use PLUS_EXPR and MINUS_EXPR in general and > use MULT_EXPR and RDIV_EXPR for mixed array / scalar operands. But > in the end I don't think this is desirable and certainly not sufficient > thinking of matrix multiplication, contraction and expansions. So > a sound approach would be to introduce a ARRAY_EXPR and use sub-codes > to distinguish the operations. The alternative is to use builtin > functions, but that ends up with nested function calls in the IL which > I think should be avoided, if only because of estethic reasons. > > > The second aspect is that with the current experimental patch (and > possibly with the final design) the array shape is encoded in the > type of the access rather than using an explicit descriptor object > as the Fortran frontend does right now. This has the advantage of > less IL (and alias conflicts with such an object) and probably > makes the analysis phase easier. But this raises the issue that > the middle-end as of today cannot express array shapes with strides > not equal to one - still this should be an easy extension of the > current array type description. > > > Then there is the question of semantics of an array expression. The > simplest and most useful one is IMHO to make iteration order undefined, > that is make statements that have overlapping LHS and RHS undefined > and require the frontend to introduce temporaries where language > semantics request that. > > The other semantics to follow could be the Fortran load-before-store > semantics, which would move all the dependency checking code out of > the frontend into the middle-end. > > But we can also easily have both semantics in parallel by creating > two different assignment operators, one that states there is no > need for temporaries to follow load-before-store semantics and one > stating the opposite. Obviously the second can be lowered to the > first one somewhen. > > > So, to sum up all of the above we propose to: > > - Allow expressions on whole arrays. > - Iteration order for evaluating an expression is undefined. > - There will be both load-before-store and undefined load/store > ordering assignment statements. > - The IL will have whole-arrays including pointer-dereferences > to whole-arrays as gimple values. > - The IL will have fake / intermediate VLA temporaries as pointers > to (defered) allocated memory. > - The IL will be lowered to GIMPLE loops using GRAPHITE. > - Expressions on arrays and mixed scalar/array operations will > use separate tree codes possibly sub-coding a common parent code. > - ARRAY_TYPEs will be extended to specify a stride. > > We do not (yet) propose language extensions to leverage the middle-end > capabilities to other frontends than Fortran. But making this > feature available using builtin functions that would be lowered to > the array IL by the gimplifier is probably easy. (No, the prototype > patch is _not_ a language extension proposal ;)) > > Find the experimentation patch below, with the limitation that it > only allows binary + as operand and that it implements GENERIC > rhs instead of the VLA temporaries approach. All testcases will ICE at least > at the point of expansion, SRA is another ICEr. Otherwise working > testcases are appended below, before the actual patch. > > Thanks, > Richard. > > > void test0 (double *p, int pn0, /* int pstride0 == 0, */ > int pn1, int pstride1, > int pn2 /*, int pstride2 unused, */) > { > *((double (*)[pn2 + pn1 * pstride1][pn1 /* + pn0 * pstride0 */][pn0])p) > = 0.0; > } > void test1 (double *p, int pn0, int pn1, int pstride1, int pn2, > double *q, int qn0, int qn1, int qstride1, int qn2) > { > *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) > = *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q); > } > void test2 (double *p, int pn0, int pn1, int pstride1, int pn2, > double *q, int qn0, int qn1, int qstride1, int qn2) > { > *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) > = *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q) + 1.0; > } > void test3 (double *p, int pn0, int pn1, int pstride1, int pn2, > double *q, int qn0, int qn1, int qstride1, int qn2) > { > *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) > += *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q); > } > void test4 (double *p, int pn0, int pn1, int pstride1, int pn2, > double *q, int qn0, int qn1, int qstride1, int qn2) > { > *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) > += *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q) + 1.0; > } > float test5 (int n) > { > float a[16]; > float b[16]; > float c[16]; > float tmp = 0.0; > *(float (*)[n])&a = *(float (*)[n])&b + *(float (*)[n])&c + tmp; > return a[0]; > } > > > Index: gcc/fold-const.c > =================================================================== > *** gcc/fold-const.c (revision 130435) > --- gcc/fold-const.c (working copy) > *************** fold_convert (tree type, tree arg) > *** 2487,2492 **** > --- 2487,2504 ---- > if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (orig)) > return fold_build1 (NOP_EXPR, type, arg); > > + /* XXX With VLA types we would need to do more complicated > + matching if a conversion is possible or not. */ > + if (TREE_CODE (type) == ARRAY_TYPE > + && TREE_CODE (orig) == ARRAY_TYPE) > + return fold_build1 (NOP_EXPR, type, arg); > + > + /* XXX Hack alert! We for sure need extra tree codes for > + mixed-mode arguments. */ > + if (TREE_CODE (type) == ARRAY_TYPE > + && TREE_CODE (orig) == REAL_TYPE) > + return arg; > + > switch (TREE_CODE (type)) > { > case INTEGER_TYPE: case ENUMERAL_TYPE: case BOOLEAN_TYPE: > Index: gcc/tree-gimple.c > =================================================================== > *** gcc/tree-gimple.c (revision 130435) > --- gcc/tree-gimple.c (working copy) > *************** is_gimple_non_addressable (tree t) > *** 382,387 **** > --- 382,394 ---- > bool > is_gimple_val (tree t) > { > + /* Allow array operands in expressions. */ > + if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE > + /*&& (is_gimple_variable (t) > + || (TREE_CODE (t) == INDIRECT_REF > + && is_gimple_variable (TREE_OPERAND (t, 0))))*/) > + return true; > + > /* Make loads from volatiles and memory vars explicit. */ > if (is_gimple_variable (t) > && is_gimple_reg_type (TREE_TYPE (t)) > Index: gcc/c-typeck.c > =================================================================== > *** gcc/c-typeck.c (revision 130435) > --- gcc/c-typeck.c (working copy) > *************** default_function_array_conversion (struc > *** 1663,1668 **** > --- 1663,1674 ---- > if (TREE_NO_WARNING (orig_exp)) > TREE_NO_WARNING (exp.value) = 1; > > + /* XXX */ > + if (TREE_CODE (exp.value) == INDIRECT_REF > + && (TREE_CODE (TREE_OPERAND (exp.value, 0)) == NOP_EXPR > + || TREE_CODE (TREE_OPERAND (exp.value, 0)) == CONVERT_EXPR)) > + return exp; > + > lvalue_array_p = !not_lvalue && lvalue_p (exp.value); > if (!flag_isoc99 && !lvalue_array_p) > { > *************** convert_for_assignment (tree type, tree > *** 4368,4373 **** > --- 4374,4398 ---- > } > else if (codel == BOOLEAN_TYPE && coder == POINTER_TYPE) > return convert (type, rhs); > + /* XXX For middle-end operations on arrays we handle ARRAY vs. REAL > + types. This needs to be appropriately extended to handle any (valid) > + scalar type. */ > + else if (codel == ARRAY_TYPE > + && coder == REAL_TYPE) > + { > + tree etype = type; > + do { > + etype = TREE_TYPE (etype); > + } while (TREE_CODE (etype) == ARRAY_TYPE); > + return convert (etype, rhs); > + } > + else if (codel == ARRAY_TYPE > + && coder == ARRAY_TYPE) > + { > + /* XXX No automatic scalar conversion with this assignment. > + XXX No check for matching layout. */ > + return rhs; > + } > > switch (errtype) > { > *************** build_binary_op (enum tree_code code, tr > *** 7962,7967 **** > --- 7987,8002 ---- > return pointer_int_sum (PLUS_EXPR, op0, op1); > else if (code1 == POINTER_TYPE && code0 == INTEGER_TYPE) > return pointer_int_sum (PLUS_EXPR, op1, op0); > + else if ((code0 == ARRAY_TYPE && code1 == REAL_TYPE) > + || (code0 == REAL_TYPE && code1 == ARRAY_TYPE) > + || (code0 == ARRAY_TYPE && code1 == ARRAY_TYPE)) > + { > + /* XXX Check compatibility. */ > + if (code0 == REAL_TYPE) > + return build2 (PLUS_EXPR, type1, op1, op0); > + else > + return build2 (PLUS_EXPR, type0, op0, op1); > + } > else > common = 1; > break; > Index: gcc/c-parser.c > =================================================================== > *** gcc/c-parser.c (revision 130435) > --- gcc/c-parser.c (working copy) > *************** c_parser_expr_no_commas (c_parser *parse > *** 4431,4437 **** > } > c_parser_consume_token (parser); > rhs = c_parser_expr_no_commas (parser, NULL); > ! rhs = default_function_array_conversion (rhs); > ret.value = build_modify_expr (lhs.value, code, rhs.value); > if (code == NOP_EXPR) > ret.original_code = MODIFY_EXPR; > --- 4431,4439 ---- > } > c_parser_consume_token (parser); > rhs = c_parser_expr_no_commas (parser, NULL); > ! /* XXX */ > ! if (TREE_CODE (TREE_TYPE (lhs.value)) != ARRAY_TYPE) > ! rhs = default_function_array_conversion (rhs); > ret.value = build_modify_expr (lhs.value, code, rhs.value); > if (code == NOP_EXPR) > ret.original_code = MODIFY_EXPR;