>From the last GCC Summit we learned about (Fortran) Frontend Based Optimizations from Toon and it was suggested to tackle one issue by promoting arrays to a first-class middle-end type. Discussion with Sebastian also reveals that this makes sense to ease the analysis of the into-GRAPHITE transformation by not dropping information that cannot be re-computed.
For example Fortran has the "load-before-store" semantics on its vector operations, i.e. if the same array cell is accessed twice it is first read, then written i.e. updated. The scalarizer has to use temporary arrays for ensuring this semantics, so for example for "A = A + B" with A and B arrays, the scalarizer creates a temporary buffer that will hold the intermediate results of the addition before updating the array A: i.e. "T = A + B" followed by "A = T". This decompression of the memory use is easy to perform (see the array privatization techniques for parallelization), but the inverse operation, i.e. the compression is much more difficult, as it has to analyze in IPA mode the uses of T: before removing the initialization of T it has to ensure that there are no other uses of T. Another point for the motivation is that vector constructs should survive till the point where GCC knows/uses information about the vector target machine with using the vectorizer (as part of the GRAPHITE framework) to create regular (optimized) scalar IL from the array operations. So this is a proposal to make this happen, allow statements and expressions that operate on (multidimensional) arrays. The first and maybe most important issue is that temporaries required by either design issues of the IL or by optimizations are not going to be cheap. The second issue is the question of if (yes I'd say) and when (very early I'd say) we want to lower statements / expressions on arrays and generate loop constructs for them. On the first issue, how to represent array expressions in the IL we can take the pragmatic approach for now and merely try to make the expressions survive until they get picked up by GRAPHITE (or other to-be-invented special optimizers) but allow scalar optimizers to operate on scalar parts of the expressions. To avoid creating temporaries of array type, which the gimplifier in its current form cannot do for variable length types, we do not create full GIMPLE form of array expressions, but retain GENERIC expressions in RHS. With the very experimental prototype patch (that just was done to experiment with forms of the IL) we would for example end up with float a[16]; float b[16]; float c[16]; tmp = 0.0; a.1 = &a; b.2 = &b; c.3 = &c; n.4 = n; D.1558 = (long unsigned int) n.4; D.1559 = D.1558 * 4; *a.1 = WITH_SIZE_EXPR <(*b.2 + *c.3) + tmp, D.1559>; D.1560 = a[0]; return D.1560; for the whole array expression C source float test5 (int n) { float a[16]; float b[16]; float c[16]; float tmp = 0.0; *(float (*)[n])&a = *(float (*)[n])&b + *(float (*)[n])&c + tmp; return a[0]; } Ignore the WITH_SIZE_EXPR here, the point is that we have two operations and three operands on the rhs, two arrays, one scalar. CCP happily propagates 0.0 into the expression and if you do not run SRA (which ICEs) you survive until expand, which of course has no idea how to deal with this expression ;) So we think this approach is sound if these expressions do not survive for too long (that is, only until after out-of-GRAPHITE). A different approach would be to only allow whole arrays as gimple values and proceed with usual gimplification of expressions. To avoid to have to allocate memory for the array temporaries we could represent them as pointers to (not yet) allocated storage like for the above case float (* D.29)[n] = (float (*)[n])&a; float (* D.30)[n] = (float (*)[n])&b; float (* D.31)[n] = (float (*)[n])&c; float (* D.42)[n]; D.42 = __builtin_vla_tmp_alloc (n * 4); *D.42 = *D.30 + *D.31; *D.29 = *D.42 + tmp; __builtin_vla_tmp_free (D.43); where we still would have de-references of pointers to arrays als gimple values as well. This approach would also fit inserted temporaries that are required for language semantic reasons. On the second issue we are thinking of leveraging GRAPHITE which needs to create scalar code out of its intermediate representation anyway to do the lowering to scalar code. Especially we do not want to force expand to deal with the whole-array expressions. But this is obviously the issue which we can most easily post-pone until we can do experiments. There are two more aspects that need consideration. First, if and if yes, what kind of, tree codes we want to use for the operators. It would be possible to re-use PLUS_EXPR and MINUS_EXPR in general and use MULT_EXPR and RDIV_EXPR for mixed array / scalar operands. But in the end I don't think this is desirable and certainly not sufficient thinking of matrix multiplication, contraction and expansions. So a sound approach would be to introduce a ARRAY_EXPR and use sub-codes to distinguish the operations. The alternative is to use builtin functions, but that ends up with nested function calls in the IL which I think should be avoided, if only because of estethic reasons. The second aspect is that with the current experimental patch (and possibly with the final design) the array shape is encoded in the type of the access rather than using an explicit descriptor object as the Fortran frontend does right now. This has the advantage of less IL (and alias conflicts with such an object) and probably makes the analysis phase easier. But this raises the issue that the middle-end as of today cannot express array shapes with strides not equal to one - still this should be an easy extension of the current array type description. Then there is the question of semantics of an array expression. The simplest and most useful one is IMHO to make iteration order undefined, that is make statements that have overlapping LHS and RHS undefined and require the frontend to introduce temporaries where language semantics request that. The other semantics to follow could be the Fortran load-before-store semantics, which would move all the dependency checking code out of the frontend into the middle-end. But we can also easily have both semantics in parallel by creating two different assignment operators, one that states there is no need for temporaries to follow load-before-store semantics and one stating the opposite. Obviously the second can be lowered to the first one somewhen. So, to sum up all of the above we propose to: - Allow expressions on whole arrays. - Iteration order for evaluating an expression is undefined. - There will be both load-before-store and undefined load/store ordering assignment statements. - The IL will have whole-arrays including pointer-dereferences to whole-arrays as gimple values. - The IL will have fake / intermediate VLA temporaries as pointers to (defered) allocated memory. - The IL will be lowered to GIMPLE loops using GRAPHITE. - Expressions on arrays and mixed scalar/array operations will use separate tree codes possibly sub-coding a common parent code. - ARRAY_TYPEs will be extended to specify a stride. We do not (yet) propose language extensions to leverage the middle-end capabilities to other frontends than Fortran. But making this feature available using builtin functions that would be lowered to the array IL by the gimplifier is probably easy. (No, the prototype patch is _not_ a language extension proposal ;)) Find the experimentation patch below, with the limitation that it only allows binary + as operand and that it implements GENERIC rhs instead of the VLA temporaries approach. All testcases will ICE at least at the point of expansion, SRA is another ICEr. Otherwise working testcases are appended below, before the actual patch. Thanks, Richard. void test0 (double *p, int pn0, /* int pstride0 == 0, */ int pn1, int pstride1, int pn2 /*, int pstride2 unused, */) { *((double (*)[pn2 + pn1 * pstride1][pn1 /* + pn0 * pstride0 */][pn0])p) = 0.0; } void test1 (double *p, int pn0, int pn1, int pstride1, int pn2, double *q, int qn0, int qn1, int qstride1, int qn2) { *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) = *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q); } void test2 (double *p, int pn0, int pn1, int pstride1, int pn2, double *q, int qn0, int qn1, int qstride1, int qn2) { *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) = *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q) + 1.0; } void test3 (double *p, int pn0, int pn1, int pstride1, int pn2, double *q, int qn0, int qn1, int qstride1, int qn2) { *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) += *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q); } void test4 (double *p, int pn0, int pn1, int pstride1, int pn2, double *q, int qn0, int qn1, int qstride1, int qn2) { *((double (*)[pn2 + pn1 * pstride1][pn1][pn0])p) += *((double (*)[qn2 + qn1 * qstride1][qn1][qn0])q) + 1.0; } float test5 (int n) { float a[16]; float b[16]; float c[16]; float tmp = 0.0; *(float (*)[n])&a = *(float (*)[n])&b + *(float (*)[n])&c + tmp; return a[0]; } Index: gcc/fold-const.c =================================================================== *** gcc/fold-const.c (revision 130435) --- gcc/fold-const.c (working copy) *************** fold_convert (tree type, tree arg) *** 2487,2492 **** --- 2487,2504 ---- if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (orig)) return fold_build1 (NOP_EXPR, type, arg); + /* XXX With VLA types we would need to do more complicated + matching if a conversion is possible or not. */ + if (TREE_CODE (type) == ARRAY_TYPE + && TREE_CODE (orig) == ARRAY_TYPE) + return fold_build1 (NOP_EXPR, type, arg); + + /* XXX Hack alert! We for sure need extra tree codes for + mixed-mode arguments. */ + if (TREE_CODE (type) == ARRAY_TYPE + && TREE_CODE (orig) == REAL_TYPE) + return arg; + switch (TREE_CODE (type)) { case INTEGER_TYPE: case ENUMERAL_TYPE: case BOOLEAN_TYPE: Index: gcc/tree-gimple.c =================================================================== *** gcc/tree-gimple.c (revision 130435) --- gcc/tree-gimple.c (working copy) *************** is_gimple_non_addressable (tree t) *** 382,387 **** --- 382,394 ---- bool is_gimple_val (tree t) { + /* Allow array operands in expressions. */ + if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE + /*&& (is_gimple_variable (t) + || (TREE_CODE (t) == INDIRECT_REF + && is_gimple_variable (TREE_OPERAND (t, 0))))*/) + return true; + /* Make loads from volatiles and memory vars explicit. */ if (is_gimple_variable (t) && is_gimple_reg_type (TREE_TYPE (t)) Index: gcc/c-typeck.c =================================================================== *** gcc/c-typeck.c (revision 130435) --- gcc/c-typeck.c (working copy) *************** default_function_array_conversion (struc *** 1663,1668 **** --- 1663,1674 ---- if (TREE_NO_WARNING (orig_exp)) TREE_NO_WARNING (exp.value) = 1; + /* XXX */ + if (TREE_CODE (exp.value) == INDIRECT_REF + && (TREE_CODE (TREE_OPERAND (exp.value, 0)) == NOP_EXPR + || TREE_CODE (TREE_OPERAND (exp.value, 0)) == CONVERT_EXPR)) + return exp; + lvalue_array_p = !not_lvalue && lvalue_p (exp.value); if (!flag_isoc99 && !lvalue_array_p) { *************** convert_for_assignment (tree type, tree *** 4368,4373 **** --- 4374,4398 ---- } else if (codel == BOOLEAN_TYPE && coder == POINTER_TYPE) return convert (type, rhs); + /* XXX For middle-end operations on arrays we handle ARRAY vs. REAL + types. This needs to be appropriately extended to handle any (valid) + scalar type. */ + else if (codel == ARRAY_TYPE + && coder == REAL_TYPE) + { + tree etype = type; + do { + etype = TREE_TYPE (etype); + } while (TREE_CODE (etype) == ARRAY_TYPE); + return convert (etype, rhs); + } + else if (codel == ARRAY_TYPE + && coder == ARRAY_TYPE) + { + /* XXX No automatic scalar conversion with this assignment. + XXX No check for matching layout. */ + return rhs; + } switch (errtype) { *************** build_binary_op (enum tree_code code, tr *** 7962,7967 **** --- 7987,8002 ---- return pointer_int_sum (PLUS_EXPR, op0, op1); else if (code1 == POINTER_TYPE && code0 == INTEGER_TYPE) return pointer_int_sum (PLUS_EXPR, op1, op0); + else if ((code0 == ARRAY_TYPE && code1 == REAL_TYPE) + || (code0 == REAL_TYPE && code1 == ARRAY_TYPE) + || (code0 == ARRAY_TYPE && code1 == ARRAY_TYPE)) + { + /* XXX Check compatibility. */ + if (code0 == REAL_TYPE) + return build2 (PLUS_EXPR, type1, op1, op0); + else + return build2 (PLUS_EXPR, type0, op0, op1); + } else common = 1; break; Index: gcc/c-parser.c =================================================================== *** gcc/c-parser.c (revision 130435) --- gcc/c-parser.c (working copy) *************** c_parser_expr_no_commas (c_parser *parse *** 4431,4437 **** } c_parser_consume_token (parser); rhs = c_parser_expr_no_commas (parser, NULL); ! rhs = default_function_array_conversion (rhs); ret.value = build_modify_expr (lhs.value, code, rhs.value); if (code == NOP_EXPR) ret.original_code = MODIFY_EXPR; --- 4431,4439 ---- } c_parser_consume_token (parser); rhs = c_parser_expr_no_commas (parser, NULL); ! /* XXX */ ! if (TREE_CODE (TREE_TYPE (lhs.value)) != ARRAY_TYPE) ! rhs = default_function_array_conversion (rhs); ret.value = build_modify_expr (lhs.value, code, rhs.value); if (code == NOP_EXPR) ret.original_code = MODIFY_EXPR;