Re: Tree CONSTRUCTORs and Ada array indices
On Wed, Oct 14, 2015 at 5:57 PM, Eric Botcazou wrote: >> I'm having fun with arrays in Ada ;) and wondering if anyone can tell me >> what's right here. > > No surprise, arrays are probably the base types for which there is the largest > gap between Ada and the C family of languages, for which GCC was written. > You can see in the E_Array_Subtype case of gnat_to_gnu_entity the hoops we > have to jump through to make them work. > >> In GCC this gets represented as an ARRAY_TYPE, whose TYPE_DOMAIN is a 32-bit >> signed integer type; the TYPE_MAX_VALUE of this is 1 (as a size_int), the >> TYPE_MIN_VALUE of this is -1 as a size_int, i.e. 4294967295. I believe >> using (unsigned 32-bit) size_int for min/max is correct (?) despite the >> domain being a signed type. > > Historically it was required that TYPE_DOMAIN be a subtype of 'sizetype', see > build_index_type. The Ada compiler was originally using a signed 'sizetype' > because of this (all the other compilers were using an unsigned one) but a > couple of changes (POINTER_PLUS_EXPR and LTO) essentially forced a uniform > 'sizetype' across languages and the Ada compiler wasn't the winner. ;-) > >> An array of this type is then initialized with a CONSTRUCTOR. The >> CONSTRUCTOR has three elements, with INTEGER_CST indices, being in order >> 2^32-1, 0, 1 (all of sizetype). >> >> I substitute the CONSTRUCTOR into an ARRAY_REF [4294967295]{lb: >> 4294967295 sz: 2}, i.e. that picks the element with index (sizetype)2^32-1. >> >> fold() in fold-const.c then fails to fold, because it does binary search for >> 2^32-1 through the constructor elements, and first checks against the >> middle CONSTRUCTOR_ELT, which has index 0. > > IIRC I wrote this code (before the signedness change). > >> So, where's the bug here? Should all these indices have sizetype? Should >> fold-const.c's binary search respect the TYPE_DOMAIN of the type of the >> CONSTRUCTOR being searched? (Lots of build_int_cst to reinterpret the >> CONSTRUCTOR_ELT indices, and/or the index being searched for, in said >> TYPE_DOMAIN ?) > > According to Richard, we should now be able to set TYPE_DOMAIN to something > else than a subtype of 'sizetype' but this is a mid-term goal and may require > some substantial work. The short-term solution is probably to do nothing, > arrays with negative indices are not very common in real life even in Ada. C/C++ empty arrays (int a[0]) now use a signed TYPE_DOMAIN with bounds [0, -1] for example (that was my original motivation to make hopefully all of the middle-end work with ssizetype TYPE_DOMAINs at least). The tree.c range-type build helpers accept all integral types now (but most FEs use sizetype or ssizetype). ISTR also doing some Ada adjustments but they may have been dumped for "interesting" changes to stor-layout.c instead: /* ??? When it is obvious that the range is signed represent it using ssizetype. */ if (TREE_CODE (lb) == INTEGER_CST && TREE_CODE (ub) == INTEGER_CST && TYPE_UNSIGNED (TREE_TYPE (lb)) && tree_int_cst_lt (ub, lb)) { lb = wide_int_to_tree (ssizetype, offset_int::from (lb, SIGNED)); ub = wide_int_to_tree (ssizetype, offset_int::from (ub, SIGNED)); } length = fold_convert (sizetype, size_binop (PLUS_EXPR, build_int_cst (TREE_TYPE (lb), 1), size_binop (MINUS_EXPR, ub, lb))); } /* ??? We have no way to distinguish a null-sized array from an array spanning the whole sizetype range, so we arbitrarily decide that [0, -1] is the only valid representation. */ if (integer_zerop (length) && TREE_OVERFLOW (length) && integer_zerop (lb)) length = size_zero_node; this probably tries to deal with the situation you are running into. Richard. > -- > Eric Botcazou
Prototype implementation: Improving effectiveness and generality of auto-vectorization
Hi Richard, This is with reference to our discussion at GNU Tools Cauldron 2015 regarding my talk titled "Improving the effectiveness and generality of GCC auto-vectorization." We, at Imaginations Technologies, have further worked on finalizing the algorithms for transformations to achieve efficient target-aware reordering instruction selection. We have implemented the prototype in python to demonstrate the capabilities of the algorithm and verify the claims made in the presentation. I am attaching the prototype along with the documented algorithm for review purposes. We would be starting with design and implementation of the same in GCC and would be glad to receive comments, feedback and suggestions. - Thanks and regards, Sameera Deshpande Introduction Current methods of auto-vectorization take the vectorization decision through a generic analysis which is almost target independent (except for target features such as vector sizes). However, the instruction selection mechanism is largely adhoc in the sense that the ideas used for instruction selection for a target may not carry over seamlessly to some other target. This has led to proliferation of target specific code in the generic part of compiler frameworks such as GCC. We propose a generic approach for reordering-instruction selection which is based on the insight that effective data reordering for vectorization can be defined in terms of a set of well defined primitive operations. Interestingly, the same set of operations can be used to define target instructions almost for all targets that we know of. Thus these operations form the basic building blocks of reorderings for effective vectorization. Primitive operations Let V_N denote a vector of N scalar elements. A collection of k V_N vectors is denoted by V_N , ...k , V_N We propose the following operations: - Concatenation_k^N : V_N x ... x k V_N --> V_kN : CONCAT_k^N (V_N , ...k , V_N) concatenates k vectors of size N to produce a single vector of size kN. - Split_k,i^kN : V_kN --> V_N . Split operation is the inverse of concatenation. SPLT_k,i^kN ( V_kN ) splits a vector of size kN into k vectors of size N each and returns the i th vector. - Inerleave_k^N : V_N x ...k x V_N --> V_kN . ILV_k^N ( V_N, ...k, V_N ) creates a vector of size kN such that ith element of output vector comes from i/kth element from i%kth vector. - Extract_k,i^kN : V_kN --> V_N. Extract operation is inverse of interleave operation. EXTR_k,i^kN (V_kN) divides the input vector into N parts and creates k vectors of size N by selecting jth element from each part, and returning ith vector of size N . - permute^N : V_N x I_M --> V_N. Permute operation takes vector of size N and integer order of size M. It divides vector N into N/M vectors of size M, and permutes each subvector with permute order I_M. Phase structure of Algorithm The algorithm for target-aware reordering-instruction selection for efficient autovectorization is divided into following phases: Phase 1 : Create tree for each loop per destination - Implemented by Sameera Deshpande Phase 2 : k-arity reduction/promotion - Implemented by Sameera Deshpande Phase 3 : Top-down distribution of p-node subtree - Implemented by Sameera Deshpande Phase 4 : Optimizations - Implemented by Sameera Deshpande Phase 5 : Bottom-up commoning of p-node subtree - Implemented by Sameera Deshpande - Cost computation of p-tree - Implemented by Sameera Deshpande Phase 6 : Vector-size reduction - Implemented by Sameera Deshpande Phase 4(again): Optimization - Implemented by Sameera Deshpande Phase 7: Cost computation - Store min_cost_tree. - Implemented by Prachi Godbole Phase 8: Target specific code generation for min_cost_tree - Implemented by Prachi Godbole Input loop == To enable auto-vectorization using this algorithm, the loop structure should be as follows: Loop with k statements reading from S_1 , S_2 , ... S_m and writing to destination D, where stmt j is of form D[k * i + d_j ] = op_j (S_1 [k_1 * i + c_j1 ], S_2 [k_2 * i + c_j2 ], ... S_m [k_m * i + c_jm ]) such that – All statements stmt_1 , ... stmt_k in the loop body are independent and vectorizable for target with vector size VS. – Iteration count N > VS. – Induction variable i for the loop is not altered by stmt_1 , stmt_2 ... stmt_k . – All S_1 , ... S_m and D are of same type. – Indices for all S and D have same multiplicative index. If different, normalize by taking LCM, and duplicate the statements appropriately by adjusting the offset. – c_jp < k_p FOR(i = 0 ... N − 1) { stmt 1 : D [k * i + d_1 ] = op_1 (S_1 [k_1 * i + c_11 ], S_2 [k_2 * i + c_12 ], . . . S_m [k_m * i + c_1m ]) stmt 2 : D [k * i + d_2 ] = op_2 (S_1 [k_1 * i + c_21 ], S_2 [k_2 * i + c_22 ], . . . S_m [k_m * i + c_2m ]) ... stmt k : D [k * i + d_k ] = op_k (S_1 [k_1 * i + c_k1 ], S_2 [k_2 * i + c_k2 ], .
Re: Tree CONSTRUCTORs and Ada array indices
> C/C++ empty arrays (int a[0]) now use a signed TYPE_DOMAIN with bounds [0, > -1] for example (that was my original motivation to make hopefully all of > the middle-end > work with ssizetype TYPE_DOMAINs at least). The tree.c range-type build > helpers accept all integral types now (but most FEs use sizetype or > ssizetype). Using ssizetype in Ada would already be a net progress, I'll investigate. > ISTR also doing some Ada adjustments but they may have been dumped for > "interesting" changes to stor-layout.c instead: > > /* ??? When it is obvious that the range is signed >represent it using ssizetype. */ > if (TREE_CODE (lb) == INTEGER_CST > && TREE_CODE (ub) == INTEGER_CST > && TYPE_UNSIGNED (TREE_TYPE (lb)) > && tree_int_cst_lt (ub, lb)) > { > lb = wide_int_to_tree (ssizetype, >offset_int::from (lb, SIGNED)); > ub = wide_int_to_tree (ssizetype, >offset_int::from (ub, SIGNED)); > } > length > = fold_convert (sizetype, > size_binop (PLUS_EXPR, > build_int_cst (TREE_TYPE (lb), > 1), size_binop (MINUS_EXPR, ub, lb))); } > > /* ??? We have no way to distinguish a null-sized array from an >array spanning the whole sizetype range, so we arbitrarily >decide that [0, -1] is the only valid representation. */ > if (integer_zerop (length) > && TREE_OVERFLOW (length) > && integer_zerop (lb)) > length = size_zero_node; IIRC the first one is yours and the second one is mine. :-) But, yes, they clearly should go and be replaced by changes in gigi. -- Eric Botcazou
Support for targets with widths other than 2^x (Power of 2)
Hi, We here at the Computer Systems Group of the Technische Universität Darmstadt, Germany are working on an SoC-Kit called SpartanMC. We are currently using our own port of GCC to compile for it, but would be willing to get our target upstream and maintain it. The SpartanMC architecture poses a few additional difficulties as it is 18 Bits wide. Our investigations have shown that GCC is on principle capable of that (it is working right now, although with several limitations). The main problems are posed by code optimizations that are only applicable for 2^x values such as ... & (obj_align - 1); or d_int.lshift (BITS_PER_UNIT_LOG) Until now, every bug or crash we experienced was caused by such implementations that can very well be rewritten to have correct results no matter the actual value. Also, the current codebase does not seem to be unified in this and already mixes general implementations and ones that only work for powers of 2. Our question now is, whether or not compatibility with our 18 bit architecture and others is something you desire GCC to be capable of in the long term. It would make it much easier for us to use GCC and allow third parties to make much easier use of our work. Further details on SpartanMC: = It is an open source SoC-Kit aimed at running on several FPGAs from various manufacturers with high resource efficiency, which is why it utilizes the 18 bit wide underlying architecture of most FPGAs. In contrast to exisiting SoC-Kits it is completely open source and fully customizable and cross platform instead of targeted only at one specific brand. Even more information can be found on our projects website: spartanmc.de Thanks and regards, Ramon Wirsch
Re: Support for targets with widths other than 2^x (Power of 2)
On Thu, Oct 15, 2015 at 02:34:06PM +0200, Ramon Wirsch wrote: > The SpartanMC architecture poses a few additional difficulties as it > is 18 Bits wide. > Our investigations have shown that GCC is on principle capable of that > (it is working right now, although with several limitations). The main > problems are posed by code optimizations that are only applicable for > 2^x values such as > > ... & (obj_align - 1); > or > d_int.lshift (BITS_PER_UNIT_LOG) > > Until now, every bug or crash we experienced was caused by such > implementations that can very well be rewritten to have correct > results no matter the actual value. Patches welcome :-) > Also, the current codebase does not seem to be unified in this and > already mixes general implementations and ones that only work for > powers of 2. > > Our question now is, whether or not compatibility with our 18 bit > architecture and others is something you desire GCC to be capable of > in the long term. If it is not a big maintenance burden (and you say it isn't), I don't see why we wouldn't want to support this. The flip side is that there also needs to be constant maintenance from "your" side (i.e., people testing with interesting word lengths), to prevent things from regressing -- many people will just not consider the possibility of non-power-of-two word lengths. Do you already have patches for this separated out? If you post those people can more clearly see what the actual impact is. Segher