date:20151015

Re: Tree CONSTRUCTORs and Ada array indices

2015-10-15 Thread Richard Biener

On Wed, Oct 14, 2015 at 5:57 PM, Eric Botcazou  wrote:
>> I'm having fun with arrays in Ada ;) and wondering if anyone can tell me
>> what's right here.
>
> No surprise, arrays are probably the base types for which there is the largest
> gap between Ada and the C family of languages, for which GCC was written.
> You can see in the E_Array_Subtype case of gnat_to_gnu_entity the hoops we
> have to jump through to make them work.
>
>> In GCC this gets represented as an ARRAY_TYPE, whose TYPE_DOMAIN is a 32-bit
>> signed integer type; the TYPE_MAX_VALUE of this is 1 (as a size_int), the
>> TYPE_MIN_VALUE of this is -1 as a size_int, i.e. 4294967295. I believe
>> using (unsigned 32-bit) size_int for min/max is correct (?) despite the
>> domain being a signed type.
>
> Historically it was required that TYPE_DOMAIN be a subtype of 'sizetype', see
> build_index_type.  The Ada compiler was originally using a signed 'sizetype'
> because of this (all the other compilers were using an unsigned one) but a
> couple of changes (POINTER_PLUS_EXPR and LTO) essentially forced a uniform
> 'sizetype' across languages and the Ada compiler wasn't the winner. ;-)
>
>> An array of this type is then initialized with a CONSTRUCTOR. The
>> CONSTRUCTOR has three elements, with INTEGER_CST indices, being in order
>> 2^32-1, 0, 1 (all of sizetype).
>>
>> I substitute the CONSTRUCTOR into an ARRAY_REF [4294967295]{lb:
>> 4294967295 sz: 2}, i.e. that picks the element with index (sizetype)2^32-1.
>>
>> fold() in fold-const.c then fails to fold, because it does binary search for
>> 2^32-1 through the constructor elements, and first checks against the
>> middle CONSTRUCTOR_ELT, which has index 0.
>
> IIRC I wrote this code (before the signedness change).
>
>> So, where's the bug here? Should all these indices have sizetype? Should
>> fold-const.c's binary search respect the TYPE_DOMAIN of the type of the
>> CONSTRUCTOR being searched? (Lots of build_int_cst to reinterpret the
>> CONSTRUCTOR_ELT indices, and/or the index being searched for, in said
>> TYPE_DOMAIN ?)
>
> According to Richard, we should now be able to set TYPE_DOMAIN to something
> else than a subtype of 'sizetype' but this is a mid-term goal and may require
> some substantial work.  The short-term solution is probably to do nothing,
> arrays with negative indices are not very common in real life even in Ada.

C/C++ empty arrays (int a[0]) now use a signed TYPE_DOMAIN with bounds [0, -1]
for example (that was my original motivation to make hopefully all of
the middle-end
work with ssizetype TYPE_DOMAINs at least).  The tree.c range-type build helpers
accept all integral types now (but most FEs use sizetype or ssizetype).

ISTR also doing some Ada adjustments but they may have been dumped for
"interesting" changes to stor-layout.c instead:

/* ???  When it is obvious that the range is signed
   represent it using ssizetype.  */
if (TREE_CODE (lb) == INTEGER_CST
&& TREE_CODE (ub) == INTEGER_CST
&& TYPE_UNSIGNED (TREE_TYPE (lb))
&& tree_int_cst_lt (ub, lb))
  {
lb = wide_int_to_tree (ssizetype,
   offset_int::from (lb, SIGNED));
ub = wide_int_to_tree (ssizetype,
   offset_int::from (ub, SIGNED));
  }
length
  = fold_convert (sizetype,
  size_binop (PLUS_EXPR,
  build_int_cst (TREE_TYPE (lb), 1),
  size_binop (MINUS_EXPR, ub, lb)));
  }

/* ??? We have no way to distinguish a null-sized array from an
   array spanning the whole sizetype range, so we arbitrarily
   decide that [0, -1] is the only valid representation.  */
if (integer_zerop (length)
&& TREE_OVERFLOW (length)
&& integer_zerop (lb))
  length = size_zero_node;

this probably tries to deal with the situation you are running into.

Richard.

> --
> Eric Botcazou

Prototype implementation: Improving effectiveness and generality of auto-vectorization

2015-10-15 Thread sameera


Hi Richard,

This is with reference to our discussion at GNU Tools Cauldron 2015 regarding my talk titled "Improving the effectiveness and generality of GCC 
auto-vectorization." We, at Imaginations Technologies, have further worked on finalizing the algorithms for transformations to achieve efficient 
target-aware reordering instruction selection. We have implemented the prototype in python to demonstrate the capabilities of the algorithm and verify 
the claims made in the presentation.


I am attaching the prototype along with the documented algorithm for review purposes. We would be starting with design and implementation of the same 
in GCC and would be glad to receive comments, feedback and suggestions.


- Thanks and regards,
  Sameera Deshpande
Introduction

Current methods of auto-vectorization take the vectorization decision through a 
generic analysis which is almost target independent
(except for target features such as vector sizes). However, the instruction 
selection mechanism is largely adhoc in the sense that the
ideas used for instruction selection for a target may not carry over seamlessly 
to some other target. This has led to proliferation of
target specific code in the generic part of compiler frameworks such as GCC.

We propose a generic approach for reordering-instruction selection which is 
based on the insight that effective data reordering for
vectorization can be defined in terms of a set of well defined primitive 
operations. Interestingly, the same set of operations can be 
used to define target instructions almost for all targets that we know of. Thus 
these operations form the basic building blocks of 
reorderings for effective vectorization.

Primitive operations

Let V_N denote a vector of N scalar elements. A collection of k V_N vectors is 
denoted by V_N , ...k , V_N
We propose the following operations:
- Concatenation_k^N : V_N x ... x k V_N --> V_kN : CONCAT_k^N (V_N , ...k , 
V_N) concatenates k vectors of size N to produce a
single vector of size kN.

- Split_k,i^kN : V_kN --> V_N . Split operation is the inverse of 
concatenation. SPLT_k,i^kN ( V_kN ) splits a vector of size kN into
k vectors of size N each and returns the i th vector.

- Inerleave_k^N : V_N x ...k x V_N --> V_kN . ILV_k^N ( V_N, ...k, V_N ) 
creates a vector of size kN such that ith element of output
  vector comes from i/kth element from i%kth vector.

- Extract_k,i^kN : V_kN --> V_N. Extract operation is inverse of interleave 
operation. EXTR_k,i^kN (V_kN) divides the input vector
 into N parts and creates k vectors of size N by selecting jth element from 
each part, and returning ith vector of size N .

- permute^N : V_N x I_M --> V_N. Permute operation takes vector of size N and 
integer order of size M. It divides vector N into N/M
  vectors of size M, and permutes each subvector with permute order I_M.

Phase structure of Algorithm

The algorithm for target-aware reordering-instruction selection for efficient 
autovectorization is divided into following phases:
Phase 1 : Create tree for each loop per destination - Implemented by Sameera 
Deshpande
Phase 2 : k-arity reduction/promotion  - Implemented by Sameera Deshpande
Phase 3 : Top-down distribution of p-node subtree  - Implemented by Sameera 
Deshpande
Phase 4 : Optimizations  - Implemented by Sameera Deshpande
Phase 5 : Bottom-up commoning of p-node subtree  - Implemented by Sameera 
Deshpande
  - Cost computation of p-tree  - Implemented by Sameera Deshpande
Phase 6 : Vector-size reduction  - Implemented by Sameera Deshpande
Phase 4(again): Optimization  - Implemented by Sameera Deshpande
Phase 7: Cost computation - Store min_cost_tree.  - Implemented by Prachi 
Godbole
Phase 8: Target specific code generation for min_cost_tree  - Implemented by 
Prachi Godbole

Input loop
==
To enable auto-vectorization using this algorithm, the loop structure should be 
as follows:
Loop with k statements reading from S_1 , S_2 , ... S_m and writing to 
destination D, where stmt j is of form
D[k * i + d_j ] = op_j (S_1 [k_1 * i + c_j1 ], S_2 [k_2 * i + c_j2 ], ... S_m 
[k_m * i + c_jm ])
such that

– All statements stmt_1 , ... stmt_k in the loop body are independent and 
vectorizable for target with vector size VS.
– Iteration count N > VS.
– Induction variable i for the loop is not altered by stmt_1 , stmt_2 ... 
stmt_k .
– All S_1 , ... S_m and D are of same type.
– Indices for all S and D have same multiplicative index. If different, 
normalize by taking LCM, and duplicate the statements 
  appropriately by adjusting the offset.
– c_jp < k_p


FOR(i = 0 ... N − 1)
{
  stmt 1 : D [k * i + d_1 ] = op_1 (S_1 [k_1 * i + c_11 ], S_2 [k_2 * i + c_12 
], . . . S_m [k_m * i + c_1m ])
  stmt 2 : D [k * i + d_2 ] = op_2 (S_1 [k_1 * i + c_21 ], S_2 [k_2 * i + c_22 
], . . . S_m [k_m * i + c_2m ])
  ...
  stmt k : D [k * i + d_k ] = op_k (S_1 [k_1 * i + c_k1 ], S_2 [k_2 * i + c_k2 
], .

Re: Tree CONSTRUCTORs and Ada array indices

2015-10-15 Thread Eric Botcazou

> C/C++ empty arrays (int a[0]) now use a signed TYPE_DOMAIN with bounds [0,
> -1] for example (that was my original motivation to make hopefully all of
> the middle-end
> work with ssizetype TYPE_DOMAINs at least).  The tree.c range-type build
> helpers accept all integral types now (but most FEs use sizetype or
> ssizetype).

Using ssizetype in Ada would already be a net progress, I'll investigate.

> ISTR also doing some Ada adjustments but they may have been dumped for
> "interesting" changes to stor-layout.c instead:
> 
> /* ???  When it is obvious that the range is signed
>represent it using ssizetype.  */
> if (TREE_CODE (lb) == INTEGER_CST
> && TREE_CODE (ub) == INTEGER_CST
> && TYPE_UNSIGNED (TREE_TYPE (lb))
> && tree_int_cst_lt (ub, lb))
>   {
> lb = wide_int_to_tree (ssizetype,
>offset_int::from (lb, SIGNED));
> ub = wide_int_to_tree (ssizetype,
>offset_int::from (ub, SIGNED));
>   }
> length
>   = fold_convert (sizetype,
>   size_binop (PLUS_EXPR,
>   build_int_cst (TREE_TYPE (lb),
> 1), size_binop (MINUS_EXPR, ub, lb))); }
> 
> /* ??? We have no way to distinguish a null-sized array from an
>array spanning the whole sizetype range, so we arbitrarily
>decide that [0, -1] is the only valid representation.  */
> if (integer_zerop (length)
> && TREE_OVERFLOW (length)
> && integer_zerop (lb))
>   length = size_zero_node;

IIRC the first one is yours and the second one is mine. :-)  But, yes, they 
clearly should go and be replaced by changes in gigi.

-- 
Eric Botcazou

Support for targets with widths other than 2^x (Power of 2)

2015-10-15 Thread Ramon Wirsch

Hi,

We here at the Computer Systems Group of the Technische Universität
Darmstadt, Germany are working on an SoC-Kit called SpartanMC.

We are currently using our own port of GCC to compile for it, but
would be willing to get our target upstream and maintain it.

The SpartanMC architecture poses a few additional difficulties as it
is 18 Bits wide.
Our investigations have shown that GCC is on principle capable of that
(it is working right now, although with several limitations). The main
problems are posed by code optimizations that are only applicable for
2^x values such as

... & (obj_align - 1);
 or
d_int.lshift (BITS_PER_UNIT_LOG)

Until now, every bug or crash we experienced was caused by such
implementations that can very well be rewritten to have correct
results no matter the actual value.
Also, the current codebase does not seem to be unified in this and
already mixes general implementations and ones that only work for
powers of 2.

Our question now is, whether or not compatibility with our 18 bit
architecture and others is something you desire GCC to be capable of
in the long term.
It would make it much easier for us to use GCC and allow third parties
to make much easier use of our work.


Further details on SpartanMC:
=
It is an open source SoC-Kit aimed at running on several FPGAs from
various manufacturers with high resource efficiency, which is why it
utilizes the 18 bit wide underlying architecture of most FPGAs. In
contrast to exisiting SoC-Kits it is completely open source and fully
customizable and cross platform instead of targeted only at one
specific brand.


Even more information can be found on our projects website: spartanmc.de

Thanks and regards,
Ramon Wirsch

Re: Support for targets with widths other than 2^x (Power of 2)

2015-10-15 Thread Segher Boessenkool

On Thu, Oct 15, 2015 at 02:34:06PM +0200, Ramon Wirsch wrote:
> The SpartanMC architecture poses a few additional difficulties as it
> is 18 Bits wide.
> Our investigations have shown that GCC is on principle capable of that
> (it is working right now, although with several limitations). The main
> problems are posed by code optimizations that are only applicable for
> 2^x values such as
> 
> ... & (obj_align - 1);
>  or
> d_int.lshift (BITS_PER_UNIT_LOG)
> 
> Until now, every bug or crash we experienced was caused by such
> implementations that can very well be rewritten to have correct
> results no matter the actual value.

Patches welcome :-)

> Also, the current codebase does not seem to be unified in this and
> already mixes general implementations and ones that only work for
> powers of 2.
> 
> Our question now is, whether or not compatibility with our 18 bit
> architecture and others is something you desire GCC to be capable of
> in the long term.

If it is not a big maintenance burden (and you say it isn't), I don't
see why we wouldn't want to support this.

The flip side is that there also needs to be constant maintenance from
"your" side (i.e., people testing with interesting word lengths), to
prevent things from regressing -- many people will just not consider
the possibility of non-power-of-two word lengths.

Do you already have patches for this separated out?  If you post those
people can more clearly see what the actual impact is.

Segher

Re: Tree CONSTRUCTORs and Ada array indices

Prototype implementation: Improving effectiveness and generality of auto-vectorization

Re: Tree CONSTRUCTORs and Ada array indices

Support for targets with widths other than 2^x (Power of 2)

Re: Support for targets with widths other than 2^x (Power of 2)

5 matches

Site Navigation

Mail list logo

Footer information