RE: Gimplilfy ICE in gnat.dg/array18.adb

2014-06-06 Thread BELBACHIR Selim
Hi,


The patch modifies gcc/ada/gcc-interface/trans.c that way :

   /* First, create the temporary for the return value if we need it: for a
- variable-sized return type if there is no target or if this is slice,
- because the gimplifier doesn't support these cases; or for a function
- with copy-in/copy-out parameters if there is no target, because we'll
- need to preserve the return value before copying back the parameters.
- This must be done before we push a new binding level around the call
- as we will pop it before copying the return value.  */
+ variable-sized return type if there is no target and this is not an
+ object declaration, or else there is a target and it is a slice or an
+ array with fixed size, as the gimplifier doesn't handle these cases;
+ otherwise for a function with copy-in/copy-out parameters if there is
+ no target, because we need to preserve the return value before copying
+ back the parameters.  This must be done before we push a binding level
+ around the call as we will pop it before copying the return value.  */
   if (function_call
   && ((TREE_CODE (TYPE_SIZE (gnu_result_type)) != INTEGER_CST
-  && (!gnu_target || TREE_CODE (gnu_target) == ARRAY_RANGE_REF))
+  && ((!gnu_target
+   && Nkind (Parent (gnat_node)) != N_Object_Declaration)
+  || (gnu_target
+  && (TREE_CODE (gnu_target) == ARRAY_RANGE_REF
+  || (TREE_CODE (TREE_TYPE (gnu_target)) == ARRAY_TYPE
+  && TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target)))
+ == INTEGER_CST)
  || (!gnu_target && TYPE_CI_CO_LIST (gnu_subprog_type



So I gdb to compared the tree  'gnu_target' and 'gnu_result_type' in my port 
and in x86_64 port.

X86_64 gnu_target:

unit size 
align 8 symtab 0 alias set -1 canonical type 0x2ad27c79a348 
precision 8 min  max  context 
pointer_to_this >
nonaliased-component QI size  unit size 

align 8 symtab 0 alias set 0 canonical type 0x2ad27c8cae70
domain 
DI
size 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x2ad27c8cadc8 
precision 64 min  max  index type 
chain > context 
chain >
QI file array18.adb line 7 col 4 size  unit 
size 
align 8 context  chain >

My gnu_target:

unit size 
align 32 symtab 0 alias set -1 canonical type 0x2add91ace930
fields 
decl_3 QI file array18.adb line 7 col 4
size 
unit size 
align 8 offset_align 64
offset 
bit offset  context 
> context  Ada size 
chain >
SI file array18.adb line 7 col 4 size  unit 
size 
align 32 context  chain >

strangely my var_decl for 'a' is a record and not an array_type so the 'if' 
condition is false (and true on X86_64)
I looked for somewhere in my backend something that would transform an 
array_type into a record_type but I did not find anything.



X86_64 gnu_result_type:

unit size 
align 8 symtab 0 alias set -1 canonical type 0x2add91a1e348 precision 8 
min  max  context 

pointer_to_this >
sizes-gimplified visited nonaliased-component BLK
size 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x2add91a1e0a8 
precision 64 min  max >
readonly visited
arg 0 
readonly visited arg 0 > 
arg 1 >
unit size 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x2add91a1e000 
precision 32 min  max >
readonly visited
arg 0 
readonly public visited unsigned ignored external SI file 
array18_pkg.ads line 5 col 30 size  unit size 

align 32 context >>
align 8 symtab 0 alias set 0 canonical type 0x2add91acef18
domain 
sizes-gimplified visited SI size  unit 
size 
align 32 symtab 0 alias set -1 canonical type 0x2add91acee70 precision 
32 min  max 
index type 
sizes-gimplified public visited unsigned SI size  unit size 
align 32 symtab 0 alias set 0 canonical type 0x2add91acebd0 
precision 32 min  max  context  RM size 
 RM min  RM max 

chain >
chain > context 
chain >


My gnu_result_type:

unit size 
align 8 symtab 0 alias set -1 canonical type 0x2ad27c79a348 precision 8 
min  max  context 

pointer_to_this >
sizes-gimplified visited nonaliased-component BLK
size 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x2ad27c79a0a8 
precision 64 min  max >
readonly visited
arg 0 
readonly visited
arg 0 
readonly visited
arg 0 
readonly visited arg 0 >>> arg 1 > unit size 
align 8 symtab 0 alias set 0 c

Re: Gimplilfy ICE in gnat.dg/array18.adb

2014-06-06 Thread Eric Botcazou
> strangely my var_decl for 'a' is a record and not an array_type so the 'if'
> condition is false (and true on X86_64) I looked for somewhere in my
> backend something that would transform an array_type into a record_type but
> I did not find anything.

The comment should clearly state the intent of the change though and how to 
adjust it to your needs.

-- 
Eric Botcazou


Re: [GSoC] decision tree first steps

2014-06-06 Thread Prathamesh Kulkarni
On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener
 wrote:
> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni
>  wrote:
>> I have few questions regarding genmatch:
>>
>> a) Why is 4 hard-coded here: ?
>> in write_nary_simplifiers:
>>  fprintf (f, "  tree captures[4] = {};\n");
>
> Magic number (this must be big enough for all cases ...).  Honestly
> this should be improved (but requires another scan over the matcher IL
> to figure out the max N used in @N).
>
>> b) Should we add syntax for a symbol to denote multiple operators ?
>> For exampleim in simplify_rotate:
>> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^  (CNT1 + CNT2 ==
>> bitsize of type of X).
>
> Something to enhance the IL with, yes.  I'd say we support
>
> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR)
>
> thus,
>
> (define_op  op...)
>
>> c) Remove for parsing capture in parse_expr since we reject outermost
>> captured expressions ?
>
> but parse_expr is also used for inner expressions, no?
>
> (plus (minus@2 @0 @1) @3)
>
> should still work
>
>> d) I am not able to follow this comment in match.pd:
>> /* Patterns required to avoid SCCVN testsuite regressions.  */
>>
>> /* (x >> 31) & 1 -> (x >> 31).  Folding in fold-const is more
>>complicated here, it does
>>  Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
>>  (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>>  if the new mask might be further optimized.  */
>> (match_and_simplify
>>   (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep)
>>   if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0)
>>   @0)
>
> The comment is literally copied from the case I extracted the
> (simplified) variant from fold-const.c.  See lines 11961-12056 in 
> fold-const.c.
> It'll be a challenge to implement the equivalent in a pattern ;)
>
>>
>> Decision Tree.
>> I have tried to come up with a prototype for decision tree (patch 
>> attached).
>> For simplicity, it handles patterns involving only unary operators and
>> no predicates
>> and returns false when the pattern fails to match (no goto to match
>> another pattern).
>> I meant to post it only for illustration, and I have not really paid
>> attention to code quality (bad formatting, memory leaks, etc.).
>>
>> * Basic Idea
>> A pattern consists of following parts: match, ifexpr and result.
>> Let's call  as "simplification" operand.
>> The common prefix between different match operands would be represented
>> by same nodes in the decision tree.
>>
>> Example:
>> (negate (bit_not @0))
>> S1
>>
>> (negate (negate @0))
>> S2
>>
>> S1, S2 denote simplifications for the above patterns respectively.
>>
>> The decision tree would look something like
>> (that's the way it gets constructed with the patch):
>>
>> dummy/root
>> |
>>NEGATE_EXPR
>>  /  \
>>  BIT_NOT   NEGATE_EXPR
>>| |
>>  @0 @0
>>| |
>>  S1  S2
>>
>> a) The children of an internal node are number of decisions that
>> can be taken at that node. In the above case it's 2 for outer NEGATE_EXPR.
>> b) Simplification operand represents leaves of the decision tree
>> c) Instead of having list of heads, I have added one dummy node,
>> and heads become children of these dummy root node.
>> d) Code-gen for non-simplification operands involves generating,
>> "matching" code and for simplification operands involves generating
>> "transform" code
>>
>> * Overall Flow:
>> I guess we would build the decision tree from the AST.
>> So the flow would be like:
>> source -> struct simplify (match, ifexpr, result) -> decision tree -> c code.
>>
>> Something like (in main):
>> decision_tree dt;
>> while (there is another pattern)
>> {
>>   simplify *s = parse_match_and_simplify ();
>>   insert s into decision tree;
>> };
>> So parsing routines are concerned with parsing and building the AST 
>> (operand),
>> and not with the decision tree. Is that fine ?
>
> Yes, that's good.
>
>> * Representation of decision tree.
>> A decision tree would need a way to represent language constructs
>> like capture, predicate, etc. so in some ways it would be similar to AST.
>> It
>> In the patch, I have created the following heirarchy:
>> dt_operand: represents a general base "operand" of in decision tree
>> dt_expr: for representing expression. Expression contains operation
>> to be performed (e_operation).
>> dt_capture: analogous to capture.
>> dt_simplify: for representing "simplification" operand.
>> simplification consists of ifexpr and result
>> dt_head: to represent "dummy" root. Maybe a separate class is not needed.
>>
>> * Constructing decision tree from AST
>> The algorithm i have used is similar to inserting string in a trie
>> outlined here: http://en.wikipedia.org/wiki/Trie
>> The difference shall be to traverse AST depth-first rath

RE: Gimplilfy ICE in gnat.dg/array18.adb

2014-06-06 Thread BELBACHIR Selim
The intent of the patch change is clear. 

The strange thing concern the variable "A" declared this way "A : String (1 .. 
1);", used that way "A := F;"  and displayed in tree (gnu_target) as a 
RECORD_TYPE instead of an ARRAY_TYPE.


The test to know if a temporary for return value is needed only check 
ARRAY_RANGE_REF and ARRAY_TYPE and not RECORD_TYPE :

TREE_CODE (gnu_target) == ARRAY_RANGE_REF
||
(TREE_CODE (TREE_TYPE (gnu_target)) == ARRAY_TYPE
&& TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target)))  == INTEGER_CST)


I don't why in my case fixed Strings are RECORD_TYPE and not ARRAY_TYPE. 
Maybe its normal and I should extend the if condition with RECORD_TYPE :

TREE_CODE (gnu_target) == ARRAY_RANGE_REF
||
(TREE_CODE (TREE_TYPE (gnu_target)) == ARRAY_TYPE
&& TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target)))  == INTEGER_CST)
||
(TREE_CODE (TREE_TYPE (gnu_target)) == RECORD_TYPE
&& TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target)))  == INTEGER_CST)


-Message d'origine-
De : gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] De la part de Eric 
Botcazou
Envoyé : vendredi 6 juin 2014 10:20
À : BELBACHIR Selim
Cc : gcc@gcc.gnu.org
Objet : Re: Gimplilfy ICE in gnat.dg/array18.adb

> strangely my var_decl for 'a' is a record and not an array_type so the 'if'
> condition is false (and true on X86_64) I looked for somewhere in 
> my backend something that would transform an array_type into a 
> record_type but I did not find anything.

The comment should clearly state the intent of the change though and how to 
adjust it to your needs.

--
Eric Botcazou


Re: [GSoC] decision tree first steps

2014-06-06 Thread Richard Biener
On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni
 wrote:
> On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener
>  wrote:
>> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni
>>  wrote:
>>> I have few questions regarding genmatch:
>>>
>>> a) Why is 4 hard-coded here: ?
>>> in write_nary_simplifiers:
>>>  fprintf (f, "  tree captures[4] = {};\n");
>>
>> Magic number (this must be big enough for all cases ...).  Honestly
>> this should be improved (but requires another scan over the matcher IL
>> to figure out the max N used in @N).
>>
>>> b) Should we add syntax for a symbol to denote multiple operators ?
>>> For exampleim in simplify_rotate:
>>> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^  (CNT1 + CNT2 ==
>>> bitsize of type of X).
>>
>> Something to enhance the IL with, yes.  I'd say we support
>>
>> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR)
>>
>> thus,
>>
>> (define_op  op...)
>>
>>> c) Remove for parsing capture in parse_expr since we reject outermost
>>> captured expressions ?
>>
>> but parse_expr is also used for inner expressions, no?
>>
>> (plus (minus@2 @0 @1) @3)
>>
>> should still work
>>
>>> d) I am not able to follow this comment in match.pd:
>>> /* Patterns required to avoid SCCVN testsuite regressions.  */
>>>
>>> /* (x >> 31) & 1 -> (x >> 31).  Folding in fold-const is more
>>>complicated here, it does
>>>  Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
>>>  (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>>>  if the new mask might be further optimized.  */
>>> (match_and_simplify
>>>   (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep)
>>>   if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0)
>>>   @0)
>>
>> The comment is literally copied from the case I extracted the
>> (simplified) variant from fold-const.c.  See lines 11961-12056 in 
>> fold-const.c.
>> It'll be a challenge to implement the equivalent in a pattern ;)
>>
>>>
>>> Decision Tree.
>>> I have tried to come up with a prototype for decision tree (patch 
>>> attached).
>>> For simplicity, it handles patterns involving only unary operators and
>>> no predicates
>>> and returns false when the pattern fails to match (no goto to match
>>> another pattern).
>>> I meant to post it only for illustration, and I have not really paid
>>> attention to code quality (bad formatting, memory leaks, etc.).
>>>
>>> * Basic Idea
>>> A pattern consists of following parts: match, ifexpr and result.
>>> Let's call  as "simplification" operand.
>>> The common prefix between different match operands would be represented
>>> by same nodes in the decision tree.
>>>
>>> Example:
>>> (negate (bit_not @0))
>>> S1
>>>
>>> (negate (negate @0))
>>> S2
>>>
>>> S1, S2 denote simplifications for the above patterns respectively.
>>>
>>> The decision tree would look something like
>>> (that's the way it gets constructed with the patch):
>>>
>>> dummy/root
>>> |
>>>NEGATE_EXPR
>>>  /  \
>>>  BIT_NOT   NEGATE_EXPR
>>>| |
>>>  @0 @0
>>>| |
>>>  S1  S2
>>>
>>> a) The children of an internal node are number of decisions that
>>> can be taken at that node. In the above case it's 2 for outer NEGATE_EXPR.
>>> b) Simplification operand represents leaves of the decision tree
>>> c) Instead of having list of heads, I have added one dummy node,
>>> and heads become children of these dummy root node.
>>> d) Code-gen for non-simplification operands involves generating,
>>> "matching" code and for simplification operands involves generating
>>> "transform" code
>>>
>>> * Overall Flow:
>>> I guess we would build the decision tree from the AST.
>>> So the flow would be like:
>>> source -> struct simplify (match, ifexpr, result) -> decision tree -> c 
>>> code.
>>>
>>> Something like (in main):
>>> decision_tree dt;
>>> while (there is another pattern)
>>> {
>>>   simplify *s = parse_match_and_simplify ();
>>>   insert s into decision tree;
>>> };
>>> So parsing routines are concerned with parsing and building the AST 
>>> (operand),
>>> and not with the decision tree. Is that fine ?
>>
>> Yes, that's good.
>>
>>> * Representation of decision tree.
>>> A decision tree would need a way to represent language constructs
>>> like capture, predicate, etc. so in some ways it would be similar to AST.
>>> It
>>> In the patch, I have created the following heirarchy:
>>> dt_operand: represents a general base "operand" of in decision tree
>>> dt_expr: for representing expression. Expression contains operation
>>> to be performed (e_operation).
>>> dt_capture: analogous to capture.
>>> dt_simplify: for representing "simplification" operand.
>>> simplification consists of ifexpr and result
>>> dt_head: to represent "dummy" root. Maybe a separate class is not needed.
>>>
>>> * Constructing de

Re: [GSoC] decision tree first steps

2014-06-06 Thread Richard Biener
On Fri, Jun 6, 2014 at 12:02 PM, Richard Biener
 wrote:
> On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni
>  wrote:
>> On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener
>>  wrote:
>>> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni
>>>  wrote:
 I have few questions regarding genmatch:

 a) Why is 4 hard-coded here: ?
 in write_nary_simplifiers:
  fprintf (f, "  tree captures[4] = {};\n");
>>>
>>> Magic number (this must be big enough for all cases ...).  Honestly
>>> this should be improved (but requires another scan over the matcher IL
>>> to figure out the max N used in @N).
>>>
 b) Should we add syntax for a symbol to denote multiple operators ?
 For exampleim in simplify_rotate:
 (X << CNT1) OP (X >> CNT2) with OP being +, |, ^  (CNT1 + CNT2 ==
 bitsize of type of X).
>>>
>>> Something to enhance the IL with, yes.  I'd say we support
>>>
>>> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR)
>>>
>>> thus,
>>>
>>> (define_op  op...)
>>>
 c) Remove for parsing capture in parse_expr since we reject outermost
 captured expressions ?
>>>
>>> but parse_expr is also used for inner expressions, no?
>>>
>>> (plus (minus@2 @0 @1) @3)
>>>
>>> should still work
>>>
 d) I am not able to follow this comment in match.pd:
 /* Patterns required to avoid SCCVN testsuite regressions.  */

 /* (x >> 31) & 1 -> (x >> 31).  Folding in fold-const is more
complicated here, it does
  Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
  (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
  if the new mask might be further optimized.  */
 (match_and_simplify
   (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep)
   if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0)
   @0)
>>>
>>> The comment is literally copied from the case I extracted the
>>> (simplified) variant from fold-const.c.  See lines 11961-12056 in 
>>> fold-const.c.
>>> It'll be a challenge to implement the equivalent in a pattern ;)
>>>

 Decision Tree.
 I have tried to come up with a prototype for decision tree (patch 
 attached).
 For simplicity, it handles patterns involving only unary operators and
 no predicates
 and returns false when the pattern fails to match (no goto to match
 another pattern).
 I meant to post it only for illustration, and I have not really paid
 attention to code quality (bad formatting, memory leaks, etc.).

 * Basic Idea
 A pattern consists of following parts: match, ifexpr and result.
 Let's call  as "simplification" operand.
 The common prefix between different match operands would be represented
 by same nodes in the decision tree.

 Example:
 (negate (bit_not @0))
 S1

 (negate (negate @0))
 S2

 S1, S2 denote simplifications for the above patterns respectively.

 The decision tree would look something like
 (that's the way it gets constructed with the patch):

 dummy/root
 |
NEGATE_EXPR
  /  \
  BIT_NOT   NEGATE_EXPR
| |
  @0 @0
| |
  S1  S2

 a) The children of an internal node are number of decisions that
 can be taken at that node. In the above case it's 2 for outer NEGATE_EXPR.
 b) Simplification operand represents leaves of the decision tree
 c) Instead of having list of heads, I have added one dummy node,
 and heads become children of these dummy root node.
 d) Code-gen for non-simplification operands involves generating,
 "matching" code and for simplification operands involves generating
 "transform" code

 * Overall Flow:
 I guess we would build the decision tree from the AST.
 So the flow would be like:
 source -> struct simplify (match, ifexpr, result) -> decision tree -> c 
 code.

 Something like (in main):
 decision_tree dt;
 while (there is another pattern)
 {
   simplify *s = parse_match_and_simplify ();
   insert s into decision tree;
 };
 So parsing routines are concerned with parsing and building the AST 
 (operand),
 and not with the decision tree. Is that fine ?
>>>
>>> Yes, that's good.
>>>
 * Representation of decision tree.
 A decision tree would need a way to represent language constructs
 like capture, predicate, etc. so in some ways it would be similar to AST.
 It
 In the patch, I have created the following heirarchy:
 dt_operand: represents a general base "operand" of in decision tree
 dt_expr: for representing expression. Expression contains operation
 to be performed (e_operation).
 dt_capture: analogous to capture.
 dt_simplify: for 

RE: gnat.dg test: div_no_warning.adb

2014-06-06 Thread BELBACHIR Selim
Hi,

I've noticed that Constraint_error warning produced by 
gcc/testsuite/gnat.dg/div_no_warning.adb disappears if the target runtime 
contains :

"Configurable_Run_Time : constant Boolean := False;"


* x86 native gnat contains Configurable_Run_Time := False ==> no warning
* Cross gnat port without full runtime contains Configurable_Run_Time := True  
==> warning raised


When I read the description of "Configurable_Run_Time_On_Target : Boolean := 
False;" in gcc/ada/targparm.ads (which may not be complete) I don't see any 
relation with this particular constraint error raising.

Can someone explain what are the full impact of Configurable_Run_Time boolean 
in system.ads and what is the relation between it and the "constant folding for 
short-circuit control forms" tested by gcc/testsuite/gnat.dg/div_no_warning.adb 
?

Regards,

Selim






Reducing Register Pressure based on Instruction Scheduling and Register Allocator!!

2014-06-06 Thread Ajit Kumar Agarwal
Hello All:

I was looking further the aspect of reducing register pressure based on 
Register Allocation and Instruction Scheduling and the
Following observation being made on reducing register pressure based on the 
existing papers on reducing register pressure 
Based on scheduling approach.

Does the following aspect of reducing register pressure is already been 
implemented in GCC?

Minimum register usage through better Instruction scheduling

Instruction scheduling play an important role increase or decrease of register 
pressure. If the instruction scheduler is done before 
register allocation the register pressure will increase whereas if the register 
allocation is done before instruction scheduling the false
antidependence will be created. For the out of order superscalar process the 
scheduling will be done in such a way that register
pressure will be decreased. To achieve ILP the register pressure increases but 
in superscalar processor the register pressure is
important. To achieve the decrease of register pressure the instruction 
scheduling for ILP should also consider the register pressure.

Govindarajan proposed a scheme where the list scheduling is modified to have 
one of the constraint's as available register
along with latency to perform the instruction scheduling. From the data 
dependency graph of the given basic block the
chains are created based on depth first search to form different chains. And 
one register is allocated for each chain in order
to have a baton process where the def is used in next instruction and the next 
instruction def  will be used in next to next 
instruction. A given register is assigned to def and use and the same register 
is used for next def and use, which makes it
to have one register for each chain, The main challenges is to assign the same 
register for the different chains if the chains
don't overlap.

The list scheduling is modified to have a parameter as list of available colors 
and release is the release of available colors.
>From the ready queue the nodes in the DAG is removed the live ranges that 
>terminate at that node is released and as
added as available colors. This ensures the register pressure will not increase 
by the schedule which was derived from 
Graph coloring register allocator.

Register pressure sensitivity based Instruction scheduling

The superscalar OOO processor does the dynamic instruction scheduling based on 
the out of order on instruction window 
and register renaming to achieve ILP. The main challenges is to achieve a ILP 
schedule and from the ILP schedule, it generates 
the instruction sequence with sensitivity to registers is called linearization. 
The ILP schedule groups are formed. The groups are,

1. If the instruction selected is s all the instruction that are scheduled 
before s are formed with one group.

2. All the instruction that are scheduled after s are formed with another group.

The linearization algorithm uses can test and must can test. The can test if 
the selected instruction is s , then all the instruction 
that are formed with Group1 and are not scheduled but in the ready list are 
generated with W-1. If the selected instruction is 
's' is generated at index i  ,all the instruction are scheduled before s are 
generated with i+W-1 where W is the size of register
 window. The must can test if the selected instruction is generated at index i, 
then all the instruction that are scheduled 
after s are generated with index i+W-1. If the instruction scheduled after s 
are not in window the ILP won't be achieved.

The linearization algorithm checks for can test if it satisfies then the 
instruction is generated. If not satisfied it will be 
checked with must can test and if it satisfies it will be generated.

The register pressure should not be increased during linearization as there is 
a chance of register pressure getting increased 
during scheduling. To achieve this, the priority is assigned for each 
instruction that are in the ready list. If the instruction 
selected all the use of the variables in the instruction are scheduled the 
priority is assigned as 2. if an instruction is dependent
 on i and i is dependent on another instruction the Live range is too long and 
the priority is less and assigned to be 1. In the
 ready list it selects the instruction with high priority which ensures the 
register pressure doesn't increases.

There are approaches where the register pressure is high in the basic block the 
instruction scheduler will be phased after
 Register allocation, otherwise the instruction scheduler is done before the 
register allocator. This is how in gcc there is
 an instruction scheduler before register allocator and after the register 
allocator.

Integrated Code Scheduling and Register Allocation with minimum register 
pressure

The integrated code scheduling where there is a prepass and post pass 
instruction scheduler. The prepass scheduler
increase the overuse of the registers and th

Re: Reducing Register Pressure based on Instruction Scheduling and Register Allocator!!

2014-06-06 Thread Vladimir Makarov

On 2014-06-06, 10:48 AM, Ajit Kumar Agarwal wrote:

Hello All:

I was looking further the aspect of reducing register pressure based on 
Register Allocation and Instruction Scheduling and the
Following observation being made on reducing register pressure based on the 
existing papers on reducing register pressure
Based on scheduling approach.

Does the following aspect of reducing register pressure is already been 
implemented in GCC?

Minimum register usage through better Instruction scheduling

Instruction scheduling play an important role increase or decrease of register 
pressure. If the instruction scheduler is done before
register allocation the register pressure will increase whereas if the register 
allocation is done before instruction scheduling the false
antidependence will be created. For the out of order superscalar process the 
scheduling will be done in such a way that register
pressure will be decreased. To achieve ILP the register pressure increases but 
in superscalar processor the register pressure is
important. To achieve the decrease of register pressure the instruction 
scheduling for ILP should also consider the register pressure.



Scheduling for OOO is not so important.  What is important is Software 
Pipelining.  OOO processor can look through few branches but it can not 
look through many of them.  Quite opposite SP can look through all 
iterations.  Therefore as I know Intel compiler had no insn scheduler at 
all until introducing Atom but the compiler has SP for a long time.


I recently implemented live-range shrinkage (resulting in register 
pressure decrease) on the insn-scheduler base.  It has a small effect on 
x86/x86-64 (and requires a lot of additional compiler-time) and 
therefore it is not switched off by default.  I guess one reason for 
that is register-pressure before the 1st insn scheduling is already 
close to minimal.  That is my observation which could be wrong after 
more thorough investigation of big code base.



Govindarajan proposed a scheme where the list scheduling is modified to have 
one of the constraint's as available register
along with latency to perform the instruction scheduling. From the data 
dependency graph of the given basic block the
chains are created based on depth first search to form different chains. And 
one register is allocated for each chain in order
to have a baton process where the def is used in next instruction and the next 
instruction def  will be used in next to next
instruction. A given register is assigned to def and use and the same register 
is used for next def and use, which makes it
to have one register for each chain, The main challenges is to assign the same 
register for the different chains if the chains
don't overlap.

The list scheduling is modified to have a parameter as list of available colors 
and release is the release of available colors.
 From the ready queue the nodes in the DAG is removed the live ranges that 
terminate at that node is released and as
added as available colors. This ensures the register pressure will not increase 
by the schedule which was derived from
Graph coloring register allocator.



I think it is too complicated and compiler-time consuming.  The 
compile-time is a also big factor for GCC.  For example, GCC has global 
and local RA.  The result probably would be better if we use classical 
iterative global RA.  But it would be a disaster with compiler time 
perspective (besides too complicated RA).



Register pressure sensitivity based Instruction scheduling

The superscalar OOO processor does the dynamic instruction scheduling based on 
the out of order on instruction window
and register renaming to achieve ILP. The main challenges is to achieve a ILP 
schedule and from the ILP schedule, it generates
the instruction sequence with sensitivity to registers is called linearization. 
The ILP schedule groups are formed. The groups are,

1. If the instruction selected is s all the instruction that are scheduled 
before s are formed with one group.

2. All the instruction that are scheduled after s are formed with another group.

The linearization algorithm uses can test and must can test. The can test if 
the selected instruction is s , then all the instruction
that are formed with Group1 and are not scheduled but in the ready list are 
generated with W-1. If the selected instruction is
's' is generated at index i  ,all the instruction are scheduled before s are 
generated with i+W-1 where W is the size of register
  window. The must can test if the selected instruction is generated at index 
i, then all the instruction that are scheduled
after s are generated with index i+W-1. If the instruction scheduled after s 
are not in window the ILP won't be achieved.

The linearization algorithm checks for can test if it satisfies then the 
instruction is generated. If not satisfied it will be
checked with must can test and if it satisfies it will be generated.

The register pressure should

What is "fnspec function type attribute"?

2014-06-06 Thread FX
In fortran/trans-decl.c, we have a comment above the code building function 
decls, saying:

>The SPEC parameter specifies the function argument and return type
>specification according to the fnspec function type attribute.  */

I was away from GCC development for some time, so this is news to me. The 
syntax is not immediately clear, and neither a Google search nor a grep of the 
trunk’s numerous .texi files reveals any information. I’m creating new decls, 
what I am to do with it?

FX

Re: What is "fnspec function type attribute"?

2014-06-06 Thread Steve Kargl
On Fri, Jun 06, 2014 at 10:07:37PM +0200, FX wrote:
> In fortran/trans-decl.c, we have a comment above the code building function 
> decls, saying:
> 
>>The SPEC parameter specifies the function argument and return type
>>specification according to the fnspec function type attribute.  */
> 
> I was away from GCC development for some time, so this is news to me. The 
> syntax is not immediately clear, and neither a Google search nor a grep of 
> the trunk?s numerous .texi files reveals any information. I?m creating new 
> decls, what I am to do with it?
> 

That comment was introduce by richi.

2010-05-10  Richard Guenther  

* trans-decl.c (gfc_build_library_function_decl): Split out
worker to ...
(build_library_function_decl_1): ... this new function.
Set a fnspec attribute if a specification was provided.
(gfc_build_library_function_decl_with_spec): New function.
(gfc_build_intrinsic_function_decls): Annotate internal_pack
and internal_unpack.

You may want to ping him on IRC.

-- 
Steve


Re: What is "fnspec function type attribute"?

2014-06-06 Thread Marc Glisse

On Fri, 6 Jun 2014, FX wrote:


In fortran/trans-decl.c, we have a comment above the code building function 
decls, saying:


   The SPEC parameter specifies the function argument and return type
   specification according to the fnspec function type attribute.  */


I was away from GCC development for some time, so this is news to me. The 
syntax is not immediately clear, and neither a Google search nor a grep of the 
trunk’s numerous .texi files reveals any information. I’m creating new decls, 
what I am to do with it?


You can look at the 2 functions in gimple.c that use gimple_call_fnspec, 
and refer to tree-core.h for the meaning of EAF_*, etc. A string like 
"2x." means:
'2': the first letter is about the return, here we are returning the 
second argument

'x': the first argument is ignored
'.': not saying anything about the second argument.

--
Marc Glisse