RE: Gimplilfy ICE in gnat.dg/array18.adb
Hi, The patch modifies gcc/ada/gcc-interface/trans.c that way : /* First, create the temporary for the return value if we need it: for a - variable-sized return type if there is no target or if this is slice, - because the gimplifier doesn't support these cases; or for a function - with copy-in/copy-out parameters if there is no target, because we'll - need to preserve the return value before copying back the parameters. - This must be done before we push a new binding level around the call - as we will pop it before copying the return value. */ + variable-sized return type if there is no target and this is not an + object declaration, or else there is a target and it is a slice or an + array with fixed size, as the gimplifier doesn't handle these cases; + otherwise for a function with copy-in/copy-out parameters if there is + no target, because we need to preserve the return value before copying + back the parameters. This must be done before we push a binding level + around the call as we will pop it before copying the return value. */ if (function_call && ((TREE_CODE (TYPE_SIZE (gnu_result_type)) != INTEGER_CST - && (!gnu_target || TREE_CODE (gnu_target) == ARRAY_RANGE_REF)) + && ((!gnu_target + && Nkind (Parent (gnat_node)) != N_Object_Declaration) + || (gnu_target + && (TREE_CODE (gnu_target) == ARRAY_RANGE_REF + || (TREE_CODE (TREE_TYPE (gnu_target)) == ARRAY_TYPE + && TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target))) + == INTEGER_CST) || (!gnu_target && TYPE_CI_CO_LIST (gnu_subprog_type So I gdb to compared the tree 'gnu_target' and 'gnu_result_type' in my port and in x86_64 port. X86_64 gnu_target: unit size align 8 symtab 0 alias set -1 canonical type 0x2ad27c79a348 precision 8 min max context pointer_to_this > nonaliased-component QI size unit size align 8 symtab 0 alias set 0 canonical type 0x2ad27c8cae70 domain DI size unit size align 64 symtab 0 alias set -1 canonical type 0x2ad27c8cadc8 precision 64 min max index type chain > context chain > QI file array18.adb line 7 col 4 size unit size align 8 context chain > My gnu_target: unit size align 32 symtab 0 alias set -1 canonical type 0x2add91ace930 fields decl_3 QI file array18.adb line 7 col 4 size unit size align 8 offset_align 64 offset bit offset context > context Ada size chain > SI file array18.adb line 7 col 4 size unit size align 32 context chain > strangely my var_decl for 'a' is a record and not an array_type so the 'if' condition is false (and true on X86_64) I looked for somewhere in my backend something that would transform an array_type into a record_type but I did not find anything. X86_64 gnu_result_type: unit size align 8 symtab 0 alias set -1 canonical type 0x2add91a1e348 precision 8 min max context pointer_to_this > sizes-gimplified visited nonaliased-component BLK size unit size align 64 symtab 0 alias set -1 canonical type 0x2add91a1e0a8 precision 64 min max > readonly visited arg 0 readonly visited arg 0 > arg 1 > unit size unit size align 32 symtab 0 alias set -1 canonical type 0x2add91a1e000 precision 32 min max > readonly visited arg 0 readonly public visited unsigned ignored external SI file array18_pkg.ads line 5 col 30 size unit size align 32 context >> align 8 symtab 0 alias set 0 canonical type 0x2add91acef18 domain sizes-gimplified visited SI size unit size align 32 symtab 0 alias set -1 canonical type 0x2add91acee70 precision 32 min max index type sizes-gimplified public visited unsigned SI size unit size align 32 symtab 0 alias set 0 canonical type 0x2add91acebd0 precision 32 min max context RM size RM min RM max chain > chain > context chain > My gnu_result_type: unit size align 8 symtab 0 alias set -1 canonical type 0x2ad27c79a348 precision 8 min max context pointer_to_this > sizes-gimplified visited nonaliased-component BLK size unit size align 64 symtab 0 alias set -1 canonical type 0x2ad27c79a0a8 precision 64 min max > readonly visited arg 0 readonly visited arg 0 readonly visited arg 0 readonly visited arg 0 >>> arg 1 > unit size align 8 symtab 0 alias set 0 c
Re: Gimplilfy ICE in gnat.dg/array18.adb
> strangely my var_decl for 'a' is a record and not an array_type so the 'if' > condition is false (and true on X86_64) I looked for somewhere in my > backend something that would transform an array_type into a record_type but > I did not find anything. The comment should clearly state the intent of the change though and how to adjust it to your needs. -- Eric Botcazou
Re: [GSoC] decision tree first steps
On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener wrote: > On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni > wrote: >> I have few questions regarding genmatch: >> >> a) Why is 4 hard-coded here: ? >> in write_nary_simplifiers: >> fprintf (f, " tree captures[4] = {};\n"); > > Magic number (this must be big enough for all cases ...). Honestly > this should be improved (but requires another scan over the matcher IL > to figure out the max N used in @N). > >> b) Should we add syntax for a symbol to denote multiple operators ? >> For exampleim in simplify_rotate: >> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^ (CNT1 + CNT2 == >> bitsize of type of X). > > Something to enhance the IL with, yes. I'd say we support > > (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR) > > thus, > > (define_op op...) > >> c) Remove for parsing capture in parse_expr since we reject outermost >> captured expressions ? > > but parse_expr is also used for inner expressions, no? > > (plus (minus@2 @0 @1) @3) > > should still work > >> d) I am not able to follow this comment in match.pd: >> /* Patterns required to avoid SCCVN testsuite regressions. */ >> >> /* (x >> 31) & 1 -> (x >> 31). Folding in fold-const is more >>complicated here, it does >> Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) >> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) >> if the new mask might be further optimized. */ >> (match_and_simplify >> (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep) >> if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0) >> @0) > > The comment is literally copied from the case I extracted the > (simplified) variant from fold-const.c. See lines 11961-12056 in > fold-const.c. > It'll be a challenge to implement the equivalent in a pattern ;) > >> >> Decision Tree. >> I have tried to come up with a prototype for decision tree (patch >> attached). >> For simplicity, it handles patterns involving only unary operators and >> no predicates >> and returns false when the pattern fails to match (no goto to match >> another pattern). >> I meant to post it only for illustration, and I have not really paid >> attention to code quality (bad formatting, memory leaks, etc.). >> >> * Basic Idea >> A pattern consists of following parts: match, ifexpr and result. >> Let's call as "simplification" operand. >> The common prefix between different match operands would be represented >> by same nodes in the decision tree. >> >> Example: >> (negate (bit_not @0)) >> S1 >> >> (negate (negate @0)) >> S2 >> >> S1, S2 denote simplifications for the above patterns respectively. >> >> The decision tree would look something like >> (that's the way it gets constructed with the patch): >> >> dummy/root >> | >>NEGATE_EXPR >> / \ >> BIT_NOT NEGATE_EXPR >>| | >> @0 @0 >>| | >> S1 S2 >> >> a) The children of an internal node are number of decisions that >> can be taken at that node. In the above case it's 2 for outer NEGATE_EXPR. >> b) Simplification operand represents leaves of the decision tree >> c) Instead of having list of heads, I have added one dummy node, >> and heads become children of these dummy root node. >> d) Code-gen for non-simplification operands involves generating, >> "matching" code and for simplification operands involves generating >> "transform" code >> >> * Overall Flow: >> I guess we would build the decision tree from the AST. >> So the flow would be like: >> source -> struct simplify (match, ifexpr, result) -> decision tree -> c code. >> >> Something like (in main): >> decision_tree dt; >> while (there is another pattern) >> { >> simplify *s = parse_match_and_simplify (); >> insert s into decision tree; >> }; >> So parsing routines are concerned with parsing and building the AST >> (operand), >> and not with the decision tree. Is that fine ? > > Yes, that's good. > >> * Representation of decision tree. >> A decision tree would need a way to represent language constructs >> like capture, predicate, etc. so in some ways it would be similar to AST. >> It >> In the patch, I have created the following heirarchy: >> dt_operand: represents a general base "operand" of in decision tree >> dt_expr: for representing expression. Expression contains operation >> to be performed (e_operation). >> dt_capture: analogous to capture. >> dt_simplify: for representing "simplification" operand. >> simplification consists of ifexpr and result >> dt_head: to represent "dummy" root. Maybe a separate class is not needed. >> >> * Constructing decision tree from AST >> The algorithm i have used is similar to inserting string in a trie >> outlined here: http://en.wikipedia.org/wiki/Trie >> The difference shall be to traverse AST depth-first rath
RE: Gimplilfy ICE in gnat.dg/array18.adb
The intent of the patch change is clear. The strange thing concern the variable "A" declared this way "A : String (1 .. 1);", used that way "A := F;" and displayed in tree (gnu_target) as a RECORD_TYPE instead of an ARRAY_TYPE. The test to know if a temporary for return value is needed only check ARRAY_RANGE_REF and ARRAY_TYPE and not RECORD_TYPE : TREE_CODE (gnu_target) == ARRAY_RANGE_REF || (TREE_CODE (TREE_TYPE (gnu_target)) == ARRAY_TYPE && TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target))) == INTEGER_CST) I don't why in my case fixed Strings are RECORD_TYPE and not ARRAY_TYPE. Maybe its normal and I should extend the if condition with RECORD_TYPE : TREE_CODE (gnu_target) == ARRAY_RANGE_REF || (TREE_CODE (TREE_TYPE (gnu_target)) == ARRAY_TYPE && TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target))) == INTEGER_CST) || (TREE_CODE (TREE_TYPE (gnu_target)) == RECORD_TYPE && TREE_CODE (TYPE_SIZE (TREE_TYPE (gnu_target))) == INTEGER_CST) -Message d'origine- De : gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] De la part de Eric Botcazou Envoyé : vendredi 6 juin 2014 10:20 À : BELBACHIR Selim Cc : gcc@gcc.gnu.org Objet : Re: Gimplilfy ICE in gnat.dg/array18.adb > strangely my var_decl for 'a' is a record and not an array_type so the 'if' > condition is false (and true on X86_64) I looked for somewhere in > my backend something that would transform an array_type into a > record_type but I did not find anything. The comment should clearly state the intent of the change though and how to adjust it to your needs. -- Eric Botcazou
Re: [GSoC] decision tree first steps
On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni wrote: > On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener > wrote: >> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni >> wrote: >>> I have few questions regarding genmatch: >>> >>> a) Why is 4 hard-coded here: ? >>> in write_nary_simplifiers: >>> fprintf (f, " tree captures[4] = {};\n"); >> >> Magic number (this must be big enough for all cases ...). Honestly >> this should be improved (but requires another scan over the matcher IL >> to figure out the max N used in @N). >> >>> b) Should we add syntax for a symbol to denote multiple operators ? >>> For exampleim in simplify_rotate: >>> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^ (CNT1 + CNT2 == >>> bitsize of type of X). >> >> Something to enhance the IL with, yes. I'd say we support >> >> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR) >> >> thus, >> >> (define_op op...) >> >>> c) Remove for parsing capture in parse_expr since we reject outermost >>> captured expressions ? >> >> but parse_expr is also used for inner expressions, no? >> >> (plus (minus@2 @0 @1) @3) >> >> should still work >> >>> d) I am not able to follow this comment in match.pd: >>> /* Patterns required to avoid SCCVN testsuite regressions. */ >>> >>> /* (x >> 31) & 1 -> (x >> 31). Folding in fold-const is more >>>complicated here, it does >>> Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) >>> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) >>> if the new mask might be further optimized. */ >>> (match_and_simplify >>> (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep) >>> if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0) >>> @0) >> >> The comment is literally copied from the case I extracted the >> (simplified) variant from fold-const.c. See lines 11961-12056 in >> fold-const.c. >> It'll be a challenge to implement the equivalent in a pattern ;) >> >>> >>> Decision Tree. >>> I have tried to come up with a prototype for decision tree (patch >>> attached). >>> For simplicity, it handles patterns involving only unary operators and >>> no predicates >>> and returns false when the pattern fails to match (no goto to match >>> another pattern). >>> I meant to post it only for illustration, and I have not really paid >>> attention to code quality (bad formatting, memory leaks, etc.). >>> >>> * Basic Idea >>> A pattern consists of following parts: match, ifexpr and result. >>> Let's call as "simplification" operand. >>> The common prefix between different match operands would be represented >>> by same nodes in the decision tree. >>> >>> Example: >>> (negate (bit_not @0)) >>> S1 >>> >>> (negate (negate @0)) >>> S2 >>> >>> S1, S2 denote simplifications for the above patterns respectively. >>> >>> The decision tree would look something like >>> (that's the way it gets constructed with the patch): >>> >>> dummy/root >>> | >>>NEGATE_EXPR >>> / \ >>> BIT_NOT NEGATE_EXPR >>>| | >>> @0 @0 >>>| | >>> S1 S2 >>> >>> a) The children of an internal node are number of decisions that >>> can be taken at that node. In the above case it's 2 for outer NEGATE_EXPR. >>> b) Simplification operand represents leaves of the decision tree >>> c) Instead of having list of heads, I have added one dummy node, >>> and heads become children of these dummy root node. >>> d) Code-gen for non-simplification operands involves generating, >>> "matching" code and for simplification operands involves generating >>> "transform" code >>> >>> * Overall Flow: >>> I guess we would build the decision tree from the AST. >>> So the flow would be like: >>> source -> struct simplify (match, ifexpr, result) -> decision tree -> c >>> code. >>> >>> Something like (in main): >>> decision_tree dt; >>> while (there is another pattern) >>> { >>> simplify *s = parse_match_and_simplify (); >>> insert s into decision tree; >>> }; >>> So parsing routines are concerned with parsing and building the AST >>> (operand), >>> and not with the decision tree. Is that fine ? >> >> Yes, that's good. >> >>> * Representation of decision tree. >>> A decision tree would need a way to represent language constructs >>> like capture, predicate, etc. so in some ways it would be similar to AST. >>> It >>> In the patch, I have created the following heirarchy: >>> dt_operand: represents a general base "operand" of in decision tree >>> dt_expr: for representing expression. Expression contains operation >>> to be performed (e_operation). >>> dt_capture: analogous to capture. >>> dt_simplify: for representing "simplification" operand. >>> simplification consists of ifexpr and result >>> dt_head: to represent "dummy" root. Maybe a separate class is not needed. >>> >>> * Constructing de
Re: [GSoC] decision tree first steps
On Fri, Jun 6, 2014 at 12:02 PM, Richard Biener wrote: > On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni > wrote: >> On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener >> wrote: >>> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni >>> wrote: I have few questions regarding genmatch: a) Why is 4 hard-coded here: ? in write_nary_simplifiers: fprintf (f, " tree captures[4] = {};\n"); >>> >>> Magic number (this must be big enough for all cases ...). Honestly >>> this should be improved (but requires another scan over the matcher IL >>> to figure out the max N used in @N). >>> b) Should we add syntax for a symbol to denote multiple operators ? For exampleim in simplify_rotate: (X << CNT1) OP (X >> CNT2) with OP being +, |, ^ (CNT1 + CNT2 == bitsize of type of X). >>> >>> Something to enhance the IL with, yes. I'd say we support >>> >>> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR) >>> >>> thus, >>> >>> (define_op op...) >>> c) Remove for parsing capture in parse_expr since we reject outermost captured expressions ? >>> >>> but parse_expr is also used for inner expressions, no? >>> >>> (plus (minus@2 @0 @1) @3) >>> >>> should still work >>> d) I am not able to follow this comment in match.pd: /* Patterns required to avoid SCCVN testsuite regressions. */ /* (x >> 31) & 1 -> (x >> 31). Folding in fold-const is more complicated here, it does Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) if the new mask might be further optimized. */ (match_and_simplify (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep) if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0) @0) >>> >>> The comment is literally copied from the case I extracted the >>> (simplified) variant from fold-const.c. See lines 11961-12056 in >>> fold-const.c. >>> It'll be a challenge to implement the equivalent in a pattern ;) >>> Decision Tree. I have tried to come up with a prototype for decision tree (patch attached). For simplicity, it handles patterns involving only unary operators and no predicates and returns false when the pattern fails to match (no goto to match another pattern). I meant to post it only for illustration, and I have not really paid attention to code quality (bad formatting, memory leaks, etc.). * Basic Idea A pattern consists of following parts: match, ifexpr and result. Let's call as "simplification" operand. The common prefix between different match operands would be represented by same nodes in the decision tree. Example: (negate (bit_not @0)) S1 (negate (negate @0)) S2 S1, S2 denote simplifications for the above patterns respectively. The decision tree would look something like (that's the way it gets constructed with the patch): dummy/root | NEGATE_EXPR / \ BIT_NOT NEGATE_EXPR | | @0 @0 | | S1 S2 a) The children of an internal node are number of decisions that can be taken at that node. In the above case it's 2 for outer NEGATE_EXPR. b) Simplification operand represents leaves of the decision tree c) Instead of having list of heads, I have added one dummy node, and heads become children of these dummy root node. d) Code-gen for non-simplification operands involves generating, "matching" code and for simplification operands involves generating "transform" code * Overall Flow: I guess we would build the decision tree from the AST. So the flow would be like: source -> struct simplify (match, ifexpr, result) -> decision tree -> c code. Something like (in main): decision_tree dt; while (there is another pattern) { simplify *s = parse_match_and_simplify (); insert s into decision tree; }; So parsing routines are concerned with parsing and building the AST (operand), and not with the decision tree. Is that fine ? >>> >>> Yes, that's good. >>> * Representation of decision tree. A decision tree would need a way to represent language constructs like capture, predicate, etc. so in some ways it would be similar to AST. It In the patch, I have created the following heirarchy: dt_operand: represents a general base "operand" of in decision tree dt_expr: for representing expression. Expression contains operation to be performed (e_operation). dt_capture: analogous to capture. dt_simplify: for
RE: gnat.dg test: div_no_warning.adb
Hi, I've noticed that Constraint_error warning produced by gcc/testsuite/gnat.dg/div_no_warning.adb disappears if the target runtime contains : "Configurable_Run_Time : constant Boolean := False;" * x86 native gnat contains Configurable_Run_Time := False ==> no warning * Cross gnat port without full runtime contains Configurable_Run_Time := True ==> warning raised When I read the description of "Configurable_Run_Time_On_Target : Boolean := False;" in gcc/ada/targparm.ads (which may not be complete) I don't see any relation with this particular constraint error raising. Can someone explain what are the full impact of Configurable_Run_Time boolean in system.ads and what is the relation between it and the "constant folding for short-circuit control forms" tested by gcc/testsuite/gnat.dg/div_no_warning.adb ? Regards, Selim
Reducing Register Pressure based on Instruction Scheduling and Register Allocator!!
Hello All: I was looking further the aspect of reducing register pressure based on Register Allocation and Instruction Scheduling and the Following observation being made on reducing register pressure based on the existing papers on reducing register pressure Based on scheduling approach. Does the following aspect of reducing register pressure is already been implemented in GCC? Minimum register usage through better Instruction scheduling Instruction scheduling play an important role increase or decrease of register pressure. If the instruction scheduler is done before register allocation the register pressure will increase whereas if the register allocation is done before instruction scheduling the false antidependence will be created. For the out of order superscalar process the scheduling will be done in such a way that register pressure will be decreased. To achieve ILP the register pressure increases but in superscalar processor the register pressure is important. To achieve the decrease of register pressure the instruction scheduling for ILP should also consider the register pressure. Govindarajan proposed a scheme where the list scheduling is modified to have one of the constraint's as available register along with latency to perform the instruction scheduling. From the data dependency graph of the given basic block the chains are created based on depth first search to form different chains. And one register is allocated for each chain in order to have a baton process where the def is used in next instruction and the next instruction def will be used in next to next instruction. A given register is assigned to def and use and the same register is used for next def and use, which makes it to have one register for each chain, The main challenges is to assign the same register for the different chains if the chains don't overlap. The list scheduling is modified to have a parameter as list of available colors and release is the release of available colors. >From the ready queue the nodes in the DAG is removed the live ranges that >terminate at that node is released and as added as available colors. This ensures the register pressure will not increase by the schedule which was derived from Graph coloring register allocator. Register pressure sensitivity based Instruction scheduling The superscalar OOO processor does the dynamic instruction scheduling based on the out of order on instruction window and register renaming to achieve ILP. The main challenges is to achieve a ILP schedule and from the ILP schedule, it generates the instruction sequence with sensitivity to registers is called linearization. The ILP schedule groups are formed. The groups are, 1. If the instruction selected is s all the instruction that are scheduled before s are formed with one group. 2. All the instruction that are scheduled after s are formed with another group. The linearization algorithm uses can test and must can test. The can test if the selected instruction is s , then all the instruction that are formed with Group1 and are not scheduled but in the ready list are generated with W-1. If the selected instruction is 's' is generated at index i ,all the instruction are scheduled before s are generated with i+W-1 where W is the size of register window. The must can test if the selected instruction is generated at index i, then all the instruction that are scheduled after s are generated with index i+W-1. If the instruction scheduled after s are not in window the ILP won't be achieved. The linearization algorithm checks for can test if it satisfies then the instruction is generated. If not satisfied it will be checked with must can test and if it satisfies it will be generated. The register pressure should not be increased during linearization as there is a chance of register pressure getting increased during scheduling. To achieve this, the priority is assigned for each instruction that are in the ready list. If the instruction selected all the use of the variables in the instruction are scheduled the priority is assigned as 2. if an instruction is dependent on i and i is dependent on another instruction the Live range is too long and the priority is less and assigned to be 1. In the ready list it selects the instruction with high priority which ensures the register pressure doesn't increases. There are approaches where the register pressure is high in the basic block the instruction scheduler will be phased after Register allocation, otherwise the instruction scheduler is done before the register allocator. This is how in gcc there is an instruction scheduler before register allocator and after the register allocator. Integrated Code Scheduling and Register Allocation with minimum register pressure The integrated code scheduling where there is a prepass and post pass instruction scheduler. The prepass scheduler increase the overuse of the registers and th
Re: Reducing Register Pressure based on Instruction Scheduling and Register Allocator!!
On 2014-06-06, 10:48 AM, Ajit Kumar Agarwal wrote: Hello All: I was looking further the aspect of reducing register pressure based on Register Allocation and Instruction Scheduling and the Following observation being made on reducing register pressure based on the existing papers on reducing register pressure Based on scheduling approach. Does the following aspect of reducing register pressure is already been implemented in GCC? Minimum register usage through better Instruction scheduling Instruction scheduling play an important role increase or decrease of register pressure. If the instruction scheduler is done before register allocation the register pressure will increase whereas if the register allocation is done before instruction scheduling the false antidependence will be created. For the out of order superscalar process the scheduling will be done in such a way that register pressure will be decreased. To achieve ILP the register pressure increases but in superscalar processor the register pressure is important. To achieve the decrease of register pressure the instruction scheduling for ILP should also consider the register pressure. Scheduling for OOO is not so important. What is important is Software Pipelining. OOO processor can look through few branches but it can not look through many of them. Quite opposite SP can look through all iterations. Therefore as I know Intel compiler had no insn scheduler at all until introducing Atom but the compiler has SP for a long time. I recently implemented live-range shrinkage (resulting in register pressure decrease) on the insn-scheduler base. It has a small effect on x86/x86-64 (and requires a lot of additional compiler-time) and therefore it is not switched off by default. I guess one reason for that is register-pressure before the 1st insn scheduling is already close to minimal. That is my observation which could be wrong after more thorough investigation of big code base. Govindarajan proposed a scheme where the list scheduling is modified to have one of the constraint's as available register along with latency to perform the instruction scheduling. From the data dependency graph of the given basic block the chains are created based on depth first search to form different chains. And one register is allocated for each chain in order to have a baton process where the def is used in next instruction and the next instruction def will be used in next to next instruction. A given register is assigned to def and use and the same register is used for next def and use, which makes it to have one register for each chain, The main challenges is to assign the same register for the different chains if the chains don't overlap. The list scheduling is modified to have a parameter as list of available colors and release is the release of available colors. From the ready queue the nodes in the DAG is removed the live ranges that terminate at that node is released and as added as available colors. This ensures the register pressure will not increase by the schedule which was derived from Graph coloring register allocator. I think it is too complicated and compiler-time consuming. The compile-time is a also big factor for GCC. For example, GCC has global and local RA. The result probably would be better if we use classical iterative global RA. But it would be a disaster with compiler time perspective (besides too complicated RA). Register pressure sensitivity based Instruction scheduling The superscalar OOO processor does the dynamic instruction scheduling based on the out of order on instruction window and register renaming to achieve ILP. The main challenges is to achieve a ILP schedule and from the ILP schedule, it generates the instruction sequence with sensitivity to registers is called linearization. The ILP schedule groups are formed. The groups are, 1. If the instruction selected is s all the instruction that are scheduled before s are formed with one group. 2. All the instruction that are scheduled after s are formed with another group. The linearization algorithm uses can test and must can test. The can test if the selected instruction is s , then all the instruction that are formed with Group1 and are not scheduled but in the ready list are generated with W-1. If the selected instruction is 's' is generated at index i ,all the instruction are scheduled before s are generated with i+W-1 where W is the size of register window. The must can test if the selected instruction is generated at index i, then all the instruction that are scheduled after s are generated with index i+W-1. If the instruction scheduled after s are not in window the ILP won't be achieved. The linearization algorithm checks for can test if it satisfies then the instruction is generated. If not satisfied it will be checked with must can test and if it satisfies it will be generated. The register pressure should
What is "fnspec function type attribute"?
In fortran/trans-decl.c, we have a comment above the code building function decls, saying: >The SPEC parameter specifies the function argument and return type >specification according to the fnspec function type attribute. */ I was away from GCC development for some time, so this is news to me. The syntax is not immediately clear, and neither a Google search nor a grep of the trunk’s numerous .texi files reveals any information. I’m creating new decls, what I am to do with it? FX
Re: What is "fnspec function type attribute"?
On Fri, Jun 06, 2014 at 10:07:37PM +0200, FX wrote: > In fortran/trans-decl.c, we have a comment above the code building function > decls, saying: > >>The SPEC parameter specifies the function argument and return type >>specification according to the fnspec function type attribute. */ > > I was away from GCC development for some time, so this is news to me. The > syntax is not immediately clear, and neither a Google search nor a grep of > the trunk?s numerous .texi files reveals any information. I?m creating new > decls, what I am to do with it? > That comment was introduce by richi. 2010-05-10 Richard Guenther * trans-decl.c (gfc_build_library_function_decl): Split out worker to ... (build_library_function_decl_1): ... this new function. Set a fnspec attribute if a specification was provided. (gfc_build_library_function_decl_with_spec): New function. (gfc_build_intrinsic_function_decls): Annotate internal_pack and internal_unpack. You may want to ping him on IRC. -- Steve
Re: What is "fnspec function type attribute"?
On Fri, 6 Jun 2014, FX wrote: In fortran/trans-decl.c, we have a comment above the code building function decls, saying: The SPEC parameter specifies the function argument and return type specification according to the fnspec function type attribute. */ I was away from GCC development for some time, so this is news to me. The syntax is not immediately clear, and neither a Google search nor a grep of the trunk’s numerous .texi files reveals any information. I’m creating new decls, what I am to do with it? You can look at the 2 functions in gimple.c that use gimple_call_fnspec, and refer to tree-core.h for the meaning of EAF_*, etc. A string like "2x." means: '2': the first letter is about the return, here we are returning the second argument 'x': the first argument is ignored '.': not saying anything about the second argument. -- Marc Glisse