On Wed, Jun 11, 2014 at 10:51 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Jun 10, 2014 at 1:57 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, Jun 10, 2014 at 1:06 PM, Prathamesh Kulkarni >> <bilbotheelffri...@gmail.com> wrote: >>> On Fri, Jun 6, 2014 at 7:18 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Fri, Jun 6, 2014 at 12:02 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni >>>>> <bilbotheelffri...@gmail.com> wrote: >>>>>> On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener >>>>>> <richard.guent...@gmail.com> wrote: >>>>>>> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni >>>>>>> <bilbotheelffri...@gmail.com> wrote: >>>>>>>> I have few questions regarding genmatch: >>>>>>>> >>>>>>>> a) Why is 4 hard-coded here: ? >>>>>>>> in write_nary_simplifiers: >>>>>>>> fprintf (f, " tree captures[4] = {};\n"); >>>>>>> >>>>>>> Magic number (this must be big enough for all cases ...). Honestly >>>>>>> this should be improved (but requires another scan over the matcher IL >>>>>>> to figure out the max N used in @N). >>>>>>> >>>>>>>> b) Should we add syntax for a symbol to denote multiple operators ? >>>>>>>> For exampleim in simplify_rotate: >>>>>>>> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^ (CNT1 + CNT2 == >>>>>>>> bitsize of type of X). >>>>>>> >>>>>>> Something to enhance the IL with, yes. I'd say we support >>>>>>> >>>>>>> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR) >>>>>>> >>>>>>> thus, >>>>>>> >>>>>>> (define_op <operator-name> op...) >>>>>>> >>>>>>>> c) Remove for parsing capture in parse_expr since we reject outermost >>>>>>>> captured expressions ? >>>>>>> >>>>>>> but parse_expr is also used for inner expressions, no? >>>>>>> >>>>>>> (plus (minus@2 @0 @1) @3) >>>>>>> >>>>>>> should still work >>>>>>> >>>>>>>> d) I am not able to follow this comment in match.pd: >>>>>>>> /* Patterns required to avoid SCCVN testsuite regressions. */ >>>>>>>> >>>>>>>> /* (x >> 31) & 1 -> (x >> 31). Folding in fold-const is more >>>>>>>> complicated here, it does >>>>>>>> Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) >>>>>>>> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) >>>>>>>> if the new mask might be further optimized. */ >>>>>>>> (match_and_simplify >>>>>>>> (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep) >>>>>>>> if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0) >>>>>>>> @0) >>>>>>> >>>>>>> The comment is literally copied from the case I extracted the >>>>>>> (simplified) variant from fold-const.c. See lines 11961-12056 in >>>>>>> fold-const.c. >>>>>>> It'll be a challenge to implement the equivalent in a pattern ;) >>>>>>> >>>>>>>> >>>>>>>> Decision Tree. >>>>>>>> I have tried to come up with a prototype for decision tree (patch >>>>>>>> attached). >>>>>>>> For simplicity, it handles patterns involving only unary operators and >>>>>>>> no predicates >>>>>>>> and returns false when the pattern fails to match (no goto to match >>>>>>>> another pattern). >>>>>>>> I meant to post it only for illustration, and I have not really paid >>>>>>>> attention to code quality (bad formatting, memory leaks, etc.). >>>>>>>> >>>>>>>> * Basic Idea >>>>>>>> A pattern consists of following parts: match, ifexpr and result. >>>>>>>> Let's call <ifexpr, result> as "simplification" operand. >>>>>>>> The common prefix between different match operands would be represented >>>>>>>> by same nodes in the decision tree. >>>>>>>> >>>>>>>> Example: >>>>>>>> (negate (bit_not @0)) >>>>>>>> S1 >>>>>>>> >>>>>>>> (negate (negate @0)) >>>>>>>> S2 >>>>>>>> >>>>>>>> S1, S2 denote simplifications for the above patterns respectively. >>>>>>>> >>>>>>>> The decision tree would look something like >>>>>>>> (that's the way it gets constructed with the patch): >>>>>>>> >>>>>>>> dummy/root >>>>>>>> | >>>>>>>> NEGATE_EXPR >>>>>>>> / \ >>>>>>>> BIT_NOT NEGATE_EXPR >>>>>>>> | | >>>>>>>> @0 @0 >>>>>>>> | | >>>>>>>> S1 S2 >>>>>>>> >>>>>>>> a) The children of an internal node are number of decisions that >>>>>>>> can be taken at that node. In the above case it's 2 for outer >>>>>>>> NEGATE_EXPR. >>>>>>>> b) Simplification operand represents leaves of the decision tree >>>>>>>> c) Instead of having list of heads, I have added one dummy node, >>>>>>>> and heads become children of these dummy root node. >>>>>>>> d) Code-gen for non-simplification operands involves generating, >>>>>>>> "matching" code and for simplification operands involves generating >>>>>>>> "transform" code >>>>>>>> >>>>>>>> * Overall Flow: >>>>>>>> I guess we would build the decision tree from the AST. >>>>>>>> So the flow would be like: >>>>>>>> source -> struct simplify (match, ifexpr, result) -> decision tree -> >>>>>>>> c code. >>>>>>>> >>>>>>>> Something like (in main): >>>>>>>> decision_tree dt; >>>>>>>> while (there is another pattern) >>>>>>>> { >>>>>>>> simplify *s = parse_match_and_simplify (); >>>>>>>> insert s into decision tree; >>>>>>>> }; >>>>>>>> So parsing routines are concerned with parsing and building the AST >>>>>>>> (operand), >>>>>>>> and not with the decision tree. Is that fine ? >>>>>>> >>>>>>> Yes, that's good. >>>>>>> >>>>>>>> * Representation of decision tree. >>>>>>>> A decision tree would need a way to represent language constructs >>>>>>>> like capture, predicate, etc. so in some ways it would be similar to >>>>>>>> AST. >>>>>>>> It >>>>>>>> In the patch, I have created the following heirarchy: >>>>>>>> dt_operand: represents a general base "operand" of in decision tree >>>>>>>> dt_expr: for representing expression. Expression contains operation >>>>>>>> to be performed (e_operation). >>>>>>>> dt_capture: analogous to capture. >>>>>>>> dt_simplify: for representing "simplification" operand. >>>>>>>> simplification consists of ifexpr and result >>>>>>>> dt_head: to represent "dummy" root. Maybe a separate class is not >>>>>>>> needed. >>>>>>>> >>>>>>>> * Constructing decision tree from AST >>>>>>>> The algorithm i have used is similar to inserting string in a trie >>>>>>>> outlined here: http://en.wikipedia.org/wiki/Trie >>>>>>>> The difference shall be to traverse AST depth-first rather than >>>>>>>> traversing the string. >>>>>>>> Apart from that I guess it would be the same (naturally find and >>>>>>>> compare operations would be different). >>>>>>>> I haven't given much thought about this. Currently, I construct >>>>>>>> decision tree for only patterns with unary operators in an >>>>>>>> ugly way by "flattening" the AST by walking it in pre-order and >>>>>>>> storing the nodes in vector in the order they are visited >>>>>>>> >>>>>>>> So to construct decision tree from the following AST: >>>>>>>> negate >>>>>>>> | >>>>>>>> bit_not >>>>>>>> | >>>>>>>> @0 >>>>>>>> >>>>>>>> AST is flattened by walking in preorder (by walk_operand_preorder) and >>>>>>>> stored >>>>>>>> as: [ expr, expr, capture ] >>>>>>>> and this vector is used to construct decision tree. >>>>>>>> I did it as a "quick and dirty" way, it has to be changed. >>>>>>>> And it won't work for patterns with n-ary operators. >>>>>>>> >>>>>>>> We should visit each node of AST in preorder, and add that node >>>>>>>> during traversal to decision tree. I am not yet clear on way of doing >>>>>>>> that. >>>>>>>> >>>>>>>> * Comparing operands >>>>>>>> How do we compare operands ? We shall need to do this while inserting >>>>>>>> in decision tree, since if the operand is already inserted we do not >>>>>>>> create >>>>>>>> a new node. >>>>>>>> In the patch (cmp_operand), I have used following rules: >>>>>>>> a) if types not equal, they are clearly not equal. >>>>>>>> b) if type is expr, compare operation->op->id >>>>>>>> c) if type is capture, compare the number. >>>>>>>> d) for predicate, compare on ident ? >>>>>>>> >>>>>>>> * Representing patterns with n-ary operators. >>>>>>>> Consider following two match operands: >>>>>>>> (plus (minus @0 @1) @1) >>>>>>>> (plus (negate @0) @1) >>>>>>>> >>>>>>>> In decision tree, it would be represented as: >>>>>>>> >>>>>>>> *********plus********* >>>>>>>> / \ / \ >>>>>>>> minus @1 negate @1 >>>>>>>> / \ | >>>>>>>> @0 @1 @0 >>>>>>>> >>>>>>>> plus has got 4 children - (minus @0 @1), @1, (negate @0), @1. >>>>>>>> However (minus @0 @1) and @1 are not separate "decisions", but >>>>>>>> children of plus within the same expression. >>>>>>>> >>>>>>>> For calculating number of decisions at an expression node: >>>>>>>> number of decisions = number of children / arity of operator. >>>>>>>> in the above case, number of decision = 4 / 2 = 2 >>>>>>>> >>>>>>>> For other cases, for instance capture node (the children would be >>>>>>>> simplification operand), >>>>>>>> number of decisions = number of children. >>>>>>> >>>>>>> Hmm. I think it should be only two children from the plus, >>>>>>> as pre-order for the first pattern is for example >>>>>>> [plus minus @0 @1 @1] so you'd have >>>>>>> >>>>>>> plus >>>>>>> / \ >>>>>>> minus negate >>>>>>> | | >>>>>>> @0 @0 >>>>>>> | | >>>>>>> @1 @1 >>>>>>> | >>>>>>> @1 >>>>>>> >>>>>>> instead? >>>>>>> >>>>>>> Note that you probably have to deal with non-matching capture >>>>>>> IDs by re-writing them on-the-fly. That is, the two >>>>>>> pre-oder traversals [plus minus @1 ...] and [plus minus @0 ...] >>>>>>> should commonize with renaming the non-matching capture >>>>>>> somehow. >>>>>> Um, could you please elaborate on that (non-matching capture ?), >>>>>> I am not sure if I understood it fully, thanks. >>>>> >>>>> I'm worried about >>>>> >>>>> (plus (minus@1 @2 @3) @2) >>>>> >>>>> vs. >>>>> >>>>> (plus (minus @1 @2) @1) >>>>> >>>>> (just made-up examples, of course). Here we'd like to >>>>> commonize the path checking for (plus (minus ...) ... but >>>>> of course we can't easily process the captures at that point >>>>> as the first has two captures (it captures the minus itself while >>>>> the 2nd doesn't) and the captures that are at the same position >>>>> use different indexes. To finally match the minus we have to >>>>> verify that in one case @2 == @2 and in the other case that >>>>> @1 == @1. >>>>> >>>>> Re-numbering capture indexes in pre-order would improve things >>>>> for the case where capture positions are in the same places but >>>>> the indexes do not match. It won't handle the above case >>>>> of course. >>> Thanks, I understand it now -:) >> >> I suppose that it would be best (in the end) to not insert the >> captures into the decision tree separately but leave them >> "combined" and then only after the decision tree is fully built >> decide on re-numbering. And treat those that are required for >> the matching process differently (those that assert that two >> ops are equal). >> >> I think we can defer the issue to later. >> >>>>> >>>>>> I have attached patch for patterns with binary operators (in principle >>>>>> should work for n-ary operators, >>>>>> but have only tested upto 2), that creates decision tree for >>>>>> patterns (excluding predicates, and multiple matching). I don't intend >>>>>> to commit this (contains many hacks!). >>>>>> >>>>>> * Constructing decision tree from AST >>>>>> We can define our decision tree to store prefixes of preorder >>>>>> traversals of diffferent AST. >>>>>> Insertion happens as follows (decision_tree::insert) >>>>>> a) Get preorder traversal of AST into vector. >>>>>> b) Insert AST nodes from vector into decision tree. >>>>>> Currently, I obtain preorder traversal into a vector. We can later >>>>>> change it to compute >>>>>> next preorder successor lazily. >>>>> >>>>> Just make the code easy to understand, optimizing it should be >>>>> secondary. >>>>> >>>>>> * Code-gen >>>>>> For n-ary operators, the patch generates code as follows: >>>>>> if (code == <expr code>) >>>>>> { >>>>>> match operand 0 >>>>>> match operand 1 >>>>>> .... >>>>>> match operand n-1 >>>>>> transform >>>>>> return true; >>>>>> } >>>>>> >>>>>> * Adding parent, level, pos fields to AST (operand). >>>>>> One change (hack), I have made to code-gen, is naming of operands that >>>>>> are >>>>>> assigned to gimple_assign_rhs in code-gen of expressions. >>>>>> I am not sure if this is really needed. >>>>>> >>>>>> in AST, we have following representation for expressions: >>>>>> expr- >>>>>> / \ >>>>>> left right >>>>>> operand operand >>>>>> >>>>>> code-gen off AST (expr::gen_gimple_match) produces code like: >>>>>> { >>>>>> tree op = gimple_assign_rhs1 (def_stmt); >>>>>> // code-gen for left operand >>>>>> } >>>>>> { >>>>>> tree op = gimple_assign_rhs2 (def_stmt); >>>>>> // code-gen for right operand >>>>>> } >>>>>> >>>>>> We can do this since, in expr::gen_gimple_match, we call >>>>>> left_operand->gen_gimple_match, >>>>>> come back to expr::gen_gimple_match (after >>>>>> left_operand->gen_gimple_match() returns), and >>>>>> then call right_operand->gen_gimple_match(). >>>>>> >>>>>> However in decision tree, the expression gets represented as follows: >>>>>> expr >>>>>> | >>>>>> left-operand >>>>>> | >>>>>> right-operand >>>>>> >>>>>> dt_expr::gen_gimple calls left_operand->gen_gimple, which calls >>>>>> right_operand->gen_gimple, >>>>>> so we return to dt_expr::gen_gimple, after code is generated for the >>>>>> whole subtree of expr. >>>>>> >>>>>> So I assign operands at the start before generating code for the subtree: >>>>>> tree op00 = gimple_assign_rhs1 (def_stmt); >>>>>> tree op10 = gimple_assign_rhs2 (def_stmt); >>>>>> <code for left-operand> >>>>>> <code for right-operand> >>>>>> each of the operands know their positions, so they know which operand >>>>>> to use (get_op_name()). >>>>>> the names are generated as: >>>>>> op<position><level>, position = index of operand in it's parent's >>>>>> expr's vector (vec<operand *> ops). >>>>>> level: level of AST, at which the operand is stored. >>>>>> >>>>>> For this I added three fields to operand (unsigned pos, and unsigned >>>>>> level, operand *parent), and accordingly >>>>>> made changes to expr::append_op. In a way this is abusing the AST. >>>>>> This probably works (works with test-cases i tried), but I don't like it >>>>>> much. >>>>> >>>>> The naming is ok, but indeed it doesn't feel very clean. Somehow this >>>>> belongs to the decision tree instead... (but then you need to lookup >>>>> the decision tree operand info from code-gen which happens off the >>>>> AST...). >>>>> >>>>>> * Order of matching. >>>>>> Currently the order of matching operands is strictly 0, 1, .. n - 1 >>>>>> for n-ary operator. >>>>>> However this might not be the best choice. >>>>>> >>>>>> For example, consider following patterns: >>>>>> (operator op1 op) S1 >>>>>> (operator op2 op) S2 >>>>>> >>>>>> Both have common operator, and 1st common operand, they differ in 0th >>>>>> operand. >>>>>> With patch, the code generated is: >>>>>> if (code == <expr-code>) >>>>>> { >>>>>> match op1 >>>>>> match op >>>>>> S1 >>>>>> >>>>>> match op2 >>>>>> match op >>>>>> S2 >>>>>> } >>>>>> >>>>>> A better ordering would be to match the 1st operand and then >>>>>> respective 0th operands of both patterns: >>>>>> >>>>>> if (code == <expr-code> >>>>>> { >>>>>> match op >>>>>> match op1 >>>>>> S1 >>>>>> match op0 >>>>>> S2 >>>>>> } >>>>>> >>>>>> This complicates insertion into decision tree. >>>>> >>>>> I'd say we leave "optimizing" the decision tree to a later stage - it's a >>>>> global >>>>> optimization problem after all (I believe you can't do optimally with just >>>>> looking at the present decision tree and the AST you want to insert). >>>>> >>>>>> * hack: only one generated gimple_match_and_simplify function >>>>>> In the patch, I generate code for gimple_match_and_simplify with 3 >>>>>> operands version (op0, op1, op2) >>>>>> and the other two (unary and binary versions), call it with NULL_TREE >>>>>> for extra operands. >>>>>> I will soon change that. >>>>>> >>>>>> * Testing patterns. >>>>>> I think I have written most of the test-cases wrongly in match-2.c >>>>>> For example this test-case doesn't match >>>>>> (i haven't checked in "fix testsuite fallout" patch, so this might not >>>>>> be true >>>>>> anymore) >>>>>> >>>>>> (match_and_simplify >>>>>> plus (minus @0 @1) @1) >>>>>> @0) >>>>>> >>>>>> for this test-case: >>>>>> /* (x - y) + y -> x */ >>>>>> int f6(int x, int y) >>>>>> { >>>>>> int t1 = x - y; >>>>>> return t1 + y; >>>>>> } >>>>>> /* { dg-final { scan-tree-dump "gimple_match_and_simplified to >>>>>> \[^\n\r\]*= x_\\d\+\\(D\\)" "forwprop1" } } */ >>>>>> >>>>>> I get following output (tree-forwprop-details): >>>>>> >>>>>> ;; Function f6 (f, funcdef_no=0, decl_uid=1744, symbol_order=0) >>>>>> >>>>>> gimple_match_and_simplified to _4 = y_2(D) + t1_3; >>>>>> f (int x, int y) >>>>>> { >>>>>> int t1; >>>>>> int _4; >>>>>> >>>>>> <bb 2>: >>>>>> t1_3 = x_1(D) - y_2(D); >>>>>> _4 = x_1(D); >>>>>> return _4; >>>>>> >>>>>> } >>>>>> >>>>>> Shouldn't it show: gimple_match_and_simplified to _4 = x_1(D) instead ? >>>>> >>>>> Yeah. I suppose you hit the issue that we don't commutate patterns, >>>>> so try with a commutated pattern inserted. >>>>> >>>>>> The issue is this test-case fails in isolation, however it passes when >>>>>> it is placed with other >>>>>> test cases that PASS and have same regex in scan-tree-dump as this >>>>>> test-case: >>>>>> scan-tree-dump "gimple_match_and_simplified to \[^\n\r\]*= >>>>>> x_\\d\+\\(D\\)" >>>>>> >>>>>> Summary: >>>>>> I think we need to (at-least) deal with following problems for decision >>>>>> tree: >>>>> >>>>> I have to leave for a while now, I'll come back later and have a detailed >>>>> look at the patch to try answer the questions below. >>>> >>>> Now for the remainder. >>>> >>>>>> a) Representation and construction of decision tree from AST - >>>>>> We do this by storing prefixes of preorder traversals of AST in >>>>>> decision tree. Do we finalize on that ? >>>> >>>> Yes. >>>> >>>>>> b) Order of operand matching >>>> >>>> I think that's the same as a) - the visiting order of the AST determines >>>> the order of operand matching. >>>> >>>>>> c) Generating patterns for built-in functions (I guess this will be >>>>>> similar to expr-gen). >>>> >>>> It should really be transparent. >>>> >>>>>> d) Multiple matching patterns >>>> >>>> The decision tree phase leaves us with a sequence of >>>> >>>> if (if-expr of pattern1) >>>> transform-pattern1 >>>> else (if-expr of pattern2) >>>> transform-pattern2 >>>> ... >>>> >>>> etc. thus that part should be basically unchanged (only the match >>>> part is driven by a decision tree). >>>> >>>>>> e) Commutative ops >>>> >>>> I'd do these as a pass over the AST of each pattern, duplicating >>>> patterns. >>>> >>>> Btw, I see you mirror the AST structure with dt_{node,operand,expr,...}. >>>> It seems to me that you can simply stick an AST operand * into >>>> dt_node and be done with just having dt_node? That is, the AST >>>> operand refered to contains all necessary information already. >>>> >>>>>> >>>>>> Pattern Enhancements: >>>>>> Some of these were already discussed, I am jotting them here: >>>>>> a) Symbol for multiple operators >>>>>> b) grouping patterns by ifexpr >>>>>> c) should we have MATCH_FAILED or FAIL to explicitly denote failure in >>>>>> manual transform (c_expr) ? >>>>>> d) making operators to be commutative. >>>> >>>> I'd say d) would be really nice to have. Either with a grammar >>>> extension to say which expression should have commutation applied >>>> to (for example: (plus! (minus @1 @2) @2) would say that the >>>> operands of plus should be commutated), or with a simple heuristic. >>>> I'd say explicitly specifying is better. >>>> >>>> a) will probably be needed to avoid pattern explosion. But similar to d) >>>> I'd do it as a pre-pass on the AST, duplicating the patterns. It's >>>> also a good question how to define its semantic. The mode-iterators >>>> that the machine description supports would suggest we do instead >>>> sth like >>>> >>>> (for OP in (plus minus mult) >>>> (match_and_simplify >>>> (OP @1 integer_zerop) >>>> ....)) >>>> >>>> c) should be easy, I'm going to try sth like that - also setting >>>> temporaries >>>> from the ifexpr and re-using them in the result like >>>> >>>> (match_and_simplify >>>> (plus @1 @2) >>>> if (INTEGRAL_TYPE_P (TREE_TYPE (@2)) >>>> && find_better_ops (&@3, &@4, @1, @2)) >>>> (plus @3 @4)) >>>> >>>> which also requires us to scan the AST for the highest capture index >>>> (a good thing anyway, given my hack to statically set the size of >>>> the captures array ;)) >>>> >>>> b) I'm not sure it's worth the trouble (apart from doing less typing >>>> in match.pd). We're probably going to have GENERIC defined >>>> when creating the generic routines and GIMPLE when creating >>>> the gimple routines so we can easily disable patterns for one or >>>> the other by doing >>>> >>>> #ifdef GIMPLE >>>> (match_and_simplify >>>> ...) >>>> #endif >>>> >>>>>> Once the decision tree is done, I can try working on these. >>>>>> Till Sunday, I will attempt to have another prototype, that covers >>>>>> most of the patterns in match.pd >>>>>> (except for cond_expr, and those requiring GENERIC support), and >>>>>> accompanying correct test-case >>>>>> for each pattern. And next-week start working on a fair patch. >>>> >>>> Sounds good. >>> I have attached a patch, that fires most of the patterns from match.pd >>> except for those that include >>> non-matching capture or require GENERIC support, and added a test-case >>> match-decision-tree.c >>> (fires around 33 patterns). >>> >>> * Representation of decision tree: >>> I clubbed dt_capture, dt_predicate, dt_expr into dt_operand, and added >>> new gen_gimple_match() function in AST >>> to generates matching code. dt_operand contains operand *op as member, >>> and merely calls op->gen_gimple_match. >>> (that's the intent anyway, in the patch, i needed to put some hacks to >>> get it working). >>> In the patch, i have kept: dt_node, dt_operand (dt_node), dt_simplify >>> (dt_node), to separate matching operand >>> from simplification (ifexpr, result). >>> >>> * Constructing decision tree from AST >>> Same as previous patch, stores prefixes of preorder traversals of AST. >>> decision_tree::print() pretty-prints the decision tree. >>> >>> I hope we won't need to change much up-to this point in the patch. >>> >>> * Code-gen >>> code gen of (operator op0 op1 ..opN) simplify takes the following form: >>> if (code == <expr code>) >>> { >>> match op0 >>> match op1 >>> ... >>> match opN >>> simplify >>> return true; >>> } >>> >>> a) Scope of temporaries (the ones that are assigned to gimple_assign_rhsX): >>> for code-gen of expressions off the AST, we create a temporary: >>> >>> { >>> tree op = gimple_assign_rhsX (); >>> generate code for X'th operand >>> } >>> >>> In the decision tree, an expression can only directly access it's 1st >>> operand (since it's preorder traversal). >>> So I decided to first create all temporaries and then generate code >>> for operands: >>> >>> { >>> tree op0 = gimple_assign_rhs1 (); >>> tree op1 = gimple_assign_rhs2 (); >>> tree opX-1 = gimple_assign_rhsX (); >>> >>> code for matching 0th operand >>> code for matching 1st operand >>> ... >>> code for matching Nth operand >>> } >>> >>> Unfortunately, this creates a problem if the operand itself is an >>> expression, >>> since it's temporaries would also be named op0, op1 ... they would >>> shadow the outer op0, op1 .. temporaries. >>> So we need distinct names for each temporary (for all temps within one >>> pattern). >>> The temporary name is created by: >>> op<position of child operand><level of tree>; >>> and each child of AST knows it's position and level, so it can >>> correctly access it's temp. >>> This requires adding members operand *parent, unsigned pos, unsigned >>> level to struct operand. >>> I have done it this way in the patch, however it doesn't feel clean, I >>> will try to see if we can change this. >>> >>> b) Short-ranged gotos >>> consider: (negate @0) S1 >>> The code is generated as (with the patch): >>> >>> if (code == PLUS) >>> { >>> if (!captures [0]) >>> { >>> captures[0] = op0; >>> goto L0; >>> } >>> else if (captures[0] == op0) >>> { >>> L0: >>> // simplification code S1 >>> return true; >>> } >>> } >>> >>> Instead of: >>> if (!captures[0]) >>> captures[0] = op0; >>> else if (captures[0] != op0) >>> goto L0; >>> >>> // Simplification code S1 >>> return true; >>> >>> L0: // match next pattern >>> >>> Generating code the former way is more easier, since location of label >>> immediately follows >>> capture code (if (!captures[0]) ...). In the latter case, I probably >>> need to store label name in the sibling's node in >>> the decision tree and then retrieve that. >>> Similarly for valueize (expr::gen_valueize). >> >> But then why not generate >> >> if (!captures[0] >> || captures[0] == op0) >> { >> captures[0] = op0; >> // simplification code S1 >> return true; >> } >> >> and go without label and goto? Or did I miss something? >> >>> c) Shared captures between all patterns: >>> In the patch all patterns share the capture array tree captures[4] = {} >>> (since all match patterns get interleaved in generated code off decision >>> tree). >>> This requires extra code to be generated to make sure captures are >>> correctly restored for another pattern. > > Btw, I see that you back-track the decision tree. That is, for > > root > |--operand: MINUS_EXPR > |----operand: PLUS_EXPR > |------operand: @0 > |--------operand: @1 > |----------operand: @0 > |------------simplify > |----------operand: @1 > |------------simplify > > you do > > if (code == MINUS_EXPR) > { > tree captures[4] = {}; > if (TREE_CODE (op0) == SSA_NAME) > { > gimple def_stmt = SSA_NAME_DEF_STMT (op0); > if (is_gimple_assign (def_stmt) && gimple_assign_rhs_code > (def_stmt) == PLUS_EXPR) > { > tree op01 = gimple_assign_rhs1 (def_stmt); > if (valueize && TREE_CODE (op01) == SSA_NAME) > op01 = valueize (op01); > else > goto L0; > if (op01) > { > L0: > tree op11 = gimple_assign_rhs2 (def_stmt); > if (valueize && TREE_CODE (op11) == SSA_NAME) > op11 = valueize (op11); > else > goto L1; > if (op11) > { > L1: > tree captures_0 = captures[0]; > if (!captures[0]) > { > captures[0] = op01; > goto L2; > } > else if (captures[0] == op01) > { > L2: > tree captures_1 = captures[1]; > if (!captures[1]) > { > captures[1] = op11; > goto L3; > } > else if (captures[1] == op11) > { > L3: > tree captures_2 = captures[0]; > if (!captures[0]) > { > captures[0] = op1; > goto L4; > } > else if (captures[0] == op1) > { > L4: > res_ops[0] = captures[1]; > *res_code = TREE_CODE (res_ops[0]); > return true; > } > captures [0] = captures_2; > tree captures_3 = captures[1]; > if (!captures[1]) > { > captures[1] = op1; > goto L5; > } > else if (captures[1] == op1) > { > L5: > res_ops[0] = captures[0]; > *res_code = TREE_CODE (res_ops[0]); > return true; > } > captures [1] = captures_3; > } > captures [1] = captures_1; > } > captures [0] = captures_0; > } > } > } > } > > but if you processed all children of a decision tree node and didn't > hit one of the simplifies (which return true) then you know nothing > else will match and you can just return false. That early out seems > to be missing completely and we fall through processing all siblings > of the parents. > > But maybe I am missing something? I see that if I make capture > IDs not matching that we'd miss to detect things then (as a > "first" capture always "matches"). > > root > |--operand: MINUS_EXPR > |----operand: PLUS_EXPR > |------operand: @0 > |--------operand: @1 > |----------operand: @0 > |------------simplify > |------operand: @1 > |--------operand: @0 > |----------operand: @0 > |------------simplify > > So I suppose we really have to distinguish "first" captures from > "matching" captures, basically not treating the "first" ones as > nodes of the decision tree and only populating the capture > array once we hit the code generation part of a pattern (well, > or the if-expr).
It's probably a good idea to pre-parse the AST to classify captures and matches differently. Given for example (plus @0 (minus@0 @1 @2)) this would say (well, if we support this), that the plus has two "same" operands (on GIMPLE the "same" valueized SSA name) and that this operand is defined by a (minus @1 @2). But with the current operator ordering in the decision tree we'd meet the no-op capture first and the match in an odd position. In the RTL machine description the matches are more explicit like (plus (match @0) (minus@0 @1 @2)) so maybe we want to auto-detect those and rewrite the AST. Just distinguish between capture-def and capture-ref, where expression captures are always -def but leafs may be -refs as well. In the decision tree builder we'd "ignore" -defs and only make -refs explicit (and the -refs would need to "materialize" the -defs just like the ifexpr or transform stage). So I'd say add a flag to struct capture telling whether it's a ref and compute that via an AST traversal (that should then error for invalid refs). In the decision tree I'd keep a vector of operand * and record a name of a temporary the operand can be accessed via (in case any of the AST operands represented by the decision tree node has a capture associated). You'd still have "empty" leaf captures that would then only record the value into a temporary. Eventually you need a back-ref to the decision tree node for each capture when you insert a new AST traversal so you can decide whether a match (a capture-ref) refers to the same decision tree node or not. Richard. > There are also opportunities to optimize the generated code, but > basic correctness first I guess. > > I suppose we have to work a bit on the capture stuff. > > Btw, you can easily play with the code generator by doing inside > the gcc build directory > > obj/gcc> build/genmatch test.pd > test.c > > with a small test.pd. I used the following for the above examples: > > (match_and_simplify > (MINUS_EXPR (PLUS_EXPR @0 @1) @0) > @1) > (match_and_simplify > (MINUS_EXPR (PLUS_EXPR @1 @0) @0) > @1) > > Richard. > >>> I will change this to have capture per pattern >>> tree captures1[4] = {}; // for pattern-1 >>> tree captures2[4] = {}; >> >> Hmm, is this the matching captures issue I mentioned? Btw, I see >> you do >> >> +void >> +walk_operand_preorder(vec<operand *>& operands, operand *op) >> +{ >> + if (op->type == operand::OP_CAPTURE || op->type == >> operand::OP_PREDICATE || op->type == operand::OP_C_EXPR) >> + { >> + operands.safe_push (op); >> + return; >> + } >> >> but that leaves captured expressions as a single operand? >> >> (plus (minus@1 @2 @3) @2) >> >> would have a decision tree >> >> plus -> minus -> @2 >> >> correct? >> >>> d) Matching multiple patterns. >>> Code for patterns with same match, but different transforms is >>> generated as follows: >>> code for match operand. >>> if (if-expr of pattern-1) >>> { >>> code for result of pattern-1 >>> return true; >>> } >>> if (if-expr of pattern-2) >>> { >>> code for result of pattern-2 >>> return true; >>> } >> >> good. >> >>> ... >>> Should we emit a warning for patterns that have same match operand but >>> no if-expr and no manual transform ? >>> >>> for eg: >>> (match_and_simplify >>> (plus (minus @0 @1) @1) >>> @0 >>> >>> (match_and_simplify >>> (plus (minus @0 @1) @1) >>> @1 // just for illustration >>> >>> Since the matching is ambiguous (same match, no if-expr, no manual >>> transform). >> >> Yeah, I suppose we should. >> >>> we are left to choose between result of pattern-1 and result of pattern-2. >>> We can emit warning and choose result of pattern-1 (first-match rule >>> as in flex). >>> >>> e) Non-matching captures: >>> Haven't thought about this yet. >>> >>> * Should we add "negative" predicates that match only if the predicate >>> fails ? >>> for eg: >>> (match_and_simplify >>> trunc_mod integer_zerop@0 !integer_zerop) >>> @0) >> >> well, there could simply be a integer_not_zerop predicate. >> >>> * Testing >>> Sorry to bring this up again, but I am still not clear what regex to >>> write in scan-tree-dump. >>> >>> Suppose we have these two patterns in match.pd: >>> /* (x + y) - y -> x */ >>> (match_and_simplify >>> (minus (plus @0 @1) @1) >>> @0) >>> >>> /* (x - y) + y -> x */ >>> (match_and_simplify >>> (plus (minus @0 @1) @1) >>> @0) >>> scan-tree-dump "gimple_match_and_simplified to \[^\n\r\]*= x_\\d\+\\(D\\)" >>> >>> I have following test-cases: >>> int f1(int x, int y) >>> { >>> int t1 = x + y; >>> return t1 - y; >>> } >>> scan-tree-dump "gimple_match_and_simplified to \[^\n\r\]*= x_\\d\+\\(D\\)" >>> >>> int f2(int x, int y) >>> { >>> int t1 = x - y; >>> return t1 + y; >>> } >>> scan-tree-dump "gimple_match_and_simplified to \[^\n\r\]*= x_\\d\+\\(D\\)" >>> >>> both the test-cases have same regex. >>> Tested in isolation f1 passes (first pattern if fired) and f2 fails >>> (second pattern doesn't fire, it does after >>> adding it's commutative variant, but that's irrelevant for this case). >>> >>> However when both test-cases are put in one file both the test cases PASS. >>> I think that's because both of them have same regex: \[^\n\r\]*= >>> x_\\d\+\\(D\\) >>> so in f1 and f2's regex both match the dump for f1 function in >>> forwprop dump file: >>> "gimple_match_and_simplified to \[^\n\r\]*= x_\\d\+\\(D\\) >>> >>> As a quick hack i rewrote f1 and f2 as: >>> int f1(int x, int y) >>> { >>> int t1 = x + y; >>> int f1_val = t1 - y; >>> return f1_val; >>> } >>> scan-tree-dump "gimple_match_and_simplified to f1_val_\\d\+ = >>> x_\\d\+\\(D\\)" >>> >>> int f2(int x, int y) >>> { >>> int t1 = x - y; >>> int f2_val = t1 + y; >>> return f2_val; >>> } >>> scan-tree-dump "gimple_match_and_simplified to f2_val_\\d\+ = >>> x_\\d\+\\(D\\)" >>> so both f1 and f2's scan-tree-dump have different regexes. >>> and f2's regex does not match dump of f1's function. >>> This matches all patterns in match-decision-tree.c however this is not >>> ideal, >>> since it does not check for matching dump across newlines. >>> Could you suggest a better way ? >> >> There isn't a good better way (the others explicitely do _not_ match against >> a newline - see the ^ in the \[\] group). Well, apart from splitting >> the testcase >> into multiple files of course. >> >> Richard. >> >>> Thanks and Regards, >>> Prathamesh >>>> >>>> Thanks, >>>> Richard. >>>> >>>>>> Thanks and Regards, >>>>>> Prathamesh >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> * Code generation. >>>>>>>> Code shall be generated by walking the decision tree. >>>>>>>> The way it's constructed, there's no difference between code generation >>>>>>>> for "matching" and code generation for "transform". For >>>>>>>> non-simplificaton >>>>>>>> operands, "matching" code is generated, and for "simplification" >>>>>>>> operands, >>>>>>>> "transform" code is generated. The tree shall be walked twice, >>>>>>>> once to generate GIMPLE code and second time for GENERIC. >>>>>>>> For simplicity, I currently return false whenever there's a fail in >>>>>>>> match, >>>>>>>> instead of trying to match the next pattern. >>>>>>>> >>>>>>>> Code-gen for capture - same as capture::gen_gimple_match. >>>>>>>> >>>>>>>> Code-gen for predicate - I haven't added support for predicate in >>>>>>>> decision tree yet, but I guess that would be the same as >>>>>>>> predicate::gen_gimple_match >>>>>>>> >>>>>>>> Code-gen for expr. >>>>>>>> There are two types of code-gen for expr. >>>>>>>> The patch generates following code: >>>>>>>> Type 1 - expr is child of root node. >>>>>>>> the only code that gets generated is (in decision_tree::gen_gimple): >>>>>>>> if (code == <expr code>) >>>>>>>> { >>>>>>>> tree captures[4] = {} >>>>>>>> <generated code for children> >>>>>>>> } >>>>>>>> This is similar to generating matching code in write_nary_simplifiers. >>>>>>>> >>>>>>>> Type 2 - expr_1 is a child of another expr_0 node. >>>>>>>> The code gets generated as follows (dt_expr::gen_gimple): >>>>>>>> { >>>>>>>> gimple def_stmt = SSA_NAME_DEF_STMT (op); >>>>>>>> if (is_gimple_assign (def_stmt) >>>>>>>> && gimple_assign_rhs_code (def_stmt) == <expr_1-code>) >>>>>>>> { >>>>>>>> tree op = gimple_assign_rhs1 (def_stmt); >>>>>>>> if (valueize && TREE_CODE (op) == SSA_NAME) >>>>>>>> { >>>>>>>> op = valueize (op); >>>>>>>> if (!op) return false; >>>>>>>> } >>>>>>>> <code-gen for children of expr_1 node> >>>>>>>> } >>>>>>>> >>>>>>>> Example: >>>>>>>> (negate (negate @0)) >>>>>>>> S1 >>>>>>>> >>>>>>>> (negate (bit_not @0)) >>>>>>>> S2 >>>>>>>> >>>>>>>> decision tree: >>>>>>>> >>>>>>>> dummy/root >>>>>>>> | >>>>>>>> NEGATE_EXPR >>>>>>>> / \ >>>>>>>> BIT_NOT NEGATE_EXPR >>>>>>>> | | >>>>>>>> @0 @0 >>>>>>>> | | >>>>>>>> S1 S2 >>>>>>>> >>>>>>>> // code-gen for NEGATE_EXPR (child of root): >>>>>>>> if (code == NEGATE_EXPR) >>>>>>>> { >>>>>>>> tree captures[4] = {}; >>>>>>>> // code gen for BIT_NOT_EXPR >>>>>>>> { >>>>>>>> gimple def_stmt = SSA_NAME_DEF_STMT (op0); >>>>>>>> if (is_gimple_assign (def_stmt) >>>>>>>> && gimple_assign_rhs_code (def_stmt) == BIT_NOT_EXPR) >>>>>>>> { >>>>>>>> tree op = gimple_assign_rhs1 (def_stmt); >>>>>>>> if (valueize && TREE_CODE (op) == SSA_NAME) >>>>>>>> { >>>>>>>> op = valueize (op); >>>>>>>> if (!op) return false; >>>>>>>> } >>>>>>>> >>>>>>>> // code-gen for @0, child of BIT_NOT_EXPR >>>>>>>> if (!captures[0]) >>>>>>>> captures[0] = op; >>>>>>>> else if (captures[0] != op) >>>>>>>> return false; >>>>>>>> >>>>>>>> // code-gen for S1, child of @0 >>>>>>>> < same as code generated by .gen_gimple_transform > >>>>>>>> return true; >>>>>>>> } >>>>>>>> // code gen for inner NEGATE_EXPR >>>>>>>> { >>>>>>>> gimple def_stmt = SSA_NAME_DEF_STMT (op0); >>>>>>>> if (is_gimple_assign (def_stmt) >>>>>>>> && gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR) >>>>>>>> <rest similar to the BIT_NOT case> >>>>>>>> } >>>>>>>> >>>>>>>> The following gets duplicated with the patch: >>>>>>>> { >>>>>>>> gimple_def_stmt = SSA_NAME_DEF_STMT (op0); >>>>>>>> if (TREE_CODE (op0) != SSA_NAME) >>>>>>>> return false; >>>>>>>> if (is_gimple_assign (def_stmt) >>>>>>>> && gimple_assign_rhs_code (def_stmt) == <expr-code>) >>>>>>>> ... >>>>>>>> } >>>>>>>> >>>>>>>> Improving code-gen for expr: >>>>>>>> "gimple def_stmt = ..." and "if (TREE_CODE (op0)" get duplicated, >>>>>>>> while they could be factored out, similar to this: >>>>>>>> >>>>>>>> { >>>>>>>> gimple def_stmt = SSA_NAME_DEF_STMT (op0); >>>>>>>> if (TREE_CODE (op0) != SSA_NAME) >>>>>>>> return false; >>>>>>>> if (!is_gimple_assign (def_stmt)) >>>>>>>> return false; >>>>>>>> if (gimple_assign_rhs_code (def_stmt) == BIT_NOT_EXPR) >>>>>>>> { >>>>>>>> // code-gen for BIT_NOT_EXPR subtree >>>>>>>> } >>>>>>>> else if (gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR) >>>>>>>> { >>>>>>>> // code-gen for NEGATE_EXPR subtree >>>>>>>> } >>>>>>>> >>>>>>>> For factoring "gimple def_stmt ..." and "if (TREE_CODE (op0) != >>>>>>>> SSA_NAME" >>>>>>>> we could have that generated at the parent of expr's node rather than >>>>>>>> at expr. However that would be incorrect for cases where not all >>>>>>>> children >>>>>>>> of a node are expressions: >>>>>>>> >>>>>>>> Example: >>>>>>>> // patterns only for illustration >>>>>>>> (negate (bit_not @0)) >>>>>>>> (negate @0) >>>>>>>> >>>>>>>> root >>>>>>>> | >>>>>>>> negate >>>>>>>> / \ >>>>>>>> bit_not @0 >>>>>>>> | >>>>>>>> @0 >>>>>>>> >>>>>>>> we cannot have the above code generated at negate, >>>>>>>> since it's not applicable negate's 2nd child (@0). >>>>>>>> >>>>>>>> This can be done by grouping together children that are expressions. >>>>>>>> However the patch does not do that. >>>>>>>> >>>>>>>> * Code-gen for simplification operand >>>>>>>> This involves code-gen for ifexpr and result of pattern. >>>>>>>> Calls gen_gimple_transform of ifexpr and result >>>>>>>> (dt_simplify::gen_gimple) >>>>>>>> So this is really code-gen off AST >>>>>>> >>>>>>> Right (modulo replacing captures with their replacements). >>>>>>> >>>>>>>> * Matching multiple patterns >>>>>>>> A pattern has following parts: match, ifexpr and result. >>>>>>>> If pattern fails in match operand, I guess we can safely return false ? >>>>>>>> We "club" together patterns that have same match operand, >>>>>>>> and use goto, if one of them fails in their (ifexpr/result) and then >>>>>>>> goto the >>>>>>>> (ifexpr/result) of the next operand. >>>>>>>> >>>>>>>> Example: >>>>>>>> /* x & 0 -> 0 */ >>>>>>>> (match_and_simplify >>>>>>>> (bit_and @0 @1) >>>>>>>> if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) && (@1 == integer_zero_node)) >>>>>>>> { integer_zero_node; }) >>>>>>>> >>>>>>>> /* x & -1 -> x */ >>>>>>>> (match_and_simplify >>>>>>>> (bit_and @0 @1) >>>>>>>> if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >>>>>>>> && (@1 == integer_minus_one_node) >>>>>>>> @0) >>>>>>>> >>>>>>>> For both patterns match is same. >>>>>>>> >>>>>>>> Decision Tree: >>>>>>>> bit_and >>>>>>>> / \ >>>>>>>> @0 @1 >>>>>>>> | >>>>>>>> S1, S2 >>>>>>> >>>>>>> I think it's worth adding a diagnostic to genmach whenever exactly >>>>>>> same matches appear. But I'd say generally we'd support this >>>>>>> by testing the ifexpr of the next pattern. >>>>>>> >>>>>>>> S1 represents <ifexpr, result> of pattern-1, and S2 represents >>>>>>>> <ifexpr, result> >>>>>>>> of pattern-2 respectively. >>>>>>>> S1, S2 would be stored as children of @1 (the last operand of n-ary >>>>>>>> operator), >>>>>>>> in dt_operand::kids vector. >>>>>>>> >>>>>>>> The code would be generated as: >>>>>>>> >>>>>>>> matching code. >>>>>>>> if (! pattern-1 ifexpr condition) >>>>>>>> goto simplify2; // next pattern with the same "match" operand. >>>>>>>> transform code for pattern 1 >>>>>>>> >>>>>>>> simplify2: >>>>>>>> if (! pattern-2 ifexpr condition) >>>>>>>> return false; // last pattern >>>>>>>> transform code for pattern 2. >>>>>>>> >>>>>>>> If matching itself fails, that is neither of the decisions get matched, >>>>>>>> I believe we can then return false as it cannot match any other >>>>>>>> pattern ? >>>>>>>> >>>>>>>> * patterns needing hacks like cond_expr or GENERIC support >>>>>>>> I haven't given thought to this yet. I suppose we can look to handle >>>>>>>> these after adding support for GENERIC. Instead of generating GENERIC >>>>>>>> matching in >>>>>>>> gimple_match_and_simplify, could we then call >>>>>>>> generic_match_and_simplify from >>>>>>>> within gimple_match_and_simplify ? >>>>>>> >>>>>>> Yes (that's what's currently done btw). >>>>>>> >>>>>>>> * Tests >>>>>>>> The patch transformed the following patterns: >>>>>>>> >>>>>>>> (match_and_simplify >>>>>>>> (negate (bit_not @0)) >>>>>>>> if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) >>>>>>>> (plus @0 { build_int_cst (TREE_TYPE (@0)), 1); })) >>>>>>>> >>>>>>>> (match_and_simplify >>>>>>>> (negate (negate @0)) >>>>>>>> @0) >>>>>>>> >>>>>>>> I have attached test-case I tried it with (negate.c) >>>>>>>> >>>>>>>> * Conclusion >>>>>>>> Does it sound reasonable ? I am going to be re-writing the >>>>>>>> decision tree from scratch, but is the basic idea fine ? Or should we >>>>>>>> take a different approach ? >>>>>>>> >>>>>>>> Thanks and Regards, >>>>>>>> Prathamesh