Re: [PEG] and-predicate with slightly different behavior?

Ondřej Bílka Sat, 11 Dec 2010 01:47:21 -0800

exp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
This looks as step backwards
How it is different from old yacc
exp <- digit '*' digit EOF -> {$$ = (* $1 $3)}


My approach is different. I made tree structure explicit in rule by binding as
in example

tree = '(' number:value tree?:left tree?:rigth ')' result(Tree)

result(Tree) creates Tree object and sets value,left,rigth fields to 
corresponding values

On Fri, Dec 10, 2010 at 01:47:59PM +1300, Peter Cashin wrote:
>    Mathias:
>    A good plan to separate out the:
>    - grammar operators
>    - parse tree building
>    - parser action code
>    I agree with you, but in my case I want to have the grammar language
>    totally independent of the implementation programming language.
>    So the grammar has no parser action code, although this can be added later
>    in a particular programming language implementation. Parboiled has much
>    closer integration with the programming language, which has advantages,
>    but I really want grammar specifications that are totally independent.
>    So now for the grammar operators, and parse tree building annotations. For
>    many years I kept these separate: tree building annotations in the rule
>    head (ie they annotate the rule name) or definition syntax (= for interior
>    nodes, : for leaf nodes or terminals). The right hand side of the rule had
>    the grammar expression body withs grammar operations only. One more head
>    notation allows pruning the tree.
>    This works fine, and maybe I should have stuck to that separation, but I
>    introduced the `x prefix notation because I found that it was an advantage
>    to be able to see the children nodes that would be generated by looking at
>    the parent rule alone, and not having to refer to the children rule
>    definitions to see how they are annotated.
>    I have found that about half of many grammars turn out to be leaf nodes,
>    so the ":" rule definition notation has a huge payoff in tree size and
>    performance.
>    Because of this the `x rule is not vital, but over time I found it was
>    convenient, and a good trade-off for me.
>    Cheers,
>    Peter.
> 
>    On Fri, Dec 10, 2010 at 11:44 AM, Mathias <[1][email protected]>
>    wrote:
> 
>      Over the course of developing parboiled (PEG parsing engine for Java and
>      Scala using internals DSLs for grammar specification) I found that it's
>      best to clearly separate the following three things:
>      - grammar operators
>      - parse tree building
>      - parser action code
> 
>      The operators, together with the rules they are applied to, define the
>      language your grammar recognizes. Nothing more, nothing less.
>      Whether your engine builds a parse tree for input of that language or
>      whether it doesn't should likely not be specified at the operator level,
>      but it some other way.
>      In "parboiled for Java" operators and rules are modeled as method calls
>      whereas parse tree building is controlled via annotations on those
>      methods. You can selectively enable or disable the creation of parse
>      tree nodes per grammar rule, which allows you to tweak parse tree
>      building exactly to your needs.
> 
>      The interface between the parsing engine and custom parser action code
>      is the thing that has changed the most over the course of parboileds
>      life cycle so far.
>      Initially parser action code had to access the parse tree nodes in order
>      to get to matched input. Creating a custom object structure during the
>      parsing run (e.g. an AST) was done by decorating the parse tree. So the
>      parse tree was the "work bench" of the parsing process.
>      Over time this heavy centering around the parse tree turned out to have
>      two problems:
> 
>      1. Low Performance.
>      Always having to create a parse tree even when all you care about is the
>      AST is just wasteful. Especially since the parse tree can contain a huge
>      number of nodes for larger inputs (sometimes more nodes than input
>      characters).
> 
>      2. Less room for automatic optimizations.
>      The structure of the parse tree is dictated by the structure of the
>      grammar rules. If your parser action code is built under the assumption
>      that the parse tree has a given structure there is little leeway for the
>      parsing engine to apply automatic rule optimizations. Decoupling the
>      parser action code from the parse tree opens up the possibility to apply
>      all kinds of automatic grammar tweaking before running the parser the
>      first time. The engine might decide to completely change the rule
>      structure, as long as all changes do not change the recognized language
>      and are transparent to the parser action code.
> 
>      Currently parboiled implements the interface between the parsing engine
>      and the action code in the following way:
>      1. Actions can appear anywhere in a rule.
>      2. Actions can access the matched input text of the sub rule immediately
>      preceding the action but not of any other rule (so there is no sub rule
>      labeling required).
>      3. For working with custom objects (e.g. AST nodes) the engine provides
>      a "Value Stack", which is a simple stack structure that serves as a fast
>      work bench for a parsing run. Actions can push objects onto this stack,
>      pop them off, swap them around, and so on.
> 
>      This solution completely decouples the parse tree from everything else.
>      You can enable or disable parse tree building without any effect on the
>      rest of the parser. There is no need for addressing sub rules in action
>      expressions and given a somewhat efficient value stack implementation
>      the whole thing is quite fast. Additionally, in "parboiled for Scala",
>      the action code with its manipulations of the value stack can be
>      statically type-checked at compile time, which is a huge plus.
> 
>      In case you are interested in more details or broader explanations, the
>      parboiled documentation is quite complete.
> 
>      Cheers,
>      Mathias
> 
>      ---
>      [2][email protected]
>      [3]http://www.parboiled.org
>      On 09.12.2010, at 21:01, Alan Post wrote:
> 
>      > I'm working on my PEG parser, in particular the interface between
>      > the parse tree and the code one can attach to productions that
>      > are executed on a successful parse.
>      >
>      > I've arranged for the two predicate operations, & and !, to not add
>      > any output to the parse tree. �That means that the following
>      > production:
>      >
>      > �rule <- &a !b "c"
>      >
>      > Produces the same parse tree as:
>      >
>      > �rule <- "c"
>      >
>      > Internally, this means that I recognize that the sequence operator
>      > (which contains the productions '&a', '!b', and '"c"' in this
>      > example) is being called with predicates in every position but one,
>      > and rather than returning a list containing that single element,
>      > I return just the single element.
>      >
>      > As I've been doing this, I've found that I want a new operator similar
>      > to '&'. �'&' matches the production it is attached to, but it does not
>      > advance the position of the input buffer.
>      >
>      > I'd like an operator that matches the production it is attached to,
>      > advances the input buffer, but doesn't add anything to the parse
>      > tree.
>      >
>      > Here's an example:
>      >
>      > �mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
>      >
>      > the mulexp production is a sequence of four other rules, but only
>      > two of them are needed by the associated code. �It would be nice
>      > if I could write the code rule like it is above, rather than say
>      > this:
>      >
>      > �(lambda (x op y EOF) (* x y))
>      >
>      > Having to account for all the rules in the sequence, but really
>      > only caring about two of them. �Here is the example rewritten
>      > with '^' expressing "match the rule, advance the input, but don't
>      > modify the parse tree":
>      >
>      > �mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))}
>      >
>      > Before I go inventing syntax for this use case, will you tell me if
>      > this is already being done with other parsers? �Have any of you had
>      > this problem and already solved it, and if so, what approach did you
>      > take?
>      >
>      > -Alan
>      > --
>      > .i ko djuno fi le do sevzi
>      >
>      > _______________________________________________
>      > PEG mailing list
>      > [4][email protected]
>      > [5]https://lists.csail.mit.edu/mailman/listinfo/peg
> 
>      _______________________________________________
>      PEG mailing list
>      [6][email protected]
>      [7]https://lists.csail.mit.edu/mailman/listinfo/peg
> 
> References
> 
>    Visible links
>    1. mailto:[email protected]
>    2. mailto:[email protected]
>    3. http://www.parboiled.org/
>    4. mailto:[email protected]
>    5. https://lists.csail.mit.edu/mailman/listinfo/peg
>    6. mailto:[email protected]
>    7. https://lists.csail.mit.edu/mailman/listinfo/peg

> _______________________________________________
> PEG mailing list
> [email protected]
> https://lists.csail.mit.edu/mailman/listinfo/peg


-- 

Plasma conduit breach

_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] and-predicate with slightly different behavior?

Reply via email to