Hello. The project for which I wrote the Dylan PEG parser library
required an error recovery technique short of giving up. After
wracking my brain, I came up with the following.
The technique requires grammar attributes and a new PEG operation
which I called "skip." The skip operation on a production p is exactly
like parsing p itself, except that if the parse succeeds, the
cumulative error state is not altered. Normally, the error state
returned by parsing p1 / p2 is the "why couldn't I parse more"
information for p1 and p2 (assuming p1 fails). However, parsing p1 /
skip(p2) only includes the information for p1.
Error recovery requires three specialized productions. The first
production acts as a checkpoint. After a production that you expect
might fail, the checkpoint production ensures that the subsequent
input is what you expect it to be. If it is not what you expect it to
be, recovery is required. Recovery is performed by the second
production; it employs some heuristic to skip ahead to where
productions that follow the failing production might match, i.e. the
next known syntax boundary, and returns a semantic value understood to
mean "I skipped stuff." The third production combines the two.
For example, in this production (if production is the right term):
checked-type <= type &followers-parser / skipper-parser
"Checked-type" is the third production mentioned, "followers-parser"
is the first, and "skipper-parser" is the second.
An expression may appear in several different context and be followed
by different followers in each context. This is where the grammar
attributes come in. The expression-followers and expression-recovery
productions vary according to context. For example, a statement
production might read as follows (with 'v' signifying an inherited
attribute and '+' signifying an array concatenation):
declaration <= (assignment / type-declaration) SEMICOLON
v followers = [ SEMICOLON, EOF ]
v skipper = til-semicolon
assignment <= variable EQUAL expression OF checked-type
type-declaration <= DEFTYPE name checked-type ARRAY?
v followers = [ ARRAY ] + followers
til-semicolon <= (!SEMICOLON any-char)* SEMICOLON
Here, the context in which the type is parsed is a declaration, but
specifically, either an assignment or a type declaration. The
"followers" and "skipper" attributes contain an array of productions
or a single production, respectively. In an assignment, "followers" is
either a semicolon or the end of file, but in a type declaration,
"followers" is the word "ARRAY" or a semicolon or the end of file. In
either case, if the expected input is not present, the parser skips to
the next semicolon.
The "followers-parser" and "skipper-parser" are user-defined functions
that refer to the corresponding attribute. "Followers-parser" succeeds
if any element of "followers" matches and "skipper-parser" performs
the "skip" operation on the "skipper" production and returns a special
value signifying that something was skipped.
The main issue with this technique is specific to my PEG parser
library, which is that the library does not allow attributes to be set
for a specific part of the grammar rule. That is, it would be nice to
have something like this:
assignment <= {v followers=[EQUAL]} checked-variable EQUAL {v
followers=[OF]} checked-expression OF {v followers=followers} checked-
type
Comments?
_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg