Currently the spec, as described in sweet.g, expects some form of
"preprocessor".
Perhaps we can actually concretely define such a preprocessor for the
core parser?
Here's my proposal:
(preprocessor
port
neoteric-read ; so we can use Scheme read while experimenting with it
(lambda (get-token)
(let ((token (get-token)))
(do-whatever-with token))))
The token returned by (get-token) is one of the following forms:
INITIAL_INDENT_WITH_BANG
INITIAL_INDENT_NO_BANG
INDENT
DEDENT
BADDENT
SUBLIST
GROUP_SPLICE
RESTART_BEGIN
RESTART_END
EOF
hspace
comment_eol
scomment
(n-expr ,<datum>)
The logic of preprocessor's get-token function is this:
We keep track of a stack of indentations (indent-stack). We also keep
track of whether we just recently consumed a newline (line-start), and
a numerical number of pending dedents (pending-dedents). Initially,
indent-stack is '(), line-start is #t, and pending-dedents is 0.
get-token promises to only use peek-char and read-char (i.e. one
character lookahead).
If pending-dedents is non-zero, decrement it and return 'DEDENT.
If eof-object?, check if indent-stack is '(). If it is, return 'EOF.
Otherwise, count the number of items in the indent-stack, set
pending-indents to the length minus 1, and return 'DEDENT.
When a ; or newline is found, consume until newline, set line-start to
#t, and return 'comment_eol.
If at line-start, and (not (null? indent-stack)), clear line-start to
#f, consume indent characters (space, tab, !) and then:
- if first non-indent character is ";" or newline, consume until
newline, set line-start to #t, and return 'comment_eol.
- otherwise, update the indent-stack as needed:
- - If the current indent is incompatible with the top-most indent,
return BADDENT.
- - If the current indent is greater than the top-most indent, push it
on the indent-stack and return 'INDENT.
- - If the current indent is the same as the top-most indent, recurse
into (get-token) [or, if the BNF uses SAME, return 'SAME].
- - If the current indent is less than the top-most indent, pop off
indent-stack items (counting the number of pop-offs) until the stack
top is equal or less than the current indent - if stack-top is less,
we got a bad indent and return BADDENT, if stack-top is equal, record
the number of pop-offs - 1 into pending-dedents and return DEDENT; an
empty indent-stack is equivalent to "" for this handling.
(the expectation is that BADDENT will always be an error)
If at line-start, and the first character is an indent character
(space, tab, !), clear line-start to #t and consume indent characters.
This is the "initial-indent" case - there is no indent-stack yet -
so return 'INITIAL_INDENT_WITH_BANG or INITIAL_INDENT_NO_BANG as
appropriate.
(the expectation is that INITIAL_INDENT_* will stop token processing,
i.e. get-token will not be called any more; in the
INITIAL_INDENT_NO_BANG it's expected that the caller will use the
ordinary Scheme read on the port)
If the character is a horizontal space, consume it and return 'hspace.
If the character is a "{" or "(" or "[", then return `(n-expr
,(neoteric-read port))
[TODO: #-handling.]
Otherwise, call neoteric-read. If it returns $, return 'SUBLIST, \\
-> 'GROUP_SPLICE. For <* and *>, we may need to have an
indent-stack-stack, and additional state for the extra tokens that
RESTART_END requires. If it's not one of the special symbols, return
`(n-expr ,<datum>).
--
Assumptions:
1. neoteric-read will not consume any whitespace or newlines after
it. In particular, if neoteric-read is given "foo bar", it will
return 'foo and leave the port at " bar", including the space before
bar.
2. BADDENT and INITIAL_INDENT_* will not cause get-token to get called again.
--
What you think?
Sincerely,
AmkG
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Readable-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/readable-discuss