I'm excited to see you interested in participating in GSoC again, Tyler. Thanks for sending in this proposal! The application deadline is on Friday (Hint to all other GSoC students who haven't submitted yet!)
The project you've picked here is very big. In fact, I would suggest it is far too large for a single student to complete over the summer. I don't doubt that you've got some MAD SKILLZ, but this is a hell of a lot of work for anybody. It's especially daunting for a student who hasn't had much prior exposure to lexing with finite automatons or parsing with LALR generators. I recommend you pick one of the two. Either create a lex-like lexer which breaks a string up into a sequence of tokens, or create an LALR parser-generator which operates on a sequence of input tokens. Doing both is way too much work and is a recipe for failure. I suggest you pick one or the other (both would be equally valuable), and focus a proposal on that. --Andrew Whitworth On Tue, Apr 5, 2011 at 1:33 AM, Tyler Curtis <[email protected]> wrote: > Here's a draft of my proposal for a LALR Parser Generator and Lexical > Analyzer Generator for Parrot. I'd welcome any feedback or questions. > > Personal Information > > Name: Tyler Curtis > Email: [email protected] > IRC: tcurtis > > LALR Parser and Lexer Generators for Parrot > > Abstract > > An LALR parser generator and a lexical analyzer generator will be > developed that will convert declarative specifications into Winxed > code to parse and lexically analyzer input. > > Benefits to the Parrot VM and Open Source Community > > Parrot currently has a very convenient to use set of tools for the > development of compilers. However, the Perl 6 grammars used by PCT > have some performance disadvantages compared to LALR(1) grammars, when > they can be expressed as LALR(1) grammars. They are also less familiar > to users of conventional parsing tools, such as lex and yacc. The > creation of more conventional parsing tools will make it easier for > compiler writers to apply their prior knowledge to targetting Parrot. > They could even potentially easily adapt portions of their existing > parsers to use the developed tools. LALR(1) parser generators also > make expressing some grammars easier. It will also be provider a more > language-agnostic option for compiler development on Parrot. > > Deliverables > > A LALR parser generator executable which translates a grammar > specification into a parser in Winxed. > A lexical analyzer generator to tokenize the input to the parser. > Some example grammars. > A test suite. > > Project Details > > LALR(Look-Ahead Left-to-Right) parsers are a variation of > LR(Left-to-Right) parsers which require much smaller parsing tables. > They merge states from an LR parsing table with identical sets of > first components. They take time linear in the size of their input to > run, which is quite fast. However, they are also very complicated. > Writing them by hand is not practical. Instead, a number of LALR > parser generators have been written, which convert a declarative > specification of an LALR grammar (possibly including semantic actions > to build a parse tree or perform other actions in the process of > parsing) into a LALR parser. > > This is a rather convenient way of writing parsers. There are LALR > parser generators producing parsers in a great number of programming > languages now. Parrot has a compiler toolkit which uses Perl 6 regexes > to express a language's grammar; however, Parsing expression grammar > (PEGs, which are analogous to Perl 6 regexes) parsers require either > more memory or more time to parse input than LALR parsers. They also > do not handle left recursive grammars as well as LALR parsers. > > My project would consist of developing a lexical analyzer generator > and parser generator that generate lexical analyzers and parsers, > respectively, as Parrot classes. The parsers will present an interface > similar to HLL::Grammar objects(with parse and parsefile methods). > Instead of embedding semantic actions into the grammar specification, > the parse/parsefile methods will accept an optional "actions" named > parameter as HLL::Grammar does. The lexer generator would create > classes with an analogous interface, also using an action parameter. > > Some sources I intend to read about the theory of LALR(1) parsing: > The Dragon Book > Parsing Techniques - A Practical Guide > "Practical LR9k) Translators" by Frank DeRemer. > "Efficient Computation of LALR(1) Look-Ahead Sets" by Frank > DeRemer > and Thomas Pennello. > "On the Translation of Languages from Left to Right" by Donald > Knuth. > > Project Schedule > > The portion of the schedule prior to June 11 is intended to be less > demanding in time because I will still be in classes until then. > > April 8 - May 22: Study the theory behind LALR(1) parsing and > deterministic finite automata (for the lexical analyzer generator). > Look at the code of existing LALR parser generators. Begin designing > the lexer and grammar specification languages. > > May 23 - May 31: Write a parser for the lexical analyzer > specifications. > > May 31 - June 11: Implement generation of a lexical analyzer from a > parsed specification. > > June 12 - June 18: Write documentation and tests for the lexical > analyzer generator. > > June 19 - June 25: Further design and research for the parser > generator, including specifying the schedule of the rest of the > summerer in more detail. > > June 26 - July 2: Implement the parser for the grammar specifications. > > July 3 - July 10: Begin work on the parser generator. > > July 11 - July 15: Work on mid-term evaluation. > > July 16 - August 6: Complete implementation of the parser generator. > > August 7 - August 22: Write more tests and documentation for the > parser generator. > > August 22: Hard "pencils down" date. > > August 22 - August 26: Work on final evaluation. > > References and Likely Mentors > > I have spoken with Whiteknight, and he is interested in mentoring. > > License > > Artistic License 2.0 > > Bio > > I am a first-year student at the University of Chicago studying > Computer Science and Mathematics. I did Google Summer of Code last > year for Parrot. Development tools, particularly compilers, parsers, > and object systems, are my primary interests in Computer Science. I do > not have much previous knowledge of the theory of LALR parsing, but, > if I am accepted, I plan to spend the remaining time before GSoC > begins studying it. > > Eligibility: I am an eligible student. > > -- > Tyler Curtis > _______________________________________________ > http://lists.parrot.org/mailman/listinfo/parrot-dev > _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
