branch: master commit bf49fb6f5d9067ea41f502782df53770957f28e5 Author: Junpeng Qiu <qjpchm...@gmail.com> Commit: Junpeng Qiu <qjpchm...@gmail.com>
Upate README --- README.org | 258 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 258 insertions(+) diff --git a/README.org b/README.org index b4f8e2c..b439c25 100644 --- a/README.org +++ b/README.org @@ -1,3 +1,261 @@ #+TITLE: parsec.el A parser combinator library for Emacs Lisp similar to Haskell's Parsec library. + +* Overview + +This work is based on [[https://github.com/jwiegley/][John Wiegley]]'s [[https://github.com/jwiegley/emacs-pl][emacs-pl]]. The original [[https://github.com/jwiegley/emacs-pl][emacs-pl]] is awesome, +but I found following problems when I tried to use it: + +- It only contains a very limited set of combinators +- Some of its functions (combinators) have different behaviors than their + Haskell counterparts +- It can't show error messages when parsing fails + +So I decided to make a new library on top of it. This library, however, contains +most of the parser combinators in =Text.Parsec.Combinator=, which should be +enough in most use cases. Of course more combinators can be added if necessary! +Most of the parser combinators have the same behavior as their Haskell +counterparts. =parsec.el= also comes with a simple error handling mechanism so +that it can display an error message showing how the parser fails. + +So we can + +- use these parser combinators to write parsers easily from scratch in Emacs + Lisp like what we can do in Haskell +- port existing Haskell program using Parsec to its equivalent Emacs Lisp + program easily + +* Parsing Functions & Parser Combinators + + We compare the functions and macros defined in this library with their Haskell + counterparts, assuming you're already familiar with Haskell's Parsec. If you + don't have any experience with parser combinators, look at the docstrings of + these functions and macros and try them to see the results! They are really + easy to learn and use! + +** Basic Parsing Functions + These parsing functions are used as the basic building block for a parser. By + default, their return value is a string. + + | parsec.el | Haskell's Parsec | Usage | + |------------------------+------------------+-------------------------------------------------------| + | parsec-ch | char | parse a character | + | parsec-any-ch | anyChar | parse an arbitrary character | + | parsec-satisfy | satisfy | parse a character satisfying a predicate | + | parsec-newline | newline | parse '\n' | + | parsec-crlf | crlf | parse '\r\n' | + | parsec-eol | eol | parse newline or CRLF | + | parsec-eof, parsec-eob | eof | parse end of file | + | parsec-eol-or-eof | *N/A* | parse EOL or EOL | + | parsec-re | *N/A* | parse using a regular expression | + | parsec-one-of | oneOf | parse one of the characters | + | parsec-none-of | noneOf | parse any character other than the supplied ones | + | parsec-str | *N/A* | parse a string but consume input only when successful | + | parsec-string | string | parse a string and consume input for partial matches | + | parsec-num | *N/A* | parse a number | + | parsec-letter | letter | parse a letter | + | parsec-digit | digit | parse a digit | + + Note: + - =parsec-str= and =parsec-string= are different. =parsec-string= behaves the + same as =string= in Haskell, and =parsec-str= is more like combining + =string= and =try= in Haskell. + - Use the power of regular expressions provided by =parsec-re= and simplify the parser! + +** Parser Combinators + These combinators can be used to combine different parsers. + + | parsec.el | Haskell's Parsec | Usage | + |---------------------------+------------------+--------------------------------------------------------------| + | parsec-or | choice | try the parsers until one succeeds | + | parsec-try | try | try parser and consume no input when an error occurs | + | parsec-with-error-message | <?> (similar) | use the new error message when an error occurs | + | parsec-many | many | apply the parser zero or more times | + | parsec-many1 | many1 | apply the parser one or more times | + | parsec-many-till | manyTill | apply parser zero or more times until end succeeds | + | parsec-until | *N/A* | parse until end succeeds | + | parsec-not-followed-by | notFollowedBy | succeed when the parser fails | + | parsec-endby | endby | apply parser zero or more times, separated and ended by end | + | parsec-sepby | sepby | apply parser zero or more times, separated by sep | + | parsec-between | between | apply parser between open and close | + | parsec-count | count | apply parser n times | + | parsec-option | option | apply parser, if it fails, return opt | + | parsec-optional | *N/A* | apply parser zero or one time and return the result | + | parsec-optional* | optional | apply parser zero or one time and discard the result | + | parsec-optional-maybe | optionMaybe | apply parser zero or one time and return the result in Maybe | + + Note: + - =parsec-or= can also be used to replace =<|>=. + - =parsec-with-error-message= is slightly different from =<?>=. It will + replace the error message even when the input is consumed. + - By default, =parsec-many-till= behaves as Haskell's =manyTill=. However, + =parsec-many-till= and =parsec-until= can accept an optional argument to + specify which part(s) to be returned. You can use =:both= or =:end= as the + optional argument to change the default behavior. See the docstrings for + more information. + +** Parser Utilities + These utilities can be used together with parser combinators to build a + parser and ease the translation process if you're trying to port an existing + Haskell program. + + | parsec.el | Haskell's Parsec | Usage | + |----------------------------------+------------------+---------------------------------------------------------| + | parsec-and | do block | try all parsers and return the last result | + | parsec-return | do block | try all parsers and return the first result | + | parsec-ensure | *N/A* | quit the parsing when an error occurs | + | parsec-ensure-with-error-message | *N/A* | quit the parsing when an error occurs with new message | + | parsec-collect | sequence | try all parsers and collect the results into a list | + | parsec-collect* | *N/A* | try all parsers and collect non-nil results into a list | + | parsec-start | parse | entry point | + | parsec-parse | parse | entry point (same as parsec-start) | + | parsec-with-input | parse | perform parsers on input | + | parsec-from-maybe | fromMaybe | retrieve value from Maybe | + | parsec-maybe-p | *N/A* | is a Maybe value or not | + | parsec-query | *N/A* | change the parser's return value | + +** Variants that Return a String + + By default, the macros/functions that return multiple values will put the + values into a list. These macros/functions are: + - =parsec-many= + - =parsec-many1= + - =parsec-many-till= + - =parsec-until= + - =parsec-count= + - =parsec-collect= and =parsec-collect*= + + They all have a variant that returns a string by concatenating the results in + the list: + - =parsec-many-as-string= + - =parsec-many1-as-string= + - =parsec-many-till-as-string= + - =parsec-until-as-string= + - =parsec-collect-as-string= + + These variants accept the same arguments. The only difference is the return + value. In most cases I found myself using these variants instead of the + original versions that return a list. + +* Code Examples + Some very simple examples are given here. You can see many code examples in + the test files in this GitHub repo. + + The following code extract the "hello" from the comment: + #+BEGIN_SRC elisp + (parsec-with-input "/* hello */" + (parsec-string "/*") + (parsec-many-till-as-string (parsec-any-ch) + (parsec-try + (parsec-string "*/")))) + #+END_SRC + + THe equivalent Haskell program: + #+BEGIN_SRC haskell + import Text.Parsec + + main :: IO () + main = print $ parse p "" "/* hello */" + where + p = do string "/*" + manyTill anyChar (try (string "*/")) + #+END_SRC + + The following code returns the "aeiou" before "end": + #+BEGIN_SRC elisp + (parsec-with-input "if aeiou end" + (parsec-str "if ") + (parsec-return + (parsec-many-as-string (parsec-one-of ?a ?e ?i ?o ?u)) + (parsec-str " end"))) + #+END_SRC + +* Parser Examples + I translate some Haskell Parsec examples into Emacs Lisp using =parsec.el=. + You can see from these examples that it is very easy to write parsers using + =parsec.el=, and if you know haskell, you can see that basically I just + translate the Haskell into Emacs Lisp one by one because most of them are just + the same! + + You can find five examples under the =examples/= directory. + + Three of the examples are taken from the chapter [[http://book.realworldhaskell.org/read/using-parsec.html][Using Parsec]] in the book of + [[http://book.realworldhaskell.org/read/][Real World Haskell]]: + - =simple-csv-parser.el=: a simple csv parser with no support for quoted cells + - =full-csv-parser.el=: a full csv parser + - =url-str-parser.el=: parser parameters in URL + + =pjson.el= is a translation of Haskell's [[https://hackage.haskell.org/package/json-0.9.1/docs/src/Text-JSON-Parsec.html][json library using Parsec]]. + + =scheme.el= is a much simplified Scheme parser based on [[https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/][Write Yourself a + Scheme in 48 Hours]]. + + They're really simple but you can see how this library works! + +* Change the Return Values using =parsec-query= + Parsing has side-effects such as forwarding the current point. In the original + [[https://github.com/jwiegley/emacs-pl][emacs-pl]], you can specify some optional arguments to some parsing functions + (=pl-ch=, =pl-re= etc.) to change the return values. In =parsec.el=, these + functions don't have such a behavior. Instead, we provide a unified interface + =parsec-query=, which accepts any parser, and changes the return value of the + parser. + + You can speicify following arguments: + #+BEGIN_EXAMPLE + :beg --> return the point before applying the PARSER + :end --> return the point after applying the PARSER + :nil --> return nil + :groups N --> return Nth group for `parsec-re'." + #+END_EXAMPLE + + So instead of returning "b" as the result, the following code returns 2: + #+BEGIN_SRC elisp + (parsec-with-input "ab" + (parsec-ch ?a) + (parsec-query (parsec-ch ?b) :beg)) + #+END_SRC + + Returning a point means that you can also incorporate =parsec.el= with Emacs + Lisp functions that can operate on points/regions, such as =goto-char= and + =kill-region=. + +* Error Messages + + =parsec.el= implements a simple error handling mechanism. When an error + happens, it will show how the parser fails. + + For example, the following code fails: + #+BEGIN_SRC elisp + (parsec-with-input "aac" + (parsec-count 2 (parsec-ch ?a)) + (parsec-ch ?b)) + #+END_SRC + + The return value is: + #+BEGIN_SRC elisp + (parsec-error . "Found \"c\" -> Expected \"b\"") + #+END_SRC + + This also works when parser combinators fail: + #+BEGIN_SRC elisp + (parsec-with-input "a" + (parsec-or (parsec-ch ?b) + (parsec-ch ?c))) + #+END_SRC + + The return value is: + #+BEGIN_SRC elisp + (parsec-error . "None of the parsers succeeds: + Found \"a\" -> Expected \"c\" + Found \"a\" -> Expected \"b\"") + #+END_SRC + + If an error occurs, the return value is a cons cell that contains the error + message in its =cdr=. Compared to Haskell's Parsec, it's really simple, but at + least the error message could tell us some information. Yeah, not perfect but + usable. + +* Acknowledgement + - Daan Leijen for Haskell's Parsec + - [[https://github.com/jwiegley/][John Wiegley]] for [[https://github.com/jwiegley/emacs-pl][emacs-pl]]