branch: elpa/clojure-ts-mode
commit ad5af674ec109aa61d9f7aacef87aa4cc98d2b07
Author: dannyfreeman <danny@dfreeman.email>
Commit: dannyfreeman <danny@dfreeman.email>

    Start documenting the design
    
    Still some work to do. Ideally this will help potential contributors get
    started.
---
 doc/design.md | 138 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)

diff --git a/doc/design.md b/doc/design.md
new file mode 100644
index 0000000000..00790f6fcf
--- /dev/null
+++ b/doc/design.md
@@ -0,0 +1,138 @@
+# Design of clojure-ts-mode
+
+This document is still a work in progress.
+
+Clojure-ts-mode is based on the tree-sitter-clojure grammar.
+
+If you want to contribute to clojure-ts-mode, it is recommend that you 
familiarize yourself with how tree-sitter works.
+The official documentation is a great place to start: 
https://tree-sitter.github.io/tree-sitter/
+
+In short:
+Tree-sitter is a tool that generates parser libraries for programming 
languages, and provides an API for interacting with those parsers.
+The generated parsers can create abstract syntax trees from source code text.
+The nodes of those trees are defined by the grammar.
+Emacs can use these generated parsers to provide major modes with things like 
syntax highlighting, indentation, navigation, structural editing, and many 
other things.
+
+## Important Definitions
+
+- Parser: A dynamic library compiled from C source code that is generated by 
the tree-sitter tool. A parser reads source code for a particular language and 
produces a syntax tree.
+- Grammar: The rules that define how a parser will create the syntax tree for 
a language. The grammar is written in javascript. Tree-sitter tooling consumes 
the grammar as input and outputs C source (which can be compiled into a parser)
+- Syntax Tree: a tree data structure comprised of syntax nodes that represents 
some source code text.
+- Syntax Node: A node in a syntax tree. It represents some subset of a source 
code text. Each node has a type, defined by the grammar used to produce it. 
Some common node types represent language constructs like strings, integers, 
operators.
+
+## tree-sitter-clojure
+
+Clojure-ts-mode uses the tree-sitter-clojure grammar, which can be found at 
https://github.com/sogaiu/tree-sitter-clojure
+The clojure-ts-mode grammar provides very basic, low level nodes that try to 
match clojure's very light syntax.
+
+There are nodes to represent:
+- Symbols (sym_lit)
+    - Contain (sym_ns) and (sym_name) nodes
+- Keywords (kwd_lit)
+    - Contain (kwd_ns) and (kw_name) nodes
+- Strings (str_lit)
+- Chars (char_lit)
+- Nil (nil_lit)
+- Booleans (bool_lit)
+- Numbers (num_lit)
+- Comments (comment, dis_expr)
+    - dis_expr is the `#_` discard expression
+- Lists (list_list)
+- Vectors (vec_lit)
+- Maps (map_lit)
+- Sets (set_lit)
+
+There are also nodes to represent metadata, which appear on `meta:` child 
fields of the nodes the metadata is defined on.
+For example a simple vector with metadata defined on it like so
+
+```clojure
+^:has-metadata [1]
+```
+
+will produce a parse tree like so
+
+```
+(vec_lit
+  meta: (meta_lit 
+          value: (kwd_lit name: (kwd_name)))
+  value: (num_lit))
+```
+
+The best place to learn more about the tree-sitter-clojure grammar is to read 
the [grammar.js file from the tree-sitter-clojure 
repository](https://github.com/sogaiu/tree-sitter-clojure/blob/master/grammar.js
 "grammar.js").
+
+### Clojure Syntax, not Clojure Semantics
+
+An important observation that anyone familiar with popular tree-sitter 
grammars may have picked up on is that there are no nodes representing things 
like functions, macros, types, and other semantic concepts.
+Representing the semantics of Clojure in a tree-sitter grammar is much more 
difficult than traditional languages that do not use macros heavily like 
Clojure and other lisps.
+To understand what an expression represents in Clojure source code requires 
macro-expansion of the source code. 
+Macro-expansion requires a runtime, and tree-sitter does not have access to a 
Clojure runtime and will never have access to a Clojure runtime.
+Additionally tree-sitter never looks back on what it has parsed, only forward, 
considering what is directly ahead of it. So even if it could identify a macro 
like `myspecialdef` it would forget about it as soon as it moved passed the 
declaring `defmacro` node.
+Another way to think about this: tree-sitter is designed to be fast and 
good-enough for tooling to implement syntax highlighting, indentation, and 
other editing conveniences. It is not meant for interpreting and execution.
+
+#### Example 1: False Negative Function Classification
+
+Consider the following macro
+
+```clojure
+(defmacro defn2 [sym args & body]
+  `(defn ~sym ~args ~@body))
+
+(defn2 dog [] "bark")
+```
+
+
+This macro lets the caller define a function, but a hypothetical 
tree-sitter-clojure semantic grammar might just see a function call where a 
variable dog is passed as an argument.
+How should tree-sitter know that `dog` should be highlighted like function? It 
would have to evaluate the `defn2` macro to understand that.
+
+#### Example 2: False Positive Function Classification
+
+```clojure
+(defmacro no-defn [body]
+  (if (= 'defn (first body))
+    (rest body)
+    body))
+(defn foo [& rest] 1)
+(no-defn (defn foo [] 2))
+```
+
+evaluates to 1, and the following
+
+```
+(foo)
+```
+
+evaluates to 1.
+
+How is tree-sitter supposed to understand that `(defn foo [] 2)` of the 
expression `(no-defn (defn foo [] 2))` is not a function declaration? It would 
have to evaluate the `no-defn` macro.
+
+#### Syntax and Semantics: Conclusions
+
+While these examples are silly, they illustrate the issue with encoding 
semantics into the tree-sitter-clojure grammar.
+If we tried to make the grammar understand functions, macros, types, and other 
semantic elements it will end up giving false positives and negatives in the 
parse tree.
+While this is an inevitability for simple static analysis of Clojure code, 
tree-sitter-clojure chooses to avoid making these kinds of mistakes 
all-together.
+Instead, it is up to the emacs-lisp code and other consumers of the 
tree-sitter-clojure grammar to make decisions about the semantic meaning of 
clojure-code.
+
+There are some pros and cons of this decision for tree-sitter-clojure to only 
consider syntax and not semantics.
+Some of the (non-exhaustive) upsides:
+    - No semantic false positives or negatives in the parse tree.
+    - Simple grammar to maintain with less nodes and rules
+    - Small, fast grammar (with a small set of grammar rules, 
tree-sitter-clojure has one of the smallest binaries and fastest grammars in 
widespread use)
+    - Stability: the grammar changes infrequently and is very stable for 
downstream consumers
+
+And the primary downside: Semantics must be (re)-implemented in tools that 
consume the grammar. While this results in more work for tooling authors, the 
tools that use the grammar are easier to change than the grammar itself. The 
inaccurate nature of statically interpreting Clojure semantics means that not 
every decision made for the grammar would meet the needs of the various grammar 
consumers. This would lead to bugs and feature requests. Nearly all changes to 
the grammar will result i [...]
+
+#### Further Reading
+
+https://github.com/sogaiu/tree-sitter-clojure/blob/master/doc/scope.md
+
+## Syntax Highlighting
+
+TODO
+
+## Indentation
+
+TODO
+
+## Semantic Interpretation in clojure-ts-mode
+
+TODO: demonstrate how clojure-ts-mode creates semantic meaning from a given 
syntax tree. Show examples of how new semantic meaning can be added (with 
highlighting, indentation, etc).

Reply via email to