branch: elpa/clojure-ts-mode
commit 2875629cbb4cfa1b289c69345d615b6c492ef6a6
Author: Bozhidar Batsov <bozhi...@batsov.dev>
Commit: Bozhidar Batsov <bozhi...@batsov.dev>

    Improve a bit the design doc
---
 doc/design.md | 78 ++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 45 insertions(+), 33 deletions(-)

diff --git a/doc/design.md b/doc/design.md
index 0d2df9c550..8afeaffee7 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -4,47 +4,50 @@ This document is still a work in progress.
 
 Clojure-ts-mode is based on the tree-sitter-clojure grammar.
 
-If you want to contribute to clojure-ts-mode, it is recommend that you 
familiarize yourself with how tree-sitter works.
-The official documentation is a great place to start: 
https://tree-sitter.github.io/tree-sitter/
-These guides for Emacs tree-sitter development are also useful
-- https://casouri.github.io/note/2023/tree-sitter-starter-guide/index.html
+If you want to contribute to clojure-ts-mode, it is recommend that you 
familiarize yourself with how Tree-sitter works.
+The official documentation is a great place to start: 
<https://tree-sitter.github.io/tree-sitter/>
+These guides for Emacs Tree-sitter development are also useful
+
+- <https://casouri.github.io/note/2023/tree-sitter-starter-guide/index.html>
 - `Developing major modes with tree-sitter` (From the Emacs 29+ Manual, `C-h 
i`, search for `tree-sitter`)
 
 In short:
-Tree-sitter is a tool that generates parser libraries for programming 
languages, and provides an API for interacting with those parsers.
-The generated parsers can create syntax trees from source code text.
-The nodes of those trees are defined by the grammar.
-Emacs can use these generated parsers to provide major modes with things like 
syntax highlighting, indentation, navigation, structural editing, and many 
other things.
+
+- Tree-sitter is a tool that generates parser libraries for programming 
languages, and provides an API for interacting with those parsers.
+- The generated parsers can create syntax trees from source code text.
+- The nodes of those trees are defined by the grammar.
+- Emacs can use these generated parsers to provide major modes with things 
like syntax highlighting, indentation, navigation, structural editing, and many 
other things.
 
 ## Important Definitions
 
-- Parser: A dynamic library compiled from C source code that is generated by 
the tree-sitter tool. A parser reads source code for a particular language and 
produces a syntax tree.
-- Grammar: The rules that define how a parser will create the syntax tree for 
a language. The grammar is written in javascript. Tree-sitter tooling consumes 
the grammar as input and outputs C source (which can be compiled into a parser)
+- Parser: A dynamic library compiled from C source code that is generated by 
the Tree-sitter tool. A parser reads source code for a particular language and 
produces a syntax tree.
+- Grammar: The rules that define how a parser will create the syntax tree for 
a language. The grammar is written in JavaScript. Tree-sitter tooling consumes 
the grammar as input and outputs C source (which can be compiled into a parser)
 - Syntax Tree: a tree data structure comprised of syntax nodes that represents 
some source code text.
-    - Concrete Syntax Tree: Syntax trees that contain nodes for every token in 
the source code, including things likes brackets and parentheses. Tree-sitter 
creates Concrete Syntax Trees.
-    - Abstract Syntax Tree: A syntax tree with less important details removed. 
An AST may contain a node for a list, but not individual parentheses. 
Tree-sitter does not create Abstract Syntax Trees.
+  - Concrete Syntax Tree: Syntax trees that contain nodes for every token in 
the source code, including things likes brackets and parentheses. Tree-sitter 
creates Concrete Syntax Trees.
+  - Abstract Syntax Tree: A syntax tree with less important details removed. 
An AST may contain a node for a list, but not individual parentheses. 
Tree-sitter does not create Abstract Syntax Trees.
 - Syntax Node: A node in a syntax tree. It represents some subset of a source 
code text. Each node has a type, defined by the grammar used to produce it. 
Some common node types represent language constructs like strings, integers, 
operators.
-    - Named Syntax Node: A node that can be identified by a name given to it 
in the tree-sitter Grammar. In clojure-ts-mode, `list_lit` is a named node for 
lists.
-    - Anonymous Syntax Node: A node that cannot be identified by a name. In 
the Grammar these are identified by simple strings, not by complex Grammar 
rules. In clojure-ts-mode, `"("` and `")"` are anonymous nodes.
+  - Named Syntax Node: A node that can be identified by a name given to it in 
the Tree-sitter Grammar. In clojure-ts-mode, `list_lit` is a named node for 
lists.
+  - Anonymous Syntax Node: A node that cannot be identified by a name. In the 
Grammar these are identified by simple strings, not by complex Grammar rules. 
In clojure-ts-mode, `"("` and `")"` are anonymous nodes.
 - Font Locking: What Emacs calls "Syntax Highlighting".
 
 ## tree-sitter-clojure
 
-Clojure-ts-mode uses the tree-sitter-clojure grammar, which can be found at 
https://github.com/sogaiu/tree-sitter-clojure
-The clojure-ts-mode grammar provides very basic, low level nodes that try to 
match clojure's very light syntax.
+Clojure-ts-mode uses the tree-sitter-clojure grammar, which can be found at 
<https://github.com/sogaiu/tree-sitter-clojure>
+The clojure-ts-mode grammar provides very basic, low level nodes that try to 
match Clojure's very light syntax.
 
 There are nodes to represent:
+
 - Symbols (sym_lit)
-    - Contain (sym_ns) and (sym_name) nodes
+  - Contain (sym_ns) and (sym_name) nodes
 - Keywords (kwd_lit)
-    - Contain (kwd_ns) and (kw_name) nodes
+  - Contain (kwd_ns) and (kw_name) nodes
 - Strings (str_lit)
 - Chars (char_lit)
 - Nil (nil_lit)
 - Booleans (bool_lit)
 - Numbers (num_lit)
 - Comments (comment, dis_expr)
-    - dis_expr is the `#_` discard expression
+  - dis_expr is the `#_` discard expression
 - Lists (list_list)
 - Vectors (vec_lit)
 - Maps (map_lit)
@@ -61,7 +64,7 @@ will produce a parse tree like so
 
 ```
 (vec_lit
-  meta: (meta_lit 
+  meta: (meta_lit
           value: (kwd_lit name: (kwd_name)))
   value: (num_lit))
 ```
@@ -70,12 +73,12 @@ The best place to learn more about the tree-sitter-clojure 
grammar is to read th
 
 ### Clojure Syntax, not Clojure Semantics
 
-An important observation that anyone familiar with popular tree-sitter 
grammars may have picked up on is that there are no nodes representing things 
like functions, macros, types, and other semantic concepts.
-Representing the semantics of Clojure in a tree-sitter grammar is much more 
difficult than traditional languages that do not use macros heavily like 
Clojure and other lisps.
-To understand what an expression represents in Clojure source code requires 
macro-expansion of the source code. 
-Macro-expansion requires a runtime, and tree-sitter does not have access to a 
Clojure runtime and will never have access to a Clojure runtime.
-Additionally tree-sitter never looks back on what it has parsed, only forward, 
considering what is directly ahead of it. So even if it could identify a macro 
like `myspecialdef` it would forget about it as soon as it moved passed the 
declaring `defmacro` node.
-Another way to think about this: tree-sitter is designed to be fast and 
good-enough for tooling to implement syntax highlighting, indentation, and 
other editing conveniences. It is not meant for interpreting and execution.
+An important observation that anyone familiar with popular Tree-sitter 
grammars may have picked up on is that there are no nodes representing things 
like functions, macros, types, and other semantic concepts.
+Representing the semantics of Clojure in a Tree-sitter grammar is much more 
difficult than traditional languages that do not use macros heavily like 
Clojure and other lisps.
+To understand what an expression represents in Clojure source code requires 
macro-expansion of the source code.
+Macro-expansion requires a runtime, and Tree-sitter does not have access to a 
Clojure runtime and will never have access to a Clojure runtime.
+Additionally Tree-sitter never looks back on what it has parsed, only forward, 
considering what is directly ahead of it. So even if it could identify a macro 
like `myspecialdef` it would forget about it as soon as it moved passed the 
declaring `defmacro` node.
+Another way to think about this: Tree-sitter is designed to be fast and 
good-enough for tooling to implement syntax highlighting, indentation, and 
other editing conveniences. It is not meant for interpreting and execution.
 
 #### Example 1: False Negative Function Classification
 
@@ -88,9 +91,8 @@ Consider the following macro
 (defn2 dog [] "bark")
 ```
 
-
 This macro lets the caller define a function, but a hypothetical 
tree-sitter-clojure semantic grammar might just see a function call where a 
variable dog is passed as an argument.
-How should tree-sitter know that `dog` should be highlighted like function? It 
would have to evaluate the `defn2` macro to understand that.
+How should Tree-sitter know that `dog` should be highlighted like function? It 
would have to evaluate the `defn2` macro to understand that.
 
 #### Example 2: False Positive Function Classification
 
@@ -105,13 +107,13 @@ How should tree-sitter know that `dog` should be 
highlighted like function? It w
 
 evaluates to 1, and the following
 
-```
+```clojure
 (foo)
 ```
 
 evaluates to 1.
 
-How is tree-sitter supposed to understand that `(defn foo [] 2)` of the 
expression `(no-defn (defn foo [] 2))` is not a function declaration? It would 
have to evaluate the `no-defn` macro.
+How is Tree-sitter supposed to understand that `(defn foo [] 2)` of the 
expression `(no-defn (defn foo [] 2))` is not a function declaration? It would 
have to evaluate the `no-defn` macro.
 
 #### Syntax and Semantics: Conclusions
 
@@ -122,17 +124,27 @@ Instead, it is up to the emacs-lisp code and other 
consumers of the tree-sitter-
 
 There are some pros and cons of this decision for tree-sitter-clojure to only 
consider syntax and not semantics.
 Some of the (non-exhaustive) upsides:
+
 - No semantic false positives or negatives in the parse tree.
 - Simple grammar to maintain with less nodes and rules
 - Small, fast grammar (with a small set of grammar rules, tree-sitter-clojure 
has one of the smallest binaries and fastest grammars in widespread use)
 - Stability: the grammar changes infrequently and is very stable for 
downstream consumers
 
-And the primary downside: Semantics must be (re)-implemented in tools that 
consume the grammar. While this results in more work for tooling authors, the 
tools that use the grammar are easier to change than the grammar itself. The 
inaccurate nature of statically interpreting Clojure semantics means that not 
every decision made for the grammar would meet the needs of the various grammar 
consumers. This would lead to bugs and feature requests. Nearly all changes to 
the grammar will result i [...]
+And the primary downside: Semantics must be (re)-implemented in tools that
+consume the grammar. While this results in more work for tooling authors, the
+tools that use the grammar are easier to change than the grammar itself. The
+inaccurate nature of statically interpreting Clojure semantics means that not
+every decision made for the grammar would meet the needs of the various grammar
+consumers. This would lead to bugs and feature requests. Nearly all changes to
+the grammar will result in some sort of breakages to its consumers, so changes
+are best avoided once the grammar has stabilized. Therefore avoiding these
+semantic interpretations in the grammar is one of the best ways to minimize
+changes in the grammar.
 
 #### Further Reading
 
-- https://github.com/sogaiu/tree-sitter-clojure/blob/master/doc/scope.md
-- 
https://tree-sitter.github.io/tree-sitter/using-parsers#named-vs-anonymous-nodes
+- <https://github.com/sogaiu/tree-sitter-clojure/blob/master/doc/scope.md>
+- 
<https://tree-sitter.github.io/tree-sitter/using-parsers#named-vs-anonymous-nodes>
 
 ## Syntax Highlighting
 

Reply via email to