I've started playing around with writing a grammar for Cython. As well as formally defining the language (in particular, with respect to Python) this should allow us to eventually move to using parser-generators rather than ad-hoc hand written code and be useful to external tools (e.g. IDEs, linters, syntax highlighting, etc.)
I've posted what I have at https://github.com/robertwb/cython/tree/grammar , in particular https://github.com/robertwb/cython/blob/grammar/Cython/Parser/Grammar https://github.com/robertwb/cython/compare/robertwb:3aa9056943f83a68cc9d9335f8a9c81e9a6f3f91...363bc162fd626203f832a33b6c736ff8b10f6086#diff-8b69afcfc588fde3d763f2ec670e42c2L1 Nothing beyond generating the raw parse tree is hooked up yet, and even that requires an extra directive (formal_grammar=True). Building and using the parser requires a source-built Python (it looks for the pgen artifact Python uses to compile the grammar). We may or may not want to stick with this approach long term (though if we do we might ship the generated .c files). This parse tree is still pretty low-level, we might want to also create something like Python.asdl to give us a (closer) AST. The grammar isn't complete, but should cover a most of the language (over 3/4 of the test suite passes, and that explores a lot of the corner cases). The most notable omissions are that it's using Python's lexer, so doesn't have a token for '?' or the additional literal string prefixes/int literal suffixes. Also, as the lexer clearly doesn't understand includes, these are handled by inlining in a preparsing step (which messes with line numbers). The grammar could be tightened up as well. For example, this grammar doesn't distinguish between valid pxd vs. pyx constructs, and allows cdef statements within if statements (or even normal class declarations). I tried to restrict the existing grammar as little as possible, in particular the only new illegal identifiers are 'cdef', 'ctypedef' and (for ambiguity reasons new' and 'sizeof'). Also, the "cython" keywords may not be used for identifiers that might be typed (e.g. "def foo(int): ..." is not allowed). The most significant departure from the existing "grammar" is that rather than using C-style declarators, cdef declarations are of the form "cdef [type] name." Thus one write the (already legal) cdef double[3] loc and cdef double* a, b declares two pointers. How to handle function pointers is still up in the air, but I wouldn't be opposed to moving to a new syntax (e.g. "(double, void*) -> double" inspired by Py3) for those. It disallows empty "declarators" for parameters of function declarations (though we could consider adding this back). i think this would also be a nice chance to simplify the grammar, so there are some intentional ommisions. Most notably, there are several modifiers (e.g the __cdecl, __stdcall, __fastcall callspecs, maybe inline, maybe even "with nogil", and the "cdef class Foo [object object_struct_name, type type_object_name ]" spec for external classes) that would make more sense as decorators. This would be backwards incompatible, but these are not commonly used features and fair warning (or even translators, I included a sed script to deal with the common case of declarators mentioned above) could be given and I think could be worth the simplification. Thoughts? - Robert _______________________________________________ cython-devel mailing list cython-devel@python.org https://mail.python.org/mailman/listinfo/cython-devel