Endilll wrote:

I've been staring at `cindex.py` for the better part of yesterday and today. 
I'll try to describe the status quo as I understand it first, to make sure all 
three of us are on the same page, then I'll describe the direction I think we 
should take.

Status quo
------------
1. I believe we're looking at three principal levels here: (1) `libclang` 
shared library, which exports functions; (2) function signatures of those 
exported functions described via `ctypes` (`functionList`), and (3) user-facing 
cindex API implemented in terms of said functions.
2. `libclang` itself is considered a bedrock for `cindex.py`. Not much to 
discuss here.
3. `ctypes` function signatures are required to describe how to call exported 
functions. Typical entry in `functionList` looks like the following: 
`("clang_isConstQualifiedType", [Type], bool)`, where function name is followed 
by a list of parameter types, which in turn is followed by return type. This 
information is used to fill in `argtypes` and `restype` attributes of the 
imported function object respectively. (@DeinAlptraum so `functionList` is 
there to stay one way or another.)
4. cindex API is typically a rather thin layer on top of `ctypes` function 
signatures, which provides convenient classes and methods.
5. There are two quirks to how values are returned from `libclang` into Python 
realm. The first one is about structural types. When exported function returns 
such a type, on `ctypes` side we need to declare a Python class that derives 
from `ctypes.Structure`, and fill in `_fields_` member describing the layout. 
Then this type can be used to initialize `restype` attribute of imported 
function object. When this function is called, under the hood `ctypes` 
constructs the Python object according to layout and returns it.
```python
class SourceLocation(Structure):
    _fields_ = [("ptr_data", c_void_p * 2), ("int_data", c_uint)]
```
6. The second quirk is about conversions of returned objects. If we need to 
return something else than what is specified in `restype`, e.g. a string, we 
need to perform a conversion. Conveniently, imported function objects has 
`errcheck` attribute that accepts a function. It is expected to perform checks 
(and raise an exception if they don't pass), but more importantly for us, what 
this function returns is what the call to imported function object returns, so 
it can be used as a conversion function as well (such usage is supported by the 
official documentation). This is what the 4th member of entries in 
`functionList` used for: `("clang_getFileName", [File], _CXString, 
_CXString.from_result)`.

Recent developments
------------------------
7. I believe our type annotation efforts up until now (mostly @DeinAlptraum 
patches) have been focused on cindex API.
8. One of the problems we've encountered while doing that is that imported 
function we rely on are not typed. This is not an issue when passing arguments 
(typed -> untyped is fine), but is a problem for the values we return (untyped 
-> typed make type checkers complain). This is the reason why we have so many 
`# type: ignore [no-any-return]` comments on `return` statements in `cindex.py`.
9. It seems that `ctypes` does not care about type hints, despite the fact that 
function signatures are specified. This is quite unfortunate, and leads to lack 
of type annotations for the layer of imported functions. I wasn't able to find 
any widely deployed solution for this.
10. It seems that the approach @TsXor is pursuing is to fill in the gap between 
`ctypes` and type hinting by introducing machinery that generates type hints 
for imported functions from their signatures (remember, we need to specify them 
anyway for `ctypes` to understand how to call them). This is a cleaner approach 
to resolve the issue with untyped returns than silencing type checkers with `# 
type: ignore [no-any-return]`.

Proposed direction
---------------------
11. I think adding type hints has two primary goals: enable users to do type 
checking, and provide more information to autocompletion engines. Specifically, 
cindex API is what needs to be annotated for users to get those benefits. I'm 
not sure why anyone would use "raw" imported function when there's much more 
convenient API in the same module.
12. I think our "current" approach supported by @DeinAlptraum work allows us to 
achieve both those primary goals.
13. I'm sympathetic to the "new" approach @TsXor is taking, but when I look at 
it for the standpoint of the primary goals, I don't see significant 
improvements that would justify the significant complexity it has over 
"current" approach.
14. `ctyped` is a useful work, but not for Clang, because we don't have enough 
maintainers experienced in this particular corner of Python. The problem it 
solves should be solved upstream, in `ctypes` or `typing` modules of the Python 
standard library. Once there is an upstream solution, we can reconsider the 
implementation approach of our type hints.
15. We shouldn't rely `errcheck` when we can help it. I find this to be a bit 
too much magic when one looks at the implementation of cindex API trying to 
understand what happens between `libclang` and cindex API.
16. Relying less on `errcheck` should help with contentious `# type: ignore 
[no-any-return]` comments sprinkled all over the place.
```python
# bad
def get_template_argument_kind(self, num: int) -> TemplateArgumentKind:
    return conf.lib.clang_Cursor_getTemplateArgumentKind(self, num)  # type: 
ignore [no-any-return]

# good
def get_template_argument_kind(self, num: int) -> TemplateArgumentKind:
    return 
TemplateArgumentKind.from_id(conf.lib.clang_Cursor_getTemplateArgumentKind(self,
 num))
```
17. Hopefully the similar transformation can help us when structural types are 
returned. In this case, it's not `errcheck` that works, but `_fields_` and 
internal `ctypes` machinery. I'm a bit surprised that `ctypes` doesn't do as 
much for us, because this seems rather trivial to implement.
```python
# bad
def start(self) -> SourceLocation:
    return conf.lib.clang_getRangeStart(self)  # type: ignore [no-any-return]

# good
def start(self) -> SourceLocation:
    return SourceLocation(conf.lib.clang_getRangeStart(self))
```
18. In #101784 I saw more fixes than `ctyped` and things around that. You're 
encouraged to submit them separately.

I hope this long write-up help us to get on the same page, and find a way 
forward.
CC @AaronBallman hopefully this is accessible enough for you to wrap your head 
around if you ever wanted to know how our Python binding work. Maybe we can 
turn this into documentation.

https://github.com/llvm/llvm-project/pull/101941
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to