[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

Pablo Galindo Salgado Tue, 18 May 2021 05:55:55 -0700

> One integer is actually not enough to assign IDs.

Actually, disregard this particular problem. I think that we could
perfectly stop assigning IDs if we reach the overflow limit and call it a
day
since you need to have a truly horrendous file to reach 4,294,967,295 AST nodes
(I did some tests to check this).


On Tue, 18 May 2021 at 13:25, Pablo Galindo Salgado <pablog...@gmail.com>
wrote:

> Yet another problem that I found:
>
> One integer is actually not enough to assign IDs. One unsigned integer can
> cover 4,294,967,295 AST nodes, but is technically possible
> to have more than that in a single file. While in PEP 657 we are tracking
> offsets that are normally very low < 100 typically or end lines
> that are easily compressible (as end lines are normally equal to the start
> of the line), node IDs can be arbitrarily big. Which is worse:
> the parser creates some AST nodes that throw away if the parser fails
> (that's how PEG parsers work), so there will be a lot of IDs that
> don't get assigned (and the parser doesn't have an easy way to know how
> many IDs has thrown away). This forces us to either use something
> bigger than an integer (probably a size_t) or to deal with overflow.
>
> On Tue, 18 May 2021 at 10:23, Pablo Galindo Salgado <pablog...@gmail.com>
> wrote:
>
>> Hummmm, actually another problem of this approach:
>>
>> Nodes are created and modified after the optimization pass, so the AST
>> produced by the parser is not enough to reconstruct the actual
>> information, we need to also run the optimization passes, but
>> unfortunately, this is (by design) not don't in the Python API (to preserve
>> all possible information about the original code), so this complicates
>> quite a lot the API to get this, as `ast.parse` is not enough to get the
>> original tree.
>>
>> On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado <pablog...@gmail.com>
>> wrote:
>>
>>> Hi Nathaniel,
>>>
>>> Thanks a lot for your suggestion! I like the idea although I still think
>>> is more complex than our current proposal, but on the other hand it allows
>>> for a much richer results so I'm quite excited to try it out. We are going
>>> to give it a go to explore it with a prototype and if we are convinced we
>>> will update the PEP.
>>>
>>> One thing I'm not a fan of is that tools that want to leverage this
>>> information (notice our pep also offers this info for tools as an important
>>> aspect of it) will need to reparse the file AND search the AST node, which
>>> also involves transforming the full tree to Python objects, which is even
>>> slower. This is unfortunately a worse API than just get the full location
>>> given an instruction offset. Also, while displaying tracebacks may be a
>>> good scenario for the speed tradeoff, it may not be for other tools.
>>> Finally, if there is no file available all this information is lost,
>>> although one could argue that then is not extremely useful...
>>>
>>> Regards from sunny London,
>>> Pablo Galindo Salgado
>>>
>>> On Tue, 18 May 2021, 01:43 Nathaniel Smith, <n...@pobox.com> wrote:
>>>
>>>> On Mon, May 17, 2021 at 6:18 AM Mark Shannon <m...@hotpy.org> wrote:
>>>> > 2. Repeated binary operations on the same line.
>>>> >
>>>> > A single location can also be clearer when all the code is on one
>>>> line.
>>>> >
>>>> > i1 + i2 + s1
>>>> >
>>>> > PEP 657:
>>>> >
>>>> > i1 + i2 + s1
>>>> > ^^^^^^^^^^^^
>>>> >
>>>> > Using a single location:
>>>> >
>>>> > i1 + i2 + s1
>>>> >          ^
>>>>
>>>> It's true this case is a bit confusing with the whole operation span
>>>> highlighted, but I'm not sure the single location version is much better. I
>>>> feel like a Really Good UI would like, highlight the two operands in
>>>> different colors or something, or at least underline the two separate items
>>>> whose type is incompatible separately:
>>>>
>>>> TypeError: unsupported operand type(s) for +: 'int' + 'str':
>>>> i1 + i2 + s1
>>>> ^^^^^^^   ~~
>>>>
>>>> More generally, these error messages are the kind of thing where the UI
>>>> can always be tweaked to improve further, and those tweaks can make good
>>>> use of any rich source information that's available.
>>>>
>>>> So, here's another option to consider:
>>>>
>>>> - When parsing, assign each AST node a unique, deterministic id (e.g.
>>>> sequentially across the AST tree from top-to-bottom, left-to-right).
>>>> - For each bytecode offset, store the corresponding AST node id in an
>>>> lnotab-like table
>>>> - When displaying a traceback, we already need to go find and read the
>>>> original .py file to print source code at all. Re-parse it, and use the ids
>>>> to find the original AST node, in context with full structure. Let the
>>>> traceback formatter do whatever clever stuff it wants with this info.
>>>>
>>>> Of course if the .py and .pyc files don't match, this might produce
>>>> gibberish. We already have that problem with showing source lines, but it
>>>> might be even more confusing if we get some random unrelated AST node. This
>>>> could be avoided by storing some kind of hash in the code object, so that
>>>> we can validate the .py file we find hasn't changed (sha512 if we're
>>>> feeling fancy, crc32 if we want to save space, either way is probably 
>>>> fine).
>>>>
>>>> This would make traceback printing more expensive, but only if you want
>>>> the fancy features, and traceback printing is already expensive (it does
>>>> file I/O!). Usually by the time you're rendering a traceback it's more
>>>> important to optimize for human time than CPU time. It would take less
>>>> memory than PEP 657, and the same as Mark's proposal (both only track one
>>>> extra integer per bytecode offset). And it would allow for arbitrarily rich
>>>> traceback display.
>>>>
>>>> (I guess in theory you could make this even cheaper by using it to
>>>> replace lnotab, instead of extending it. But I think keeping lnotab around
>>>> is a good idea, as a fallback for cases where you can't find the original
>>>> source but still want some hint at location information.)
>>>>
>>>> -n
>>>>
>>>> --
>>>> Nathaniel J. Smith -- https://vorpus.org
>>>> _______________________________________________
>>>> Python-Dev mailing list -- python-dev@python.org
>>>> To unsubscribe send an email to python-dev-le...@python.org
>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>> Message archived at
>>>> https://mail.python.org/archives/list/python-dev@python.org/message/BUXFOSAEBXLIHH432PKBCXOGXUAHQIVP/
>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>
>>>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TROO2AWN2TQBQN73OUNDG7RL6HONISNN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

Reply via email to