> One integer is actually not enough to assign IDs. Actually, disregard this particular problem. I think that we could perfectly stop assigning IDs if we reach the overflow limit and call it a day since you need to have a truly horrendous file to reach 4,294,967,295 AST nodes (I did some tests to check this).
On Tue, 18 May 2021 at 13:25, Pablo Galindo Salgado <pablog...@gmail.com> wrote: > Yet another problem that I found: > > One integer is actually not enough to assign IDs. One unsigned integer can > cover 4,294,967,295 AST nodes, but is technically possible > to have more than that in a single file. While in PEP 657 we are tracking > offsets that are normally very low < 100 typically or end lines > that are easily compressible (as end lines are normally equal to the start > of the line), node IDs can be arbitrarily big. Which is worse: > the parser creates some AST nodes that throw away if the parser fails > (that's how PEG parsers work), so there will be a lot of IDs that > don't get assigned (and the parser doesn't have an easy way to know how > many IDs has thrown away). This forces us to either use something > bigger than an integer (probably a size_t) or to deal with overflow. > > On Tue, 18 May 2021 at 10:23, Pablo Galindo Salgado <pablog...@gmail.com> > wrote: > >> Hummmm, actually another problem of this approach: >> >> Nodes are created and modified after the optimization pass, so the AST >> produced by the parser is not enough to reconstruct the actual >> information, we need to also run the optimization passes, but >> unfortunately, this is (by design) not don't in the Python API (to preserve >> all possible information about the original code), so this complicates >> quite a lot the API to get this, as `ast.parse` is not enough to get the >> original tree. >> >> On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado <pablog...@gmail.com> >> wrote: >> >>> Hi Nathaniel, >>> >>> Thanks a lot for your suggestion! I like the idea although I still think >>> is more complex than our current proposal, but on the other hand it allows >>> for a much richer results so I'm quite excited to try it out. We are going >>> to give it a go to explore it with a prototype and if we are convinced we >>> will update the PEP. >>> >>> One thing I'm not a fan of is that tools that want to leverage this >>> information (notice our pep also offers this info for tools as an important >>> aspect of it) will need to reparse the file AND search the AST node, which >>> also involves transforming the full tree to Python objects, which is even >>> slower. This is unfortunately a worse API than just get the full location >>> given an instruction offset. Also, while displaying tracebacks may be a >>> good scenario for the speed tradeoff, it may not be for other tools. >>> Finally, if there is no file available all this information is lost, >>> although one could argue that then is not extremely useful... >>> >>> Regards from sunny London, >>> Pablo Galindo Salgado >>> >>> On Tue, 18 May 2021, 01:43 Nathaniel Smith, <n...@pobox.com> wrote: >>> >>>> On Mon, May 17, 2021 at 6:18 AM Mark Shannon <m...@hotpy.org> wrote: >>>> > 2. Repeated binary operations on the same line. >>>> > >>>> > A single location can also be clearer when all the code is on one >>>> line. >>>> > >>>> > i1 + i2 + s1 >>>> > >>>> > PEP 657: >>>> > >>>> > i1 + i2 + s1 >>>> > ^^^^^^^^^^^^ >>>> > >>>> > Using a single location: >>>> > >>>> > i1 + i2 + s1 >>>> > ^ >>>> >>>> It's true this case is a bit confusing with the whole operation span >>>> highlighted, but I'm not sure the single location version is much better. I >>>> feel like a Really Good UI would like, highlight the two operands in >>>> different colors or something, or at least underline the two separate items >>>> whose type is incompatible separately: >>>> >>>> TypeError: unsupported operand type(s) for +: 'int' + 'str': >>>> i1 + i2 + s1 >>>> ^^^^^^^ ~~ >>>> >>>> More generally, these error messages are the kind of thing where the UI >>>> can always be tweaked to improve further, and those tweaks can make good >>>> use of any rich source information that's available. >>>> >>>> So, here's another option to consider: >>>> >>>> - When parsing, assign each AST node a unique, deterministic id (e.g. >>>> sequentially across the AST tree from top-to-bottom, left-to-right). >>>> - For each bytecode offset, store the corresponding AST node id in an >>>> lnotab-like table >>>> - When displaying a traceback, we already need to go find and read the >>>> original .py file to print source code at all. Re-parse it, and use the ids >>>> to find the original AST node, in context with full structure. Let the >>>> traceback formatter do whatever clever stuff it wants with this info. >>>> >>>> Of course if the .py and .pyc files don't match, this might produce >>>> gibberish. We already have that problem with showing source lines, but it >>>> might be even more confusing if we get some random unrelated AST node. This >>>> could be avoided by storing some kind of hash in the code object, so that >>>> we can validate the .py file we find hasn't changed (sha512 if we're >>>> feeling fancy, crc32 if we want to save space, either way is probably >>>> fine). >>>> >>>> This would make traceback printing more expensive, but only if you want >>>> the fancy features, and traceback printing is already expensive (it does >>>> file I/O!). Usually by the time you're rendering a traceback it's more >>>> important to optimize for human time than CPU time. It would take less >>>> memory than PEP 657, and the same as Mark's proposal (both only track one >>>> extra integer per bytecode offset). And it would allow for arbitrarily rich >>>> traceback display. >>>> >>>> (I guess in theory you could make this even cheaper by using it to >>>> replace lnotab, instead of extending it. But I think keeping lnotab around >>>> is a good idea, as a fallback for cases where you can't find the original >>>> source but still want some hint at location information.) >>>> >>>> -n >>>> >>>> -- >>>> Nathaniel J. Smith -- https://vorpus.org >>>> _______________________________________________ >>>> Python-Dev mailing list -- python-dev@python.org >>>> To unsubscribe send an email to python-dev-le...@python.org >>>> https://mail.python.org/mailman3/lists/python-dev.python.org/ >>>> Message archived at >>>> https://mail.python.org/archives/list/python-dev@python.org/message/BUXFOSAEBXLIHH432PKBCXOGXUAHQIVP/ >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TROO2AWN2TQBQN73OUNDG7RL6HONISNN/ Code of Conduct: http://python.org/psf/codeofconduct/