[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Antoine Pitrou
On Sat, 8 May 2021 02:58:40 +
Neil Schemenauer  wrote:
> On 2021-05-07, Pablo Galindo Salgado wrote:
> > Technically the main concern may be the size of the unmarshalled
> > pyc files in memory, more than the storage size of disk.  
> 
> It would be cool if we could mmap the pyc files and have the VM run
> code without an unmarshal step.

What happens if another process mutates or truncates the file while the
CPython VM is executing code from the mapped file?  Crash?

> Instead, could we dump out the pyc data in a format similar to Cap'n
> Proto?  That way no unmarshal is needed.

How do you freeze PyObjects in Cap'n Proto so that no unmarshal is
needed when loading them?

> The benefit would be faster startup times.  The unmarshal step is
> costly.

How costly? Do we have numbers?

> It would mostly solve the concern about these larger
> linenum/colnum tables.  We would only load that data into memory if
> the table is accessed.

Memory-mapped files are accessed with page granularity (4 kB on x86), so
I'm not sure it's that simple.  You would have to make sure to store
those tables in separate sections distant from the actual code areas.

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R57VMAURJIA3DZKMRTBK35CTMDS5JCDZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Antoine Pitrou
On Fri, 7 May 2021 23:20:55 +0100
Irit Katriel via Python-Dev  wrote:
> On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado 
> wrote:
> 
> >
> > The cost of this is having the start column number and end column number
> > information for every bytecode instruction
> >  
> 
> 
> Is it really every instruction? Or only those that can raise exceptions?

I think almost any instruction can be interrupted with
KeyboardInterrupt (or any other asynchronously-raised exception).

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HMX327TD72DOTCAE2TGJRKFHF4H4ZWEC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Antoine Pitrou


You can certainly get fancy and apply delta encoding + entropy
compression, such as done in Parquet, a high-performance data storage
format:
https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5

(the linked paper from Lemire and Boytsov gives a lot of ideas)

But it would be weird to apply such level of engineering when we never
bothered compressing docstrings.

Regards

Antoine.



On Fri, 7 May 2021 23:30:46 +0100
Pablo Galindo Salgado  wrote:
> This is actually a very good point. The only disadvantage is that it
> complicates the parsing a bit and we loose the possibility of indexing
> the table by instruction offset.
> 
> On Fri, 7 May 2021 at 23:01, Larry Hastings  wrote:
> 
> > On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
> >
> > Given that column numbers are not very big compared with line numbers, we
> > plan to store these as unsigned chars
> > or unsigned shorts. We ran some experiments over the standard library and
> > we found that the overhead of all pyc files is:
> >
> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
> > extra size is 0.88 MB).
> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> > extra size is 0.44MB).
> >
> > One of the disadvantages of using chars is that we can only report columns
> > from 1 to 255 so if an error happens in a column
> > bigger than that then we would have to exclude it (and not show the
> > highlighting) for that frame. Unsigned short will allow
> > the values to go from 0 to 65535.
> >
> > Are lnotab entries required to be a fixed size?  If not:
> >
> > if column < 255:
> > lnotab.write_one_byte(column)
> > else:
> > lnotab.write_one_byte(255)
> > lnotab.write_two_bytes(column)
> >
> >
> > I might even write four bytes instead of two in the latter case,
> >
> >
> > */arry*
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> > https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >  
> 



___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UOCHN5ZY3ERPNWOCO2SJRTCDTEYMYVD7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Ammar Askar
I really like this idea Nathaniel.

We already have a section considering lazy-loading column information in the
draft PEP but obviously it has a pretty high implementation complexity since it
necessitates a change in the pyc format and the importlib machinery.

Long term this might be the way to go for column information and line
information to alleviate the memory burden.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XS435QQOBWWQNU2FY6RVLA4YUXJCN7XF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] a name for the ExceptHandler.type when it is a literal tuple of types

2021-05-08 Thread Thomas Grainger
That's this bit:

```
except (A, B):
   ^^
```

bpo-43149 currently calls it an "exception group", but that conflicts with PEP 
654 -- Exception Groups and except*

```

   >>> try:
   ...   pass
   ... except A, B:
   ...   pass
   Traceback (most recent call last):
   SyntaxError: exception group must be parenthesized
```

some alternatives:

exception classinfo must be parenthesized (classinfo so named from the 
parameter to issubclass)
exception sequence must be parenthesized

see also:

- https://github.com/python/cpython/pull/24467#discussion_r628756347
- https://www.python.org/dev/peps/pep-0654/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HSN6ESRB4BD6IUIPKLMNP4TPBQPWHBFK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Devin Jeanpierre
> What are people thoughts on the feature?

I'm +1, this level of detail in the bytecode is very useful. My main
interest is actually from the AST though. :) In order to be in the
bytecode, one assumes it must first be in the AST. That information is
incredibly useful for refactoring tools like https://github.com/ssbr/refex
(n.b. author=me) or https://github.com/gristlabs/asttokens (which refex
builds on). Currently, asttokens actually attempts to re-discover that kind
of information after the fact, which is error-prone and difficult.

This could also be useful for finer-grained code coverage tracking and/or
debugging. One can actually imagine highlighting the spans of code which
were only partially executed: e.g. if only x() were ever executed in "x()
and y()" . Ned Batchelder once did wild hacks in this space, and maybe this
proposal could lead in the future to something non-hacky?
https://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
I say "in the future" because it doesn't just automatically work, since as
I understand it, coverage currently doesn't track spans, but lines hit by
the line-based debugger. Something else is needed to be able to track which
spans were hit rather than which lines, and it may be similarly hacky if
it's isolated to coveragepy. If, for example, enough were exposed to let a
debugger skip to bytecode for the next different (sub) span, then this
would be useful for both coverage and actual debugging as you step through
an expression. This is probably way out of scope for your PEP, but even so,
the feature may be laying some useful ground work here.

-- Devin

On Fri, May 7, 2021 at 2:52 PM Pablo Galindo Salgado 
wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>
> Thanks in advance,
>
> Regards from cloudy London,
> Pablo Galindo Salgado
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DB3RTYBF2BXTY6ZHP3Z4DXCRWPJIQUFD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Jelle Zijlstra
El sáb, 8 may 2021 a las 10:00, Devin Jeanpierre ()
escribió:

> > What are people thoughts on the feature?
>
> I'm +1, this level of detail in the bytecode is very useful. My main
> interest is actually from the AST though. :) In order to be in the
> bytecode, one assumes it must first be in the AST. That information is
> incredibly useful for refactoring tools like https://github.com/ssbr/refex
> (n.b. author=me) or https://github.com/gristlabs/asttokens (which refex
> builds on). Currently, asttokens actually attempts to re-discover that kind
> of information after the fact, which is error-prone and difficult.
>
The AST already has column offsets (
https://docs.python.org/3.10/library/ast.html#ast.AST.col_offset).


>
> This could also be useful for finer-grained code coverage tracking and/or
> debugging. One can actually imagine highlighting the spans of code which
> were only partially executed: e.g. if only x() were ever executed in "x()
> and y()" . Ned Batchelder once did wild hacks in this space, and maybe this
> proposal could lead in the future to something non-hacky?
> https://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
> I say "in the future" because it doesn't just automatically work, since as
> I understand it, coverage currently doesn't track spans, but lines hit by
> the line-based debugger. Something else is needed to be able to track which
> spans were hit rather than which lines, and it may be similarly hacky if
> it's isolated to coveragepy. If, for example, enough were exposed to let a
> debugger skip to bytecode for the next different (sub) span, then this
> would be useful for both coverage and actual debugging as you step through
> an expression. This is probably way out of scope for your PEP, but even so,
> the feature may be laying some useful ground work here.
>
> -- Devin
>
> On Fri, May 7, 2021 at 2:52 PM Pablo Galindo Salgado 
> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early discussion
>> about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>
>> return lel2(x) / 23
>>
>>^^^
>>
>>   File "test.py", line 9, in lel2
>>
>> return 25 + lel(x) + lel(x)
>>
>> ^^
>>
>>   File "test.py", line 6, in lel
>>
>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>
>>  ^
>>
>> TypeError: 'NoneType' object is not subscriptable
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>> and this is what we want to discuss (there is also some stack cost to
>> re-raise exceptions but that's not a big problem in
>> any case). Given that column numbers are not very big compared with line
>> numbers, we plan to store these as unsigned chars
>> or unsigned shorts. We ran some experiments over the standard library and
>> we found that the overhead of all pyc files is:
>>
>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> extra size is 0.88 MB).
>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>> extra size is 0.44MB).
>>
>> One of the disadvantages of using chars is that we can only report
>> columns from 1 to 255 so if an error happens in a column
>> bigger than that then we would have to exclude it (and not show the
>> highlighting) for that frame. Unsigned short will allow
>> the values to go from 0 to 65535.
>>
>> Unfortunately these numbers are not easily compressible, as every
>> instruction would have very different offsets.
>>
>> There is also the possibility of not doing this based on some build flag
>> on when using -O to allow users to opt out, but given the fact
>> that these numbers can be quite useful to other tools like coverage
>> measuring tools, tracers, profilers and the such adding conditional
>> logic to many places would complicate the implementation considerably and
>> will potentially reduce the usability of those tools so we prefer
>> not to have the conditional logic. We believe this is extra cost is very
>> much worth the better error reporting but we understand and respect
>> other points of view.
>>
>> Does anyone see a better way to encode this information **without
>> complicating a lot the implementation**? What are people thoughts on the
>> feature?
>>
>> Thanks in advance,
>>
>> Regards from cloudy London,
>> Pablo Galindo Salgado
>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.p

[Python-Dev] Re: a name for the ExceptHandler.type when it is a literal tuple of types

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 8:54 AM Thomas Grainger  wrote:

> That's this bit:
>
> ```
> except (A, B):
>^^
> ```
>
> bpo-43149 currently calls it an "exception group", but that conflicts with
> PEP 654 -- Exception Groups and except*
>
> ```
>
>>>> try:
>...   pass
>... except A, B:
>...   pass
>Traceback (most recent call last):
>SyntaxError: exception group must be parenthesized
> ```
>
> some alternatives:
>
> exception classinfo must be parenthesized (classinfo so named from the
> parameter to issubclass)
> exception sequence must be parenthesized
>
> see also:
>
> - https://github.com/python/cpython/pull/24467#discussion_r628756347
> - https://www.python.org/dev/peps/pep-0654/


Given it requires ()s it is probably better to call it an "exception
sequence" or even go fully to "exception tuple" in order to avoid confusion
and tie in with the other meanings of the required syntax.

-gps
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y7SPN4WSXXFPAZITS2PMF2PRSVX3H5SE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: a name for the ExceptHandler.type when it is a literal tuple of types

2021-05-08 Thread Thomas Grainger
There's a PR to use "SyntaxError: multiple exception types must be
parenthesized"

https://github.com/python/cpython/pull/25996

On Sat, 8 May 2021, 19:20 Gregory P. Smith,  wrote:

>
>
> On Sat, May 8, 2021 at 8:54 AM Thomas Grainger  wrote:
>
>> That's this bit:
>>
>> ```
>> except (A, B):
>>^^
>> ```
>>
>> bpo-43149 currently calls it an "exception group", but that conflicts
>> with PEP 654 -- Exception Groups and except*
>>
>> ```
>>
>>>>> try:
>>...   pass
>>... except A, B:
>>...   pass
>>Traceback (most recent call last):
>>SyntaxError: exception group must be parenthesized
>> ```
>>
>> some alternatives:
>>
>> exception classinfo must be parenthesized (classinfo so named from the
>> parameter to issubclass)
>> exception sequence must be parenthesized
>>
>> see also:
>>
>> - https://github.com/python/cpython/pull/24467#discussion_r628756347
>> - https://www.python.org/dev/peps/pep-0654/
>
>
> Given it requires ()s it is probably better to call it an "exception
> sequence" or even go fully to "exception tuple" in order to avoid confusion
> and tie in with the other meanings of the required syntax.
>
> -gps
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2ZHLIOVQG27EUJLXQYMTQUB6Z67MNJ4I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Brett Cannon
On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado 
wrote:

> Although we were originally not sympathetic with it, we may need to offer
> an opt-out mechanism for those users that care about the impact of the
> overhead of the new data in pyc files
> and in in-memory code objectsas was suggested by some folks (Thomas, Yury,
> and others). For this, we could propose that the functionality will be
> deactivated along with the extra
> information when Python is executed in optimized mode (``python -O``) and
> therefore pyo files will not have the overhead associated with the extra
> required data.
>

Just to be clear, .pyo files have not existed for a while:
https://www.python.org/dev/peps/pep-0488/.


> Notice that Python
> already strips docstrings in this mode so it would be "aligned" with the
> current mechanism of optimized mode.
>

This only kicks in at the -OO level.


>
> Although this complicates the implementation, it certainly is still much
> easier than dealing with compression (and more useful for those that don't
> want the feature). Notice that we also
> expect pessimistic results from compression as offsets would be quite
> random (although predominantly in the range 10 - 120).
>

I personally prefer the idea of dropping the data with -OO since if you're
stripping out docstrings you're already hurting introspection capabilities
in the name of memory. Or one could go as far as to introduce -Os to do -OO
plus dropping this extra data.

As for .pyc file size, I personally wouldn't worry about it. If someone is
that space-constrained they either aren't using .pyc files or are only
shipping a single set of .pyc files under -OO and skipping source code. And
.pyc files are an implementation detail of CPython so there  shouldn't be
too much of a concern for other interpreters.

-Brett


>
> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
> wrote:
>
>> One last note for clarity: that's the increase of size in the stdlib, the
>> increase of size
>> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
>> of 22%.
>>
>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
>> wrote:
>>
>>> Some update on the numbers. We have made some draft implementation to
>>> corroborate the
>>> numbers with some more realistic tests and seems that our original
>>> calculations were wrong.
>>> The actual increase in size is quite bigger than previously advertised:
>>>
>>> Using bytes object to encode the final object and marshalling that to
>>> disk (so using uint8_t) as the underlying
>>> type:
>>>
>>> BEFORE:
>>>
>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>> ❯ du -h Lib -c --max-depth=0
>>> 70M Lib
>>> 70M total
>>>
>>> AFTER:
>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>> ❯ du -h Lib -c --max-depth=0
>>> 76M Lib
>>> 76M total
>>>
>>> So that's an increase of 8.56 % over the original value. This is storing
>>> the start offset and end offset with no compression
>>> whatsoever.
>>>
>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
>>> wrote:
>>>
 Hi there,

 We are preparing a PEP and we would like to start some early discussion
 about one of the main aspects of the PEP.

 The work we are preparing is to allow the interpreter to produce more
 fine-grained error messages, pointing to
 the source associated to the instructions that are failing. For example:

 Traceback (most recent call last):

   File "test.py", line 14, in 

 lel3(x)

 ^^^

   File "test.py", line 12, in lel3

 return lel2(x) / 23

^^^

   File "test.py", line 9, in lel2

 return 25 + lel(x) + lel(x)

 ^^

   File "test.py", line 6, in lel

 return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)

  ^

 TypeError: 'NoneType' object is not subscriptable

 The cost of this is having the start column number and end
 column number information for every bytecode instruction
 and this is what we want to discuss (there is also some stack cost to
 re-raise exceptions but that's not a big problem in
 any case). Given that column numbers are not very big compared with
 line numbers, we plan to store these as unsigned chars
 or unsigned shorts. We ran some experiments over the standard library
 and we found that the overhead of all pyc files is:

 * If we use shorts, the total overhead is ~3% (total size 28MB and the
 extra size is 0.88 MB).
 * If we use chars. the total overhead is ~1.5% (total size 28 MB and
 the extra size is 0.44MB).

 One of the disadvantages of using chars is that we can only report
 columns from 1 to 255 so if an error happens in a column
 bigger than that then we would have to exclude it (and not show the
 highlighting) for that frame. Unsigned short will a

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
Hi Brett,

Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.


Whoops, my bad, I wanted to refer to the pyc files that are generated
with -OO, which have the "opt-2" prefix.

This only kicks in at the -OO level.


I will correct the PEP so it reflex this more exactly.

I personally prefer the idea of dropping the data with -OO since if you're
> stripping out docstrings you're already hurting introspection capabilities
> in the name of memory. Or one could go as far as to introduce -Os to do -OO
> plus dropping this extra data.


This is indeed the plan, sorry for the confusion. The opt-out mechanism is
using -OO, precisely as we are already dropping other data.

Thanks for the clarifications!



On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:

>
>
> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado 
> wrote:
>
>> Although we were originally not sympathetic with it, we may need to offer
>> an opt-out mechanism for those users that care about the impact of the
>> overhead of the new data in pyc files
>> and in in-memory code objectsas was suggested by some folks (Thomas,
>> Yury, and others). For this, we could propose that the functionality will
>> be deactivated along with the extra
>> information when Python is executed in optimized mode (``python -O``) and
>> therefore pyo files will not have the overhead associated with the extra
>> required data.
>>
>
> Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.
>
>
>> Notice that Python
>> already strips docstrings in this mode so it would be "aligned" with the
>> current mechanism of optimized mode.
>>
>
> This only kicks in at the -OO level.
>
>
>>
>> Although this complicates the implementation, it certainly is still much
>> easier than dealing with compression (and more useful for those that don't
>> want the feature). Notice that we also
>> expect pessimistic results from compression as offsets would be quite
>> random (although predominantly in the range 10 - 120).
>>
>
> I personally prefer the idea of dropping the data with -OO since if you're
> stripping out docstrings you're already hurting introspection capabilities
> in the name of memory. Or one could go as far as to introduce -Os to do -OO
> plus dropping this extra data.
>
> As for .pyc file size, I personally wouldn't worry about it. If someone is
> that space-constrained they either aren't using .pyc files or are only
> shipping a single set of .pyc files under -OO and skipping source code. And
> .pyc files are an implementation detail of CPython so there  shouldn't be
> too much of a concern for other interpreters.
>
> -Brett
>
>
>>
>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
>> wrote:
>>
>>> One last note for clarity: that's the increase of size in the stdlib,
>>> the increase of size
>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
>>> of 22%.
>>>
>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
>>> wrote:
>>>
 Some update on the numbers. We have made some draft implementation to
 corroborate the
 numbers with some more realistic tests and seems that our original
 calculations were wrong.
 The actual increase in size is quite bigger than previously advertised:

 Using bytes object to encode the final object and marshalling that to
 disk (so using uint8_t) as the underlying
 type:

 BEFORE:

 ❯ ./python -m compileall -r 1000 Lib > /dev/null
 ❯ du -h Lib -c --max-depth=0
 70M Lib
 70M total

 AFTER:
 ❯ ./python -m compileall -r 1000 Lib > /dev/null
 ❯ du -h Lib -c --max-depth=0
 76M Lib
 76M total

 So that's an increase of 8.56 % over the original value. This is
 storing the start offset and end offset with no compression
 whatsoever.

 On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
 wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early
> discussion about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For
> example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
> The cost of this is having the start co

[Python-Dev] Re: a name for the ExceptHandler.type when it is a literal tuple of types

2021-05-08 Thread Guido van Rossum
I propose “exception tuple”, since syntactically and semantically it must
be a tuple. (Same as for isinstance() and issubclass().)

On Sat, May 8, 2021 at 05:52 Thomas Grainger  wrote:

> That's this bit:
>
> ```
> except (A, B):
>^^
> ```
>
> bpo-43149 currently calls it an "exception group", but that conflicts with
> PEP 654 -- Exception Groups and except*
>
> ```
>
>>>> try:
>...   pass
>... except A, B:
>...   pass
>Traceback (most recent call last):
>SyntaxError: exception group must be parenthesized
> ```
>
> some alternatives:
>
> exception classinfo must be parenthesized (classinfo so named from the
> parameter to issubclass)
> exception sequence must be parenthesized
>
> see also:
>
> - https://github.com/python/cpython/pull/24467#discussion_r628756347
> - https://www.python.org/dev/peps/pep-0654/
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/HSN6ESRB4BD6IUIPKLMNP4TPBQPWHBFK/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/62WTPW24K3XNOYJNTBC6DWBGZHHKF2L5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: a name for the ExceptHandler.type when it is a literal tuple of types

2021-05-08 Thread Thomas Grainger
Would it be possible to drop the requirement that multiple exception types
are parenthesized? Is it only ambiguous with the old Python2 syntax?

On Sat, 8 May 2021, 20:15 Guido van Rossum,  wrote:

> I propose “exception tuple”, since syntactically and semantically it must
> be a tuple. (Same as for isinstance() and issubclass().)
>
> On Sat, May 8, 2021 at 05:52 Thomas Grainger  wrote:
>
>> That's this bit:
>>
>> ```
>> except (A, B):
>>^^
>> ```
>>
>> bpo-43149 currently calls it an "exception group", but that conflicts
>> with PEP 654 -- Exception Groups and except*
>>
>> ```
>>
>>>>> try:
>>...   pass
>>... except A, B:
>>...   pass
>>Traceback (most recent call last):
>>SyntaxError: exception group must be parenthesized
>> ```
>>
>> some alternatives:
>>
>> exception classinfo must be parenthesized (classinfo so named from the
>> parameter to issubclass)
>> exception sequence must be parenthesized
>>
>> see also:
>>
>> - https://github.com/python/cpython/pull/24467#discussion_r628756347
>> - https://www.python.org/dev/peps/pep-0654/
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/HSN6ESRB4BD6IUIPKLMNP4TPBQPWHBFK/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> --
> --Guido (mobile)
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3EW75R7I4IYIBKKVPBW333ZBYMPB5YGR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: a name for the ExceptHandler.type when it is a literal tuple of types

2021-05-08 Thread Guido van Rossum
That’s a discussion for another day.

On Sat, May 8, 2021 at 09:17 Thomas Grainger  wrote:

> Would it be possible to drop the requirement that multiple exception types
> are parenthesized? Is it only ambiguous with the old Python2 syntax?
>
> On Sat, 8 May 2021, 20:15 Guido van Rossum,  wrote:
>
>> I propose “exception tuple”, since syntactically and semantically it must
>> be a tuple. (Same as for isinstance() and issubclass().)
>>
>> On Sat, May 8, 2021 at 05:52 Thomas Grainger  wrote:
>>
>>> That's this bit:
>>>
>>> ```
>>> except (A, B):
>>>^^
>>> ```
>>>
>>> bpo-43149 currently calls it an "exception group", but that conflicts
>>> with PEP 654 -- Exception Groups and except*
>>>
>>> ```
>>>
>>>>>> try:
>>>...   pass
>>>... except A, B:
>>>...   pass
>>>Traceback (most recent call last):
>>>SyntaxError: exception group must be parenthesized
>>> ```
>>>
>>> some alternatives:
>>>
>>> exception classinfo must be parenthesized (classinfo so named from the
>>> parameter to issubclass)
>>> exception sequence must be parenthesized
>>>
>>> see also:
>>>
>>> - https://github.com/python/cpython/pull/24467#discussion_r628756347
>>> - https://www.python.org/dev/peps/pep-0654/
>>> ___
>>> Python-Dev mailing list -- python-dev@python.org
>>> To unsubscribe send an email to python-dev-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-dev@python.org/message/HSN6ESRB4BD6IUIPKLMNP4TPBQPWHBFK/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> --
>> --Guido (mobile)
>>
> --
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZYKTQJDZN3T63ZAUZZ7BLFGYOJVNOSPJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado 
wrote:

> Hi Brett,
>
> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>
>
> Whoops, my bad, I wanted to refer to the pyc files that are generated
> with -OO, which have the "opt-2" prefix.
>
> This only kicks in at the -OO level.
>
>
> I will correct the PEP so it reflex this more exactly.
>
> I personally prefer the idea of dropping the data with -OO since if you're
>> stripping out docstrings you're already hurting introspection capabilities
>> in the name of memory. Or one could go as far as to introduce -Os to do -OO
>> plus dropping this extra data.
>
>
> This is indeed the plan, sorry for the confusion. The opt-out mechanism is
> using -OO, precisely as we are already dropping other data.
>

We can't piggy back on -OO as the only way to disable this, it needs to
have an option of its own.  -OO is unusable as code that relies on
"doc"strings as application data such as http://www.dabeaz.com/ply/ply.html
exists.

-gps


>
> Thanks for the clarifications!
>
>
>
> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>
>>
>>
>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado 
>> wrote:
>>
>>> Although we were originally not sympathetic with it, we may need to
>>> offer an opt-out mechanism for those users that care about the impact of
>>> the overhead of the new data in pyc files
>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>> Yury, and others). For this, we could propose that the functionality will
>>> be deactivated along with the extra
>>> information when Python is executed in optimized mode (``python -O``)
>>> and therefore pyo files will not have the overhead associated with the
>>> extra required data.
>>>
>>
>> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>>> Notice that Python
>>> already strips docstrings in this mode so it would be "aligned" with
>>> the current mechanism of optimized mode.
>>>
>>
>> This only kicks in at the -OO level.
>>
>>
>>>
>>> Although this complicates the implementation, it certainly is still much
>>> easier than dealing with compression (and more useful for those that don't
>>> want the feature). Notice that we also
>>> expect pessimistic results from compression as offsets would be quite
>>> random (although predominantly in the range 10 - 120).
>>>
>>
>> I personally prefer the idea of dropping the data with -OO since if
>> you're stripping out docstrings you're already hurting introspection
>> capabilities in the name of memory. Or one could go as far as to introduce
>> -Os to do -OO plus dropping this extra data.
>>
>> As for .pyc file size, I personally wouldn't worry about it. If someone
>> is that space-constrained they either aren't using .pyc files or are only
>> shipping a single set of .pyc files under -OO and skipping source code. And
>> .pyc files are an implementation detail of CPython so there  shouldn't be
>> too much of a concern for other interpreters.
>>
>> -Brett
>>
>>
>>>
>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
>>> wrote:
>>>
 One last note for clarity: that's the increase of size in the stdlib,
 the increase of size
 for pyc files goes from 28.471296MB to 34.750464MB, which is an
 increase of 22%.

 On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
 wrote:

> Some update on the numbers. We have made some draft implementation to
> corroborate the
> numbers with some more realistic tests and seems that our original
> calculations were wrong.
> The actual increase in size is quite bigger than previously advertised:
>
> Using bytes object to encode the final object and marshalling that to
> disk (so using uint8_t) as the underlying
> type:
>
> BEFORE:
>
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 70M Lib
> 70M total
>
> AFTER:
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 76M Lib
> 76M total
>
> So that's an increase of 8.56 % over the original value. This is
> storing the start offset and end offset with no compression
> whatsoever.
>
> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early
>> discussion about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For
>> example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>
>>

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> We can't piggy back on -OO as the only way to disable this, it needs to
have an option of its own.  -OO is unusable as code that relies on
"doc"strings as application data such as http://www.dabeaz.com/ply/ply.html
exists.

-OO is the only sensible way to disable the data. There are two things to
disable:

* The data in pyc files
* Printing the exception highlighting

Printing the exception highlighting can be disabled via combo of
environment variable / -X option but collecting the data can only be
disabled by -OO. The reason is that this will end in pyc files
so when the data is not there, a different kind of pyc files need to be
produced and I really don't want to have another set of pyc file extension
just to deactivate this. Notice that also a configure
time variable won't work because it will cause crashes when reading pyc
files produced by the interpreter compiled without the flag.

On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:

>
>
> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado 
> wrote:
>
>> Hi Brett,
>>
>> Just to be clear, .pyo files have not existed for a while:
>>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>> with -OO, which have the "opt-2" prefix.
>>
>> This only kicks in at the -OO level.
>>
>>
>> I will correct the PEP so it reflex this more exactly.
>>
>> I personally prefer the idea of dropping the data with -OO since if
>>> you're stripping out docstrings you're already hurting introspection
>>> capabilities in the name of memory. Or one could go as far as to introduce
>>> -Os to do -OO plus dropping this extra data.
>>
>>
>> This is indeed the plan, sorry for the confusion. The opt-out mechanism
>> is using -OO, precisely as we are already dropping other data.
>>
>
> We can't piggy back on -OO as the only way to disable this, it needs to
> have an option of its own.  -OO is unusable as code that relies on
> "doc"strings as application data such as
> http://www.dabeaz.com/ply/ply.html exists.
>
> -gps
>
>
>>
>> Thanks for the clarifications!
>>
>>
>>
>> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>>
>>>
>>>
>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
 Although we were originally not sympathetic with it, we may need to
 offer an opt-out mechanism for those users that care about the impact of
 the overhead of the new data in pyc files
 and in in-memory code objectsas was suggested by some folks (Thomas,
 Yury, and others). For this, we could propose that the functionality will
 be deactivated along with the extra
 information when Python is executed in optimized mode (``python -O``)
 and therefore pyo files will not have the overhead associated with the
 extra required data.

>>>
>>> Just to be clear, .pyo files have not existed for a while:
>>> https://www.python.org/dev/peps/pep-0488/.
>>>
>>>
 Notice that Python
 already strips docstrings in this mode so it would be "aligned" with
 the current mechanism of optimized mode.

>>>
>>> This only kicks in at the -OO level.
>>>
>>>

 Although this complicates the implementation, it certainly is still
 much easier than dealing with compression (and more useful for those that
 don't want the feature). Notice that we also
 expect pessimistic results from compression as offsets would be quite
 random (although predominantly in the range 10 - 120).

>>>
>>> I personally prefer the idea of dropping the data with -OO since if
>>> you're stripping out docstrings you're already hurting introspection
>>> capabilities in the name of memory. Or one could go as far as to introduce
>>> -Os to do -OO plus dropping this extra data.
>>>
>>> As for .pyc file size, I personally wouldn't worry about it. If someone
>>> is that space-constrained they either aren't using .pyc files or are only
>>> shipping a single set of .pyc files under -OO and skipping source code. And
>>> .pyc files are an implementation detail of CPython so there  shouldn't be
>>> too much of a concern for other interpreters.
>>>
>>> -Brett
>>>
>>>

 On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
 wrote:

> One last note for clarity: that's the increase of size in the stdlib,
> the increase of size
> for pyc files goes from 28.471296MB to 34.750464MB, which is an
> increase of 22%.
>
> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Some update on the numbers. We have made some draft implementation to
>> corroborate the
>> numbers with some more realistic tests and seems that our original
>> calculations were wrong.
>> The actual increase in size is quite bigger than previously
>> advertised:
>>
>> Using bytes object to encode the final object and marshalling that to
>> disk (so using uint8_t) as the underlying
>> type:
>>

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
wrote:

> > We can't piggy back on -OO as the only way to disable this, it needs to
> have an option of its own.  -OO is unusable as code that relies on
> "doc"strings as application data such as
> http://www.dabeaz.com/ply/ply.html exists.
>
> -OO is the only sensible way to disable the data. There are two things to
> disable:
>

nit: I wouldn't choose the word "sensible" given that -OO is already
fundamentally unusable without knowing if any code in your entire
transitive dependencies might depend on the presence of docstrings...


>
> * The data in pyc files
> * Printing the exception highlighting
>
> Printing the exception highlighting can be disabled via combo of
> environment variable / -X option but collecting the data can only be
> disabled by -OO. The reason is that this will end in pyc files
> so when the data is not there, a different kind of pyc files need to be
> produced and I really don't want to have another set of pyc file extension
> just to deactivate this. Notice that also a configure
> time variable won't work because it will cause crashes when reading pyc
> files produced by the interpreter compiled without the flag.
>

I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.


> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>
>>
>>
>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>> pablog...@gmail.com> wrote:
>>
>>> Hi Brett,
>>>
>>> Just to be clear, .pyo files have not existed for a while:
 https://www.python.org/dev/peps/pep-0488/.
>>>
>>>
>>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>>> with -OO, which have the "opt-2" prefix.
>>>
>>> This only kicks in at the -OO level.
>>>
>>>
>>> I will correct the PEP so it reflex this more exactly.
>>>
>>> I personally prefer the idea of dropping the data with -OO since if
 you're stripping out docstrings you're already hurting introspection
 capabilities in the name of memory. Or one could go as far as to introduce
 -Os to do -OO plus dropping this extra data.
>>>
>>>
>>> This is indeed the plan, sorry for the confusion. The opt-out mechanism
>>> is using -OO, precisely as we are already dropping other data.
>>>
>>
>> We can't piggy back on -OO as the only way to disable this, it needs to
>> have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -gps
>>
>>
>>>
>>> Thanks for the clarifications!
>>>
>>>
>>>
>>> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>>>


 On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
 pablog...@gmail.com> wrote:

> Although we were originally not sympathetic with it, we may need to
> offer an opt-out mechanism for those users that care about the impact of
> the overhead of the new data in pyc files
> and in in-memory code objectsas was suggested by some folks (Thomas,
> Yury, and others). For this, we could propose that the functionality will
> be deactivated along with the extra
> information when Python is executed in optimized mode (``python -O``)
> and therefore pyo files will not have the overhead associated with the
> extra required data.
>

 Just to be clear, .pyo files have not existed for a while:
 https://www.python.org/dev/peps/pep-0488/.


> Notice that Python
> already strips docstrings in this mode so it would be "aligned" with
> the current mechanism of optimized mode.
>

 This only kicks in at the -OO level.


>
> Although this complicates the implementation, it certainly is still
> much easier than dealing with compression (and more useful for those that
> don't want the feature). Notice that we also
> expect pessimistic results from compression as offsets would be quite
> random (although predominantly in the range 10 - 120).
>

 I personally prefer the idea of dropping the data with -OO since if
 you're stripping out docstrings you're already hurting introspection
 capabilities in the name of memory. Or one could go as far as to introduce
 -Os to do -OO plus dropping this extra data.

 As for .pyc file size, I personally wouldn't worry about it. If someone
 is that space-constrained they either aren't using .pyc files or are only
 shipping a single set of .pyc files under -OO and skipping source code. And
 .pyc files are an implementation detail of CPython so there  shouldn't be
 too much of a concern for other interpreters.

 -Brett


>
> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> One last note for clarity: that's the increase of size in the stdlib,
>> the increase of size
>> fo

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.

That could work, but in my personal opinion, I would prefer not to do that
as it complicates things and I think is overkill.

On Sat, 8 May 2021 at 21:45, Gregory P. Smith  wrote:

>
> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
> wrote:
>
>> > We can't piggy back on -OO as the only way to disable this, it needs
>> to have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -OO is the only sensible way to disable the data. There are two things to
>> disable:
>>
>
> nit: I wouldn't choose the word "sensible" given that -OO is already
> fundamentally unusable without knowing if any code in your entire
> transitive dependencies might depend on the presence of docstrings...
>
>
>>
>> * The data in pyc files
>> * Printing the exception highlighting
>>
>> Printing the exception highlighting can be disabled via combo of
>> environment variable / -X option but collecting the data can only be
>> disabled by -OO. The reason is that this will end in pyc files
>> so when the data is not there, a different kind of pyc files need to be
>> produced and I really don't want to have another set of pyc file extension
>> just to deactivate this. Notice that also a configure
>> time variable won't work because it will cause crashes when reading pyc
>> files produced by the interpreter compiled without the flag.
>>
>
> I don't think the optional existence of column number information needs a
> different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
>
>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>>
>>>
>>>
>>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
 Hi Brett,

 Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.


 Whoops, my bad, I wanted to refer to the pyc files that are generated
 with -OO, which have the "opt-2" prefix.

 This only kicks in at the -OO level.


 I will correct the PEP so it reflex this more exactly.

 I personally prefer the idea of dropping the data with -OO since if
> you're stripping out docstrings you're already hurting introspection
> capabilities in the name of memory. Or one could go as far as to introduce
> -Os to do -OO plus dropping this extra data.


 This is indeed the plan, sorry for the confusion. The opt-out mechanism
 is using -OO, precisely as we are already dropping other data.

>>>
>>> We can't piggy back on -OO as the only way to disable this, it needs to
>>> have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -gps
>>>
>>>

 Thanks for the clarifications!



 On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:

>
>
> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Although we were originally not sympathetic with it, we may need to
>> offer an opt-out mechanism for those users that care about the impact of
>> the overhead of the new data in pyc files
>> and in in-memory code objectsas was suggested by some folks (Thomas,
>> Yury, and others). For this, we could propose that the functionality will
>> be deactivated along with the extra
>> information when Python is executed in optimized mode (``python -O``)
>> and therefore pyo files will not have the overhead associated with the
>> extra required data.
>>
>
> Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.
>
>
>> Notice that Python
>> already strips docstrings in this mode so it would be "aligned" with
>> the current mechanism of optimized mode.
>>
>
> This only kicks in at the -OO level.
>
>
>>
>> Although this complicates the implementation, it certainly is still
>> much easier than dealing with compression (and more useful for those that
>> don't want the feature). Notice that we also
>> expect pessimistic results from compression as offsets would be quite
>> random (although predominantly in the range 10 - 120).
>>
>
> I personally prefer the idea of dropping the data with -OO since if
> you're stripping out docstrings you're already hurting introspection
> capabilities in the name of memory. Or one could go as far as to introduce
> -Os to do -OO plus dropping this extra data.
>
> As for .pyc file size, I personally wouldn't worry about it. If
> someone is that space-constrained they eithe

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> That could work, but in my personal opinion, I would prefer not to do
that as it complicates things and I think is overkill.

Let me expand on this:

I recognize the problem that -OO can be quite unusable if some of your
dependencies depend on docstrings and that It would be good to separate
this from that option, but I am afraid of the following:

- New APIs in the marshal module and other places to pass down the extra
information to read/write or not the extra information.
- Complication of the pyc format with more entries in the header.
- Complication of the implementation.

Given that the reasons to deactivate this option exist, but I expect them
to be very rare, I would prefer to maximize simplicity and maintainability.

On Sat, 8 May 2021 at 21:50, Pablo Galindo Salgado 
wrote:

> > I don't think the optional existence of column number information needs
> a different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
> That could work, but in my personal opinion, I would prefer not to do that
> as it complicates things and I think is overkill.
>
> On Sat, 8 May 2021 at 21:45, Gregory P. Smith  wrote:
>
>>
>> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
>> wrote:
>>
>>> > We can't piggy back on -OO as the only way to disable this, it needs
>>> to have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -OO is the only sensible way to disable the data. There are two things
>>> to disable:
>>>
>>
>> nit: I wouldn't choose the word "sensible" given that -OO is already
>> fundamentally unusable without knowing if any code in your entire
>> transitive dependencies might depend on the presence of docstrings...
>>
>>
>>>
>>> * The data in pyc files
>>> * Printing the exception highlighting
>>>
>>> Printing the exception highlighting can be disabled via combo of
>>> environment variable / -X option but collecting the data can only be
>>> disabled by -OO. The reason is that this will end in pyc files
>>> so when the data is not there, a different kind of pyc files need to be
>>> produced and I really don't want to have another set of pyc file extension
>>> just to deactivate this. Notice that also a configure
>>> time variable won't work because it will cause crashes when reading pyc
>>> files produced by the interpreter compiled without the flag.
>>>
>>
>> I don't think the optional existence of column number information needs a
>> different kind of pyc file.  Just a flag in a pyc file's header at most.
>> It isn't a new type of file.
>>
>>
>>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>>>


 On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
 pablog...@gmail.com> wrote:

> Hi Brett,
>
> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>
>
> Whoops, my bad, I wanted to refer to the pyc files that are generated
> with -OO, which have the "opt-2" prefix.
>
> This only kicks in at the -OO level.
>
>
> I will correct the PEP so it reflex this more exactly.
>
> I personally prefer the idea of dropping the data with -OO since if
>> you're stripping out docstrings you're already hurting introspection
>> capabilities in the name of memory. Or one could go as far as to 
>> introduce
>> -Os to do -OO plus dropping this extra data.
>
>
> This is indeed the plan, sorry for the confusion. The opt-out
> mechanism is using -OO, precisely as we are already dropping other data.
>

 We can't piggy back on -OO as the only way to disable this, it needs to
 have an option of its own.  -OO is unusable as code that relies on
 "doc"strings as application data such as
 http://www.dabeaz.com/ply/ply.html exists.

 -gps


>
> Thanks for the clarifications!
>
>
>
> On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:
>
>>
>>
>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>> pablog...@gmail.com> wrote:
>>
>>> Although we were originally not sympathetic with it, we may need to
>>> offer an opt-out mechanism for those users that care about the impact of
>>> the overhead of the new data in pyc files
>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>> Yury, and others). For this, we could propose that the functionality 
>>> will
>>> be deactivated along with the extra
>>> information when Python is executed in optimized mode (``python
>>> -O``) and therefore pyo files will not have the overhead associated with
>>> the extra required data.
>>>
>>
>> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>>> Notice that Python
>>> already strips docstring

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Ethan Furman

On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>> We can't piggy back on -OO as the only way to disable this, it needs to
>> have an option of its own.  -OO is unusable as code that relies on "doc"
>> strings as application data such as http://www.dabeaz.com/ply/ply.html
>> exists.
>
> -OO is the only sensible way to disable the data. There are two things to 
disable:
>
> * The data in pyc files
> * Printing the exception highlighting

Why not put in it -O instead?  Then -O means lose asserts and lose fine-grained tracebacks, while -OO continues to also 
strip out doc strings.


--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BEE4BGUZHXBTVDPOW5R4DC3S463XC3EJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> Why not put in it -O instead?  Then -O means lose asserts and lose
fine-grained tracebacks, while -OO continues to also
strip out doc strings.

What if someone wants to keep asserts but do not want the extra data?

On Sat, 8 May 2021 at 22:05, Ethan Furman  wrote:

> On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>  >> We can't piggy back on -OO as the only way to disable this, it needs to
>  >> have an option of its own.  -OO is unusable as code that relies on
> "doc"
>  >> strings as application data such as http://www.dabeaz.com/ply/ply.html
>  >> exists.
>  >
>  > -OO is the only sensible way to disable the data. There are two things
> to disable:
>  >
>  > * The data in pyc files
>  > * Printing the exception highlighting
>
> Why not put in it -O instead?  Then -O means lose asserts and lose
> fine-grained tracebacks, while -OO continues to also
> strip out doc strings.
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/BEE4BGUZHXBTVDPOW5R4DC3S463XC3EJ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WK4KXZPOSWYMI3C5AILQCEYVZRCDFL7N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
> I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.

Greg, what do you think if instead of not writing it to the pyc file with
-OO or adding a header entry to decide to read/write, we place None in the
field? That way we can
leverage the option that we intend to add to deactivate displaying the
traceback new information to reduce the data in the pyc files. The only
problem
is that there will be still a tiny bit of overhead: an extra object per
code object (None), but that's much much better than something that scales
with the
number of instructions :)

What's your opinion on this?


On Sat, 8 May 2021 at 21:45, Gregory P. Smith  wrote:

>
> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado 
> wrote:
>
>> > We can't piggy back on -OO as the only way to disable this, it needs
>> to have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -OO is the only sensible way to disable the data. There are two things to
>> disable:
>>
>
> nit: I wouldn't choose the word "sensible" given that -OO is already
> fundamentally unusable without knowing if any code in your entire
> transitive dependencies might depend on the presence of docstrings...
>
>
>>
>> * The data in pyc files
>> * Printing the exception highlighting
>>
>> Printing the exception highlighting can be disabled via combo of
>> environment variable / -X option but collecting the data can only be
>> disabled by -OO. The reason is that this will end in pyc files
>> so when the data is not there, a different kind of pyc files need to be
>> produced and I really don't want to have another set of pyc file extension
>> just to deactivate this. Notice that also a configure
>> time variable won't work because it will cause crashes when reading pyc
>> files produced by the interpreter compiled without the flag.
>>
>
> I don't think the optional existence of column number information needs a
> different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
>
>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith  wrote:
>>
>>>
>>>
>>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
 Hi Brett,

 Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.


 Whoops, my bad, I wanted to refer to the pyc files that are generated
 with -OO, which have the "opt-2" prefix.

 This only kicks in at the -OO level.


 I will correct the PEP so it reflex this more exactly.

 I personally prefer the idea of dropping the data with -OO since if
> you're stripping out docstrings you're already hurting introspection
> capabilities in the name of memory. Or one could go as far as to introduce
> -Os to do -OO plus dropping this extra data.


 This is indeed the plan, sorry for the confusion. The opt-out mechanism
 is using -OO, precisely as we are already dropping other data.

>>>
>>> We can't piggy back on -OO as the only way to disable this, it needs to
>>> have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -gps
>>>
>>>

 Thanks for the clarifications!



 On Sat, 8 May 2021 at 19:41, Brett Cannon  wrote:

>
>
> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Although we were originally not sympathetic with it, we may need to
>> offer an opt-out mechanism for those users that care about the impact of
>> the overhead of the new data in pyc files
>> and in in-memory code objectsas was suggested by some folks (Thomas,
>> Yury, and others). For this, we could propose that the functionality will
>> be deactivated along with the extra
>> information when Python is executed in optimized mode (``python -O``)
>> and therefore pyo files will not have the overhead associated with the
>> extra required data.
>>
>
> Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.
>
>
>> Notice that Python
>> already strips docstrings in this mode so it would be "aligned" with
>> the current mechanism of optimized mode.
>>
>
> This only kicks in at the -OO level.
>
>
>>
>> Although this complicates the implementation, it certainly is still
>> much easier than dealing with compression (and more useful for those that
>> don't want the feature). Notice that we also
>> expect pessimistic results from compression as offsets would be quite
>> random (although predominantly in the range 10 - 120).
>>
>>

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Jonathan Goble
On Sat, May 8, 2021 at 5:08 PM Pablo Galindo Salgado 
wrote:

> > Why not put in it -O instead?  Then -O means lose asserts and lose
> fine-grained tracebacks, while -OO continues to also
> strip out doc strings.
>
> What if someone wants to keep asserts but do not want the extra data?
>

What if I want to keep asserts and docstrings but don't want the extra data?

Or actually, consider this. I *need* to keep asserts (because rightly or
wrongly, I have a dependency, or my own code, that relies on them), but I
*don't* want docstrings (because they're huge and I don't want the overhead
in production), and I *don't* want the extra data in production either.

Now what?

I think what this illustrates is that the entire concept of optimizations
in Python needs a complete rethink. It's already fundamentally broken for
someone who wants to keep asserts but remove docstrings. Adding a third
layer to this is a perfect opportunity to reconsider the whole paradigm.

I'm getting off-topic here, and this should probably be a thread of its
own, but perhaps what we should introduce is a compiler directive, similar
to future statements but not that, that one can place at the top of a
source file to tell the compiler "this file depends on asserts, don't
optimize them out". Same for each thing that can be optimized that has a
runtime behavior effect, including docstrings. This would be minimally
disruptive since we can then stay at only two optimization levels and put
column info at whichever level we feel makes sense, but (provided the
compiler directives are used properly) information a particular file
requires to function correctly will never be removed from that file even if
the process-wide optimization level calls for it. I see no reason code with
asserts in one file and optimized code without asserts in another file
can't interact, and no reason code with docstrings and optimized code
without docstrings can't interact. Soft keywords would make this compiler
directive much easier, as it doesn't have to be shoehorned into the import
syntax (to suggest a bikeshed color, perhaps "retain asserts, docstrings"?)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QV7LVUKWC72XA23NBZMFA573V7HPU72Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 2:09 PM Pablo Galindo Salgado 
wrote:

> > Why not put in it -O instead?  Then -O means lose asserts and lose
> fine-grained tracebacks, while -OO continues to also
> strip out doc strings.
>
> What if someone wants to keep asserts but do not want the extra data?
>

exactly my theme.  our existing -O and -OO already don't serve all user
needs.  (I've witnessed people who need asserts but don't want docstrings
wasting ram jump through hacky hoops to do that).  Complicating these
options more by combining additional actions on them them doesn't help.

The reason we have -O and -OO generate their own special opt-1 and opt-2
pyc files is because both of those change the generated bytecode and
overall flow of the program by omitting instructions and data.  code using
those will run slightly faster as there are fewer instructions.

The change we're talking about here doesn't do that.  It just adds
additional metadata to whatever instructions are generated.  So it doesn't
feel -O related.

While some people aren't going to like the overhead, I'm happy not offering
the choice.

> Greg, what do you think if instead of not writing it to the pyc file with
-OO or adding a header entry to decide to read/write, we place None in the
field? That way we can
> leverage the option that we intend to add to deactivate displaying the
traceback new information to reduce the data in the pyc files. The only
problem
> is that there will be still a tiny bit of overhead: an extra object per
code object (None), but that's much much better than something that scales
with the
> number of instructions :)
>
> What's your opinion on this?

I don't understand the pyc structure enough to comment on how that works,
but that sounds fine from a way to store less data if these are stored as a
side table rather than intermingled with each instruction itself.  *If
anyone even cares about storing less data.*  I would not activate
generation of that in py_compile and compileall based on the -X flag to
disable display of tracebacks though.  A flag changing a setting of the
current runtime regarding traceback printing detail level should not change
the metadata in pyc files it emits.  I realize -O and -OO behave this way,
but I don't view those as a great example. We're not writing new uniquely
named pyc files, I suggest making this an explicit option for py_compile
and compileall if we're going to support generation of pyc files without
column data at all.

I'm unclear on what the specific goals are with all of these option
possibilities.

Who non-hypothetically cares about a 22% pyc file size increase?  I don't
think we should be concerned.  I'm in favor of always writing them and the
20% size increase that results in.  If pyc size is an issue that should be
its own separate enhancement PEP.  When it comes to pyc files there is more
data we may want to store in the future for performance reasons - I don't
see them shrinking without an independent effort.

Caring about additional data retained in memory at runtime makes more sense
to me as ram cost is much greater than storage cost and is paid repeatedly
per process.  Storing an additional reference to None on code objects where
a column information table is perfectly fine.  That can be a -X style
interpreter startup option.  It isn't something that needs to impacted by
the pyc files.  Pass that option to the interpreter, and it just discards
column info tables on code objects after loading them or compiling them.
If people want to optimize for a shared pyc situation with memory mapping
techniques, that is also something that should be a separate enhancement
PEP and not involved here.  People writing code to use the column
information should always check it for None first, that'd be something we
document with the new feature.

-gps


>
> On Sat, 8 May 2021 at 22:05, Ethan Furman  wrote:
>
>> On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>>  >> We can't piggy back on -OO as the only way to disable this, it needs
>> to
>>  >> have an option of its own.  -OO is unusable as code that relies on
>> "doc"
>>  >> strings as application data such as
>> http://www.dabeaz.com/ply/ply.html
>>  >> exists.
>>  >
>>  > -OO is the only sensible way to disable the data. There are two things
>> to disable:
>>  >
>>  > * The data in pyc files
>>  > * Printing the exception highlighting
>>
>> Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> --
>> ~Ethan~
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/BEE4BGUZHXBTVDPOW5R4DC3S463XC3EJ/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> 

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Gregory P. Smith
On Sat, May 8, 2021 at 2:40 PM Jonathan Goble  wrote:

> On Sat, May 8, 2021 at 5:08 PM Pablo Galindo Salgado 
> wrote:
>
>> > Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> What if someone wants to keep asserts but do not want the extra data?
>>
>
> What if I want to keep asserts and docstrings but don't want the extra
> data?
>
> Or actually, consider this. I *need* to keep asserts (because rightly or
> wrongly, I have a dependency, or my own code, that relies on them), but I
> *don't* want docstrings (because they're huge and I don't want the overhead
> in production), and I *don't* want the extra data in production either.
>
> Now what?
>
> I think what this illustrates is that the entire concept of optimizations
> in Python needs a complete rethink. It's already fundamentally broken for
> someone who wants to keep asserts but remove docstrings. Adding a third
> layer to this is a perfect opportunity to reconsider the whole paradigm.
>

Reconsidering "the whole paradigm" is always possible, but is a much larger
effort. It should not be something that blocks this enhancement from
happening.

We have discussed the -O mess before, on list and at summits and sprints.
-OO and the __pycache__ and longer .pyc names and versioned names were
among the results of that.  But we opted not to try and make life even more
complicated by expanding the test matrix of possible generated bytecode
even larger.

I'm getting off-topic here, and this should probably be a thread of its
> own, but perhaps what we should introduce is a compiler directive, similar
> to future statements but not that, that one can place at the top of a
> source file to tell the compiler "this file depends on asserts, don't
> optimize them out". Same for each thing that can be optimized that has a
> runtime behavior effect, including docstrings.
>

This idea has merit.  Worth keeping in mind for the future.  But agreed,
this goes beyond this threads topic so I'll leave it at that.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PCZGEWFIPS2YPMJWTILVANJYT6VWS27B/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Pablo Galindo Salgado
Thanks Greg for the great, detailed response

I think I understand now better your proposal and I think is a good idea
and I would like to explore that. I have some questions:

* One problem I see is that that will make the constructor of the code
object depend on global options in the interpreter. Someone using the C-API
and passing down that attribute will be surprised to find that it was
modified by a global. I am not saying is bad but I can see some problems
with that.

* The alternative is to modify all calls to the code object constructor.
This is easy to do in the compiler because code objects are constructed
very close where the meta data is crated but is going to be a pain in other
places, because the code objects are constructed in places where we would
either need new APIs or to hide global state as the previous point.

* Another alternative is to walk the graph and strip the fields but that's
going to have a performance impact.

I think that if we decide to offer an opt out, this is actually one of the
best options but I am still slightly concerned about the extra complexity,
potential new APIs and maintainability.



On Sat, 8 May 2021, 22:55 Gregory P. Smith,  wrote:

>
>
> On Sat, May 8, 2021 at 2:09 PM Pablo Galindo Salgado 
> wrote:
>
>> > Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> What if someone wants to keep asserts but do not want the extra data?
>>
>
> exactly my theme.  our existing -O and -OO already don't serve all user
> needs.  (I've witnessed people who need asserts but don't want docstrings
> wasting ram jump through hacky hoops to do that).  Complicating these
> options more by combining additional actions on them them doesn't help.
>
> The reason we have -O and -OO generate their own special opt-1 and opt-2
> pyc files is because both of those change the generated bytecode and
> overall flow of the program by omitting instructions and data.  code using
> those will run slightly faster as there are fewer instructions.
>
> The change we're talking about here doesn't do that.  It just adds
> additional metadata to whatever instructions are generated.  So it doesn't
> feel -O related.
>
> While some people aren't going to like the overhead, I'm happy not
> offering the choice.
>
> > Greg, what do you think if instead of not writing it to the pyc file
> with -OO or adding a header entry to decide to read/write, we place None in
> the field? That way we can
> > leverage the option that we intend to add to deactivate displaying the
> traceback new information to reduce the data in the pyc files. The only
> problem
> > is that there will be still a tiny bit of overhead: an extra object per
> code object (None), but that's much much better than something that scales
> with the
> > number of instructions :)
> >
> > What's your opinion on this?
>
> I don't understand the pyc structure enough to comment on how that works,
> but that sounds fine from a way to store less data if these are stored as a
> side table rather than intermingled with each instruction itself.  *If
> anyone even cares about storing less data.*  I would not activate
> generation of that in py_compile and compileall based on the -X flag to
> disable display of tracebacks though.  A flag changing a setting of the
> current runtime regarding traceback printing detail level should not change
> the metadata in pyc files it emits.  I realize -O and -OO behave this way,
> but I don't view those as a great example. We're not writing new uniquely
> named pyc files, I suggest making this an explicit option for py_compile
> and compileall if we're going to support generation of pyc files without
> column data at all.
>
> I'm unclear on what the specific goals are with all of these option
> possibilities.
>
> Who non-hypothetically cares about a 22% pyc file size increase?  I don't
> think we should be concerned.  I'm in favor of always writing them and the
> 20% size increase that results in.  If pyc size is an issue that should be
> its own separate enhancement PEP.  When it comes to pyc files there is more
> data we may want to store in the future for performance reasons - I don't
> see them shrinking without an independent effort.
>
> Caring about additional data retained in memory at runtime makes more
> sense to me as ram cost is much greater than storage cost and is paid
> repeatedly per process.  Storing an additional reference to None on code
> objects where a column information table is perfectly fine.  That can be a
> -X style interpreter startup option.  It isn't something that needs to
> impacted by the pyc files.  Pass that option to the interpreter, and it
> just discards column info tables on code objects after loading them or
> compiling them.  If people want to optimize for a shared pyc situation with
> memory mapping techniques, that is also something that should be a separate
> enhancement PEP and n

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Brett Cannon
On Sat, May 8, 2021 at 2:59 PM Gregory P. Smith  wrote:

>
>
> On Sat, May 8, 2021 at 2:09 PM Pablo Galindo Salgado 
> wrote:
>
>> > Why not put in it -O instead?  Then -O means lose asserts and lose
>> fine-grained tracebacks, while -OO continues to also
>> strip out doc strings.
>>
>> What if someone wants to keep asserts but do not want the extra data?
>>
>
> exactly my theme.  our existing -O and -OO already don't serve all user
> needs.  (I've witnessed people who need asserts but don't want docstrings
> wasting ram jump through hacky hoops to do that).  Complicating these
> options more by combining additional actions on them them doesn't help.
>
> The reason we have -O and -OO generate their own special opt-1 and opt-2
> pyc files is because both of those change the generated bytecode and
> overall flow of the program by omitting instructions and data.  code using
> those will run slightly faster as there are fewer instructions.
>
> The change we're talking about here doesn't do that.  It just adds
> additional metadata to whatever instructions are generated.  So it doesn't
> feel -O related.
>

While I'm the opposite. 😄 Metadata that is not necessary for CPython to
function and whose primary driver is better exception tracebacks totally
falls into the same camp as "I don't need docstrings" to me.


>
> While some people aren't going to like the overhead, I'm happy not
> offering the choice.
>
> > Greg, what do you think if instead of not writing it to the pyc file
> with -OO or adding a header entry to decide to read/write, we place None in
> the field? That way we can
> > leverage the option that we intend to add to deactivate displaying the
> traceback new information to reduce the data in the pyc files. The only
> problem
> > is that there will be still a tiny bit of overhead: an extra object per
> code object (None), but that's much much better than something that scales
> with the
> > number of instructions :)
> >
> > What's your opinion on this?
>
> I don't understand the pyc structure enough to comment on how that works,
>

Code to read a .pyc file and use it:
https://github.com/python/cpython/blob/a0bd9e9c11f5f52c7ddd19144c8230da016b53c6/Lib/importlib/_bootstrap_external.py#L951-L1015
(I'd explain more but it is the weekend and I technically shouldn't be
reading this thread 😉).

-Brett


> but that sounds fine from a way to store less data if these are stored as
> a side table rather than intermingled with each instruction itself.  *If
> anyone even cares about storing less data.*  I would not activate
> generation of that in py_compile and compileall based on the -X flag to
> disable display of tracebacks though.  A flag changing a setting of the
> current runtime regarding traceback printing detail level should not change
> the metadata in pyc files it emits.  I realize -O and -OO behave this way,
> but I don't view those as a great example. We're not writing new uniquely
> named pyc files, I suggest making this an explicit option for py_compile
> and compileall if we're going to support generation of pyc files without
> column data at all.
>
> I'm unclear on what the specific goals are with all of these option
> possibilities.
>
> Who non-hypothetically cares about a 22% pyc file size increase?  I don't
> think we should be concerned.  I'm in favor of always writing them and the
> 20% size increase that results in.  If pyc size is an issue that should be
> its own separate enhancement PEP.  When it comes to pyc files there is more
> data we may want to store in the future for performance reasons - I don't
> see them shrinking without an independent effort.
>
> Caring about additional data retained in memory at runtime makes more
> sense to me as ram cost is much greater than storage cost and is paid
> repeatedly per process.  Storing an additional reference to None on code
> objects where a column information table is perfectly fine.  That can be a
> -X style interpreter startup option.  It isn't something that needs to
> impacted by the pyc files.  Pass that option to the interpreter, and it
> just discards column info tables on code objects after loading them or
> compiling them.  If people want to optimize for a shared pyc situation with
> memory mapping techniques, that is also something that should be a separate
> enhancement PEP and not involved here.  People writing code to use the
> column information should always check it for None first, that'd be
> something we document with the new feature.
>
> -gps
>
>
>>
>> On Sat, 8 May 2021 at 22:05, Ethan Furman  wrote:
>>
>>> On 5/8/21 1:31 PM, Pablo Galindo Salgado wrote:
>>>  >> We can't piggy back on -OO as the only way to disable this, it needs
>>> to
>>>  >> have an option of its own.  -OO is unusable as code that relies on
>>> "doc"
>>>  >> strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html
>>>  >> exists.
>>>  >
>>>  > -OO is the only sensible way to disable the data. There are two
>>> things to disable:
>

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Steven D'Aprano
Hi Chris,

On Fri, May 07, 2021 at 07:13:16PM -0700, Chris Jerdonek wrote:

> I'm not sure why you're sounding so negative. Pablo asked for ideas in his
> first message to the list:

I know that Pablo asked for ideas, but that doesn't mean that we are 
obliged to agree with every idea. This is a discussion list which 
means we discuss ideas, both to agree and disagree.

I don't think I'm being negative. I'm very positive about this proposal, 
and I don't want to see it get bogged down with bike-shedding about the 
precise compression/encoding algorithm used.

If Pablo, or any other volunteer such as yourself, wants to go down that 
track to investigate the data distribution, I'm not going to tell them 
that they must not. Go for it! But I'd rather not make this a mandatory 
prerequisite for the PEP.


[...]
> my reply wasn't about the pyc files on disk but about their representation
> in memory, which Pablo later said may be the main concern. So it's not
> compression algorithms like LZ4 so much as a method of encoding.

Okay, thanks for the clarification.


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4UQH3R44ZOBBKOAAY2KV2PKDCMMSGRQN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Jim J. Jewett
Antoine Pitrou wrote:
> On Sat, 8 May 2021 02:58:40 +
> Neil Schemenauer nas-pyt...@arctrix.com wrote:

> > It would be cool if we could mmap the pyc files and have the VM run
> > code without an unmarshal step.
> > What happens if another process mutates or truncates the file while the
> CPython VM is executing code from the mapped file?  Crash?

Why would this be any different than whatever happens now?  Just because it is 
easier for another process to get (exclusive) access to the file if there is a 
longer delay between loading the first part of the file and going back for the 
docstrings and lnotab?

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ONMS26WLPIT35H5VX4Z6STPYWSXXBQVJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-08 Thread Richard Damon
On 5/8/21 10:16 PM, Jim J. Jewett wrote:
> Antoine Pitrou wrote:
>> On Sat, 8 May 2021 02:58:40 +
>> Neil Schemenauer nas-pyt...@arctrix.com wrote:
>>> It would be cool if we could mmap the pyc files and have the VM run
>>> code without an unmarshal step.
>>> What happens if another process mutates or truncates the file while the
>> CPython VM is executing code from the mapped file?  Crash?
> Why would this be any different than whatever happens now?  Just because it 
> is easier for another process to get (exclusive) access to the file if there 
> is a longer delay between loading the first part of the file and going back 
> for the docstrings and lnotab?
>
> -jJ

I think the issue being pointed out is that currently, when Python opens
the .pyc file for reading and keeps the file handle open, that will
block any other process from opening the file for writing, and thus
can't change the contents under it. Once it is all done, it can release
the lock as it won't need to read it again.

if it mapped the file into its address space, it would need a similar
sort of lock, but need to keep if for the FULL execution of the program,
so that no other process could change the contents behind its back. I
think normal mmapping doesn't do this, but if that sort of lock is
available, then it probably should be used.

-- 
Richard Damon

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QQHLCQC4UUWLM6HWHGOJ5SYCGBOO2LNS/
Code of Conduct: http://python.org/psf/codeofconduct/