Awesome, that's what I was hoping. Accepted! Congrats and thank you very much for writing the PEP and guiding the discussion.
On Fri, Mar 20, 2015 at 4:00 PM, Brett Cannon <bcan...@gmail.com> wrote: > > > On Fri, Mar 20, 2015 at 4:41 PM Guido van Rossum <gu...@python.org> wrote: > >> I am willing to be the BDFL for this PEP. I have tried to skim the recent >> discussion (only python-dev) and I don't see much remaining controversy. >> HOWEVER... The PEP is not clear (or at least too subtle) about the actual >> name for optimization level 0. If I have foo.py, and I compile it three >> times with three different optimization levels (no optimization; -O; -OO), >> and then I look in __pycache__, would I see this: >> >> # (1) >> foo.cpython-35.pyc >> foo.cpython-35.opt-1.pyc >> foo.cpython-35.opt-2.pyc >> >> Or would I see this? >> >> # (2) >> foo.cpython-35.opt-0.pyc >> foo.cpython-35.opt-1.pyc >> foo.cpython-35.opt-2.pyc >> > > #1 > > >> >> Your lead-in ("I have decided to have the default case of no optimization >> levels mean that the .pyc file name will have *no* optimization level >> specified in the name and thus be just as it is today.") makes me think I >> should expect (1), but I can't actually pinpoint where the language of the >> PEP says this. >> > > It was meant to be explained by "When no optimization level is specified, > the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name > semantics)", but obviously it's a bit too subtle. I just updated the PEP > with an explicit list of bytecode file name examples based on no -O, -O, > and -OO. > > -Brett > > >> >> >> On Fri, Mar 20, 2015 at 11:34 AM, Brett Cannon <bcan...@gmail.com> wrote: >> >>> I have decided to have the default case of no optimization levels mean >>> that the .pyc file name will have *no* optimization level specified in >>> the name and thus be just as it is today. I made this decision due to >>> potential backwards-compatibility issues -- although I expect them to be >>> minutes -- and to not force other implementations like PyPy to have some >>> bogus value set since they don't have .pyo files to begin with (PyPy >>> actually uses bytecode for -O and don't bother with -OO since PyPy already >>> uses a bunch of memory when running). >>> >>> Since this closes out the last open issue, I need either a BDFL decision >>> or a BDFAP to be assigned to make a decision. Guido? >>> >>> ====================================== >>> >>> PEP: 488 >>> Title: Elimination of PYO files >>> Version: $Revision$ >>> Last-Modified: $Date$ >>> Author: Brett Cannon <br...@python.org> >>> Status: Draft >>> Type: Standards Track >>> Content-Type: text/x-rst >>> Created: 20-Feb-2015 >>> Post-History: >>> 2015-03-06 >>> 2015-03-13 >>> 2015-03-20 >>> >>> Abstract >>> ======== >>> >>> This PEP proposes eliminating the concept of PYO files from Python. >>> To continue the support of the separation of bytecode files based on >>> their optimization level, this PEP proposes extending the PYC file >>> name to include the optimization level in the bytecode repository >>> directory when it's called for (i.e., the ``__pycache__`` directory). >>> >>> >>> Rationale >>> ========= >>> >>> As of today, bytecode files come in two flavours: PYC and PYO. A PYC >>> file is the bytecode file generated and read from when no >>> optimization level is specified at interpreter startup (i.e., ``-O`` >>> is not specified). A PYO file represents the bytecode file that is >>> read/written when **any** optimization level is specified (i.e., when >>> ``-O`` **or** ``-OO`` is specified). This means that while PYC >>> files clearly delineate the optimization level used when they were >>> generated -- namely no optimizations beyond the peepholer -- the same >>> is not true for PYO files. To put this in terms of optimization >>> levels and the file extension: >>> >>> - 0: ``.pyc`` >>> - 1 (``-O``): ``.pyo`` >>> - 2 (``-OO``): ``.pyo`` >>> >>> The reuse of the ``.pyo`` file extension for both level 1 and 2 >>> optimizations means that there is no clear way to tell what >>> optimization level was used to generate the bytecode file. In terms >>> of reading PYO files, this can lead to an interpreter using a mixture >>> of optimization levels with its code if the user was not careful to >>> make sure all PYO files were generated using the same optimization >>> level (typically done by blindly deleting all PYO files and then >>> using the `compileall` module to compile all-new PYO files [1]_). >>> This issue is only compounded when people optimize Python code beyond >>> what the interpreter natively supports, e.g., using the astoptimizer >>> project [2]_. >>> >>> In terms of writing PYO files, the need to delete all PYO files >>> every time one either changes the optimization level they want to use >>> or are unsure of what optimization was used the last time PYO files >>> were generated leads to unnecessary file churn. The change proposed >>> by this PEP also allows for **all** optimization levels to be >>> pre-compiled for bytecode files ahead of time, something that is >>> currently impossible thanks to the reuse of the ``.pyo`` file >>> extension for multiple optimization levels. >>> >>> As for distributing bytecode-only modules, having to distribute both >>> ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case >>> of code obfuscation and smaller file deployments. This means that >>> bytecode-only modules will only load from their non-optimized >>> ``.pyc`` file name. >>> >>> >>> Proposal >>> ======== >>> >>> To eliminate the ambiguity that PYO files present, this PEP proposes >>> eliminating the concept of PYO files and their accompanying ``.pyo`` >>> file extension. To allow for the optimization level to be unambiguous >>> as well as to avoid having to regenerate optimized bytecode files >>> needlessly in the `__pycache__` directory, the optimization level >>> used to generate the bytecode file will be incorporated into the >>> bytecode file name. When no optimization level is specified, the >>> pre-PEP ``.pyc`` file name will be used (i.e., no change in file name >>> semantics). This increases backwards-compatibility while also being >>> more understanding of Python implementations which have no use for >>> optimization levels (e.g., PyPy[10]_). >>> >>> Currently bytecode file names are created by >>> ``importlib.util.cache_from_source()``, approximately using the >>> following expression defined by PEP 3147 [3]_, [4]_, [5]_:: >>> >>> '{name}.{cache_tag}.pyc'.format(name=module_name, >>> >>> cache_tag=sys.implementation.cache_tag) >>> >>> This PEP proposes to change the expression when an optimization >>> level is specified to:: >>> >>> '{name}.{cache_tag}.opt-{optimization}.pyc'.format( >>> name=module_name, >>> cache_tag=sys.implementation.cache_tag, >>> optimization=str(sys.flags.optimize)) >>> >>> The "opt-" prefix was chosen so as to provide a visual separator >>> from the cache tag. The placement of the optimization level after >>> the cache tag was chosen to preserve lexicographic sort order of >>> bytecode file names based on module name and cache tag which will >>> not vary for a single interpreter. The "opt-" prefix was chosen over >>> "o" so as to be somewhat self-documenting. The "opt-" prefix was >>> chosen over "O" so as to not have any confusion in case "0" was the >>> leading prefix of the optimization level. >>> >>> A period was chosen over a hyphen as a separator so as to distinguish >>> clearly that the optimization level is not part of the interpreter >>> version as specified by the cache tag. It also lends to the use of >>> the period in the file name to delineate semantically different >>> concepts. >>> >>> For example, if ``-OO`` had been passed to the interpreter then instead >>> of ``importlib.cpython-35.pyo`` the file name would be >>> ``importlib.cpython-35.opt-2.pyc``. >>> >>> It should be noted that this change in no way affects the performance >>> of import. Since the import system looks for a single bytecode file >>> based on the optimization level of the interpreter already and >>> generates a new bytecode file if it doesn't exist, the introduction >>> of potentially more bytecode files in the ``__pycache__`` directory >>> has no effect in terms of stat calls. The interpreter will continue >>> to look for only a single bytecode file based on the optimization >>> level and thus no increase in stat calls will occur. >>> >>> The only potentially negative result of this PEP is the probable >>> increase in the number of ``.pyc`` files and thus increase in storage >>> use. But for platforms where this is an issue, >>> ``sys.dont_write_bytecode`` exists to turn off bytecode generation so >>> that it can be controlled offline. >>> >>> >>> Implementation >>> ============== >>> >>> importlib >>> --------- >>> >>> As ``importlib.util.cache_from_source()`` is the API that exposes >>> bytecode file paths as well as being directly used by importlib, it >>> requires the most critical change. As of Python 3.4, the function's >>> signature is:: >>> >>> importlib.util.cache_from_source(path, debug_override=None) >>> >>> This PEP proposes changing the signature in Python 3.5 to:: >>> >>> importlib.util.cache_from_source(path, debug_override=None, *, >>> optimization=None) >>> >>> The introduced ``optimization`` keyword-only parameter will control >>> what optimization level is specified in the file name. If the >>> argument is ``None`` then the current optimization level of the >>> interpreter will be assumed (including no optimization). Any argument >>> given for ``optimization`` will be passed to ``str()`` and must have >>> ``str.isalnum()`` be true, else ``ValueError`` will be raised (this >>> prevents invalid characters being used in the file name). If the >>> empty string is passed in for ``optimization`` then the addition of >>> the optimization will be suppressed, reverting to the file name >>> format which predates this PEP. >>> >>> It is expected that beyond Python's own two optimization levels, >>> third-party code will use a hash of optimization names to specify the >>> optimization level, e.g. >>> ``hashlib.sha256(','.join(['no dead code', 'const >>> folding'])).hexdigest()``. >>> While this might lead to long file names, it is assumed that most >>> users never look at the contents of the __pycache__ directory and so >>> this won't be an issue. >>> >>> The ``debug_override`` parameter will be deprecated. As the parameter >>> expects a boolean, the integer value of the boolean will be used as >>> if it had been provided as the argument to ``optimization`` (a >>> ``None`` argument will mean the same as for ``optimization``). A >>> deprecation warning will be raised when ``debug_override`` is given a >>> value other than ``None``, but there are no plans for the complete >>> removal of the parameter at this time (but removal will be no later >>> than Python 4). >>> >>> The various module attributes for importlib.machinery which relate to >>> bytecode file suffixes will be updated [7]_. The >>> ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will >>> both be documented as deprecated and set to the same value as >>> ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and >>> ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be >>> not later than Python 4). >>> >>> All various finders and loaders will also be updated as necessary, >>> but updating the previous mentioned parts of importlib should be all >>> that is required. >>> >>> >>> Rest of the standard library >>> ---------------------------- >>> >>> The various functions exposed by the ``py_compile`` and >>> ``compileall`` functions will be updated as necessary to make sure >>> they follow the new bytecode file name semantics [6]_, [1]_. The CLI >>> for the ``compileall`` module will not be directly affected (the >>> ``-b`` flag will be implicit as it will no longer generate ``.pyo`` >>> files when ``-O`` is specified). >>> >>> >>> Compatibility Considerations >>> ============================ >>> >>> Any code directly manipulating bytecode files from Python 3.2 on >>> will need to consider the impact of this change on their code (prior >>> to Python 3.2 -- including all of Python 2 -- there was no >>> __pycache__ which already necessitates bifurcating bytecode file >>> handling support). If code was setting the ``debug_override`` >>> argument to ``importlib.util.cache_from_source()`` then care will be >>> needed if they want the path to a bytecode file with an optimization >>> level of 2. Otherwise only code **not** using >>> ``importlib.util.cache_from_source()`` will need updating. >>> >>> As for people who distribute bytecode-only modules (i.e., use a >>> bytecode file instead of a source file), they will have to choose >>> which optimization level they want their bytecode files to be since >>> distributing a ``.pyo`` file with a ``.pyc`` file will no longer be >>> of any use. Since people typically only distribute bytecode files for >>> code obfuscation purposes or smaller distribution size then only >>> having to distribute a single ``.pyc`` should actually be beneficial >>> to these use-cases. And since the magic number for bytecode files >>> changed in Python 3.5 to support PEP 465 there is no need to support >>> pre-existing ``.pyo`` files [8]_. >>> >>> >>> Rejected Ideas >>> ============== >>> >>> Completely dropping optimization levels from CPython >>> ---------------------------------------------------- >>> >>> Some have suggested that instead of accommodating the various >>> optimization levels in CPython, we should instead drop them >>> entirely. The argument is that significant performance gains would >>> occur from runtime optimizations through something like a JIT and not >>> through pre-execution bytecode optimizations. >>> >>> This idea is rejected for this PEP as that ignores the fact that >>> there are people who do find the pre-existing optimization levels for >>> CPython useful. It also assumes that no other Python interpreter >>> would find what this PEP proposes useful. >>> >>> >>> Alternative formatting of the optimization level in the file name >>> ----------------------------------------------------------------- >>> >>> Using the "opt-" prefix and placing the optimization level between >>> the cache tag and file extension is not critical. All options which >>> have been considered are: >>> >>> * ``importlib.cpython-35.opt-1.pyc`` >>> * ``importlib.cpython-35.opt1.pyc`` >>> * ``importlib.cpython-35.o1.pyc`` >>> * ``importlib.cpython-35.O1.pyc`` >>> * ``importlib.cpython-35.1.pyc`` >>> * ``importlib.cpython-35-O1.pyc`` >>> * ``importlib.O1.cpython-35.pyc`` >>> * ``importlib.o1.cpython-35.pyc`` >>> * ``importlib.1.cpython-35.pyc`` >>> >>> These were initially rejected either because they would change the >>> sort order of bytecode files, possible ambiguity with the cache tag, >>> or were not self-documenting enough. An informal poll was taken and >>> people clearly preferred the formatting proposed by the PEP [9]_. >>> Since this topic is non-technical and of personal choice, the issue >>> is considered solved. >>> >>> >>> Embedding the optimization level in the bytecode metadata >>> --------------------------------------------------------- >>> >>> Some have suggested that rather than embedding the optimization level >>> of bytecode in the file name that it be included in the file's >>> metadata instead. This would mean every interpreter had a single copy >>> of bytecode at any time. Changing the optimization level would thus >>> require rewriting the bytecode, but there would also only be a single >>> file to care about. >>> >>> This has been rejected due to the fact that Python is often installed >>> as a root-level application and thus modifying the bytecode file for >>> modules in the standard library are always possible. In this >>> situation integrators would need to guess at what a reasonable >>> optimization level was for users for any/all situations. By >>> allowing multiple optimization levels to co-exist simultaneously it >>> frees integrators from having to guess what users want and allows >>> users to utilize the optimization level they want. >>> >>> >>> References >>> ========== >>> >>> .. [1] The compileall module >>> (https://docs.python.org/3/library/compileall.html#module-compileall) >>> >>> .. [2] The astoptimizer project >>> (https://pypi.python.org/pypi/astoptimizer) >>> >>> .. [3] ``importlib.util.cache_from_source()`` >>> ( >>> https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source >>> ) >>> >>> .. [4] Implementation of ``importlib.util.cache_from_source()`` from >>> CPython 3.4.3rc1 >>> ( >>> https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437 >>> ) >>> >>> .. [5] PEP 3147, PYC Repository Directories, Warsaw >>> (http://www.python.org/dev/peps/pep-3147) >>> >>> .. [6] The py_compile module >>> (https://docs.python.org/3/library/compileall.html#module-compileall) >>> >>> .. [7] The importlib.machinery module >>> ( >>> https://docs.python.org/3/library/importlib.html#module-importlib.machinery >>> ) >>> >>> .. [8] ``importlib.util.MAGIC_NUMBER`` >>> ( >>> https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER >>> ) >>> >>> .. [9] Informal poll of file name format options on Google+ >>> (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm) >>> >>> .. [10] The PyPy Project >>> (http://pypy.org/) >>> >>> >>> Copyright >>> ========= >>> >>> This document has been placed in the public domain. >>> >>> >>> .. >>> Local Variables: >>> mode: indented-text >>> indent-tabs-mode: nil >>> sentence-end-double-space: t >>> fill-column: 70 >>> coding: utf-8 >>> End: >>> >>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev@python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com