I am willing to be the BDFL for this PEP. I have tried to skim the recent discussion (only python-dev) and I don't see much remaining controversy. HOWEVER... The PEP is not clear (or at least too subtle) about the actual name for optimization level 0. If I have foo.py, and I compile it three times with three different optimization levels (no optimization; -O; -OO), and then I look in __pycache__, would I see this:
# (1) foo.cpython-35.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc Or would I see this? # (2) foo.cpython-35.opt-0.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc Your lead-in ("I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today.") makes me think I should expect (1), but I can't actually pinpoint where the language of the PEP says this. On Fri, Mar 20, 2015 at 11:34 AM, Brett Cannon <bcan...@gmail.com> wrote: > I have decided to have the default case of no optimization levels mean > that the .pyc file name will have *no* optimization level specified in > the name and thus be just as it is today. I made this decision due to > potential backwards-compatibility issues -- although I expect them to be > minutes -- and to not force other implementations like PyPy to have some > bogus value set since they don't have .pyo files to begin with (PyPy > actually uses bytecode for -O and don't bother with -OO since PyPy already > uses a bunch of memory when running). > > Since this closes out the last open issue, I need either a BDFL decision > or a BDFAP to be assigned to make a decision. Guido? > > ====================================== > > PEP: 488 > Title: Elimination of PYO files > Version: $Revision$ > Last-Modified: $Date$ > Author: Brett Cannon <br...@python.org> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 20-Feb-2015 > Post-History: > 2015-03-06 > 2015-03-13 > 2015-03-20 > > Abstract > ======== > > This PEP proposes eliminating the concept of PYO files from Python. > To continue the support of the separation of bytecode files based on > their optimization level, this PEP proposes extending the PYC file > name to include the optimization level in the bytecode repository > directory when it's called for (i.e., the ``__pycache__`` directory). > > > Rationale > ========= > > As of today, bytecode files come in two flavours: PYC and PYO. A PYC > file is the bytecode file generated and read from when no > optimization level is specified at interpreter startup (i.e., ``-O`` > is not specified). A PYO file represents the bytecode file that is > read/written when **any** optimization level is specified (i.e., when > ``-O`` **or** ``-OO`` is specified). This means that while PYC > files clearly delineate the optimization level used when they were > generated -- namely no optimizations beyond the peepholer -- the same > is not true for PYO files. To put this in terms of optimization > levels and the file extension: > > - 0: ``.pyc`` > - 1 (``-O``): ``.pyo`` > - 2 (``-OO``): ``.pyo`` > > The reuse of the ``.pyo`` file extension for both level 1 and 2 > optimizations means that there is no clear way to tell what > optimization level was used to generate the bytecode file. In terms > of reading PYO files, this can lead to an interpreter using a mixture > of optimization levels with its code if the user was not careful to > make sure all PYO files were generated using the same optimization > level (typically done by blindly deleting all PYO files and then > using the `compileall` module to compile all-new PYO files [1]_). > This issue is only compounded when people optimize Python code beyond > what the interpreter natively supports, e.g., using the astoptimizer > project [2]_. > > In terms of writing PYO files, the need to delete all PYO files > every time one either changes the optimization level they want to use > or are unsure of what optimization was used the last time PYO files > were generated leads to unnecessary file churn. The change proposed > by this PEP also allows for **all** optimization levels to be > pre-compiled for bytecode files ahead of time, something that is > currently impossible thanks to the reuse of the ``.pyo`` file > extension for multiple optimization levels. > > As for distributing bytecode-only modules, having to distribute both > ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case > of code obfuscation and smaller file deployments. This means that > bytecode-only modules will only load from their non-optimized > ``.pyc`` file name. > > > Proposal > ======== > > To eliminate the ambiguity that PYO files present, this PEP proposes > eliminating the concept of PYO files and their accompanying ``.pyo`` > file extension. To allow for the optimization level to be unambiguous > as well as to avoid having to regenerate optimized bytecode files > needlessly in the `__pycache__` directory, the optimization level > used to generate the bytecode file will be incorporated into the > bytecode file name. When no optimization level is specified, the > pre-PEP ``.pyc`` file name will be used (i.e., no change in file name > semantics). This increases backwards-compatibility while also being > more understanding of Python implementations which have no use for > optimization levels (e.g., PyPy[10]_). > > Currently bytecode file names are created by > ``importlib.util.cache_from_source()``, approximately using the > following expression defined by PEP 3147 [3]_, [4]_, [5]_:: > > '{name}.{cache_tag}.pyc'.format(name=module_name, > cache_tag=sys.implementation.cache_tag) > > This PEP proposes to change the expression when an optimization > level is specified to:: > > '{name}.{cache_tag}.opt-{optimization}.pyc'.format( > name=module_name, > cache_tag=sys.implementation.cache_tag, > optimization=str(sys.flags.optimize)) > > The "opt-" prefix was chosen so as to provide a visual separator > from the cache tag. The placement of the optimization level after > the cache tag was chosen to preserve lexicographic sort order of > bytecode file names based on module name and cache tag which will > not vary for a single interpreter. The "opt-" prefix was chosen over > "o" so as to be somewhat self-documenting. The "opt-" prefix was > chosen over "O" so as to not have any confusion in case "0" was the > leading prefix of the optimization level. > > A period was chosen over a hyphen as a separator so as to distinguish > clearly that the optimization level is not part of the interpreter > version as specified by the cache tag. It also lends to the use of > the period in the file name to delineate semantically different > concepts. > > For example, if ``-OO`` had been passed to the interpreter then instead > of ``importlib.cpython-35.pyo`` the file name would be > ``importlib.cpython-35.opt-2.pyc``. > > It should be noted that this change in no way affects the performance > of import. Since the import system looks for a single bytecode file > based on the optimization level of the interpreter already and > generates a new bytecode file if it doesn't exist, the introduction > of potentially more bytecode files in the ``__pycache__`` directory > has no effect in terms of stat calls. The interpreter will continue > to look for only a single bytecode file based on the optimization > level and thus no increase in stat calls will occur. > > The only potentially negative result of this PEP is the probable > increase in the number of ``.pyc`` files and thus increase in storage > use. But for platforms where this is an issue, > ``sys.dont_write_bytecode`` exists to turn off bytecode generation so > that it can be controlled offline. > > > Implementation > ============== > > importlib > --------- > > As ``importlib.util.cache_from_source()`` is the API that exposes > bytecode file paths as well as being directly used by importlib, it > requires the most critical change. As of Python 3.4, the function's > signature is:: > > importlib.util.cache_from_source(path, debug_override=None) > > This PEP proposes changing the signature in Python 3.5 to:: > > importlib.util.cache_from_source(path, debug_override=None, *, > optimization=None) > > The introduced ``optimization`` keyword-only parameter will control > what optimization level is specified in the file name. If the > argument is ``None`` then the current optimization level of the > interpreter will be assumed (including no optimization). Any argument > given for ``optimization`` will be passed to ``str()`` and must have > ``str.isalnum()`` be true, else ``ValueError`` will be raised (this > prevents invalid characters being used in the file name). If the > empty string is passed in for ``optimization`` then the addition of > the optimization will be suppressed, reverting to the file name > format which predates this PEP. > > It is expected that beyond Python's own two optimization levels, > third-party code will use a hash of optimization names to specify the > optimization level, e.g. > ``hashlib.sha256(','.join(['no dead code', 'const > folding'])).hexdigest()``. > While this might lead to long file names, it is assumed that most > users never look at the contents of the __pycache__ directory and so > this won't be an issue. > > The ``debug_override`` parameter will be deprecated. As the parameter > expects a boolean, the integer value of the boolean will be used as > if it had been provided as the argument to ``optimization`` (a > ``None`` argument will mean the same as for ``optimization``). A > deprecation warning will be raised when ``debug_override`` is given a > value other than ``None``, but there are no plans for the complete > removal of the parameter at this time (but removal will be no later > than Python 4). > > The various module attributes for importlib.machinery which relate to > bytecode file suffixes will be updated [7]_. The > ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will > both be documented as deprecated and set to the same value as > ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and > ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be > not later than Python 4). > > All various finders and loaders will also be updated as necessary, > but updating the previous mentioned parts of importlib should be all > that is required. > > > Rest of the standard library > ---------------------------- > > The various functions exposed by the ``py_compile`` and > ``compileall`` functions will be updated as necessary to make sure > they follow the new bytecode file name semantics [6]_, [1]_. The CLI > for the ``compileall`` module will not be directly affected (the > ``-b`` flag will be implicit as it will no longer generate ``.pyo`` > files when ``-O`` is specified). > > > Compatibility Considerations > ============================ > > Any code directly manipulating bytecode files from Python 3.2 on > will need to consider the impact of this change on their code (prior > to Python 3.2 -- including all of Python 2 -- there was no > __pycache__ which already necessitates bifurcating bytecode file > handling support). If code was setting the ``debug_override`` > argument to ``importlib.util.cache_from_source()`` then care will be > needed if they want the path to a bytecode file with an optimization > level of 2. Otherwise only code **not** using > ``importlib.util.cache_from_source()`` will need updating. > > As for people who distribute bytecode-only modules (i.e., use a > bytecode file instead of a source file), they will have to choose > which optimization level they want their bytecode files to be since > distributing a ``.pyo`` file with a ``.pyc`` file will no longer be > of any use. Since people typically only distribute bytecode files for > code obfuscation purposes or smaller distribution size then only > having to distribute a single ``.pyc`` should actually be beneficial > to these use-cases. And since the magic number for bytecode files > changed in Python 3.5 to support PEP 465 there is no need to support > pre-existing ``.pyo`` files [8]_. > > > Rejected Ideas > ============== > > Completely dropping optimization levels from CPython > ---------------------------------------------------- > > Some have suggested that instead of accommodating the various > optimization levels in CPython, we should instead drop them > entirely. The argument is that significant performance gains would > occur from runtime optimizations through something like a JIT and not > through pre-execution bytecode optimizations. > > This idea is rejected for this PEP as that ignores the fact that > there are people who do find the pre-existing optimization levels for > CPython useful. It also assumes that no other Python interpreter > would find what this PEP proposes useful. > > > Alternative formatting of the optimization level in the file name > ----------------------------------------------------------------- > > Using the "opt-" prefix and placing the optimization level between > the cache tag and file extension is not critical. All options which > have been considered are: > > * ``importlib.cpython-35.opt-1.pyc`` > * ``importlib.cpython-35.opt1.pyc`` > * ``importlib.cpython-35.o1.pyc`` > * ``importlib.cpython-35.O1.pyc`` > * ``importlib.cpython-35.1.pyc`` > * ``importlib.cpython-35-O1.pyc`` > * ``importlib.O1.cpython-35.pyc`` > * ``importlib.o1.cpython-35.pyc`` > * ``importlib.1.cpython-35.pyc`` > > These were initially rejected either because they would change the > sort order of bytecode files, possible ambiguity with the cache tag, > or were not self-documenting enough. An informal poll was taken and > people clearly preferred the formatting proposed by the PEP [9]_. > Since this topic is non-technical and of personal choice, the issue > is considered solved. > > > Embedding the optimization level in the bytecode metadata > --------------------------------------------------------- > > Some have suggested that rather than embedding the optimization level > of bytecode in the file name that it be included in the file's > metadata instead. This would mean every interpreter had a single copy > of bytecode at any time. Changing the optimization level would thus > require rewriting the bytecode, but there would also only be a single > file to care about. > > This has been rejected due to the fact that Python is often installed > as a root-level application and thus modifying the bytecode file for > modules in the standard library are always possible. In this > situation integrators would need to guess at what a reasonable > optimization level was for users for any/all situations. By > allowing multiple optimization levels to co-exist simultaneously it > frees integrators from having to guess what users want and allows > users to utilize the optimization level they want. > > > References > ========== > > .. [1] The compileall module > (https://docs.python.org/3/library/compileall.html#module-compileall) > > .. [2] The astoptimizer project > (https://pypi.python.org/pypi/astoptimizer) > > .. [3] ``importlib.util.cache_from_source()`` > ( > https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source > ) > > .. [4] Implementation of ``importlib.util.cache_from_source()`` from > CPython 3.4.3rc1 > ( > https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437 > ) > > .. [5] PEP 3147, PYC Repository Directories, Warsaw > (http://www.python.org/dev/peps/pep-3147) > > .. [6] The py_compile module > (https://docs.python.org/3/library/compileall.html#module-compileall) > > .. [7] The importlib.machinery module > ( > https://docs.python.org/3/library/importlib.html#module-importlib.machinery > ) > > .. [8] ``importlib.util.MAGIC_NUMBER`` > ( > https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER > ) > > .. [9] Informal poll of file name format options on Google+ > (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm) > > .. [10] The PyPy Project > (http://pypy.org/) > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com