Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
Donald Stufft gmail.com> writes: > This is insane. A fairly simple database query is going to "grind the PyPI > servers into dust"? You're going to need to back up this FUD or please > refrain from spouting it. Never mind the "Obsoletes" information - even the more useful "Requires-Dist" information is not exposed via PyPI, even though it appears to be stored in the database. (Or if it is, please point me to where - I must have missed it.) Even if this were to be made available, it's presumably obtained from PKG-INFO. As I understand, this data is not considered reliable - for example, pip runs egg_info on downloaded packages to get updated information when determining dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't be relied on, surely less critical information such as Obsoletes can't be relied on, either? Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On Thursday, December 6, 2012 at 6:28 AM, Vinay Sajip wrote: > Donald Stufft gmail.com (http://gmail.com)> writes: > > Never mind the "Obsoletes" information - even the more useful "Requires-Dist" > information is not exposed via PyPI, even though it appears to be stored in > the > database. (Or if it is, please point me to where - I must have missed it.) > > Requires-Dist doesn't exist for more than a handful of packages. But PyPI exposes it via the XMLRPC API, possibly the JSON api as well. > > Even if this were to be made available, it's presumably obtained from > PKG-INFO. > As I understand, this data is not considered reliable - for example, pip runs > egg_info on downloaded packages to get updated information when determining > dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't be > relied on, surely less critical information such as Obsoletes can't be relied > on, > either? > > pip runs egg_info because setuptools does not write out to PKG-INFO what the dependencies are (it does write it out to a different text file though). But IIRC that text file is not guaranteed to exist in the distribution. There's also the history where pip was trying to preserve as much backwards compat with easy_install as it could, and if you used the file that egg_info writes out then you'll only get the requirements for the system that the distribution was packaged on. Any if statements that affect the dependencies won't be in effect. > > Regards, > > Vinay Sajip > > ___ > Python-Dev mailing list > Python-Dev@python.org (mailto:Python-Dev@python.org) > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com > > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On Thu, Dec 6, 2012 at 6:33 AM, Donald Stufft wrote: > On Thursday, December 6, 2012 at 6:28 AM, Vinay Sajip wrote: > > Donald Stufft gmail.com> writes: > > Never mind the "Obsoletes" information - even the more useful > "Requires-Dist" > information is not exposed via PyPI, even though it appears to be stored > in the > database. (Or if it is, please point me to where - I must have missed it.) > > Requires-Dist doesn't exist for more than a handful of packages. But PyPI > exposes > it via the XMLRPC API, possibly the JSON api as well. > > > Even if this were to be made available, it's presumably obtained from > PKG-INFO. > As I understand, this data is not considered reliable - for example, pip > runs > egg_info on downloaded packages to get updated information when determining > dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't > be > relied on, surely less critical information such as Obsoletes can't be > relied on, > either? > > pip runs egg_info because setuptools does not write out to PKG-INFO what > the dependencies are (it does write it out to a different text file > though). But IIRC > that text file is not guaranteed to exist in the distribution. There's > also the > history where pip was trying to preserve as much backwards compat with > easy_install as it could, and if you used the file that egg_info writes out > then you'll only get the requirements for the system that the distribution > was > packaged on. Any if statements that affect the dependencies won't be > in effect. > It will be Obsoleted-By:. The "drop in replacement" requirement will be removed. The package manager will say "you are using these obsolete packages; check out these non-obsolete ones" but will not automatically pull the replacement without a Requires tag. I will probably add the unambiguous Conflicts: tag "uninstall this other package if I am installed". Many packages (IIRC more than half) have the pre-Metadata-1.2 equivalent of Requires-Dist: which is the very easy to parse requires.txt. This information is not reliable because it could depend on conditions in setup.py. Someone should write a setup.py compiler that determines whether a package's requirements are conditional or not. Environment markers (limited Python expressions at the end of Requires-Dist lines) attempt to make Requires-Dist reliable. You can execute them safely in your environment to determine whether a requirement is right for you: Requires-Dist: pywin32 (>1.0); sys.platform == 'win32' The wheel implementation makes sure all the metadata (the .dist-info directory) is at the end of the .zip archive. It's possible to read the metadata with a single HTTP partial request for the end of the archive without downloading the entire archive. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
Daniel Holth gmail.com> writes: > The wheel implementation makes sure all the metadata (the .dist-info > directory) > is at the end of the .zip archive. It's possible to read the metadata with a > single HTTP partial request for the end of the archive without downloading the > entire archive. Sounds good, but can you point to any example code which does this? As I understand it, for .zip files you have to read the last part of the file to get a pointer to the directory, then read that to find where each file in the archive is, then seek to a specific position to read the file contents. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip wrote: > Daniel Holth gmail.com> writes: > > > The wheel implementation makes sure all the metadata (the .dist-info > directory) > > is at the end of the .zip archive. It's possible to read the metadata > with a > > single HTTP partial request for the end of the archive without > downloading the > > entire archive. > > Sounds good, but can you point to any example code which does this? As I > understand it, for .zip files you have to read the last part of the file > to get a > pointer to the directory, then read that to find where each file in the > archive > is, then seek to a specific position to read the file contents. You have to make a maximum of 3 requests: one for the directory pointer, one for the directory, and one for the file you want. It's not particularly difficult to make an HTTP-backed seekable file object to pass to ZipFile() for this purpose but I don't have an example. Normally the last few k of the file will contain all 3 pieces. 8k or 16k would be a good guess. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On 6 Dec, 2012, at 15:58, Vinay Sajip wrote: > Daniel Holth gmail.com> writes: > >> The wheel implementation makes sure all the metadata (the .dist-info >> directory) >> is at the end of the .zip archive. It's possible to read the metadata with a >> single HTTP partial request for the end of the archive without downloading >> the >> entire archive. > > Sounds good, but can you point to any example code which does this? As I > understand it, for .zip files you have to read the last part of the file to > get a > pointer to the directory, then read that to find where each file in the > archive > is, then seek to a specific position to read the file contents. Because zipfiles can be appended to other files (for example when creating a self-extracting archive) the zipfile module maintains the file offset of the start of a zipfile. The code in the stdlib doesn't appear to test that the zipfile is at a positive offset in the file, therefore with some luck the following will work: * Download the last 10K of the archive (adjust the size to taste, it should be large enough to contain the zipfile directory and the file you are trying to read) * Create a zipfile.ZipFile * Read the zipfile member. If that doesn't work you'll have to create a temporary file of the right size and place the downloaded bit at the end of that file. BTW. Another (more hacky) alternative is to place the interesting bits of dist-info at the start of the zipfile, then you only need to download the first bit of the archive and can then extract the bits you need by parsing the local file headers (zipfiles contain both a directory at the end of the zipfile and a local header stored just before the file data). Ronald > > Regards, > > Vinay Sajip > > > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
Daniel Holth gmail.com> writes: > You have to make a maximum of 3 requests: one for the directory pointer, one > for the directory, and one for the file you want. It's not particularly > difficult to make an HTTP-backed seekable file object to pass to ZipFile() for > this purpose but I don't have an example. Normally the last few k of the file > will contain all 3 pieces. 8k or 16k would be a good guess. I don't need an example for doing it with multiple HTTP requests. I only asked for an example because you said one could read the metadata "with a single HTTP partial request", and I couldn't see how it could always be done with a single request. PEP 427 is mute on the subject of zip file comments in a .whl, but perhaps it shouldn't be. IIUC, the directory of the zip file *could* be further from the end of the file by more than 16K, due to the possible presence of a pathologically large comment in the end record. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On Thu, Dec 6, 2012 at 11:30 AM, Vinay Sajip wrote: > Daniel Holth gmail.com> writes: > > > You have to make a maximum of 3 requests: one for the directory pointer, > one > > for the directory, and one for the file you want. It's not particularly > > difficult to make an HTTP-backed seekable file object to pass to > ZipFile() for > > this purpose but I don't have an example. Normally the last few k of the > file > > will contain all 3 pieces. 8k or 16k would be a good guess. > > I don't need an example for doing it with multiple HTTP requests. I only > asked > for an example because you said one could read the metadata "with a single > HTTP partial request", and I couldn't see how it could always be done with > a > single request. > > PEP 427 is mute on the subject of zip file comments in a .whl, but perhaps > it > shouldn't be. IIUC, the directory of the zip file *could* be further from > the end > of the file by more than 16K, due to the possible presence of a > pathologically > large comment in the end record. It's just a "usually works" optimization that might be fun when bandwidth is more important than round trip times. The distance between the directory and the end of the file depends on the size of the directory. Django's is an extreme case at nearly half a meg; most are much smaller. On many filesystems it is cheap to create a sparse file the size of the entire archive and write the partial requests into it. The OS doesn't actually store all the 0's. The other reason wheel puts the metadata at the end is so the metadata can be re-written efficiently without re-writing the entire zipfile. The wheel project implements ZipFile.pop() which truncates the last file from a (normal) zip archive. This is especially useful when the last file is the attached digital signature. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On Thu, Dec 6, 2012 at 8:39 AM, Daniel Holth wrote: > It will be Obsoleted-By:. The "drop in replacement" requirement will be > removed. The package manager will say "you are using these obsolete > packages; check out these non-obsolete ones" but will not automatically pull > the replacement without a Requires tag. Sounds fine to me. > I will probably add the unambiguous Conflicts: tag "uninstall this other > package if I am installed". Please don't. See my lengthy posts from the previous PEP 345 retread discussion for why, or ask MRAB to succinctly summarize them as he did so brilliantly with the obsoletes/obsoleted-by issue. ;-) I'll take a stab at a short version, though: a conflict (other than filename conflict) is not an installation-time property of a single project, but rather a *runtime* property of an overall system to which the projects are being installed, including configuration that is out of scope for a Python-specific installation tool to manage. In addition, even declaring overall conflicts as a *mere shorthand* for an existing file conflict creates the possibility of stale conflict information! For example, RuleDispatch vs. PyDispatcher: at one time both provided a "dispatch" package, but if RuleDispatch declared PyDispatcher conflicting, the declaration would quickly have become outdated: PyDispatcher soon renamed its provided package to resolve the conflict. A file-based system can both detect and resolve this conflict (or lack thereof) automatically, whereas a manual "Conflicts" notation must be maintained by the author(s) of one or both packages and removed when out of date. In effect, a "conflicts" field actually *creates* conflicts and maintenance burdens where they did not previously exist, because even after the conflict no longer really existed, an automated tool would have prevented PyDispatch from being installed, or, per your suggestion above, unnecessarily *uninstalled* it after a user installed RuleDispatch. And unlike the Obsoletes->Obsoleted-By change, I do not know of any similar way to salvage the idea of a Conflicts field, without reference to some mediating authority that manages the information on behalf of an overall system into which the projects are being fitted. But in that case, neither of the projects really owns the declaration - it's more like Zope (say) would need a list of plugins that conflict with each other, or they could declare that they conflict when activated in the same instance. A generic Python installer, however, that doesn't know about Zope instances or Apache vhosts or Django apps or any other "environment of conflict", can't assume that *mere installation* constitutes a conflict! It doesn't know, for example, whether code from two simultaneously-installed packages will ever even be *imported* in the same process, let alone whether their specific conflicting features will be used in that process. This effectively ensures that in general, Python installation tools can *only* rely on file-based conflicts as being denotable by project metadata -- and even then, it's better to stick with *actual* file conflicts rather than predicted ones, to avoid the type of logjam described above. P.S. Sorry once again to drag you through all this at the last minute; I just sort of assumed you picked up where Alexis left off on the previous attempt at an update to PEP 345 and didn't pay close enough attention to earlier drafts. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip wrote: > Daniel Holth gmail.com> writes: > >> The wheel implementation makes sure all the metadata (the .dist-info >> directory) >> is at the end of the .zip archive. It's possible to read the metadata with a >> single HTTP partial request for the end of the archive without downloading >> the >> entire archive. > > Sounds good, but can you point to any example code which does this? As I > understand it, for .zip files you have to read the last part of the file to > get a > pointer to the directory, then read that to find where each file in the > archive > is, then seek to a specific position to read the file contents. ISTR that this is especially true for zipimport: I think it depends on a zipfile signature being present at the *end* of the file. Certainly, the standard for .exe and shell wrappers for zipfiles is to place them at the beginning of the file, rather than the end. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]
On Thu, Dec 6, 2012 at 1:49 AM, Toshio Kuratomi wrote: > On Wed, Dec 05, 2012 at 07:34:41PM -0500, PJ Eby wrote: >> On Wed, Dec 5, 2012 at 6:07 PM, Donald Stufft >> wrote: >> >> Nobody has actually proposed a better one, outside of package renaming >> -- and that example featured an author who could just as easily have >> used an obsoleted-by field. >> > How about pexpect and pextpect-u as a better example? Perhaps you could explain? I'm not familiar with those projects. > Note that although well-managed Linux distros attempt to control random > forking internally, the distro package managers don't prevent people from > installing from third parties. So Ubuntu PPAs, upstreams that provide their > own rpms/debs, and major third party repos (for instance, rpmfusion as > an add-on repo to Fedora) all have and sometimes (mis)use the ability to > Obsolete packages in the base repository. But in each of these cases, the packages are being defined *with reference to* some underlying vision of what the distro (or even "a distro") is. An Ubuntu PPA, if I understand correctly, is still *building an Ubuntu system*. Python packaging as a whole lacks such frames of reference. A forked distro is still a distro, and it's a fork *of something*. Rpmfusion is defining an enhanced Fedora, not slinging random unrelated packages about. If there's a distro analogy to PyPI, it seems to me that something like RpmFind would be closer: it's just a free-for-all of packages, with the user needing to decide for themselves whether installing something from a foreign distro will or won't blow up their system. (E.g., because their native distro and the foreign one use a different "provides" taxonomy.) RpmFind itself can't solve anybody's issues with conflicts or obsoletes; all it can do is search the data that's there. But unlike PyPI, RpmFind can at least tell you which vision of "a distro" a particular package was intended for. ;-) > The ability for this class of fields to cause harm is not, to me, > a compelling argument not to include them. But it is absolutely not a compelling argument *to* include them, and the actual arguments for them are pretty thin on the ground. The real knockdown is that in the PyPI environment, there aren't any automated use cases that don't produce collateral damage (outside of advisories about Obsoleted-By projects). > It could be an argument to > explicitly tell implementers of install tools that they all have caveats > when used with pypi and similar unpoliced community package repositories. AFAIK, there are only a handful of curated repositories: Scipy, Enthought, and ActiveState come to mind. These are essentially "python distros", and they might certainly have reason to build policy into their metadata. I expect, however, that they would not want the *package* authors declaring their own conflicts or obsolescence, so I'm not sure how the metadata spec will help them. Has anyone asked for their input or experience? It seems pointless to speculate on what they might or might not need for curated distribution. (I'm pretty sure Enthought has their own install tools, not sure about the other two.) > The install tools can then choose how they wish to deal with those caveats. > Some example strategies: choose to prompt the user as to which to install, > choose to always treat the fields as human-informational only, mark some > repositories as being trusted to contain packages where these fields are > active and other repositories where the fields are ignored. A peculiar phenomenon: every defense of these fields seems to refer almost exclusively to how the problems could be fixed or why the problems aren't that bad, rather than *how useful the fields would be* in real-world scenarios. In some cases, the argument for the fields' safety actually runs *counter* to their usefulness, e.g., the fields aren't that bad because we could make them have a limited function or no function at all. Isn't lack of usefulness generally considered an argument for *not* including a feature? ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com