Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Vinay Sajip
Donald Stufft  gmail.com> writes:

> This is insane. A fairly simple database query is going to "grind the PyPI
> servers into dust"?  You're going to need to back up this FUD or please
> refrain from spouting it.

Never mind the "Obsoletes" information - even the more useful "Requires-Dist"
information is not exposed via PyPI, even though it appears to be stored in the
database. (Or if it is, please point me to where - I must have missed it.)

Even if this were to be made available, it's presumably obtained from PKG-INFO.
As I understand, this data is not considered reliable - for example, pip runs
egg_info on downloaded packages to get updated information when determining
dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't be
relied on, surely less critical information such as Obsoletes can't be relied 
on,
either?

Regards,

Vinay Sajip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Donald Stufft
On Thursday, December 6, 2012 at 6:28 AM, Vinay Sajip wrote:
> Donald Stufft  gmail.com (http://gmail.com)> writes:
> 
> Never mind the "Obsoletes" information - even the more useful "Requires-Dist"
> information is not exposed via PyPI, even though it appears to be stored in 
> the
> database. (Or if it is, please point me to where - I must have missed it.)
> 
> 

Requires-Dist doesn't exist for more than a handful of packages. But PyPI 
exposes
it via the XMLRPC API, possibly the JSON api as well. 
> 
> Even if this were to be made available, it's presumably obtained from 
> PKG-INFO.
> As I understand, this data is not considered reliable - for example, pip runs
> egg_info on downloaded packages to get updated information when determining
> dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't be
> relied on, surely less critical information such as Obsoletes can't be relied 
> on,
> either?
> 
> 

pip runs egg_info because setuptools does not write out to PKG-INFO what
the dependencies are (it does write it out to a different text file though). 
But IIRC
that text file is not guaranteed to exist in the distribution. There's also the
history where pip was trying to preserve as much backwards compat with
easy_install as it could, and if you used the file that egg_info writes out
then you'll only get the requirements for the system that the distribution was
packaged on. Any if statements that affect the dependencies won't be
in effect.
> 
> Regards,
> 
> Vinay Sajip
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org (mailto:Python-Dev@python.org)
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com
> 
> 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Daniel Holth
On Thu, Dec 6, 2012 at 6:33 AM, Donald Stufft wrote:

> On Thursday, December 6, 2012 at 6:28 AM, Vinay Sajip wrote:
>
> Donald Stufft  gmail.com> writes:
>
> Never mind the "Obsoletes" information - even the more useful
> "Requires-Dist"
> information is not exposed via PyPI, even though it appears to be stored
> in the
> database. (Or if it is, please point me to where - I must have missed it.)
>
> Requires-Dist doesn't exist for more than a handful of packages. But PyPI
> exposes
> it via the XMLRPC API, possibly the JSON api as well.
>
>
> Even if this were to be made available, it's presumably obtained from
> PKG-INFO.
> As I understand, this data is not considered reliable - for example, pip
> runs
> egg_info on downloaded packages to get updated information when determining
> dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't
> be
> relied on, surely less critical information such as Obsoletes can't be
> relied on,
> either?
>
> pip runs egg_info because setuptools does not write out to PKG-INFO what
> the dependencies are (it does write it out to a different text file
> though). But IIRC
> that text file is not guaranteed to exist in the distribution. There's
> also the
> history where pip was trying to preserve as much backwards compat with
> easy_install as it could, and if you used the file that egg_info writes out
> then you'll only get the requirements for the system that the distribution
> was
> packaged on. Any if statements that affect the dependencies won't be
> in effect.
>

It will be Obsoleted-By:. The "drop in replacement" requirement will be
removed. The package manager will say "you are using these obsolete
packages; check out these non-obsolete ones" but will not automatically
pull the replacement without a Requires tag.

I will probably add the unambiguous Conflicts: tag "uninstall this other
package if I am installed".


Many packages (IIRC more than half) have the pre-Metadata-1.2 equivalent of
Requires-Dist: which is the very easy to parse requires.txt. This
information is not reliable because it could depend on conditions in
setup.py. Someone should write a setup.py compiler that determines whether
a package's requirements are conditional or not.


Environment markers (limited Python expressions at the end of Requires-Dist
lines) attempt to make Requires-Dist reliable. You can execute them safely
in your environment to determine whether  a requirement is right for you:

Requires-Dist: pywin32 (>1.0); sys.platform == 'win32'


The wheel implementation makes sure all the metadata (the .dist-info
directory) is at the end of the .zip archive. It's possible to read the
metadata with a single HTTP partial request for the end of the archive
without downloading the entire archive.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Vinay Sajip
Daniel Holth  gmail.com> writes:

> The wheel implementation makes sure all the metadata (the .dist-info 
> directory)
> is at the end of the .zip archive. It's possible to read the metadata with a
> single HTTP partial request for the end of the archive without downloading the
> entire archive.

Sounds good, but can you point to any example code which does this? As I
understand it, for .zip files you have to read the last part of the file to get 
a
pointer to the directory, then read that to find where each file in the archive
is, then seek to a specific position to read the file contents.

Regards,

Vinay Sajip



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Daniel Holth
On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip  wrote:

> Daniel Holth  gmail.com> writes:
>
> > The wheel implementation makes sure all the metadata (the .dist-info
> directory)
> > is at the end of the .zip archive. It's possible to read the metadata
> with a
> > single HTTP partial request for the end of the archive without
> downloading the
> > entire archive.
>
> Sounds good, but can you point to any example code which does this? As I
> understand it, for .zip files you have to read the last part of the file
> to get a
> pointer to the directory, then read that to find where each file in the
> archive
> is, then seek to a specific position to read the file contents.


You have to make a maximum of 3 requests: one for the directory pointer,
one for the directory, and one for the file you want. It's not particularly
difficult to make an HTTP-backed seekable file object to pass to ZipFile()
for this purpose but I don't have an example. Normally the last few k of
the file will contain all 3 pieces. 8k or 16k would be a good guess.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Ronald Oussoren

On 6 Dec, 2012, at 15:58, Vinay Sajip  wrote:

> Daniel Holth  gmail.com> writes:
> 
>> The wheel implementation makes sure all the metadata (the .dist-info 
>> directory)
>> is at the end of the .zip archive. It's possible to read the metadata with a
>> single HTTP partial request for the end of the archive without downloading 
>> the
>> entire archive.
> 
> Sounds good, but can you point to any example code which does this? As I
> understand it, for .zip files you have to read the last part of the file to 
> get a
> pointer to the directory, then read that to find where each file in the 
> archive
> is, then seek to a specific position to read the file contents.

Because zipfiles can be appended to other files (for example when creating a 
self-extracting archive) the zipfile module maintains the file offset of the 
start of a zipfile. The code in the stdlib doesn't appear to test that the 
zipfile is at a positive offset in the file, therefore with some luck the 
following will work: 

* Download the last 10K of the archive (adjust the size to taste, it should be 
large enough to contain the zipfile directory and the file you are trying to 
read)

* Create a zipfile.ZipFile

* Read the zipfile member.

If that doesn't work you'll have to create a temporary file of the right size 
and place the downloaded bit at the end of that file.

BTW. Another (more hacky) alternative is to place the interesting bits of 
dist-info at the start of the zipfile, then you only need to download the first 
bit of the archive and can then extract the bits you need by parsing the local 
file headers (zipfiles contain both a directory at the end of the zipfile and a 
local header stored just before the file data). 

Ronald
> 
> Regards,
> 
> Vinay Sajip
> 
> 
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Vinay Sajip
Daniel Holth  gmail.com> writes:

> You have to make a maximum of 3 requests: one for the directory pointer, one
> for the directory, and one for the file you want. It's not particularly
> difficult to make an HTTP-backed seekable file object to pass to ZipFile() for
> this purpose but I don't have an example. Normally the last few k of the file
> will contain all 3 pieces. 8k or 16k would be a good guess.

I don't need an example for doing it with multiple HTTP requests. I only asked
for an example because you said one could read the metadata "with a single
HTTP partial request", and I couldn't see how it could always be done with a
single request.

PEP 427 is mute on the subject of zip file comments in a .whl, but perhaps it
shouldn't be. IIUC, the directory of the zip file *could* be further from the 
end
of the file by more than 16K, due to the possible presence of a pathologically
large comment in the end record.

Regards,

Vinay Sajip


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread Daniel Holth
On Thu, Dec 6, 2012 at 11:30 AM, Vinay Sajip wrote:

> Daniel Holth  gmail.com> writes:
>
> > You have to make a maximum of 3 requests: one for the directory pointer,
> one
> > for the directory, and one for the file you want. It's not particularly
> > difficult to make an HTTP-backed seekable file object to pass to
> ZipFile() for
> > this purpose but I don't have an example. Normally the last few k of the
> file
> > will contain all 3 pieces. 8k or 16k would be a good guess.
>
> I don't need an example for doing it with multiple HTTP requests. I only
> asked
> for an example because you said one could read the metadata "with a single
> HTTP partial request", and I couldn't see how it could always be done with
> a
> single request.
>
> PEP 427 is mute on the subject of zip file comments in a .whl, but perhaps
> it
> shouldn't be. IIUC, the directory of the zip file *could* be further from
> the end
> of the file by more than 16K, due to the possible presence of a
> pathologically
> large comment in the end record.


It's just a "usually works" optimization that might be fun when bandwidth
is more important than round trip times. The distance between the directory
and the end of the file depends on the size of the directory. Django's is
an extreme case at nearly half a meg; most are much smaller.

On many filesystems it is cheap to create a sparse file the size of the
entire archive and write the partial requests into it. The OS doesn't
actually store all the 0's.

The other reason wheel puts the metadata at the end is so the metadata can
be re-written efficiently without re-writing the entire zipfile. The wheel
project implements ZipFile.pop() which truncates the last file from a
(normal) zip archive. This is especially useful when the last file is the
attached digital signature.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread PJ Eby
On Thu, Dec 6, 2012 at 8:39 AM, Daniel Holth  wrote:
> It will be Obsoleted-By:. The "drop in replacement" requirement will be
> removed. The package manager will say "you are using these obsolete
> packages; check out these non-obsolete ones" but will not automatically pull
> the replacement without a Requires tag.

Sounds fine to me.

> I will probably add the unambiguous Conflicts: tag "uninstall this other
> package if I am installed".

Please don't.  See my lengthy posts from the previous PEP 345 retread
discussion for why, or ask MRAB to succinctly summarize them as he did
so brilliantly with the obsoletes/obsoleted-by issue.  ;-)

I'll take a stab at a short version, though: a conflict (other than
filename conflict) is not an installation-time property of a single
project, but rather a *runtime* property of an overall system to which
the projects are being installed, including configuration that is out
of scope for a Python-specific installation tool to manage.  In
addition, even declaring overall conflicts as a *mere shorthand* for
an existing file conflict creates the possibility of stale conflict
information!

For example, RuleDispatch vs. PyDispatcher: at one time both provided
a "dispatch" package, but if RuleDispatch declared PyDispatcher
conflicting, the declaration would quickly have become outdated:
PyDispatcher soon renamed its provided package to resolve the
conflict.  A file-based system can both detect and resolve this
conflict (or lack thereof) automatically, whereas a manual "Conflicts"
notation must be maintained by the author(s) of one or both packages
and removed when out of date.

In effect, a "conflicts" field actually *creates* conflicts and
maintenance burdens where they did not previously exist, because even
after the conflict no longer really existed, an automated tool would
have prevented PyDispatch from being installed, or, per your
suggestion above, unnecessarily *uninstalled* it after a user
installed RuleDispatch.

And unlike the Obsoletes->Obsoleted-By change, I do not know of any
similar way to salvage the idea of a Conflicts field, without
reference to some mediating authority that manages the information on
behalf of an overall system into which the projects are being fitted.
But in that case, neither of the projects really owns the declaration
- it's more like Zope (say) would need a list of plugins that conflict
with each other, or they could declare that they conflict when
activated in the same instance.

A generic Python installer, however, that doesn't know about Zope
instances or Apache vhosts or Django apps or any other "environment of
conflict", can't assume that *mere installation* constitutes a
conflict!  It doesn't know, for example, whether code from two
simultaneously-installed packages will ever even be *imported* in the
same process, let alone whether their specific conflicting features
will be used in that process.

This effectively ensures that in general, Python installation tools
can *only* rely on file-based conflicts as being denotable by project
metadata -- and even then, it's better to stick with *actual* file
conflicts rather than predicted ones, to avoid the type of logjam
described above.


P.S. Sorry once again to drag you through all this at the last minute;
I just sort of assumed you picked up where Alexis left off on the
previous attempt at an update to PEP 345 and didn't pay close enough
attention to earlier drafts.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread PJ Eby
On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip  wrote:
> Daniel Holth  gmail.com> writes:
>
>> The wheel implementation makes sure all the metadata (the .dist-info 
>> directory)
>> is at the end of the .zip archive. It's possible to read the metadata with a
>> single HTTP partial request for the end of the archive without downloading 
>> the
>> entire archive.
>
> Sounds good, but can you point to any example code which does this? As I
> understand it, for .zip files you have to read the last part of the file to 
> get a
> pointer to the directory, then read that to find where each file in the 
> archive
> is, then seek to a specific position to read the file contents.

ISTR that this is especially true for zipimport: I think it depends on
a zipfile signature being present at the *end* of the file.

Certainly, the standard for .exe and shell wrappers for zipfiles is to
place them at the beginning of the file, rather than the end.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-06 Thread PJ Eby
On Thu, Dec 6, 2012 at 1:49 AM, Toshio Kuratomi  wrote:
> On Wed, Dec 05, 2012 at 07:34:41PM -0500, PJ Eby wrote:
>> On Wed, Dec 5, 2012 at 6:07 PM, Donald Stufft  
>> wrote:
>>
>> Nobody has actually proposed a better one, outside of package renaming
>> -- and that example featured an author who could just as easily have
>> used an obsoleted-by field.
>>
> How about pexpect and pextpect-u as a better example?

Perhaps you could explain?  I'm not familiar with those projects.

> Note that although well-managed Linux distros attempt to control random
> forking internally, the distro package managers don't prevent people from
> installing from third parties.  So Ubuntu PPAs, upstreams that provide their
> own rpms/debs, and major third party repos (for instance, rpmfusion as
> an add-on repo to Fedora) all have and sometimes (mis)use the ability to
> Obsolete packages in the base repository.

But in each of these cases, the packages are being defined *with
reference to* some underlying vision of what the distro (or even "a
distro") is.  An Ubuntu PPA, if I understand correctly, is still
*building an Ubuntu system*.  Python packaging as a whole lacks such
frames of reference.  A forked distro is still a distro, and it's a
fork *of something*.  Rpmfusion is defining an enhanced Fedora, not
slinging random unrelated packages about.

If there's a distro analogy to PyPI, it seems to me that something
like RpmFind would be closer: it's just a free-for-all of packages,
with the user needing to decide for themselves whether installing
something from a foreign distro will or won't blow up their system.
(E.g., because their native distro and the foreign one use a different
"provides" taxonomy.)

RpmFind itself can't solve anybody's issues with conflicts or
obsoletes; all it can do is search the data that's there.

But unlike PyPI, RpmFind can at least tell you which vision of "a
distro" a particular package was intended for.  ;-)


> The ability for this class of fields to cause harm is not, to me,
> a compelling argument not to include them.

But it is absolutely not a compelling argument *to* include them, and
the actual arguments for them are pretty thin on the ground.

The real knockdown is that in the PyPI environment, there aren't any
automated use cases that don't produce collateral damage (outside of
advisories about Obsoleted-By projects).


> It could be an argument to
> explicitly tell implementers of install tools that they all have caveats
> when used with pypi and similar unpoliced community package repositories.

AFAIK, there are only a handful of curated repositories: Scipy,
Enthought, and ActiveState come to mind.  These are essentially
"python distros", and they might certainly have reason to build policy
into their metadata.  I expect, however, that they would not want the
*package* authors declaring their own conflicts or obsolescence, so
I'm not sure how the metadata spec will help them.  Has anyone asked
for their input or experience?  It seems pointless to speculate on
what they might or might not need for curated distribution.  (I'm
pretty sure Enthought has their own install tools, not sure about the
other two.)

> The install tools can then choose how they wish to deal with those caveats.
> Some example strategies: choose to prompt the user as to which to install,
> choose to always treat the fields as human-informational only, mark some
> repositories as being trusted to contain packages where these fields are
> active and other repositories where the fields are ignored.

A peculiar phenomenon: every defense of these fields seems to refer
almost exclusively to how the problems could be fixed or why the
problems aren't that bad, rather than *how useful the fields would be*
in real-world scenarios.  In some cases, the argument for the fields'
safety actually runs *counter* to their usefulness, e.g., the fields
aren't that bad because we could make them have a limited function or
no function at all.  Isn't lack of usefulness generally considered an
argument for *not* including a feature?  ;-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com