Re: [Python-Dev] [Python-ideas] Ext4 data loss

2009-03-11 Thread zooko
Would there be interest in a filetools module? Replies and  
discussion to python-ideas please.



I've been using and maintaining a few filesystem hacks for, let's  
see, almost nine years now:


http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/fileutil.py

(The first version of that was probably written by Greg Smith in  
about 1999.)


I'm sure there are many other such packages.  A couple of quick  
searches of pypi turned up these two:


http://pypi.python.org/pypi/Pythonutils
http://pypi.python.org/pypi/fs

I wonder if any of them have the sort of functionality you're  
thinking of.


Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3rd party developers: don't change your APIs when porting to Py3k! (but consider using ctypes)

2008-03-17 Thread zooko
I'm the maintainer of a few Python packages which wrap native C or C+ 
+ code.

At Pycon, I learned that PyPy and Jython support ctypes or have plans  
to do so in the near future.  I don't know about IronPython.

However, having CPython, PyPy, and Jython all supporting ctypes makes  
it the obvious choice for interfacing to native code in the future.

Regards,

Zooko O'Whielacronx
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] PEP 365 (Adding the pkg_resources module)

2008-03-18 Thread zooko
Folks:

(By the way, it was a pleasure to meet many of you in Real Life for  
the first time at Pycon.)

Here is what I want:

1.  The standard Python build tools by default produce machine- 
parseable package metadata, which can include package dependency  
information with reasonably well-defined semantics.

Oh good!  I already have this, since distutils in Python >= 2.5  
produces .egg-info metadata in an easy-to-parse format.

2.  This machine-parseable metadata is widely supported and  
understood by the Python community.

In retrospect, it's too bad that it isn't named ".pkg-info" instead  
of ".egg-info", in order to avoid the fraught politics around the  
concept of "eggs".  A concrete example of such a misunderstanding is  
the sad fact that many Linux distributions were in the habit of  
deleting this information from their Python packages, perhaps because  
they were under the impression that it was obviated by their  
packaging tools.  The major distributions have all stopped doing this  
now.

Unifying the created-by-default PKG-INFO files and the created-by- 
default .egg-info directories would be nice, too.

3.  The standard Python library includes a tool to find and parse  
this metadata, so that I can write programmatic tests of my  
dependencies, like this:

http://allmydata.org/trac/tahoe/browser/_auto_deps.py?rev=2062

This is one of the improvements that I was anticipating from  
pkg_resources going into the standard library.

4.  The standard Python library includes a tool to find and read  
resources (other than Python modules) that came bundled in a Python  
package.

Consider, for example, this snippets of code in Nevow:

http://divmod.org/trac/browser/trunk/Nevow/setup.py?rev=13786#L10
http://divmod.org/trac/browser/trunk/Nevow/setup.py?rev=13786
http://divmod.org/trac/browser/trunk/Nevow/setup_egg.py?rev=2406

When Nevow uses pkg_resources to import its files such as  
"default.css", then it is able to find at runtime, even if is being  
imported from a py2exe or py2app zip, or on other platforms where its  
homegrown setup script and homegrown "find my file" function fail.   
So using pkg_resources (and setuptools to install it) makes  
"test_nevow" pass on all of the allmydata.org buildslaves:

http://allmydata.org/buildbot/waterfall?show_events=false


Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)

2008-03-19 Thread zooko
On Mar 19, 2008, at 3:23 PM, Guido van Rossum wrote:

> If other people want to chime in please do so; if this is just a
> dialog between Phillip and me I might incorrectly assume
> that nobody besides Phillip really cares.

I really care.  I've used setuptools, easy_install, eggs, and  
pkg_resources extensively for the past year or so (and contributed a  
few small patches).  There have been plenty of problems, but I find  
them to be overall useful tools.

It is a great boon to a programming community to lower the costs of  
re-using other people's code.  The Python community will benefit  
greatly once a way to do that becomes widely enough accepted to reach  
a tipping point and become ubiquitous.  Setuptools is already the de  
facto standard, but it hasn't become ubiquitous, possibly in part  
because of "egg hatred", about which more below.

I've interviewed several successful Python hackers who "hate eggs" in  
order to understand what they hate about them, and I've taken notes  
from some of these interviews.  (The list includes MvL, whose name  
was invoked earlier in this thread.)

After filtering out yer basic complaining about bugs (which  
complaints are of course legitimate, but which don't indict  
setuptools as worse than other software of comparable scope and  
maturity), their objections seem to fall into two categories:

1.  "The very notion of package dependency resolution and  
programmable or command-line installation of packages at the language  
level is a bad notion."

This can't really be the case.  If the existence of such  
functionality at the programming language level were an inherently  
bad notion, then we would be hearing some complaints from the Ruby  
folks, where the Gems system is standard and ubiquitous.  We hear no  
complaints -- only murmurs of satisfaction.  One person recently  
reported to me that while there are more packages in Python, he finds  
himself re-using other people's code more often when he works in  
Ruby, because almost all Ruby software is Gemified, but only a  
fraction of Python software is Eggified.

Often this complaint comes with the idea that eggs conflict with  
their system-level package management tools.  (These are usually  
Debian/Ubuntu users.)

Note that Ruby software is not too hard to include in operating  
system packaging schemes -- my Ubuntu Hardy apt-cache shows plenty of  
Ruby software.  A sufficiently mature and widely supported setuptools  
could actually make it easier to integrate Python software into  
Debian -- see stdeb [1].

2.  "Setuptools/eggs give me grief."

What can really be the case is that setuptools causes a host of  
small, unnecessary problems for people who prefer to do things  
differently than PJE does.  Personally, I prefer to use GNU stow, and  
setuptools causes unnecessary, but avoidable, problems for me.  Many  
people object (rightly enough) to a "./setup.py install"  
automatically fetching new software over the Internet by default.   
The fact that easy_install creates a site.py that changes the  
semantics of PYTHONPATH is probably the most widely and deservedly  
hated example of this kind of thing [2].  I could go on with a few  
other common technical complaints of this kind.

These type-2 problems can be fixed by changing setuptools or they can  
be grudgingly accepted by users, while retaining compatibility with  
the large and growing ecosystem of eggy software.  Certainly fixing  
setuptools to play better with others is a more likely path to  
success than setting out to invent a non-egg-compatible alternative.   
Such a project might never be implemented well enough to serve, and  
if it were it would probably never overtake eggs's lead in the Python  
ecosystem, and if it did it would probably not turn out to be a  
better tool.

So, since you asked for my chime, I advise you to publically bless  
eggs, setuptools, and easy_install as plausible future standards and  
solicit patches which address the complaints.  For that matter,  
soliciting specific complaints would be a good start.  I've done so  
in private many times with only partial success as to the "specific"  
part.  One promising approach is to request objections in the form of  
automated tests that setuptools fails, e.g. [3].

Regards,

Zooko O'Whielacronx

[1] http://stdeb.python-hosting.com/
[2] http://www.rittau.org/blog/20070726-02
And no, PJE's suggested "trivial fix" does not satisfy the  
objectors, as it can't support the use case of "cd somepkg ; python ./ 
setup.py install ; cd .. ; python -c 'import somepkg'".
[3] http://twistedmatrix.com/trac/ticket/2308#comment:5
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)

2008-03-20 Thread zooko
On Mar 20, 2008, at 7:44 AM, Tres Seaver wrote:

> Paul Moore wrote:

>> 4. Hard to use with limited connectivity. At work, I *only* have
>> access to the internet via Internet Explorer (MS based proxy). There
>> are workarounds, but ultimately "download an installer, then run it"
>> is a far simpler approach for me.
>
> I don't know how to make this requirement compatible with using shared
> dependencies,

We've done something like this.

The http://allmydata.org project bundles its easy_installable  
dependencies.  If you get the current trunk from our darcs repository  
[1], or get a release tarball or a snapshot tarball from [2], then it  
comes with a directory named "misc/dependencies" which has the source  
tarballs of our easy_installable dependencies.  You can browse this  
directory on the web: [3].

Therefore, if you manually satisfy the non-easy_installable  
dependencies, you can download an allmydata.org tarball, disconnect  
from the Internet (which we call "moving to a Desert Island"), and  
install it.

This is, as you say, "compatible with using shared dependencies"  
because setuptools will detect if you already have sufficiently new  
versions of some of these dependencies installed (for example, if  
they are installed in Debian packages), and then skip the step of  
installing that dependency from its source tarball.

The remaining dependencies that cannot be satisfied automatically by  
our setup.py are listed in the install.html [4].  They are:

1.  g++ >= v3.3 -- the Cygwin version of gcc/g++ works for Cygwin  
and for Windows
2. GNU make
3. Python >= v2.4.2 including development headers i.e. "Python.h"
4. Twisted >= v2.4.0 -- from the Twisted "sumo" source tarball
5. OpenSSL >= v0.9.7, including development headers
6. PyOpenSSL == v0.6
7. Crypto++ >= v5.2.1, including development headers

I am hoping that in the future Twisted (see twisted #1286 [5]) and  
pyOpenSSL will be easy_installable, and that our use of setuptools  
plugins will eventually replace our GNUmakefile and thus remove our  
dependency on GNUmake.  That will leave only g++, Python, OpenSSL,  
and Crypto++ as dependencies that a user has to manually deal with in  
order to build allmydata.org from source.

Regards,

Zooko

[1] http://allmydata.org/source/tahoe/trunk/
[2] http://allmydata.org/source/tahoe/tarballs/
[3] http://allmydata.org/trac/tahoe/browser/misc/dependencies
[4] http://allmydata.org/source/tahoe/trunk/docs/install.html
[5] http://twistedmatrix.com/trac/ticket/1286
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wow, I think I actually *get* it now!

2008-03-21 Thread zooko
Phillip J. Eby wrote:
> Hm.  So it seems to me that maybe one thing that would help is a
> "Setuptools Haters' Guide To Setuptools" -- that is, *short*
> documentation specifically written for people who don't want to use
> setuptools and want to minimize its impact on their systems.


Perhaps relevant are my blog entries on how to use setuptools with  
GNU stow:

https://zooko.com/log-2007.html#d2007-04-27- 
distutils_or_setuptools_with_GNU_stow

https://zooko.com/log-2007.html#d2007-06-02- 
distutils_or_setuptools_with_GNU_stow

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)

2008-03-21 Thread zooko

On Mar 20, 2008, at 6:22 PM, Robert Brewer wrote:

> Phillip J. Eby wrote:
>> The other tool that would be handy to have, would be one that unpacks
>> eggs into standard distutils-style installation.
>
> Hear, hear. I'm an author of a couple libraries that need to
> interoperate with others. Of the many eggs I've downloaded over the  
> past
> year, I'd say 80%+ are never installed or even built--I just want to
> grep the source code, and using my preferred tools, not some lame Find
> command in a ZIP browser menu.

Um, isn't this tool called "unzip"?  I have done this -- accessed the  
source code -- many times, and unzip suffices.

I don't know what else would be required in order to make an egg into  
"a standard distutils-style installation".  Until PJE's comment  
above, I thought that unzip already accomplished exactly that.

Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] how to easily consume just the parts of eggs that are good for you

2008-03-26 Thread zooko
Folks:

Here is a simple proposal:  make the standard Python "import"  
mechanism notice eggs on the PYTHONPATH and insert them (into the  
*same* location) on the sys.path.

This eliminates the #1 problem with eggs -- that they don't easily  
work when installing them into places other than your site-packages  
and that if you allow any of them to be installed on your system then  
they take precedence over your non-egg packages even you explicitly  
put those other packages earlier in your PYTHONPATH.  (That latter  
behavior is very disagreeable to more than a few prorgammers.)

This also preserves most of the value of eggs for many use cases.

This is backward-compatible with most current use cases that rely on  
eggs.

This is very likely forward-compatible with new schemes that are  
currently being cooked up and will be deployed in the future.

Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how to easily consume just the parts of eggs that are good for you

2008-04-08 Thread zooko

On Mar 26, 2008, at 7:34 PM, Chris McDonough wrote:
> zooko wrote:

http://mail.python.org/pipermail/python-dev/2008-March/078243.html

>> Here is a simple proposal:  make the standard Python "import"   
>> mechanism notice eggs on the PYTHONPATH and insert them (into the   
>> *same* location) on the sys.path.
>> This eliminates the #1 problem with eggs -- that they don't  
>> easily  work when installing them into places other than your site- 
>> packages  and that if you allow any of them to be installed on  
>> your system then  they take precedence over your non-egg packages  
>> even you explicitly  put those other packages earlier in your  
>> PYTHONPATH.  (That latter  behavior is very disagreeable to more  
>> than a few prorgammers.)
>
> Sorry if I'm out of the loop and there's some subtlety here that  
> I'm disregarding, but it doesn't appear that either of the issues  
> you mention is a actually problem with eggs.  These are instead  
> problems with how eggs get installed by easy_install (which uses  
> a .pth file to extend sys.path).  It's reasonable to put eggs on  
> the PYTHONPATH manually (e.g. sys.path.append('/path/to/some.egg'))  
> instead of using easy_install to install them.

Yes, you are missing something.  While many programmers, such as  
yourself and Lennart Regebro (who posted to this thread) find the  
current eggs system to be perfectly convenient and to Just Work, many  
others, such as Glyph Lefkowitz (who posted to a related thread) find  
them to be so annoying that they actively ensure that no eggs are  
ever allowed to touch their system.

The reasons for this latter problem are two:

1.  You can't conveniently install eggs into a non-system directory,  
such as ~/my-python-stuff.

2.  If you allow even a single egg to be installed into your  
PYTHONPATH, it will change the semantics of your PYTHONPATH.

Both of these problems are directly caused by the need for eggs to  
hack your site.py.  If Python automatically added eggs found in the  
PYTHONPATH to the sys.path, both of these problems would go away.

I am skeptical that the current proposals to define a new database  
for installed packages will fare any better than the current eggs  
scheme does in this respect.

This issue is important to me, because the benefits of eggs grow  
superlinearly with the number of good programmers who use them.  They  
are a tool for re-using source code -- a tool for cooperation between  
programmers.  To gain the greatest benefits at this point we do not  
need to add new features to eggs, we need to make them more palatable  
to more good programmers.

I am skeptical that prorgammers are going to be willing to use a new  
database format.  They already have a database -- their filesystem --  
and they already have the tools to control it -- mv, rm, and  
PYTHONPATH.  Many of them already hate the existence the  
"easy_instlal.pth" database file, and I don't see why a new database  
file would be any different.

My proposal makes the current benefits of eggs -- clean, easy code re- 
use among programmers -- more compatible with their current tools --  
mv, rm, and PYTHONPATH.  It is also forward-compatible with more  
sophisticated proposals to add features like uninstall and operating  
system integration.

By the way, since I posted my proposal two weeks ago I have pointed a  
couple of Python hackers who currently refuse to use eggs at the URL:

http://mail.python.org/pipermail/python-dev/2008-March/078243.html

They both agreed that it made perfect sense.  I told one of them  
about the alternate proposal to define a new database file to contain  
a list of installed packages, and he sighed and rolled his eyes and  
said "So they are planning to reinvent apt!".

Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

2008-04-08 Thread zooko
On Apr 8, 2008, at 11:27 AM, Lloyd Kvam wrote:
>
> When I wear my sysadmin hat, eggs become a nuisance.
...
> As a developer, eggs are great.
...
> Fortunately, distutils includes tools like bdist_rpm so that python
> modules can be packaged for easy processing by the system package
> manager.  So once I need to switch back to a sysadmin role, I can use
> the system tools to install and track packages.

This is the same experience I have.  I rely on setuptools and eggs  
extensively in developing our software, and I use setuptools and eggs  
as the primary method of giving our source code to other programmers.

But no software is ever installed on our production servers unless  
that software is in .deb form in an apt-gettable repository, and this  
requirement is unlikely to change for the forseeable future.

Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

2008-04-09 Thread zooko
On Apr 8, 2008, at 9:41 PM, Phillip J. Eby wrote:
>
> I'm curious.  Have any of you actually read PEP 262 in any detail?

I read it, though not in fine detail.

I didn't write that you are planning to reinvent apt.  I wrote that  
when programmers hear about this PEP they exclaim "They are planning  
to reinvent apt!".

That is a matter of perception and marketing -- the value that I want  
to gain from Python packages is the value of a critical mass of good  
programmers using compatible tools for code re-use.  If a lot of  
programmers hate an idea, then it doesn't matter what the details are  
-- it isn't going to provide this value to me.

I think part of our disagreement is that we are talking about two  
overlapping use cases: programmer and sysadmin (and "end user" which  
is much like sysadmin).  I am not, at this time, interested in the  
sysadmin use case.  As I've mentioned, my sysadmin needs are  
currently well satisfied by apt (and sometimes by GNU stow), and my  
fellow sysadmins with whom I work are absolutely not going to relax  
their "apt-only policy" for the forseeable future, so I cannot use  
such a tool unless it is named "apt" and written by Debian/Ubuntu.

On the other hand I am very interested in the programmer use case,  
because setuptools/easy_install already works pretty well for that,  
and we are already very close to achieving a critical mass of good  
programmers.  Recently several more packages that my project [1]  
relies on have become easy_installable -- Twisted, pywin32 (thanks to  
you, PJE), and foolscap -- and several more have had bugfixes which  
make them work better with easy_install/setuptools -- Nevow and  
zope.interface -- and there are some patches in the queue to make  
another one compatible with setuptools -- pyflakes [2, 3].

So setuptools/easy_install is already (slowly) winning.  I want to  
accelerate that process by reducing the degree to which it is  
incompatible, inconvenient, or objectionable to other programmers.

PEP 262 sounds like a non-starter to me because

1.  It appears to be backwards-incompatible with setuptools/ 
easy_install/eggs, thus losing all the recently gained cooperation  
that I mentioned in the previous paragraph, and

2.  It defines a new database file, where I would prefer either:
a.  Doing away with database files entirely and relying on the  
filesystem alone to hold that information, or
b.  Continuing to use the current ".pth" database file format,  
possibly improved by having native support for .pth files in the  
Python import machinery.

3.  Because of #2, it triggers programmers to exclaim "They are  
planning to reinvent apt!", thus making it unlikely that the new  
proposal will recapture the cooperation that setuptools has already  
(slowly) gained.

I'm sorry, PJE -- I know it must be frustrating to you to have your  
proposal criticized on social rather than technical grounds -- but  
social benefits are what I am seeking right now.

Perhaps PEP 262 and my proposal are not actually alternatives, but  
are complementary.  I do not object to Python maintaining a database  
of installed packages for itself (although I cannot *rely* upon such  
behavior, not least because I will be maintaining backwards  
compatibility with Python 2.4 for at least the next several years,  
and with Python 2.5 for at least the next several years after that).  
What I want is for the already implemented, tested, and deployed code- 
re-use features of setuptools/easy_install to be more widely  
accepted.  This is best and most easily achieved by fixing the two  
most frequent objections to setuptools/easy_install: 1.  That you  
can't conveniently install into an arbitrary directory, and 2. that  
it subverts the meaning of your PYTHONPATH.

Regards,

Zooko

[1] http://allmydata.org/source/tahoe/trunk/docs/install.html
[2] http://divmod.org/trac/ticket/2535
[3] http://divmod.org/trac/ticket/2048
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how to easily consume just the parts of eggs that are good for you

2008-04-09 Thread zooko

On Apr 8, 2008, at 4:36 PM, Greg Ewing wrote:
>
> I discovered another annoyance with eggs the other day -- it
> seems that tracebacks referring to egg-resident files contain the
> pathname of some temporary directory that existed when the egg
> was being packaged, rather than the one it actually exists in
> at run time.

Brian Warner and I discovered that issue yesterday, too.  We  
determined that if you install the egg (with easy_install or with a  
setuptools-powered ./setup.py install) in unzipped form then the  
source file names get rewritten so that your stack traces come with  
source lines.

If you have a package which requires stack traces to come with source  
lines, then you could pass "zip_safe=False" to the call to setup().

I would prefer that zip_safe=False were the default and that either  
the producer or the consumer of a package had to specifically choose  
zip_safe=True in order to install eggs in zipped form.

I've opened a ticket on my setuptools trac:

http://allmydata.org/trac/setuptools/ticket/4

Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

2008-04-09 Thread zooko

On Apr 9, 2008, at 6:00 AM, Phillip J. Eby wrote:
>>
>> By the way, if these tools work well, they are priceless!
>
> I haven't had need to use any of them, so I don't really know.

They are easydeb [1] and stddeb [2].  The former appears to be  
incomplete and unmaintained.  The latter appears to be usable, but  
somewhat incomplete -- substantial manual labor is required to use it  
successfully, as documented by my programming partner Brian Warner in  
this ticket: [3].

Regards,

Zooko

[1] http://easy-deb.sourceforge.net/
[2] http://stdeb.python-hosting.com/
[3] http://allmydata.org/trac/tahoe/ticket/251
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

2008-04-09 Thread zooko
On Apr 9, 2008, at 12:40 PM, Phillip J. Eby wrote:
>>
>> You are talking here about bdist_rpm and not about a tool that  
>> would take
>> a Python package distributed as an egg file and convert the egg to  
>> an rpm
>> or a deb.  Unfortunately, some Python packagers are beginning to  
>> limit
>> their focus only to egg distribution.  That creates a problem for  
>> users
>> who have native operating system package management.
>
> That is indeed a problem -- but it's a social one, not a technical
> one.  It's trivial for the publisher of an egg to change their
> command line from "setup.py bdist_egg upload" to "setup.py sdist
> bdist_egg upload", as soon as their users (politely) request that  
> they do so.

In general, it would be good if eggs came with .py files by default  
instead of .pyc files.

I've opened a ticket on my setuptools trac about this proposal:

http://allmydata.org/trac/setuptools/ticket/5 # binary eggs should  
come with .py files by default, rather than .pyc files

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

2008-04-09 Thread zooko

On Apr 9, 2008, at 4:12 PM, Phillip J. Eby wrote:

>> http://allmydata.org/trac/setuptools/ticket/5 # binary eggs should
>> come with .py files by default, rather than .pyc files
>
> Filling your tracker with already-rejected proposals isn't likely  
> to encourage me to look at it, especially when I've personally  
> rejected them to you in IRC.  That goes for your ticket #4 as well.

Part of my motivation in maintaining this tracker is to take issue  
discussions from IRC, and from mailing lists, and make them more  
permanent and structured.  This part is useful even for rejected  
proposals, as an historical record that other people interested in  
those issues can consult.

I will mark those two tickets as "rejected by PJE".  Could you please  
repeat (so that I don't misrepresent you due to my faulty memory of  
our IRC discussion from more than a year ago) your reason for  
rejecting these two:

http://allmydata.org/trac/setuptools/ticket/4 (when considering  
whether to zip, err on the side of safety rather than performance)

http://allmydata.org/trac/setuptools/ticket/5 (binary eggs should  
come with .py files by default, rather than .pyc files)

You are of course welcome to log in to that trac and update those  
tickets yourself, but if you prefer not to then I will paste your  
reasons into those tickets.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] an example of setuptools being used to good effect -- allmydata.org Tahoe

2008-04-11 Thread zooko
Folks:

I'm sorry, but I am not caught up on the current conversation about  
packaging.  I'm very busy with exciting Python development -- http:// 
allmydata.com and http://allmydata.org .  I understand from PJE's  
message that he thinks I misunderstand some things about PEP 262;  
this is entirely possible.  I intend to catch up on reading the  
emails of this conversation and to read carefully PJE's messages  
about PEP 262 in the coming days.

Meanwhile, here is the last message that I wrote before I got swamped  
with the aforementioned excitement:


On Apr 9, 2008, at 5:59 PM, Greg Ewing wrote:
> Paul Moore wrote:
>
>> I believe that Mac OS X goes for an even simpler structure -
>> applications store *everything* in the one directory, so that
>> install/uninstall is simply a directory copy/remove.
>
> Yep, and thereby cuts the whole gordian knot, throws the
> pieces on the fire and burns them. :-)
>
> Package managers have always seemed to me to be an
> excessively complex solution to a problem that needn't
> have existed in the first place.

Yes!  I love the Zen of the Mac OS X packaging approach.  The best  
install is none at all!  (Of course, I also love apt.)


> I keep hoping that someday Linux will support something
> like MacOSX application bundles and frameworks, but I
> haven't seen any sign of it yet.

We're slowly approaching this level of simplicity in packaging of the  
*source code* of Allmydata.org "Tahoe", using setuptools.

http://allmydata.org/source/tahoe/trunk/docs/install.html

The list of dependencies which are automatically resolved by  
setuptools is visible here: [1].  It currently includes zfec,  
foolscap, simplejson, pycryptopp, nevow, zope.interface, twisted, and  
pywin32.

This automatic resolution of dependencies works while fully  
preserving the user's ability to use their own packages and their own  
packaging tools.  That is:

1. If any of these dependencies are already installed, such as in a  
Debian package or if the user has installed them by hand, then  
installing Tahoe will use the already-installed versions and not  
install anything new, and

2. For any of these dependencies that are not already installed,  
installing Tahoe will *not* write these dependencies into your  
standard system directory (which is potentially a sacred place where  
only your own packaging tool or your root account is allowed to  
tread) but will instead write them into a local, newly-created  
install directory from which you can then run Tahoe.  (This part is  
similar in spirit to the Mac OS packaging technique.)

Also, this install process never downloads anything from the Internet  
at install time, per our policy [2, 3], which also happens to be a  
policy some other people have, e.g. [4, 5].

This works on all of our supported platforms, which includes Linux,  
Solaris, Windows, Cygwin, and Mac OS X.

Oh yes, we also have our buildbot [6] automatically produce Debian  
packages for edgy, feisty, etch, and gutsy.

As far as I know, all of this is accomplished without breaking any of  
the use cases traditionally associated with setuptools /  
easy_install, for example "easy_install allmydata-tahoe" works, and  
if you want "setup.py install" to install into your standard system  
directory, it will.

The reason that I am posting this is to let other programmers know  
that setuptools is actually a pretty useful tool, even if the use  
cases that you want to support are incompatible with certain  
easy_install traditions such as fetching new packages from the  
internet at buildtime or installing into your system directory.

Regards,

Zooko

P.S.  Two days ago I was able to remove twisted from the list of  
"Manual Dependencies" that people have to be aware of in order to try  
out Allmydata Tahoe from the source tarball.

I think I can safely remove pyOpenSSL now, but that remains to be  
properly tested by our buildbot.

I will be able to remove Crypto++ soon, due to the pycryptopp [7]  
library.

If I can figure out a hack to work-around one of the major  
frustrations of setuptools (that you can't simply run "./setup.py  
install --prefix=$FOO"), and if I finish my setuptools plugin to run  
Twisted trial instead pyunit when "./setup.py test", then I'll be  
able to remove GNU make from the dependencies.

That will leave only g++, Python, and OpenSSL as packages that a  
programmer has to manually deal with in order to try out Allmydata  
Tahoe from source.

[1] http://allmydata.org/trac/tahoe/browser/_auto_deps.py
[2] http://allmydata.org/trac/tahoe/wiki/Packaging
[3] http://allmydata.org/trac/tahoe/ticket/229
[4] http://bytes.com/forum/thread666455.html
[5] http://fedoraproject.org/wiki/Packaging/Python/Eggs
[6] http://allmydata.org/buildbot/waterfall?show_events=false
[7] http://allmydata.org/trac/pycr

[Python-Dev] shal we redefine "module" and "package"?

2008-04-30 Thread zooko

Folks:

Here's an experiment you can perform.  Round up a Python programmer  
and ask him the following three questions:


Q1.  You type "import foo" and it works.  What kind of thing is foo?

Q2.  You go to the Python package index and download something named  
"bar-1.0.0.tar.gz".  What kind of thing is bar?


Q3.  What is a "distribution"?

I'm willing to bet that you will get the following answers:

A1.  foo is a module.

A2.  bar is a package.

A3.  A distribution is a version of Linux that comes with a lot of  
Free Software.



Unfortunately these answers aren't quite right.  A "package" is  
actually a directory containing an __init__.py file, and a  
distribution is actually what you think of when you say "package" --  
a reusable package of Python code that you can, for example, get from  
the Python package index.


Educational efforts such as the Python tutorial and the distutils  
docs have not succeeded in training Python programmers to understand  
the terminology for these things as used by the Python implementors,  
so perhaps instead the implementors should start using the  
terminology understood by the programmers:


1.  A "module" shall henceforth be the name for either a foo.py file  
(a single-file module), or a directory with an __init__.py in it (a  
directory module).


2.  A "package" shall henceforth be the name of the thing that is  
currently called a "distribution".



Regards,

Zooko

who doesn't mind stirring up trouble on occasion...

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] shal we redefine "module" and "package"?

2008-05-01 Thread zooko

On Apr 30, 2008, at 5:11 PM, [EMAIL PROTECTED] wrote:

I have a less disruptive counterproposal.  How about just starting  
to refer to directories (or "folders", or zip entries) with  
'__init__.py' in them as "package modules"?  A package is-a module  
anyway.


That's a good idea.

I belive a multi-word term here would be similarly more memorable  
and precise.  A "package distribution" would include the more  
familiar term while still being specific, consistent with the old  
terminology, and correct.  Using a qualifying word is probably a  
good idea in this context anyway.  I usually say "debian package",  
"RPM", "MSI", or "tarball" unless I'm specifically talking about  
"packages for your platform",


That's a good one too.

almost always in the phrase, "please do not use distutils to do a  
system install of Twisted, use the specific package for your  
platform".


This is a tangent, but why do you give that advice?  I typically give  
people the opposite advice on how to install Twisted.


I do, however, agree with Steve emphatically on your original  
proposal. Changing the terminology now will make billions upon  
billions of Python web pages, modules (c.f.  
twisted.python.modules.PythonModule.isPackage()) documents, and  
searchable message archives obsolete, not to mention that 90% of  
the community will probably ignore you and use the old terminology  
anyway, creating more confusion than it eliminates.


I suspect 90% of the community already uses my proposed terminology  
-- that was my original challenge to round up a Python programmer and  
find out.


But I agree that my proposal would contribute to confusion and  
disruption, and I like your counterproposals better, at least for now.


Directories, folders, or zip entries with __init__.py in them are  
"package modules", and Python packages are "package distributions".


Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: add odict to collections

2008-06-15 Thread zooko

On Jun 14, 2008, at 8:26 PM, Guido van Rossum wrote:


No, but an ordered dict happens to be a *very* common thing to need,
for a variety of reasons. So I'm +0.5 on adding this to the
collections module. However someone needs to contribute working code.


Here's an LRU cache that I've used a few times over the years:

http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/cache.py

This is just like a dict ordered by insertion, except:

1.  That it removes the oldest entry if it grows beyond a limit.

2.  That it moves an entry to the head of the queue when has_key() is  
called on that item.


So, it would be easy to change those two behaviors in order to use  
this implementation.  There are actually three implementations in  
that file: one that is asymptotically O(1) for all operations (using  
a double-linked list woven into the values of the dict), and one that  
uses a Python list to hold the order, so it is faster for small  
enough dicts.  The third implementation is an implementation that  
someone else wrote that I included just for comparison purposes --  
the comparison showed that each of mine was better.


Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: add odict to collections

2008-06-15 Thread zooko

On Jun 15, 2008, at 12:20 PM, zooko wrote:

So, it would be easy to change those two behaviors in order to use  
this implementation.  There are actually three implementations in  
that file: one that is asymptotically O(1) for all operations  
(using a double-linked list woven into the values of the dict), and  
one that uses a Python list to hold the order, so it is faster for  
small enough dicts.


P.S.  I didn't mean to fall for the common misunderstanding that hash  
table operations are O(1).  What I should have written is that my  
ordered dict technique *adds* only O(1) time to the time of the dict  
on which it is built.


As to the question of how important or common this data structure is,  
I have to admit that while I implemented this one and used it several  
times (always exclusively for LRU caching), I currently don't use it  
for anything.  Nowadays I try to avoid doing transparent caching  
(such as LRU caches are often used for) in favor of explicit  
management of the resource.


Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Base-85

2008-08-11 Thread zooko

On Aug 2, 2008, at 13:58 PM, Antoine Pitrou wrote:


Martin v. Löwis  v.loewis.de> writes:


P.S. Just in case it isn't clear: I would oppose any specific  
proposal

to add this Ascii85 algorithm to the standard library. It would sound
like we don't have any real problems to solve.


According to Wikipedia, "its main modern use is in Adobe's  
PostScript and

Portable Document Format file formats".

... git ... mercurial ... bzr

It's sort of too bad about the April Fool's RFC, because now people  
tend to think that an encoding with a non-power-of-2 base is just a  
joke.


I had to overcome that when working with my programming partner, but  
he eventually decided that base-62 was indeed a useful encoding for  
our purposes.  :-)


I've written a few ascii encoders over the years, mostly in Python,  
plus an optimized C version of base-32 (with a real live Duff's Device):


base62.py:

http://allmydata.org/source/z-base-62/trunk-hashedformat/z-base-62/ 
base62/base62.py


base36.py:

http://allmydata.org/source/z-base-36/trunk-hashedformat/z-base-36/ 
base36/base36.py


base32.py:

http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/ 
base32/base32.py


base32.c:

http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/base32.c

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bsddb

2008-09-08 Thread zooko

On Sep 7, 2008, at 12:04 PM, Gregory P. Smith wrote:


FWIW, many years ago in the past when I asked sleepycat about this
(long before oracle bought them) they said that python was considered
to be the application.  Using berkeleydb via python for a commercial
application did not require a berkeleydb license.


They also posted a FAQ on their web site which included that  
statement, including specifically declaring that using BerkeleyDB via  
Python for a commercial product did not require a commercial licence.


Oh, look, it is still there:

http://www.oracle.com/technology/software/products/berkeley-db/htdocs/ 
licensing.html


"""
Q. Do I have to pay for a Berkeley DB license to use it in my Perl or  
Python scripts?


A. No, you may use the Berkeley DB open source license at no cost.  
The Berkeley DB open source license requires that software that uses  
Berkeley DB be freely redistributable. In the case of Perl or Python,  
that software is Perl or Python, and not your scripts. Any scripts  
you write are your property, including scripts that make use of  
Berkeley DB. None of the Perl, Python or Berkeley DB licenses place  
any restrictions on what you may do with them.

"""

Regards,

Zooko
---
http://allmydata.org -- Tahoe, the Least-Authority Filesystem
http://allmydata.com -- back up all your files for $5/month

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Zooko O'Whielacronx
Thanks for writing this PEP 383, MvL.  I recently ran into this  
problem in Python 2.x in the Tahoe project [1].  The Tahoe project  
should be considered a good use case showing what some people need.   
For example, the assumption that a file will later be written back  
into the same local filesystem (and thus luckily use the same  
encoding) from which it originally came doesn't hold for us, because  
Tahoe is used for file-sharing as well as for backup-and-restore.


One of my first conclusions in pursuing this issue is that we can  
never use the Python 2.x unicode APIs on Linux, just as we can never  
use the Python 2.x str APIs on Windows [2].  (You mentioned this  
ugliness in your PEP.)  My next conclusion was that the Linux way of  
doing encoding of filenames really sucks compared to, for example,  
the Mac OS X way.  I'm heartened to see what David Wheeler is trying  
to persuade the maintainers of Linux filesystems to improve some of  
this: [3].


My final conclusion was that we needed to have two kinds of  
workaround for the Linux suckage: first, if decoding using the  
suggested filesystem encoding fails, then we fall back to mojibake  
[4] by decoding with iso-8859-1 (or else with windows-1252 -- I'm not  
sure if it matters and I haven't yet understood if utf-8b offers  
another alternative for this case).  Second, if decoding succeeds  
using the suggested filesystem encoding on Linux, then write down the  
encoding that we used and include that with the filename.  This  
expands the size of our filenames significantly, but it is the only  
way to allow some future programmer to undo the damage of a falsely- 
successful decoding.  Here's our whole plan: [5].


Regards,

Zooko

[1] http://allmydata.org
[2] http://allmydata.org/pipermail/tahoe-dev/2009-March/001379.html #  
see the footnote of this message

[3] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
[4] http://en.wikipedia.org/wiki/Mojibake
[5] http://allmydata.org/trac/tahoe/ticket/534#comment:47
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Zooko O'Whielacronx

On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:

Are you proposing to unconditionally encode file names as  
iso8859-15, or to do so only when undecodeable bytes are encountered?


For what it is worth, what we have previously planned to do for the  
Tahoe project is the second of these -- decode using some 1-byte  
encoding such as iso-8859-1, iso-8859-15, or windows-1252 only in the  
case that attempting to decode the bytes using the local alleged  
encoding failed.


If you switch to iso8859-15 only in the presence of undecodable  
UTF-8, then you have the same round-trip problem as the PEP: both  
b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a  
way to unambiguously recover the original file name.


Why do you say that?  It seems to work as I expected here:

>>> '\xff'.decode('iso-8859-15')
u'\xff'
>>> '\xc3\xbf'.decode('iso-8859-15')
u'\xc3\xbf'
>>>
>>>
>>>
>>> '\xff'.decode('cp1252')
u'\xff'
>>> '\xc3\xbf'.decode('cp1252')
u'\xc3\xbf'

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-28 Thread Zooko O'Whielacronx

On Apr 28, 2009, at 13:01 PM, Thomas Breuel wrote:

(2) Should the default UTF-8 encoder for file system operations be  
allowed to generate illegal byte sequences?


I think that's a definite no; if I set the encoding for a device to  
UTF-8, I never want Python to try to write illegal UTF-8 strings to  
my device.

...
If people really want the option of (3c), then I think encoders  
related to the file system should by default reject those strings  
as illegal because the potential problems from writing them are  
just too serious.  Printing routines and UI routines could display  
them without error (but some clear indication), of course.


For what it is worth, sometimes we have to write bytes to a POSIX  
filesystem even though those bytes are not the encoding of any string  
in the filesystem's "alleged encoding".  The reason is that it is  
common for there to be filenames which are not the encodings of  
anything in the filesystem's alleged encoding, and the user expects  
my tool (Tahoe-LAFS [1]) to copy that name to a distributed storage  
grid and then copy it back unchanged.  Even though, I re-iterate,  
that name is *not* a valid encoding of anything in the current encoding.


This doesn't argue that this behavior has to be the *default*  
behavior, but it is sometimes necessary.


It's too bad that POSIX is so far behind Mac OS X in this respect.   
(Also so far behind Windows, but I use Mac as the example to show how  
it is possible to build a better system on top of POSIX.)  Hopefully  
David Wheeler's proposals to tighten the requirements in Linux  
filesystems will catch on: [2].


Regards,

Zooko

[1] http://allmydata.org
[2] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 and GUI libraries

2009-04-30 Thread Zooko O'Whielacronx
x27;, 'python-replace')" with
".decode('windows-1252')" and it works just as well.  While UTF-8b seems
like a really cool hack, and it would produce more legible results if
utf-8-encoded strings were partially corrupted, I guess I should just
use 'windows-1252' which is already implemented in Python 2 (as well as
in all other software in the world).

I guess this means that PEP 383, which I have approved of and liked so
far in this discussion, would actually not help Tahoe at all and would
in fact harm Tahoe -- I would have to remember to detect and work-around
the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python
3.

If anyone else has a concrete, real use case which would be helped by
PEP 383, I would like to hear about it.  Perhaps Tahoe can learn
something from it.

Oh, if this PEP could be extended to add a flag to each unicode object
indicating whether it was created with the python-escape handler or not,
then it would be useful to me.

Regards,

Zooko

[1] http://mail.python.org/pipermail/python-dev/2009-April/089020.html
[2] http://allmydata.org/trac/tahoe/attachment/ticket/534/fsencode.3.py
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 and GUI libraries

2009-05-01 Thread Zooko O'Whielacronx
Following-up to my own post to correct a major error:


On Thu, Apr 30, 2009 at 11:44 PM, Zooko O'Whielacronx  wrote:
> Folks:
>
> My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary
> binary names from the filesystem and store them so that I can regenerate
> the same byte string later, but it also requires that I *know* whether
> what I got was a valid string in the expected encoding (which might be
> utf-8) or whether it was not and I need to fall back to storing the
> bytes.

Okay, I am wrong about this.  Having a flag to remember whether I had to
fall back to the utf-8b trick is one method to implement my requirement,
but my actual requirement is this:

Requirement: either the unicode string or the bytes are faithfully
transmitted from one system to another.

That is: if you read a filename from the filesystem, and transmit that
filename to another system and use it, then there are two cases:

Requirement 1: the byte string was valid in the encoding of source
system, in which case the unicode name is faithfully transmitted
(i.e. the bytes that finally land on the target system are the result of
sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding).

Requirement 2: the byte string was not valid in the encoding of source
system, in which case the bytes are faithfully transmitted (i.e. the
bytes that finally land on the target system are the same as the bytes
that originated in the source system).

Now I finally understand how fiendishly clever MvL's PEP 383
generalization of Markus Kuhn's utf-8b trick is!  The only thing
necessary to achieve both of those requirements above is that the
'python-escape' error handler is used on the target system .encode() as
well as on the source system .decode()!

Well, I'm going to have to let this sink in and maybe write some code to
see if I really understand it.

But if this is right, then I can do away with some of the mechanism that
I've built up, and instead:

Backport PEP 383 to Python 2.

And, document the PEP 383 trick in some generic, widely respected format
such as an Internet Draft so that I can explain to other users of the
Tahoe data (many of whom use other languages than Python) what they have
to do if they find invalid utf-8 in the data.  Oh good, I just realized
that Tahoe emits only utf-8, so all I have to do is point them to the
utf-8b documents (such as they are) and explain that to read filenames
produced by Tahoe they have to implement utf-8b.  That's really good
that they don't have to implement MvL's generalization of that trick to
other encodings, since utf-8b is already understood by some folks.


Okay, I find it surprisingly easy to make subtle errors in this encoding
stuff, so please let me know if you spot one.  Is it true that
srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
'python-escape') will always produce srcbytes ?  That is my Requirement
2.


Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 and GUI libraries

2009-05-01 Thread Zooko O'Whielacronx
Folks:

Being new to the use of gmail, I accidentally sent the following only
to MvL and not to the list.  He promptly replied with a helpful
counterexample showing that my design can suffer collisions.  :-)

Regards,

Zooko


On Fri, May 1, 2009 at 10:38 AM, "Martin v. Löwis"  wrote:
>>
>> Requirement: either the unicode string or the bytes are faithfully
>> transmitted from one system to another.
>
> I don't understand this requirement very well, in particular not
> the "faithfully" part.
>
>> That is: if you read a filename from the filesystem, and transmit that
>> filename to another system and use it, then there are two cases:
>
> What do you mean by "use it"? Things like opening files? How does
> that work? In general, a file name valid on one system is invalid
> on a different system - or, at least, refers to a different file
> over there. This is independent of encodings.

Tahoe is a backup and filesharing program, so you might for example,
execute "tahoe cp -r Motörhead tahoe:" to copy all the contents of
your "Motörhead" directory to your Tahoe filesystem.  Later you or a
friend, might execute "tahoe cp -r tahoe:Motörhead ." to copy
everything from that directory within your Tahoe filesystem to your
local filesystem.  So in this case the flow of information is
local_system_1 -> Tahoe -> local_system_2.

The Requirement 1 is that for each filename encountered which is a
valid encoding in local_system_1, then the resulting (unicode) name is
transmitted through the Tahoe filesystem and then written out into
local_system_2 in the expected way (i.e. just by using the Python
unicode APIs and passing the unicode object to them).

Requirement 2 is that for each filename encountered which is not a
valid encoding in local_system_1, then the original bytes are
transmitted through the Tahoe filesystem and then, if the target
system is a byte-oriented system such as Linux, the original bytes are
written into the target filesystem.  (If the target is not Linux then
mojibake! but we don't have to go into that now.)

Does that make sense?

> In all your descriptions, I'm puzzled as to where exactly you get
> the source bytes from. If you use the PEP 383 interfaces, you will
> start with character strings, not byte strings, always.

On Mac and Windows, we use the Python unicode APIs e.g.
os.listdir(u"Motörhead").  On Linux and Solaris, we use the Python
bytestring APIs e.g.
os.listdir("Motörhead".encode(sys.getfilesystemencoding())).

>> Okay, I find it surprisingly easy to make subtle errors in this encoding
>> stuff, so please let me know if you spot one.  Is it true that
>> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
>> 'python-escape') will always produce srcbytes ?
>
> I think you mixed up bytes and unicode here: if srcbytes is indeed
> a bytes object, then you can't apply .encode to it.

Yep, I reversed the order of encode() and decode().  However, my whole
statement was utterly wrong and shows that I still didn't fully get it
yet.  I have flip-flopped again and currently think that PEP 383 is
useless for this use case and that my original plan [1] is still the
way to go.  Please let me know if you spot a flaw in my plan or a
ridiculousity in my requirements, or if you see a way that PEP 383 can
help me.

Thank you very much.

Regards,

Zooko

[1] http://allmydata.org/trac/tahoe/ticket/534#comment:47
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 and GUI libraries

2009-05-02 Thread Zooko O'Whielacronx
file.
Therefore these three requirements imply that we have to detect such
collisions and deal with them somehow. (Thanks to Martin v. Löwis for
reminding me of this.)

Possible Requirement 4 (faithful bytes if not unicode, a.k.a.
"round-tripping"): Suppose you have a directory with some files with
Japanese names, encoded using shift-jis, and some files with Russian
names, encoded using koi8-r. Suppose your locale is set to shift-jis,
and then you do "tahoe cp -r myfiles/ tahoe:". Then suppose you or
someone else does "tahoe cp -r tahoe: copy_of_myfiles/". The
"round-tripping" feature is that the files with Russian names that did
not accidentally decode cleanly with shift-jis still have the same
bytes in their names as they did in the original myfiles directory.

As I write this, I am becoming skeptical of this (faithful bytes if
not unicode, a.k.a. "round-tripping"), thanks in part to criticism
from James Knight, MvL, Thomas Breuel, and others. One reason to be
skeptical is that about a third of the Russian files will happen to
decode cleanly as shift-jis anyway, and will therefore come out as
something entirely different if the target filesystem's encoding is
something other than shift-jis. But an even worse problem -- the
show-stopper for me -- is that I don't want what Tahoe shows when you
do "tahoe ls" or view it in a web browser to differ from what it
writes out when you do "tahoe cp -r tahoe: newfiles/". So I'm ready to
reject this one.

Now about the "metadata" part which is separate from the filename
itself. I have another requirement:

Requirement 5 (no loss of information):  I don't want Tahoe to destroy
information -- every transformation should be (in principle)
reversible by some future computer-augmented archaeologist. For
example, if a bytestring decodes cleanly with the locale's suggested
encoding, and we use the resulting unicode as the filename, then we
also store the original byte string in the metadata since we don't
know if the locale's suggested encoding was good. This allows the
later invention of a tool which shows the user what the filename would
have been with other encodings and let the user choose one that makes
sense. It is important to note that this does not impose any
requirement on the *filename* itself -- all such information can be
stored in the metadata.

Okay, in light of the above four requirements and the rejection of #4,
I hereby propose to change from the previous Tahoe design [2] to the
following:

To copy an entry from a local filesystem into Tahoe:

1. On Windows or Mac read the filename with the unicode APIs.
Normalize the string with filename = unicodedata.normalize('NFC',
filename). Leave the "original_bytes" key and the "failed_decode" flag
out of the metadata.

2. On Linux or Solaris read the filename with the string APIs, and
store the result in the "original_bytes" part of the metadata. Call
sys.getfilesystemencoding() to get an alleged_encoding. Then, call
bytes.decode(alleged_encoding, 'strict') to try to get a unicode
object.

2.a. If this decoding succeeds then normalize the unicode filename
with filename = unicodedata.normalize('NFC', filename), store the
resulting filename and leave the "failed_decode" flag out of the
metadata.

2.b. If this decoding fails, then we decode it again with
bytes.decode('latin-1', 'strict'). Do not normalize it. Store the
resulting unicode object into the "filename" part, set the
"failed_decode" flag to True. This is mojibake!

3. (handling collisions)  In either case 2.a or 2.b the resulting
unicode string may already be present in the directory. If so, check
the failed_decode flags on the current entry and the new entry. If
they are both set or both unset then the new entry overwrites the old
entry -- they had the same name. If the failed_decode flags differ
then this is a case of collision -- the old entry and the new entry
had (as far as we are concerned) different names that accidentally
generated the same unicode. Alter the new entry's name, for example by
appending "~1" and then trying again and incrementing the number until
it doesn't match any extant entry.

To copy an entry from Tahoe into a local filesystem:

Always use the Python unicode API. The original_bytes field and the
failed_decode field in the metadata are not consulted.

Now a question for python-dev people: could utf-8b or PEP 383 be
useful for requirements like the four requirements listed above?  If
not, what requirements does PEP 383 help with?  I'm sure that if can
help with the use case of "I'm doing os.listdir() and then I'm going
to turn around and use the resulting unicode objects on the same local
filesystem in the same Python process". I'm not sure that it can help
if

Re: [Python-Dev] PEP 383 and Tahoe [was: GUI libraries]

2009-05-04 Thread Zooko O'Whielacronx
Thank you for sharing your extensive knowledge of these issues, SJT.

On Sun, May 3, 2009 at 3:32 AM, Stephen J. Turnbull  wrote:
> Zooko O'Whielacronx writes:
>
>  > However, it is moot because Tahoe is not a new system. It is
>  > currently at v1.4.1, has a strong policy of backwards-
>  > compatibility, and already has lots of data, lots of users, and
>  > programmers building on top of it.
>
> Cool!

Thanks!  Actually yes it is extremely cool that it really does this
encryption, erasure-encoding, capability-based access control, and
decentralized topology all in a fully functional, stable system.  If
you're interested in such stuff then you should definitely check it
out!

> Question: is there a way to negotiate versions, or better yet,
> features?

For the peer-to-peer protocol there is, but the persistent storage is
an inherently one-way communication.  A Tahoe client writes down
information, and at a later point a Tahoe client, possibly of a
different version, reads it.  There is no way for the original writer
to ask what versions or features the readers may eventually have.
But, the writer can write down optional information which will be
invisible to readers that don't know to look for it, but adding it
into the "metadata" dictionary.  For example:
http://testgrid.allmydata.org:3567/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/?t=json
renders the directory contents into json and results in this:

  "r\u00e9sum\u00e9.html": [
"filenode",
{
 "mutable": false,
 "verify_uri":
"URI:CHK-Verifier:63y4b5bziddi73jc6cmyngyqdq:5p7cxw7ofacblmctmjtgmhi6jq7g5wf77tx6befn2rjsfpedzkia:3:10:8328",
 "metadata": {
  "ctime": 1241365319.0695441,
  "mtime": 1241365319.0695441
 },
 "ro_uri": 
"URI:CHK:no2l46woyeri6xmhcrhhomgr5a:5p7cxw7ofacblmctmjtgmhi6jq7g5wf77tx6befn2rjsfpedzkia:3:10:8328",
 "size": 8328
}
   ],

A new version of Tahoe writing entries like this is constrained to
making the primary key (the filename) be a valid unicode string (if it
wants older Tahoe clients to be able to read the directory at all).
However, it is not constrained about what new keys it may add to the
"metadata" dict, which is where we propose to add the "failed_decode"
flag and the "original_bytes".

> Well, it's a high-dimensional problem.  Keeping track of all the
> variables is hard.

Well put.

>  That's why something like PEP 383 can be important
> to you even though it's only a partial solution; it eliminates one
> variable.

Would that it were so!  The possibility that PEP 383 could help me or
other like me is why I am trying so hard to explain what kind of help
I need.  :-)


>  > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux
>  > system and then you inspect the files in the Tahoe filesystem,
>  > such as by examining the web interface [1] or by running
>  > "tahoe ls", either of which you could do either from the same
>  > machine where you ran "tahoe cp" or from a different machine
>  > (which could be using any operating system). We have the
>  > following requirements about what ends up in your Tahoe directory
>  > after that cp -r.
>
> Whoa! Slow down!  Where's "my" "Tahoe directory"?  Do you mean the
> directory listing?  A copy to whatever system I'm on?  The bytes that
> the Tahoe host has just loaded into a network card buffer to tell me
> about it?  The bytes on disk at the Tahoe host?  You'll find it a lot
> easier to explain things if you adopt a precise, consistent
> terminology.

Okay here's some more detail.

There exists a Tahoe directory, the bytes of which are encrypted,
erasure-coded, and spread out over multiple Tahoe servers.  (To the
servers it is utterly opaque, since it is encrypted with a symmetric
encryption key that they don't have.)  A Tahoe client has the
decryption key and it recovers the cleartext bytes.  (Note: the
internal storage format is not the json encoding shown above -- it is
a custom format -- the json format above is what is produced to be
exported through the API, and it serves as a useful example for e-mail
discussions.)  Then for each bytestring childname in the directory it
decodes it with utf-8 to get the unicode childname.

Does that all make sense?

>  > Requirement 1 (unicode):  Each filename that you see needs to be valid
>  > unicode
>
> What does "see" mean?  In directory listings?

Yes, either with "tahoe ls", with a FUSE plugin, wht the web UI.
Remove the trailing "?t=json" from the URL above to see an example.

>  Under 

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Zooko O'Whielacronx
On Tue, May 5, 2009 at 8:57 AM, Stephen J. Turnbull  wrote:
>
> 2.  The specification should state, and the discussion emphasize, that
>    strings which were produced by surrogate replacement *must not* be
>    used in data interchange with systems that do not specifically
>    accept such strings, and that this is the responsibility of the
>    application.[2]

That sounds like a useful statement to make.  How would an application
make sure that they were producing only valid unicode?  How about add
an option to os.listdir() named "errors" with default value 'utf8b'
(or 'surrogate-replace', or whatever the name is)?  Then applications
which need to produce only valid unicode strings could pass
errors=strict, errors=ignore, or errors=replace?  (If anyone really
wants behavior like Python 3.0 then we could perhaps also add a new
one just for os.listdir() named errors=skipfilename.)

My most recent plan for Tahoe, as of the letter that I sent last
night, is to emulate the behavior of Nautilus and GNU ls by using the
'replace' error handler and (emulating Nautilus) to append " (invalid
encoding)" to the end of the string.  (screenshot:
http://zooko.com/Nautilus_vs_invalid_encoding.png )

So if I could ask os.listdir to return filenames with U+FFFD in place
of undecodable characters, then I could subsequently do something
like:

for f in os.listdir(d, errors='replace'):
if u"\ufffd" in f:
f += " (invalid encoding)"

(On top of that I would have to check for collisions, but that's out of scope.)

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .pth files are evil

2009-05-09 Thread Zooko O'Whielacronx
.pth files are why I can't easily use GNU stow with easy_install.
If installing a Python package involved writing new files into the
filesystem, but did not require reading, updating, and re-writing any
extant files such as .pth files, then GNU stow would Just Work with
easy_install the way it Just Works with most things.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how GNU stow is complementary rather than alternative to distutils

2009-05-10 Thread Zooko O'Whielacronx
following-up to my own post to mention one very important reason why
anyone cares:

On Sun, May 10, 2009 at 12:04 PM, Zooko Wilcox-O'Hearn  wrote:

> It is a beautiful, elegant hack because it is sooo dumb.  It is also very
> nice to use the same tool to manage packages written in any programming
> language, provided only that they can build a directory tree of the right
> shape and content.

And, you are not relying on the author of the package that you are
installing to avoid accidentally or maliciously screwing up your
system.  You're not even relying on the authors of the *build system*
(e.g. the authors of distutils or easy_install).  You are relying
*only* on GNU stow to avoid accidentally or maliciously screwing up
your system, and GNU stow is very dumb, so it is easy to understand
what it is going to do and why that isn't going to irreversibly screw
up your system.

That is: you don't run the "build yourself and install into $prefix"
step as root.  This is an important consideration for a lot of people,
who absolutely refuse on principle to ever run "sudo python
./setup.py" on a system that they care about unless they wrote the
"setup.py" script themselves.  (Likewise they refuse to run "sudo make
install" on packages written in C.)

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
Dear Pythonistas:

This issue causes serious problems.  Users occasionally get binaries built for a
compatible Linux and Python version but with a different UCS2-vs-UCS4 setting,
and those users get mysterious memory corruption errors which are hard to
diagnose.  It is possible that these situations also open up security
vulnerabilities.  A couple such instances are documented on
http://bugs.python.org/setuptools/issue78, but you can find more by googling.
I would like to get this problem fixed!

In order to help address this issue I sampled what UCS size is used by python
executables in the wild.  I instrumented a few buildslaves that are
contributed by
various people to the Tahoe-LAFS project to print out their platform,
python version,
and sys.maxunicode.  The full results are appended below.  maxunicode: 1114111
means that python executable was configured with --enable-unicode=ucs4, and
maxunicode: 65535 means that python executable was configured with
--enable-unicode=ucs2 or just with --enable-unicode .  The only
incompatibilities
that I found are because some packagers have deliberately set UCS4
configuration and other packagers have left the default setting.

In the three cases where someone configured python with UCS2, one of the three
is certainly an accident (a custom-built python executable on an Ubuntu server)
and the other two just use the default instead of specifically configuring ucs2
in their configurations of Python and I suspect that they don't know the
difference and that it was an accident that they built a Python incompatible
with other distributions of their operating system.

In sum, while it would be good to add the unicode setting to the platform's ABI
(as discussed in setuptools ticket #78), it would also be good to make
the default
value be UCS4 instead of UCS2.  This would fix all three of the potential
incompatibilities that I found (listed below), and once we have proper inclusion
of the unicode setting in the ABI in order to prevent the memory corruption,
defaulting to UCS4 would increase the likelihood that a binary built on one
distribution would be usable on another.

I'm sure that someone can come up with a reason why UCS2 is better than UCS4,
but I'm also sure that the benefits of compatibility outweigh any benefits of
UCS2 encoding, and that the widespread use of UCS4 demonstrates that there is
nothing fatally wrong with it, and that people who really value UCS2 encoding
more than compatibility can choose that for themselves by explicitly
setting UCS2.

Let me restate that I am not suggesting taking away anyone's options, only
making the setting for people who don't specify default to the
compatible option.
Hm, I guess that means that it should default to UCS2 on Windows and Mac and
to UCS4 on Linux and Solaris.

Regards,

Zooko

Ubuntu 6.10 "edgy" i386: python: 2.4.4c1 (#2, Mar  7 2008, 03:03:38)  [GCC 4.1.2
20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)], maxunicode: 1114111
Ubuntu 7.04 "feisty": python: 2.5.1 (r251:54863, Jul 31 2008, 22:53:39)  [GCC
4.1.2 (Ubuntu 4.1.2-0ubuntu4)], maxunicode: 1114111
Ubuntu 7.10 "gutsy" i386: python: 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)], maxunicode: 1114111
Ubuntu 8.04 "hardy" amd64: python: 2.5.2 (r252:60911, Jul 22 2009, 15:33:10)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)], maxunicode: 1114111
Ubuntu 8.04 "hardy" i386: *custom* python: 2.6 (r26:66714, Oct  2 2008,
13:40:28)  [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)], maxunicode: 65535
Ubuntu 8.04 "hardy" i386: python: 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)], maxunicode: 1114111
Ubuntu 9.04 "jaunty" amd64: *custom* python: 2.6.2 (release26-maint, Apr 19
2009, 01:58:18)  [GCC 4.3.3], maxunicode: 1114111

Debian 4.0 "etch" i386: python: 2.4.4 (#2, Oct 22 2008, 19:52:44)  [GCC 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)], maxunicode: 1114111
Debian 5.0 "lenny" i386: python: 2.5.2 (r252:60911, Jan  4 2009, 17:40:26)  [GCC
4.3.2], maxunicode: 1114111
Debian 5.0 "lenny" amd64: python: 2.5.2 (r252:60911, Jan  4 2009, 21:59:32)
[GCC 4.3.2], maxunicode: 1114111
Debian 5.0 "lenny" armv5tel: python: 2.5.2 (r252:60911, Jan  5 2009, 02:00:00)
[GCC 4.3.2], maxunicode: 1114111
Debian unstable "squeeze/sid" i386: python: 2.5.4 (r254:67916, Feb 17 2009,
20:16:45)  [GCC 4.3.3], maxunicode: 1114111

Fedora 11 "leonidas" amd64: python: 2.6 (r26:66714, Jul  4 2009, 17:37:13)  [GCC
4.4.0 20090506 (Red Hat 4.4.0-4)], maxunicode: 1114111

ArchLinux: python: 2.6.2 (r262:71600, Jul 20 2009, 02:23:30)  [GCC 4.4.0
20090630 (prerelease)], maxunicode: 65535

NetBSD 4: python: 2.5.2 (r252:60911, Mar 20 2009, 14:00:07)  [GCC 4.1.2 20060628
prerelease (NetBSD nb2 20060711)], maxunicode: 65535

OpenSolaris SunOS-5.11-i86pc-i386-32bit: python: 2.4.4 (#1, Mar 

Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
I'm sorry, I should have mentioned that I did read those archives
before I posted my letter.  That discussion was all about whether UCS2
or UCS4 is better.  I consider that question to be mostly irrelevant
to this issue, which is about compatibility for people who don't
choose to configure that setting themselves.  Platforms or people who
prefer UCS2 will continue to use it as appropriate.  UCS4 is clearly
good enough for the vast majority of Linux users, and having fewer
mysterious segfaults and potential security vulnerabilities would be
an important improvement to the user experience of Python on Linux.

I should mention that the reason I'm spending time on this right now
is that it is currently blocking me from being able to distribute
binaries of Python packages which will work for all of my Linux users.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou  wrote:
>
> What "binaries" are you talking about?

I mean extension modules with native code, which means .so shared
library files on unix.

> AFAIK, C extensions should fail loading when they have the wrong UCS2/4 
> setting.

That would be an improvement!  Unfortunately we instead get mysterious
misbehavior of the module, e.g.:

http://bugs.python.org/setuptools/msg309
http://allmydata.org/trac/tahoe/ticket/704#comment:5

> For information, all Mandriva versions I've used until now have had their
> Python's built with UCS2 (maxunicode == 65535).

Thank you for the data point.  This means that binary extension
modules built on Mandriva can't be ported to Ubuntu or vice versa.
However, is this an argument for or against changing the default
setting to UCS4?  Changing the default setting wouldn't interfere with
Mandriva's decision, right?

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou  wrote:
> For information, all Mandriva versions I've used until now have had their
> Python's built with UCS2 (maxunicode == 65535).

By the way, I was investigating this, and discovered an issue on the
Mandriva tracker which suggests that they intend to switch to UCS4 in
the next release in order to avoid compatibility problems like these.
(Not because they think that UCS4 is better than UCS2.)

https://qa.mandriva.com/show_bug.cgi?id=48570

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-29 Thread Zooko O'Whielacronx
Dear MAL and python-dev:

I failed to explain the problem that users are having.  I will try
again, and this time I will omit my ideas about how to improve things
and just focus on describing the problem.

Some users are having trouble using Python packages containing binary
extensions on Linux.  I want to provide such binary Python packages
for Linux for the pycryptopp project
(http://allmydata.org/trac/pycryptopp ) and the zfec project
(http://allmydata.org/trac/zfec ).  I also want to make it possible
for users to install the Tahoe-LAFS project (http://allmydata.org )
without having a compiler or Python header files.  (You'd be surprised
at how often Tahoe-LAFS users try to do this on Linux.  Linux is no
longer only for people who have the knowledge and patience to compile
software themselves.)  Tahoe-LAFS also depends on many packages that
are maintained by other people and are not packaged or distributed by
me -- pyOpenSSL, simplejson, etc..

There have been several hurdles in the way that we've overcome, and no
doubt there will be more, but the current hurdle is that there are two
"formats" for Python extension modules that are used on Linux -- UCS2
and UCS4.  If a user gets a Python package containing a compiled
extension module which was built for the wrong UCS2/4 setting, he will
get mysterious (to him) "undefined symbol" errors at import time.

On Mon, Sep 28, 2009 at 2:25 AM, M.-A. Lemburg  wrote:
>
> The Python default is UCS2 for a good reason: it's a good trade-off
> between memory consumption, functionality and performance.

I'm sure you are right about this.  At some point I will try to
measure the performance implications in the context of our
application.  I don't think it will be an issue for us, as so far no
users have complained about any performance or functionality problems
that were traceable to the choice of UCS2/4.

> As already mentioned, I also don't understand how the changing
> the Python default on Linux would help your users in any way -
> if you let distutils compile your extensions, it's automatically
> going to use the right Unicode setting for you (as well as your
> users).

My users are using some Python packages built by me and some built by
others.  The binary packages they get from others could have the
incompatible UCS2/4 setting.  Also some of my users might be using a
python configured with the opposite setting of the python interpreter
that I use to build packages.

> Unfortunately, this automatic support doesn't help you when
> shipping e.g. setuptools eggs, but this is a tool problem,
> not one of Python: setuptools completely ignores the fact
> that there are two ways to build Python.

This is the setuptools/distribute issue that I mentioned:
http://bugs.python.org/setuptools/issue78 .  If that issue were solved
then if a user tried to install a specific package, for example with a
command-line like "easy_install
http://allmydata.org/source/tahoe/deps/tahoe-dep-eggs/pyOpenSSL-0.8-py2.5-linux-i686.egg";,
then instead of getting an undefined symbol error at import time, they
would get an error message to the effect of "This package is not
compatible with your Python interpreter." at install time.  That would
be good because it would be less confusing to the users.

However, if they were using the default setuptools/distribute
dependency-satisfaction feature, e.g. because they are installing a
package and that package is marked as
"install_requires=['pyOpenSSL']", then setuptools/distribute would do
its fallback behavior in which it attempts to compile the package from
source when it can't find a compatible binary package.  This would
probably confuse the users at least as much as the undefined symbol
error currently does.

In any case, improving the tools to handle incompatible packages
nicely would not make more packages compatible.  Let's do both!
Improve tools to handle incompatible packages nicely, and encourage
everyone who compiles python on Linux to use the same UCS2/4
setting.

Thank you for your attention.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [New-bugs-announce] [issue7064] Python 2.6.3 / setuptools 0.6c9: extension module builds fail with KeyError

2009-10-06 Thread Zooko O'Whielacronx
Here are three buildbot farms for three different projects that
exercise various features of setuptools: build, install, sdist_dsc,
bdist_egg, sdist, and various specific requirements that our projects
have, such as the "Desert Island Build" in which setuptools is not
allowed to download anything from the Internet at build time or else
it flunks the test.

http://allmydata.org/buildbot/waterfall
http://allmydata.org/buildbot-pycryptopp/waterfall
http://allmydata.org/buildbot-zfec/waterfall

I would love it if new versions of setuptools/Distribute would make
more of these tests pass on more platforms or at least avoid causing
any regressions in these tests on these platforms.

Unfortunately, we can't really deploy new versions of
setuptools/Distribute to this buildbot farm in order to re-run all the
tests, because the only way that I know of to trigger all the tests is
to make a commit to our central darcs repository for Tahoe-LAFS,
pycryptopp, or zfec, and I don't want to do that to experiment with
new versions of setuptools/Distribute.  Does anyone know how to use a
buildbot farm like this one to run tests without committing patches to
the central repository?

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a new setuptools release?

2009-10-07 Thread Zooko O'Whielacronx
+1

For a large number of people [1, 2, 3], setuptools is already a
critical part of Python.  Make it official.  Let everyone know that
future releases of Python will not break setuptools/Distribute, and
that they can rely on backwards-compatibility with the myriad existing
packages.  Make the next release of the distutils standard lib module
be Distribute.

(Perhaps some people will complain, but they can channel their energy
into improving the new distutils.)

Regards,

Zooko

[1] The majority of professional developers using Python rely on
setuptools to distribute and to use packages:
 http://tarekziade.wordpress.com/2009/03/26/packaging-survey-first-results/
[2] setuptools is one of the most downloaded packages on PyPI:
 http://pypi.python.org/pypi/setuptools
 http://blog.delaguardia.com.mx/tags/pypi
[3] about one fifth of Debian users who install python2.5 also install
python-pkg-resources:
 http://qa.debian.org/popcon.php?package=python-setuptools
 http://qa.debian.org/popcon.php?package=python2.5
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] please consider changing --enable-unicode default to ucs4

2009-10-07 Thread Zooko O'Whielacronx
Folks:

I accidentally sent this letter just to MAL when I intended it to
python-dev.  Please read it, as it explains why the issue I'm raising
is not just the "we should switch to ucs4 because it is better" issue
that was previously settled by GvR.  This is a current, practical
problem that is preventing people from distributing and using Python
packages with binary extension modules on Linux.

Regards,

Zooko


-- Forwarded message ------
From: Zooko O'Whielacronx 
Date: Sun, Sep 27, 2009 at 11:43 AM
Subject: Re: [Python-Dev] please consider changing --enable-unicode
default to ucs4
To: "M.-A. Lemburg" 


Folks:

I'm sorry, I think I didn't make my concern clear.  My users, and lots
of other users, are having a problem with incompatibility between
Python binary extension modules.  One way to improve the situation
would be if the Python devs would use their "bully pulpit" -- their
unique position as a source respected by all Linux distributions --
and say "We recommend that Linux distributions use UCS4 for
compatibility with one another".  This would not abrogate anyone's
ability to choose their preferred setting nor, as far as I can tell,
would it interfere with the ongoing development of Python.

Here are the details:

I'm the maintainer of several Python packages.  I work hard to make it
easy for users, even users who don't know anything about Python, to
use my software.  There have been many pain points in this process and
I've spent a lot of time on it for about three years now working on
packaging, including the tools such as setuptools and distutils and
the new "distribute" tool.  Python packaging has been improving during
these years -- things are looking up.

One of the remaining pain points is that I can distribute binaries of
my Python extension modules for Windows or Mac, but if I distribute a
binary Python extension module on Linux, then if the user has a
different UCS2/UCS4 setting then they won't be able to use the
extension module.  The current de facto standard for Linux is UCS4 --
it is used by Debian, Ubuntu, Fedora, RHEL, OpenSuSE, etc. etc..  The
vast majority of Linux users in practice have UCS4, and most binary
Python modules are compiled for UCS4.

That means that a few folks will get left out.  Those folks, from my
experience, are people who built their python executable themselves
without specifying an override for the default, and the smaller Linux
distributions who insist on doing whatever upstream Python devs
recommend instead of doing whatever the other Linux distros are doing.
 One of the data points that I reported was a Python interpreter that
was built locally on an Ubuntu server.  Since the person building it
didn't know to override the default setting of --enable-unicode, he
ended up with a Python interpreter built for UCS2, even though all the
Python extension modules shipped by Ubuntu were built with UCS4.

These are not isolated incidents.  The following google searches
suggest that a number of people spend time trying to figure out why
Python extension modules fail on their linux systems:

http://www.google.com/search?q=PyUnicodeUCS4_FromUnicode+undefined+symbol
http://www.google.com/search?q=+PyUnicodeUCS2_FromUnicode+undefined+symbol
http://www.google.com/search?q=_PyUnicodeUCS2_AsDefaultEncodedString+undefined+symbol

Another data point is the Mandriva Linux distribution.  It is probably
much smaller than Debian, Ubuntu, or RedHat, but it is still one of
the major, well-known distributions.  I requested of the Python
maintainer for Mandriva, Michael Scherer, that they switch from UCS2
to UCS4 in order to reduce compatibility problems like these.  His
answer as I understood it was that it is best to follow the
recommendations of the upstream Python devs by using the default
setting instead of choosing a setting for himself.

(Now we could implement a protocol which would show whether a given
Python package was compiled for UCS2 or UCS4.  That would be good.
Hopefully it would make incompatibility more explicit and
understandable to users.  Here is a ticket for that -- which project I
am contributing to: http://bugs.python.org/setuptools/issue78 .
However, even if we implement that feature in the distribute tool (the
successor to setuptools), users who build their own python or who use
a Linux distribution that follows upstream configuration defaults will
still be unable to use most Python packages with compiled extension
modules.)

In a message on this thread, MvL wrote:

> "For that reason I think it's also better that the configure script
> continues to default to UTF-16 -- this will give the UTF-16 support
> code the necessary exercise."
>
> This is effectively a BDFL pronouncement. Nothing has changed the
> validity of the premise of the statement, so the conclusion remains
> valid, as well.

My understand of 

Re: [Python-Dev] a new setuptools release?

2009-10-07 Thread Zooko O'Whielacronx
Thanks for the reply, MAL.

How would we judge whether Distribute is ready for inclusion in the
Python standard lib?  Maybe if it has a few more releases, leaving a
trail of "closed: fixed" issue tickets behind it?

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python byte-compiled and optimized code

2009-10-07 Thread Zooko O'Whielacronx
You might be interested in the new PYTHONDONTWRITEBYTECODE environment
variable supported as of Python 2.6.  I personally think it is a great
improvement.  :-)

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.6.4rc2

2009-10-21 Thread Zooko O'Whielacronx
Barry:

Do you know anything about this alleged regression in 2.6.3 with
regard to the __doc__ property?

https://bugs.edge.launchpad.net/ubuntu/+source/boost1.38/+bug/457688

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible language summit topic: buildbots

2009-10-27 Thread Zooko O'Whielacronx
Right, how do developers benefit from a buildbot?

>From my experience (five large buildbots with many developers plus two
with only a couple of developers), a buildbot does little good unless
the tests are reliable and not too noisy.  "Reliable" is best achieved
by having tests be deterministic and reproducible.  "Not too noisy"
means that the builders are all green all the time (at least for a
"supported" subset of the buildslaves).

Beyond that, then I think there has to be a culture change where the
development team decides that it is really, really not okay to leave a
builder red after you turned it red, and that instead you need to
revert the patch that made it go from green to red before you do
anything else.  It has taken me a long time to acculturate to that and
I wouldn't expect most people to do it quickly or easily.

(It is interesting to think of what would happen if that policy were
automated -- any patch which caused any "supported" builder to go from
green to red would be automatically be reverted.)

Also, of course, this is mostly meaningless unless the code that is
being changed by the patches is well-covered by tests.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?

2009-11-03 Thread Zooko O'Whielacronx
Folks:

I really don't want to make anyone feel bad or to criticize, but I
should mention that I have no plans to use Python 3 or to support
Python 3.  My best guess at this time is that the current projects
that I'm involved in will still require Python 2 for the forseeable
future (let's say 5 years.  I can see 5 years into the future.), and
that as I start new projects I will probably try out interesting
alternative programming languages like Haskell, Newspeak [1],
Jacaranda [2], and other new things that appear in the coming years.

Of course, I reserve the right to change my mind and start using and
supporting Python 3.  That might happen if there is some combination
of: 1. my users start asking for it (no-one has yet), 2. my
dependencies start providing it (I use Python because it has Twisted.
Twisted requires Python 2.), 3. it becomes more possible for me to
write code which is still Python-2-compatible and also is more and
more close to being Python-3-compatible.

By the way, one significant detail which makes Python 3 less
interesting to me is [3].  Those two languages that I mentioned --
Newspeak and Jacaranda -- both have object-capability nature.  If that
issue in [3] were fixed then Python 3 would join Python 2 as a
language that can (with the CapPython extension) have
object-capability nature.

Regards,

Zooko

[1] http://bracha.org/Site/Newspeak.html
[2] http://jacaranda.org
[3] 
http://lackingrhoticity.blogspot.com/2008/09/cappython-unbound-methods-and-python-30.html
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- http://allmydata.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?

2009-11-04 Thread Zooko O'Whielacronx
Folks:

It occurred to me to wonder why I haven't investigated how hard it
would be to make my Python packages Python-3-compatible.  That's right
-- I haven't even looked closely.  I couldn't even tell you off the
top of my head what is in Python 3 that I would have to think about
except for the new unicode regime.  I think the answer is that the
payoff is just *so* low to me at this point that it doesn't even
justify me taking 15 minutes to read "What's New In Python 3" or to
execute 2to3 on my smallest package and see what it does.

On the other hand, I'm totally committed to supporting Python 2.7,
because my customers will demand it and because I expect that it will
be easy.

So, if you guys slip in your favorite new Python 3 feature into 2.7
and add a deprecation warning for your least favorite Python 2
misfeature, then probably within about 24 months I'll have fixed all
code that uses the deprecated feature, and probably within about five
years I'll consider dropping backwards-compatibility with Python 2.6
and starting to use that new feature that you added to Python 2.7.

(I'm currently considering dropping Python 2.4 compatibility for the
next releases of most of my code.)

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] mingw support?

2010-08-13 Thread Zooko O'Whielacronx
On Sat, Aug 7, 2010 at 2:14 PM, Steve Holden  wrote:
> There have certainly been demonstrations that Python can be compiled
> with mingw, but as far as I am aware what's  missing is a developer
> sufficiently motivated to integrate that build system into the
> distributions and maintain it.

It looks like quite a lot of activity on
http://bugs.python.org/issue3871 . I find it surprising that nobody
mentioned it before on this thread. Perhaps nobody who has been
posting to this thread was aware of this activity.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Goodbye

2010-09-23 Thread Zooko O'Whielacronx
Speaking as a frequent contributor of bug reports and an occasional
contributor of patches, I must say that I feel like status quo of the
tracker before Mark's work was discouraging. If there is a vast
collection of abandoned tickets, it gives me the strong impression
that my attempted contributions are likely to end up in that pile. The
messages I got from the tracker due to Mark's work saying things like
"This ticket is closed due to inactivity." or "Would you be interested
in refreshing this patch?" started to get me interested in
contributing again.

Also, I would like to point out that, not having read the other
traffic that this thread alludes to, either from earlier mailing list
threads or from IRC, I don't really understand what exactly Mark did
wrong. The complaints about his behavior on this thread seem to be a
little... non-specific. Did he continue to close tickets after he was
asked not to do so? I didn't see any quotes or timestamps showing what
happened or when.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Zooko Wilcox-O'Hearn

On May 6, 2009, at 7:33 AM, Stephen J. Turnbull wrote:


You have convinced me that the PEP should wait as well.

In its current form it is incomplete and dangerous.


+1 on delaying PEP 383

I think PEP 383 is a good idea in principle, but I'm still struggling  
to understand it myself, and it seems to offer new hazards for the  
unwary programmer.


On the other hand, maybe the wary programmers are waiting for Python  
3.2 anyway .


On the gripping hand, if PEP 383 is released in Python 3.1, will that  
obligate python-dev to support it indefinitely, at least in backwards- 
compatibility mode?  I'm not thinking of API compatibility as much as  
data compatibility -- someone used Python 3.1 to write down some  
filenames, and now a few years later they are trying to use the  
latest and greatest Python release to read those filenames...


Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Zooko Wilcox-O'Hearn

On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:


Zooko Wilcox-O'Hearn  zooko.com> writes:


I'm not thinking of API compatibility as much as data  
compatibility -- someone used Python 3.1 to write down some  
filenames, and now a few years later they are trying to use the  
latest and greatest Python release to read those filenames...


Well, if the filenames are generated by Python (as opposed to read  
from an existing directory on disk), they should be regular unicode  
objects without any lone surrogates, so I don't see the  
compatibility problem.


I meant that the application reads filenames from an existing  
directory on disk, saves those filenames, and then later, using a  
future version of Python, wants to read them and use them.


I'm not saying that I know this would be a problem.  I'm saying that  
I personally can't tell whether it would be a problem or not, and the  
extensive discussions so far have not convinced me that there is  
anyone who both understands PEP 383 and considers this use case.


Many people who apparently understand encoding issues well have said  
something to the effect that there is no problem, but those people  
haven't yet managed to get through my thick skull how I would use PEP  
383 safely for this sort of use case -- the one where data generated  
by os.listdir() travels forward in time or the one were that data  
travels sideways to other systems, including Windows or other systems  
that validate incoming unicode.


That's why I am a bit uncomfortable about PEP 383 being quickly  
implemented and deployed in Python 3.1.


By the way, much of the detailed discussion about what Tahoe requires  
and how that may or may not benefit from PEP 383 has now moved to the  
tahoe-dev mailing list: http://allmydata.org/cgi-bin/mailman/listinfo/ 
tahoe-dev .


Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .pth files are evil

2009-05-10 Thread Zooko Wilcox-O'Hearn

On May 9, 2009, at 9:39 AM, P.J. Eby wrote:

It would be really straightforward, though, for someone to  
implement an easy_install variant that does this.  Just invoke  
"easy_install -Zmaxd /some/tmpdir packagelist" to get a full set of  
unpacked .egg directories in /some/tmpdir, and then move the  
contents of the resulting .egg subdirs to the target location,  
renaming EGG-INFO subdirs to projectname-version.egg-info subdirs.


Except for the renaming part, this is exactly what GNU stow does.

(Of course, this ignores the issue of uninstalling previous  
versions, or overwriting of conflicting files in the target -- does  
pip handle these?)


GNU stow does handle these issues.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] how GNU stow is complementary rather than alternative to distutils

2009-05-10 Thread Zooko Wilcox-O'Hearn

On May 10, 2009, at 11:18 AM, Martin v. Löwis wrote:

If GNU stow solves all your problems, why do you want to use  
easy_install in the first place?


That's a good question.  The answer is that there are two separate  
jobs: building executables and putting them in a directory structure  
of the appropriate shape for your system is one job, and installing  
or uninstalling that tree into your system is another.  GNU stow does  
only the latter.


The input to GNU stow is a set of executables, library files, etc.,  
in a directory tree that is of the right shape for your system.  For  
example, if you are on a Linux system, then your scripts all need to  
be in $prefix/bin/, your shared libs should be in $prefix/lib, your  
Python packages ought to be in $prefix/lib/python$x.$y/site- 
packages/, etc.  GNU stow is blissfully ignorant about all issues of  
building binaries, and choosing where to place files, etc. -- that's  
the job of the build system of the package, e.g. the "./configure -- 
prefix=foo && make && make install" for most C packages, or the  
"python ./setup.py install --prefix=foo" for Python packages using  
distutils (footnote 1).


Once GNU stow has the well-shaped directory which is the output of  
the build process, then it follows a very dumb, completely reversible  
(uninstallable) process of symlinking those files into the system  
directory structure.


It is a beautiful, elegant hack because it is sooo dumb.  It is also  
very nice to use the same tool to manage packages written in any  
programming language, provided only that they can build a directory  
tree of the right shape and content.


However, there are lots of things that it doesn't do, such as  
automatically acquiring and building dependencies, or producing  
executables for the target platform for each of your console  
scripts.  Not to mention creating a directory named "$prefx/lib/python 
$x.$y/site-packages" and cp'ing your Python files into it.  That's  
why you still need a build system even if you use GNU stow for an  
install-and-uninstall system.


The thing that prevents this from working with setuptools is that  
setuptools creates a file named easy_install.pth during the "python ./ 
setup.py install --prefix=foo" if you build two different Python  
packages this way, they will each create an easy_install.pth file,  
and then when you ask GNU stow to link the two resulting packages  
into your system, it will say "You are asking me to install two  
different packages which both claim that they need to write a file  
named '/usr/local/lib/python2.5/site-packages/easy_install.pth'.  I'm  
too dumb to deal with this conflict, so I give up.".  If I understand  
correctly, your (MvL's) suggestion that easy_install create a .pth  
file named "easy_install-$PACKAGE-$VERSION.pth" instead of  
"easy_install.pth" would indeed make it work with GNU stow.


Regards,

Zooko

footnote 1: Aside from the .pth file issue, the other reason that  
setuptools doesn't work for this use while distutils does is that  
setuptools tries to hard to save you from making a mistake: maybe you  
don't know what you are doing if you ask it to install into a  
previously non-existent prefix dir "foo".  This one is easier to fix:  
http://bugs.python.org/setuptools/issue54 # "be more like distutils  
with regard to --prefix=" .

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] ctime: I don't think that word means what you think it means.

2009-06-13 Thread Zooko Wilcox-O'Hearn

The stat module uses the "st_ctime" slot to hold two kinds of values
which are semantically different and which are frequently
confused with one another.  It chooses which kind of value to put in
there based on platform -- Windows gets the file creation time and all
other platforms get the "ctime".  The only sane way to use this API is
then to switch on platform:

if platform.system() == "Windows":
metadata["creation time"] = s.st_ctime
else:
metadata["unix ctime"] = s.st_ctime

(That is an actual code snippet from the Allmydata-Tahoe project.)

Many or even most programmers incorrectly think that unix ctime is file
creation time, so instead of using the sane idiom above, they write the
following:

metadata["ctime"] = s.st_ctime

thus passing on the confusion to the users of their metadata, who may
not be able to tell on which platform this metadata was created.   
This is

the situation we have found ourselves in for the Allmydata-Tahoe
project -- we now have a bunch of "ctime" values stored in our
filesystem and no way to tell which kind they were.

More and more filesystems such as ZFS and Mac HFS+ apparently offer
creation time nowadays.

I propose the following changes:

1.  Add a "st_crtime" field which gets populated on filesystems
(Windows, ZFS, Mac) which can do so.

That is hopefully not too controversial and we could proceed to do so
even if the next proposal gets bogged down:

2.  Add a "st_unixctime" field which gets populated *only* by the unix
ctime and never by any other value (even on Windows, where the unix
ctime *is* available even though nobody cares about it), and deprecate
the hopelessly ambiguous "st_ctime" field.

You may be interested in http://allmydata.org/trac/tahoe/ticket/628
("mtime" and "ctime": I don't think that word means what you think it
means.) where the Allmydata-Tahoe project is carefully unpicking the
mess we made for ourselves by confusing ctime with file-creation time.

This is ticket http://bugs.python.org/issue5720 .

Regards,

Zooko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drop the new time.wallclock() function?

2012-03-23 Thread Zooko Wilcox-O'Hearn
> I merged the two functions into one function: time.steady(strict=False).
>
> time.steady() should be monotonic most of the time, but may use a fallback.
>
> time.steady(strict=True) fails with OSError or NotImplementedError if
> reading the monotonic clock failed or if no monotonic clock is available.

If someone wants time.steady(strict=False), then why don't they just
continue to use time.time()?

I want time.steady(strict=True), and I'm glad you're providing it and
I'm willing to use it this way, although it is slightly annoying
because "time.steady(strict=True)" really means
"time.steady(i_really_mean_it=True)". Else, I would have used
"time.time()".

I am aware of a large number of use cases for a steady clock (event
scheduling, profiling, timeouts), and a large number of uses cases for
a "NTP-respecting wall clock" clock (calendaring, displaying to a
user, timestamping). I'm not aware of any use case for "steady if
implemented, else wall-clock", and it sounds like a mistake to me.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drop the new time.wallclock() function?

2012-03-26 Thread Zooko Wilcox-O'Hearn
On Fri, Mar 23, 2012 at 11:27 AM, Victor Stinner
 wrote:
>
> time.steady(strict=False) is what you need to implement timeout.

No, that doesn't fit my requirements, which are about event
scheduling, profiling, and timeouts. See below for more about my
requirements.

I didn't say this explicitly enough in my previous post:

Some use cases (timeouts, event scheduling, profiling, sensing)
require a steady clock. Others (calendaring, communicating times to
users, generating times for comparison to remote hosts) require a wall
clock.

Now here's the kicker: each use case incur significant risks if it
uses the wrong kind of clock.

If you're implementing event scheduling or sensing and control, and
you accidentally get a wall clock when you thought you had a steady
clock, then your program may go seriously wrong -- events may fire in
the wrong order, measurements of your sensors may be wildly incorrect.
This can lead to serious accidents. On the other hand, if you're
implementing calendaring or display of "real local time of day" to a
user, and you are using a steady clock for some reason, then you risk
displaying incorrect results to the user.

So using one kind of clock and then "falling back" to the other kind
is a choice that should be rare, explicit, and discouraged. The
provision of such a function in the standard library is an attractive
nuisance -- a thing that people naturally think that they want when
they haven't though about it very carefully, but that is actually
dangerous.

If someone has a use case which fits the "steady or else fall back to
wall clock" pattern, I would like to learn about it.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 418: Add monotonic clock

2012-03-26 Thread Zooko Wilcox-O'Hearn
>  system_clock = wall clock time
>  monotonic_clock = always goes forward but can be adjusted
>  steady_clock = always goes forward and cannot be adjusted
>  high_resolution_clock = steady_clock || system_clock

Note that the C++ standard deprecated monotonic_clock once they
realized that there is absolutely no point in having a clock that
jumps forward but not back, and that none of the operating systems
implement such a thing -- instead they all implement a clock which
doesn't jump in either direction.

http://stackoverflow.com/questions/6777278/what-is-the-rationale-for-renaming-monotonic-clock-to-steady-clock-in-chrono

In other words, yes! +1! The C++ standards folks just went through the
process that we're now going through, and if we do it right we'll end
up at the same place they are:

http://en.cppreference.com/w/cpp/chrono/system_clock

"""
system_clock represents the system-wide real time wall clock. It may
not be monotonic: on most systems, the system time can be adjusted at
any moment. It is the only clock that has the ability to map its time
points to C time, and, therefore, to be displayed.

steady_clock: monotonic clock that will never be adjusted

high_resolution_clock: the clock with the shortest tick period available
"""

Note that we don't really have the option of providing a clock which
is "monotonic but not steady" in the sense of "can jump forward but
not back". It is a misunderstanding (doubtless due to the confusing
name "monotonic") to think that such a thing is offered by the
underlying platforms. We can choose to *call* it "monotonic",
following POSIX instead of calling it "steady", following C++.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drop the new time.wallclock() function?

2012-03-26 Thread Zooko Wilcox-O'Hearn
On Mon, Mar 26, 2012 at 5:07 PM, Victor Stinner
 wrote:
>>
>> If someone has a use case which fits the "steady or else fall back to wall 
>> clock" pattern, I would like to learn about it.
>
> Python 3.2 doesn't provide a monotonic clock, so most program uses 
> time.time() even if a monotonic clock would be better in some functions. For 
> these programs, you can replace time.time() by time.steady() where you need 
> to compute a time delta (e.g. compute a timeout) to avoid issues with the 
> system clock update. The idea is to improve the program without refusing to 
> start if no monotonic clock is available.

I agree that this is a reasonable use case. I think of it as basically
being a kind of backward-compatibility, for situations where an
unsteady clock is okay, and a steady clock isn't available. Twisted
faces a similar issue:

http://twistedmatrix.com/trac/ticket/2424

It might good for use cases like this to explicitly implement the
try-and-fallback, since they might have specific needs about how it is
done. For one thing, some such uses may need to emit a warning, or
even to require the caller to explicitly override, such a refusing to
start if a steady clock isn't available unless the user specifies
"--unsteady-clock-ok".

For motivating examples, consider software written using Twisted >
12.0 or Python > 3.2 which is using a clock to drive real world
sensing and control -- measuring the position of a machine and using
time deltas to calculate the machine's velocity, in order to
automatically control the motion of the machine. For some uses, it is
okay if the measurement could, in rare cases, be drastically wrong.
For other uses, that is not an acceptable risk.

One reason I'm sensitive to this issue is that I work in the field of
security, and making the behavior dependent on the system clock
extends the "reliance set", i.e. the set of things that an attacker
could use against you. For example, if your robot depends on the
system clock for its sensing and control, and if your system clock
obeys NTP, then the set of things that an attacker could use against
you includes your NTP servers. If your robot depends instead on a
steady clock, then NTP servers are not in the reliance set.

Now, if your control platform doesn't have a steady clock, you may
choose to go ahead, while making sure that the NTP servers are
authenticated, or you may choose to disable NTP on the control
platform, etc., but that choice might need to be made explicitly by
the operator, rather than automatically by the library.

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 418 is too divisive and confusing and should be postponed

2012-04-05 Thread Zooko Wilcox-O'Hearn
Folks:

Good job, Victor Stinner on baking the accumulated knowledge of this
thread into PEP 418. Even though I'm very interested in the topic, I
haven't been able to digest the whole thread(s) on the list and
understand what the current collective understanding is. The detailed
PEP document helps a lot.

I think there are still some mistakes, either in our collective
understanding as reflected by the PEP, or in my own head.

For starters, I still don't understand the first, most basic thing:
what do people mean when they say "monotonic clock"? I don't
understand the current text of PEP 418 with regard to the definition
of that word.

Allow me to resort to an analogy. There is an infinitely long,
perfectly straight and flat racetrack. There is a flag that gets
dragged along it at a constant rate, with the label "REAL TIME" on the
flag. There are some runners, each with a different label on their
chest:

Runner A: a helicopter hovers over Runner A. Occasionally it picks him
up and plops him down right next to the flag. Also, he wears a headset
and listens to instructions from his coach to run a little faster or
slower, as necessary, to remain abreast of the flag.

Runner B: a helicopter hovers over Runner B. If he is behind the flag,
it will pick him up and plop him down right next to the flag. However,
if he is ahead of the flag it will not pick him up.

Runner C: no helicopter ever picks up Runner C, but he does wear a
headset and listens to instructions from his coach to run a little
faster or a little slower. His coach tells him to run a little faster
if he is behind the flag or run a little slower if he is in front of
the flag, with the goal of eventually having him right next to the
flag.

Runner D: like Runner C, he never gets picked up, but he listens to
instructions to run a little faster or a little slower. However,
instead of telling him to run faster in order to catch up to the flag,
or to run slower in order to "catch down" to the flag, his coach
instead tells him to run a little faster if he is moving slower than
the flag is moving, and to run a little slower if he is moving faster
than the flag is moving. Note that this is very different from Runner
C, in that it is not intended to cause him to eventually be right next
to the flag, and indeed if it is done right it guarantees that he will
*never* be right next to the flag, although he will be moving just as
fast as the flag is moving.

Runner E: no helicopter, no headset. He just proceeds at his own pace,
blissfully unaware of the exhortations of others.

Now: which ones of these five runners do you call "monotonic"? Which
ones do you call "steady"?

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] this is why we shouldn't call it a "monotonic clock" (was: PEP 418 is too divisive and confusing and should be postponed)

2012-04-05 Thread Zooko Wilcox-O'Hearn
On Thu, Apr 5, 2012 at 7:14 PM, Greg Ewing  wrote:
>
> This is the strict mathematical meaning of the word "monotonic", but the way 
> it's used in relation to OS clocks, it seems to mean rather more than that.

Yep. As far as I can tell, nobody has a use for an unsteady, monotonic clock.

There seem to be two groups of people:

1. Those who think that "monotonic clock" means a clock that never
goes backwards. These people are in the majority. After all, that's
what the word "monotonic" means ¹ . However, a clock which guarantees
*only* this is useless.

2. Those who think that "monotonic clock" means a clock that never
jumps, and that runs at a rate approximating the rate of real time.
This is a very useful kind of clock to have! It is what C++ now calls
a "steady clock". It is what all the major operating systems provide.

The people in class 1 are more correct, technically, and far more
numerous, but the concept from 1 is a useless concept that should be
forgotten.

So before proceeding, we should mutually agree that we have no
interest in implementing a clock of type 1. It wouldn't serve anyone's
use case (correct me if I'm wrong!) and the major operating systems
don't offer such a thing anyway.

Then, if we all agree to stop thinking about that first concept, then
we need to agree whether we're all going to use the word "monotonic
clock" to refer to the second concept, or if we're going to use a
different word (such as "steady clock") to refer to the second
concept. I would prefer the latter, as it will relieve us of the need
to repeatedly explain to newcomers: "That word doesn't mean what you
think it means.".

The main reason to use the word "monotonic clock" to refer to the
second concept is that POSIX does so, but since Mac OS X, Solaris,
Windows, and C++ have all avoided following POSIX's mistake, I think
Python should too.

Regards,

Zooko

¹ http://mathworld.wolfram.com/MonotonicSequence.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of packaging in 3.3

2012-06-21 Thread Zooko Wilcox-O'Hearn
On Thu, Jun 21, 2012 at 12:57 AM, Nick Coghlan  wrote:
>
> Standard assumptions about the behaviour of site and distutils cease to be 
> valid once setuptools is installed
…
> - advocacy for the "egg" format and the associated sys.path changes that 
> result for all Python programs running on a system
…
> System administrators (and developers that think like system administrators 
> when it comes to configuration management) *hate* what setuptools (and 
> setuptools based installers) can do to their systems.

I have extensive experience with this, including quite a few bug
reports and a few patches in setuptools and distribute, plus
maintaining my own fork of setuptools to build and deploy my own
projects, plus interviewing quite a few Python developers about why
they hated setuptools, plus supporting one of them who hates
setuptools even though he and I use it in a build system
(https://tahoe-lafs.org).

I believe that 80% to 90% of the hatred alluded to above is due to a
single issue: the fact that setuptools causes your Python interpreter
to disrespect the PYTHONPATH, in violation of the documentation in
http://docs.python.org/release/2.7.2/install/index.html#inst-search-path
, which says:

"""
The PYTHONPATH variable can be set to a list of paths that will be
added to the beginning of sys.path. For example, if PYTHONPATH is set
to /www/python:/opt/py, the search path will begin with
['/www/python', '/opt/py']. (Note that directories must exist in order
to be added to sys.path; the site module removes paths that don’t
exist.)
"""

Fortunately, this issue is fixable! I opened a bug report and I and a
others have provided patches that makes setuptools stop doing this
behavior. This makes the above documentation true again. The negative
impact on features or backwards-compatibility doesn't seem to be
great.

http://bugs.python.org/setuptools/issue53

Philip J. Eby provisionally approved of one of the patches, except for
some specific requirement that I didn't really understand how to fix
and that now I don't exactly remember:

http://mail.python.org/pipermail/distutils-sig/2009-January/010880.html

Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com