Re: [Python-Dev] [Python-ideas] Ext4 data loss
Would there be interest in a filetools module? Replies and discussion to python-ideas please. I've been using and maintaining a few filesystem hacks for, let's see, almost nine years now: http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/fileutil.py (The first version of that was probably written by Greg Smith in about 1999.) I'm sure there are many other such packages. A couple of quick searches of pypi turned up these two: http://pypi.python.org/pypi/Pythonutils http://pypi.python.org/pypi/fs I wonder if any of them have the sort of functionality you're thinking of. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3rd party developers: don't change your APIs when porting to Py3k! (but consider using ctypes)
I'm the maintainer of a few Python packages which wrap native C or C+ + code. At Pycon, I learned that PyPy and Jython support ctypes or have plans to do so in the near future. I don't know about IronPython. However, having CPython, PyPy, and Jython all supporting ctypes makes it the obvious choice for interfacing to native code in the future. Regards, Zooko O'Whielacronx ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] PEP 365 (Adding the pkg_resources module)
Folks: (By the way, it was a pleasure to meet many of you in Real Life for the first time at Pycon.) Here is what I want: 1. The standard Python build tools by default produce machine- parseable package metadata, which can include package dependency information with reasonably well-defined semantics. Oh good! I already have this, since distutils in Python >= 2.5 produces .egg-info metadata in an easy-to-parse format. 2. This machine-parseable metadata is widely supported and understood by the Python community. In retrospect, it's too bad that it isn't named ".pkg-info" instead of ".egg-info", in order to avoid the fraught politics around the concept of "eggs". A concrete example of such a misunderstanding is the sad fact that many Linux distributions were in the habit of deleting this information from their Python packages, perhaps because they were under the impression that it was obviated by their packaging tools. The major distributions have all stopped doing this now. Unifying the created-by-default PKG-INFO files and the created-by- default .egg-info directories would be nice, too. 3. The standard Python library includes a tool to find and parse this metadata, so that I can write programmatic tests of my dependencies, like this: http://allmydata.org/trac/tahoe/browser/_auto_deps.py?rev=2062 This is one of the improvements that I was anticipating from pkg_resources going into the standard library. 4. The standard Python library includes a tool to find and read resources (other than Python modules) that came bundled in a Python package. Consider, for example, this snippets of code in Nevow: http://divmod.org/trac/browser/trunk/Nevow/setup.py?rev=13786#L10 http://divmod.org/trac/browser/trunk/Nevow/setup.py?rev=13786 http://divmod.org/trac/browser/trunk/Nevow/setup_egg.py?rev=2406 When Nevow uses pkg_resources to import its files such as "default.css", then it is able to find at runtime, even if is being imported from a py2exe or py2app zip, or on other platforms where its homegrown setup script and homegrown "find my file" function fail. So using pkg_resources (and setuptools to install it) makes "test_nevow" pass on all of the allmydata.org buildslaves: http://allmydata.org/buildbot/waterfall?show_events=false Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)
On Mar 19, 2008, at 3:23 PM, Guido van Rossum wrote: > If other people want to chime in please do so; if this is just a > dialog between Phillip and me I might incorrectly assume > that nobody besides Phillip really cares. I really care. I've used setuptools, easy_install, eggs, and pkg_resources extensively for the past year or so (and contributed a few small patches). There have been plenty of problems, but I find them to be overall useful tools. It is a great boon to a programming community to lower the costs of re-using other people's code. The Python community will benefit greatly once a way to do that becomes widely enough accepted to reach a tipping point and become ubiquitous. Setuptools is already the de facto standard, but it hasn't become ubiquitous, possibly in part because of "egg hatred", about which more below. I've interviewed several successful Python hackers who "hate eggs" in order to understand what they hate about them, and I've taken notes from some of these interviews. (The list includes MvL, whose name was invoked earlier in this thread.) After filtering out yer basic complaining about bugs (which complaints are of course legitimate, but which don't indict setuptools as worse than other software of comparable scope and maturity), their objections seem to fall into two categories: 1. "The very notion of package dependency resolution and programmable or command-line installation of packages at the language level is a bad notion." This can't really be the case. If the existence of such functionality at the programming language level were an inherently bad notion, then we would be hearing some complaints from the Ruby folks, where the Gems system is standard and ubiquitous. We hear no complaints -- only murmurs of satisfaction. One person recently reported to me that while there are more packages in Python, he finds himself re-using other people's code more often when he works in Ruby, because almost all Ruby software is Gemified, but only a fraction of Python software is Eggified. Often this complaint comes with the idea that eggs conflict with their system-level package management tools. (These are usually Debian/Ubuntu users.) Note that Ruby software is not too hard to include in operating system packaging schemes -- my Ubuntu Hardy apt-cache shows plenty of Ruby software. A sufficiently mature and widely supported setuptools could actually make it easier to integrate Python software into Debian -- see stdeb [1]. 2. "Setuptools/eggs give me grief." What can really be the case is that setuptools causes a host of small, unnecessary problems for people who prefer to do things differently than PJE does. Personally, I prefer to use GNU stow, and setuptools causes unnecessary, but avoidable, problems for me. Many people object (rightly enough) to a "./setup.py install" automatically fetching new software over the Internet by default. The fact that easy_install creates a site.py that changes the semantics of PYTHONPATH is probably the most widely and deservedly hated example of this kind of thing [2]. I could go on with a few other common technical complaints of this kind. These type-2 problems can be fixed by changing setuptools or they can be grudgingly accepted by users, while retaining compatibility with the large and growing ecosystem of eggy software. Certainly fixing setuptools to play better with others is a more likely path to success than setting out to invent a non-egg-compatible alternative. Such a project might never be implemented well enough to serve, and if it were it would probably never overtake eggs's lead in the Python ecosystem, and if it did it would probably not turn out to be a better tool. So, since you asked for my chime, I advise you to publically bless eggs, setuptools, and easy_install as plausible future standards and solicit patches which address the complaints. For that matter, soliciting specific complaints would be a good start. I've done so in private many times with only partial success as to the "specific" part. One promising approach is to request objections in the form of automated tests that setuptools fails, e.g. [3]. Regards, Zooko O'Whielacronx [1] http://stdeb.python-hosting.com/ [2] http://www.rittau.org/blog/20070726-02 And no, PJE's suggested "trivial fix" does not satisfy the objectors, as it can't support the use case of "cd somepkg ; python ./ setup.py install ; cd .. ; python -c 'import somepkg'". [3] http://twistedmatrix.com/trac/ticket/2308#comment:5 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)
On Mar 20, 2008, at 7:44 AM, Tres Seaver wrote: > Paul Moore wrote: >> 4. Hard to use with limited connectivity. At work, I *only* have >> access to the internet via Internet Explorer (MS based proxy). There >> are workarounds, but ultimately "download an installer, then run it" >> is a far simpler approach for me. > > I don't know how to make this requirement compatible with using shared > dependencies, We've done something like this. The http://allmydata.org project bundles its easy_installable dependencies. If you get the current trunk from our darcs repository [1], or get a release tarball or a snapshot tarball from [2], then it comes with a directory named "misc/dependencies" which has the source tarballs of our easy_installable dependencies. You can browse this directory on the web: [3]. Therefore, if you manually satisfy the non-easy_installable dependencies, you can download an allmydata.org tarball, disconnect from the Internet (which we call "moving to a Desert Island"), and install it. This is, as you say, "compatible with using shared dependencies" because setuptools will detect if you already have sufficiently new versions of some of these dependencies installed (for example, if they are installed in Debian packages), and then skip the step of installing that dependency from its source tarball. The remaining dependencies that cannot be satisfied automatically by our setup.py are listed in the install.html [4]. They are: 1. g++ >= v3.3 -- the Cygwin version of gcc/g++ works for Cygwin and for Windows 2. GNU make 3. Python >= v2.4.2 including development headers i.e. "Python.h" 4. Twisted >= v2.4.0 -- from the Twisted "sumo" source tarball 5. OpenSSL >= v0.9.7, including development headers 6. PyOpenSSL == v0.6 7. Crypto++ >= v5.2.1, including development headers I am hoping that in the future Twisted (see twisted #1286 [5]) and pyOpenSSL will be easy_installable, and that our use of setuptools plugins will eventually replace our GNUmakefile and thus remove our dependency on GNUmake. That will leave only g++, Python, OpenSSL, and Crypto++ as dependencies that a user has to manually deal with in order to build allmydata.org from source. Regards, Zooko [1] http://allmydata.org/source/tahoe/trunk/ [2] http://allmydata.org/source/tahoe/tarballs/ [3] http://allmydata.org/trac/tahoe/browser/misc/dependencies [4] http://allmydata.org/source/tahoe/trunk/docs/install.html [5] http://twistedmatrix.com/trac/ticket/1286 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Wow, I think I actually *get* it now!
Phillip J. Eby wrote: > Hm. So it seems to me that maybe one thing that would help is a > "Setuptools Haters' Guide To Setuptools" -- that is, *short* > documentation specifically written for people who don't want to use > setuptools and want to minimize its impact on their systems. Perhaps relevant are my blog entries on how to use setuptools with GNU stow: https://zooko.com/log-2007.html#d2007-04-27- distutils_or_setuptools_with_GNU_stow https://zooko.com/log-2007.html#d2007-06-02- distutils_or_setuptools_with_GNU_stow Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)
On Mar 20, 2008, at 6:22 PM, Robert Brewer wrote: > Phillip J. Eby wrote: >> The other tool that would be handy to have, would be one that unpacks >> eggs into standard distutils-style installation. > > Hear, hear. I'm an author of a couple libraries that need to > interoperate with others. Of the many eggs I've downloaded over the > past > year, I'd say 80%+ are never installed or even built--I just want to > grep the source code, and using my preferred tools, not some lame Find > command in a ZIP browser menu. Um, isn't this tool called "unzip"? I have done this -- accessed the source code -- many times, and unzip suffices. I don't know what else would be required in order to make an egg into "a standard distutils-style installation". Until PJE's comment above, I thought that unzip already accomplished exactly that. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] how to easily consume just the parts of eggs that are good for you
Folks: Here is a simple proposal: make the standard Python "import" mechanism notice eggs on the PYTHONPATH and insert them (into the *same* location) on the sys.path. This eliminates the #1 problem with eggs -- that they don't easily work when installing them into places other than your site-packages and that if you allow any of them to be installed on your system then they take precedence over your non-egg packages even you explicitly put those other packages earlier in your PYTHONPATH. (That latter behavior is very disagreeable to more than a few prorgammers.) This also preserves most of the value of eggs for many use cases. This is backward-compatible with most current use cases that rely on eggs. This is very likely forward-compatible with new schemes that are currently being cooked up and will be deployed in the future. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to easily consume just the parts of eggs that are good for you
On Mar 26, 2008, at 7:34 PM, Chris McDonough wrote: > zooko wrote: http://mail.python.org/pipermail/python-dev/2008-March/078243.html >> Here is a simple proposal: make the standard Python "import" >> mechanism notice eggs on the PYTHONPATH and insert them (into the >> *same* location) on the sys.path. >> This eliminates the #1 problem with eggs -- that they don't >> easily work when installing them into places other than your site- >> packages and that if you allow any of them to be installed on >> your system then they take precedence over your non-egg packages >> even you explicitly put those other packages earlier in your >> PYTHONPATH. (That latter behavior is very disagreeable to more >> than a few prorgammers.) > > Sorry if I'm out of the loop and there's some subtlety here that > I'm disregarding, but it doesn't appear that either of the issues > you mention is a actually problem with eggs. These are instead > problems with how eggs get installed by easy_install (which uses > a .pth file to extend sys.path). It's reasonable to put eggs on > the PYTHONPATH manually (e.g. sys.path.append('/path/to/some.egg')) > instead of using easy_install to install them. Yes, you are missing something. While many programmers, such as yourself and Lennart Regebro (who posted to this thread) find the current eggs system to be perfectly convenient and to Just Work, many others, such as Glyph Lefkowitz (who posted to a related thread) find them to be so annoying that they actively ensure that no eggs are ever allowed to touch their system. The reasons for this latter problem are two: 1. You can't conveniently install eggs into a non-system directory, such as ~/my-python-stuff. 2. If you allow even a single egg to be installed into your PYTHONPATH, it will change the semantics of your PYTHONPATH. Both of these problems are directly caused by the need for eggs to hack your site.py. If Python automatically added eggs found in the PYTHONPATH to the sys.path, both of these problems would go away. I am skeptical that the current proposals to define a new database for installed packages will fare any better than the current eggs scheme does in this respect. This issue is important to me, because the benefits of eggs grow superlinearly with the number of good programmers who use them. They are a tool for re-using source code -- a tool for cooperation between programmers. To gain the greatest benefits at this point we do not need to add new features to eggs, we need to make them more palatable to more good programmers. I am skeptical that prorgammers are going to be willing to use a new database format. They already have a database -- their filesystem -- and they already have the tools to control it -- mv, rm, and PYTHONPATH. Many of them already hate the existence the "easy_instlal.pth" database file, and I don't see why a new database file would be any different. My proposal makes the current benefits of eggs -- clean, easy code re- use among programmers -- more compatible with their current tools -- mv, rm, and PYTHONPATH. It is also forward-compatible with more sophisticated proposals to add features like uninstall and operating system integration. By the way, since I posted my proposal two weeks ago I have pointed a couple of Python hackers who currently refuse to use eggs at the URL: http://mail.python.org/pipermail/python-dev/2008-March/078243.html They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said "So they are planning to reinvent apt!". Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Apr 8, 2008, at 11:27 AM, Lloyd Kvam wrote: > > When I wear my sysadmin hat, eggs become a nuisance. ... > As a developer, eggs are great. ... > Fortunately, distutils includes tools like bdist_rpm so that python > modules can be packaged for easy processing by the system package > manager. So once I need to switch back to a sysadmin role, I can use > the system tools to install and track packages. This is the same experience I have. I rely on setuptools and eggs extensively in developing our software, and I use setuptools and eggs as the primary method of giving our source code to other programmers. But no software is ever installed on our production servers unless that software is in .deb form in an apt-gettable repository, and this requirement is unlikely to change for the forseeable future. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Apr 8, 2008, at 9:41 PM, Phillip J. Eby wrote: > > I'm curious. Have any of you actually read PEP 262 in any detail? I read it, though not in fine detail. I didn't write that you are planning to reinvent apt. I wrote that when programmers hear about this PEP they exclaim "They are planning to reinvent apt!". That is a matter of perception and marketing -- the value that I want to gain from Python packages is the value of a critical mass of good programmers using compatible tools for code re-use. If a lot of programmers hate an idea, then it doesn't matter what the details are -- it isn't going to provide this value to me. I think part of our disagreement is that we are talking about two overlapping use cases: programmer and sysadmin (and "end user" which is much like sysadmin). I am not, at this time, interested in the sysadmin use case. As I've mentioned, my sysadmin needs are currently well satisfied by apt (and sometimes by GNU stow), and my fellow sysadmins with whom I work are absolutely not going to relax their "apt-only policy" for the forseeable future, so I cannot use such a tool unless it is named "apt" and written by Debian/Ubuntu. On the other hand I am very interested in the programmer use case, because setuptools/easy_install already works pretty well for that, and we are already very close to achieving a critical mass of good programmers. Recently several more packages that my project [1] relies on have become easy_installable -- Twisted, pywin32 (thanks to you, PJE), and foolscap -- and several more have had bugfixes which make them work better with easy_install/setuptools -- Nevow and zope.interface -- and there are some patches in the queue to make another one compatible with setuptools -- pyflakes [2, 3]. So setuptools/easy_install is already (slowly) winning. I want to accelerate that process by reducing the degree to which it is incompatible, inconvenient, or objectionable to other programmers. PEP 262 sounds like a non-starter to me because 1. It appears to be backwards-incompatible with setuptools/ easy_install/eggs, thus losing all the recently gained cooperation that I mentioned in the previous paragraph, and 2. It defines a new database file, where I would prefer either: a. Doing away with database files entirely and relying on the filesystem alone to hold that information, or b. Continuing to use the current ".pth" database file format, possibly improved by having native support for .pth files in the Python import machinery. 3. Because of #2, it triggers programmers to exclaim "They are planning to reinvent apt!", thus making it unlikely that the new proposal will recapture the cooperation that setuptools has already (slowly) gained. I'm sorry, PJE -- I know it must be frustrating to you to have your proposal criticized on social rather than technical grounds -- but social benefits are what I am seeking right now. Perhaps PEP 262 and my proposal are not actually alternatives, but are complementary. I do not object to Python maintaining a database of installed packages for itself (although I cannot *rely* upon such behavior, not least because I will be maintaining backwards compatibility with Python 2.4 for at least the next several years, and with Python 2.5 for at least the next several years after that). What I want is for the already implemented, tested, and deployed code- re-use features of setuptools/easy_install to be more widely accepted. This is best and most easily achieved by fixing the two most frequent objections to setuptools/easy_install: 1. That you can't conveniently install into an arbitrary directory, and 2. that it subverts the meaning of your PYTHONPATH. Regards, Zooko [1] http://allmydata.org/source/tahoe/trunk/docs/install.html [2] http://divmod.org/trac/ticket/2535 [3] http://divmod.org/trac/ticket/2048 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to easily consume just the parts of eggs that are good for you
On Apr 8, 2008, at 4:36 PM, Greg Ewing wrote: > > I discovered another annoyance with eggs the other day -- it > seems that tracebacks referring to egg-resident files contain the > pathname of some temporary directory that existed when the egg > was being packaged, rather than the one it actually exists in > at run time. Brian Warner and I discovered that issue yesterday, too. We determined that if you install the egg (with easy_install or with a setuptools-powered ./setup.py install) in unzipped form then the source file names get rewritten so that your stack traces come with source lines. If you have a package which requires stack traces to come with source lines, then you could pass "zip_safe=False" to the call to setup(). I would prefer that zip_safe=False were the default and that either the producer or the consumer of a package had to specifically choose zip_safe=True in order to install eggs in zipped form. I've opened a ticket on my setuptools trac: http://allmydata.org/trac/setuptools/ticket/4 Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Apr 9, 2008, at 6:00 AM, Phillip J. Eby wrote: >> >> By the way, if these tools work well, they are priceless! > > I haven't had need to use any of them, so I don't really know. They are easydeb [1] and stddeb [2]. The former appears to be incomplete and unmaintained. The latter appears to be usable, but somewhat incomplete -- substantial manual labor is required to use it successfully, as documented by my programming partner Brian Warner in this ticket: [3]. Regards, Zooko [1] http://easy-deb.sourceforge.net/ [2] http://stdeb.python-hosting.com/ [3] http://allmydata.org/trac/tahoe/ticket/251 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Apr 9, 2008, at 12:40 PM, Phillip J. Eby wrote: >> >> You are talking here about bdist_rpm and not about a tool that >> would take >> a Python package distributed as an egg file and convert the egg to >> an rpm >> or a deb. Unfortunately, some Python packagers are beginning to >> limit >> their focus only to egg distribution. That creates a problem for >> users >> who have native operating system package management. > > That is indeed a problem -- but it's a social one, not a technical > one. It's trivial for the publisher of an egg to change their > command line from "setup.py bdist_egg upload" to "setup.py sdist > bdist_egg upload", as soon as their users (politely) request that > they do so. In general, it would be good if eggs came with .py files by default instead of .pyc files. I've opened a ticket on my setuptools trac about this proposal: http://allmydata.org/trac/setuptools/ticket/5 # binary eggs should come with .py files by default, rather than .pyc files Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Apr 9, 2008, at 4:12 PM, Phillip J. Eby wrote: >> http://allmydata.org/trac/setuptools/ticket/5 # binary eggs should >> come with .py files by default, rather than .pyc files > > Filling your tracker with already-rejected proposals isn't likely > to encourage me to look at it, especially when I've personally > rejected them to you in IRC. That goes for your ticket #4 as well. Part of my motivation in maintaining this tracker is to take issue discussions from IRC, and from mailing lists, and make them more permanent and structured. This part is useful even for rejected proposals, as an historical record that other people interested in those issues can consult. I will mark those two tickets as "rejected by PJE". Could you please repeat (so that I don't misrepresent you due to my faulty memory of our IRC discussion from more than a year ago) your reason for rejecting these two: http://allmydata.org/trac/setuptools/ticket/4 (when considering whether to zip, err on the side of safety rather than performance) http://allmydata.org/trac/setuptools/ticket/5 (binary eggs should come with .py files by default, rather than .pyc files) You are of course welcome to log in to that trac and update those tickets yourself, but if you prefer not to then I will paste your reasons into those tickets. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] an example of setuptools being used to good effect -- allmydata.org Tahoe
Folks: I'm sorry, but I am not caught up on the current conversation about packaging. I'm very busy with exciting Python development -- http:// allmydata.com and http://allmydata.org . I understand from PJE's message that he thinks I misunderstand some things about PEP 262; this is entirely possible. I intend to catch up on reading the emails of this conversation and to read carefully PJE's messages about PEP 262 in the coming days. Meanwhile, here is the last message that I wrote before I got swamped with the aforementioned excitement: On Apr 9, 2008, at 5:59 PM, Greg Ewing wrote: > Paul Moore wrote: > >> I believe that Mac OS X goes for an even simpler structure - >> applications store *everything* in the one directory, so that >> install/uninstall is simply a directory copy/remove. > > Yep, and thereby cuts the whole gordian knot, throws the > pieces on the fire and burns them. :-) > > Package managers have always seemed to me to be an > excessively complex solution to a problem that needn't > have existed in the first place. Yes! I love the Zen of the Mac OS X packaging approach. The best install is none at all! (Of course, I also love apt.) > I keep hoping that someday Linux will support something > like MacOSX application bundles and frameworks, but I > haven't seen any sign of it yet. We're slowly approaching this level of simplicity in packaging of the *source code* of Allmydata.org "Tahoe", using setuptools. http://allmydata.org/source/tahoe/trunk/docs/install.html The list of dependencies which are automatically resolved by setuptools is visible here: [1]. It currently includes zfec, foolscap, simplejson, pycryptopp, nevow, zope.interface, twisted, and pywin32. This automatic resolution of dependencies works while fully preserving the user's ability to use their own packages and their own packaging tools. That is: 1. If any of these dependencies are already installed, such as in a Debian package or if the user has installed them by hand, then installing Tahoe will use the already-installed versions and not install anything new, and 2. For any of these dependencies that are not already installed, installing Tahoe will *not* write these dependencies into your standard system directory (which is potentially a sacred place where only your own packaging tool or your root account is allowed to tread) but will instead write them into a local, newly-created install directory from which you can then run Tahoe. (This part is similar in spirit to the Mac OS packaging technique.) Also, this install process never downloads anything from the Internet at install time, per our policy [2, 3], which also happens to be a policy some other people have, e.g. [4, 5]. This works on all of our supported platforms, which includes Linux, Solaris, Windows, Cygwin, and Mac OS X. Oh yes, we also have our buildbot [6] automatically produce Debian packages for edgy, feisty, etch, and gutsy. As far as I know, all of this is accomplished without breaking any of the use cases traditionally associated with setuptools / easy_install, for example "easy_install allmydata-tahoe" works, and if you want "setup.py install" to install into your standard system directory, it will. The reason that I am posting this is to let other programmers know that setuptools is actually a pretty useful tool, even if the use cases that you want to support are incompatible with certain easy_install traditions such as fetching new packages from the internet at buildtime or installing into your system directory. Regards, Zooko P.S. Two days ago I was able to remove twisted from the list of "Manual Dependencies" that people have to be aware of in order to try out Allmydata Tahoe from the source tarball. I think I can safely remove pyOpenSSL now, but that remains to be properly tested by our buildbot. I will be able to remove Crypto++ soon, due to the pycryptopp [7] library. If I can figure out a hack to work-around one of the major frustrations of setuptools (that you can't simply run "./setup.py install --prefix=$FOO"), and if I finish my setuptools plugin to run Twisted trial instead pyunit when "./setup.py test", then I'll be able to remove GNU make from the dependencies. That will leave only g++, Python, and OpenSSL as packages that a programmer has to manually deal with in order to try out Allmydata Tahoe from source. [1] http://allmydata.org/trac/tahoe/browser/_auto_deps.py [2] http://allmydata.org/trac/tahoe/wiki/Packaging [3] http://allmydata.org/trac/tahoe/ticket/229 [4] http://bytes.com/forum/thread666455.html [5] http://fedoraproject.org/wiki/Packaging/Python/Eggs [6] http://allmydata.org/buildbot/waterfall?show_events=false [7] http://allmydata.org/trac/pycr
[Python-Dev] shal we redefine "module" and "package"?
Folks: Here's an experiment you can perform. Round up a Python programmer and ask him the following three questions: Q1. You type "import foo" and it works. What kind of thing is foo? Q2. You go to the Python package index and download something named "bar-1.0.0.tar.gz". What kind of thing is bar? Q3. What is a "distribution"? I'm willing to bet that you will get the following answers: A1. foo is a module. A2. bar is a package. A3. A distribution is a version of Linux that comes with a lot of Free Software. Unfortunately these answers aren't quite right. A "package" is actually a directory containing an __init__.py file, and a distribution is actually what you think of when you say "package" -- a reusable package of Python code that you can, for example, get from the Python package index. Educational efforts such as the Python tutorial and the distutils docs have not succeeded in training Python programmers to understand the terminology for these things as used by the Python implementors, so perhaps instead the implementors should start using the terminology understood by the programmers: 1. A "module" shall henceforth be the name for either a foo.py file (a single-file module), or a directory with an __init__.py in it (a directory module). 2. A "package" shall henceforth be the name of the thing that is currently called a "distribution". Regards, Zooko who doesn't mind stirring up trouble on occasion... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] shal we redefine "module" and "package"?
On Apr 30, 2008, at 5:11 PM, [EMAIL PROTECTED] wrote: I have a less disruptive counterproposal. How about just starting to refer to directories (or "folders", or zip entries) with '__init__.py' in them as "package modules"? A package is-a module anyway. That's a good idea. I belive a multi-word term here would be similarly more memorable and precise. A "package distribution" would include the more familiar term while still being specific, consistent with the old terminology, and correct. Using a qualifying word is probably a good idea in this context anyway. I usually say "debian package", "RPM", "MSI", or "tarball" unless I'm specifically talking about "packages for your platform", That's a good one too. almost always in the phrase, "please do not use distutils to do a system install of Twisted, use the specific package for your platform". This is a tangent, but why do you give that advice? I typically give people the opposite advice on how to install Twisted. I do, however, agree with Steve emphatically on your original proposal. Changing the terminology now will make billions upon billions of Python web pages, modules (c.f. twisted.python.modules.PythonModule.isPackage()) documents, and searchable message archives obsolete, not to mention that 90% of the community will probably ignore you and use the old terminology anyway, creating more confusion than it eliminates. I suspect 90% of the community already uses my proposed terminology -- that was my original challenge to round up a Python programmer and find out. But I agree that my proposal would contribute to confusion and disruption, and I like your counterproposals better, at least for now. Directories, folders, or zip entries with __init__.py in them are "package modules", and Python packages are "package distributions". Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
On Jun 14, 2008, at 8:26 PM, Guido van Rossum wrote: No, but an ordered dict happens to be a *very* common thing to need, for a variety of reasons. So I'm +0.5 on adding this to the collections module. However someone needs to contribute working code. Here's an LRU cache that I've used a few times over the years: http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/cache.py This is just like a dict ordered by insertion, except: 1. That it removes the oldest entry if it grows beyond a limit. 2. That it moves an entry to the head of the queue when has_key() is called on that item. So, it would be easy to change those two behaviors in order to use this implementation. There are actually three implementations in that file: one that is asymptotically O(1) for all operations (using a double-linked list woven into the values of the dict), and one that uses a Python list to hold the order, so it is faster for small enough dicts. The third implementation is an implementation that someone else wrote that I included just for comparison purposes -- the comparison showed that each of mine was better. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
On Jun 15, 2008, at 12:20 PM, zooko wrote: So, it would be easy to change those two behaviors in order to use this implementation. There are actually three implementations in that file: one that is asymptotically O(1) for all operations (using a double-linked list woven into the values of the dict), and one that uses a Python list to hold the order, so it is faster for small enough dicts. P.S. I didn't mean to fall for the common misunderstanding that hash table operations are O(1). What I should have written is that my ordered dict technique *adds* only O(1) time to the time of the dict on which it is built. As to the question of how important or common this data structure is, I have to admit that while I implemented this one and used it several times (always exclusively for LRU caching), I currently don't use it for anything. Nowadays I try to avoid doing transparent caching (such as LRU caches are often used for) in favor of explicit management of the resource. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Base-85
On Aug 2, 2008, at 13:58 PM, Antoine Pitrou wrote: Martin v. Löwis v.loewis.de> writes: P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve. According to Wikipedia, "its main modern use is in Adobe's PostScript and Portable Document Format file formats". ... git ... mercurial ... bzr It's sort of too bad about the April Fool's RFC, because now people tend to think that an encoding with a non-power-of-2 base is just a joke. I had to overcome that when working with my programming partner, but he eventually decided that base-62 was indeed a useful encoding for our purposes. :-) I've written a few ascii encoders over the years, mostly in Python, plus an optimized C version of base-32 (with a real live Duff's Device): base62.py: http://allmydata.org/source/z-base-62/trunk-hashedformat/z-base-62/ base62/base62.py base36.py: http://allmydata.org/source/z-base-36/trunk-hashedformat/z-base-36/ base36/base36.py base32.py: http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/ base32/base32.py base32.c: http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/base32.c Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bsddb
On Sep 7, 2008, at 12:04 PM, Gregory P. Smith wrote: FWIW, many years ago in the past when I asked sleepycat about this (long before oracle bought them) they said that python was considered to be the application. Using berkeleydb via python for a commercial application did not require a berkeleydb license. They also posted a FAQ on their web site which included that statement, including specifically declaring that using BerkeleyDB via Python for a commercial product did not require a commercial licence. Oh, look, it is still there: http://www.oracle.com/technology/software/products/berkeley-db/htdocs/ licensing.html """ Q. Do I have to pay for a Berkeley DB license to use it in my Perl or Python scripts? A. No, you may use the Berkeley DB open source license at no cost. The Berkeley DB open source license requires that software that uses Berkeley DB be freely redistributable. In the case of Perl or Python, that software is Perl or Python, and not your scripts. Any scripts you write are your property, including scripts that make use of Berkeley DB. None of the Perl, Python or Berkeley DB licenses place any restrictions on what you may do with them. """ Regards, Zooko --- http://allmydata.org -- Tahoe, the Least-Authority Filesystem http://allmydata.com -- back up all your files for $5/month ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Thanks for writing this PEP 383, MvL. I recently ran into this problem in Python 2.x in the Tahoe project [1]. The Tahoe project should be considered a good use case showing what some people need. For example, the assumption that a file will later be written back into the same local filesystem (and thus luckily use the same encoding) from which it originally came doesn't hold for us, because Tahoe is used for file-sharing as well as for backup-and-restore. One of my first conclusions in pursuing this issue is that we can never use the Python 2.x unicode APIs on Linux, just as we can never use the Python 2.x str APIs on Windows [2]. (You mentioned this ugliness in your PEP.) My next conclusion was that the Linux way of doing encoding of filenames really sucks compared to, for example, the Mac OS X way. I'm heartened to see what David Wheeler is trying to persuade the maintainers of Linux filesystems to improve some of this: [3]. My final conclusion was that we needed to have two kinds of workaround for the Linux suckage: first, if decoding using the suggested filesystem encoding fails, then we fall back to mojibake [4] by decoding with iso-8859-1 (or else with windows-1252 -- I'm not sure if it matters and I haven't yet understood if utf-8b offers another alternative for this case). Second, if decoding succeeds using the suggested filesystem encoding on Linux, then write down the encoding that we used and include that with the filename. This expands the size of our filenames significantly, but it is the only way to allow some future programmer to undo the damage of a falsely- successful decoding. Here's our whole plan: [5]. Regards, Zooko [1] http://allmydata.org [2] http://allmydata.org/pipermail/tahoe-dev/2009-March/001379.html # see the footnote of this message [3] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [4] http://en.wikipedia.org/wiki/Mojibake [5] http://allmydata.org/trac/tahoe/ticket/534#comment:47 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote: Are you proposing to unconditionally encode file names as iso8859-15, or to do so only when undecodeable bytes are encountered? For what it is worth, what we have previously planned to do for the Tahoe project is the second of these -- decode using some 1-byte encoding such as iso-8859-1, iso-8859-15, or windows-1252 only in the case that attempting to decode the bytes using the local alleged encoding failed. If you switch to iso8859-15 only in the presence of undecodable UTF-8, then you have the same round-trip problem as the PEP: both b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a way to unambiguously recover the original file name. Why do you say that? It seems to work as I expected here: >>> '\xff'.decode('iso-8859-15') u'\xff' >>> '\xc3\xbf'.decode('iso-8859-15') u'\xc3\xbf' >>> >>> >>> >>> '\xff'.decode('cp1252') u'\xff' >>> '\xc3\xbf'.decode('cp1252') u'\xc3\xbf' Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)
On Apr 28, 2009, at 13:01 PM, Thomas Breuel wrote: (2) Should the default UTF-8 encoder for file system operations be allowed to generate illegal byte sequences? I think that's a definite no; if I set the encoding for a device to UTF-8, I never want Python to try to write illegal UTF-8 strings to my device. ... If people really want the option of (3c), then I think encoders related to the file system should by default reject those strings as illegal because the potential problems from writing them are just too serious. Printing routines and UI routines could display them without error (but some clear indication), of course. For what it is worth, sometimes we have to write bytes to a POSIX filesystem even though those bytes are not the encoding of any string in the filesystem's "alleged encoding". The reason is that it is common for there to be filenames which are not the encodings of anything in the filesystem's alleged encoding, and the user expects my tool (Tahoe-LAFS [1]) to copy that name to a distributed storage grid and then copy it back unchanged. Even though, I re-iterate, that name is *not* a valid encoding of anything in the current encoding. This doesn't argue that this behavior has to be the *default* behavior, but it is sometimes necessary. It's too bad that POSIX is so far behind Mac OS X in this respect. (Also so far behind Windows, but I use Mac as the example to show how it is possible to build a better system on top of POSIX.) Hopefully David Wheeler's proposals to tighten the requirements in Linux filesystems will catch on: [2]. Regards, Zooko [1] http://allmydata.org [2] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 and GUI libraries
x27;, 'python-replace')" with ".decode('windows-1252')" and it works just as well. While UTF-8b seems like a really cool hack, and it would produce more legible results if utf-8-encoded strings were partially corrupted, I guess I should just use 'windows-1252' which is already implemented in Python 2 (as well as in all other software in the world). I guess this means that PEP 383, which I have approved of and liked so far in this discussion, would actually not help Tahoe at all and would in fact harm Tahoe -- I would have to remember to detect and work-around the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python 3. If anyone else has a concrete, real use case which would be helped by PEP 383, I would like to hear about it. Perhaps Tahoe can learn something from it. Oh, if this PEP could be extended to add a flag to each unicode object indicating whether it was created with the python-escape handler or not, then it would be useful to me. Regards, Zooko [1] http://mail.python.org/pipermail/python-dev/2009-April/089020.html [2] http://allmydata.org/trac/tahoe/attachment/ticket/534/fsencode.3.py ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 and GUI libraries
Following-up to my own post to correct a major error: On Thu, Apr 30, 2009 at 11:44 PM, Zooko O'Whielacronx wrote: > Folks: > > My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary > binary names from the filesystem and store them so that I can regenerate > the same byte string later, but it also requires that I *know* whether > what I got was a valid string in the expected encoding (which might be > utf-8) or whether it was not and I need to fall back to storing the > bytes. Okay, I am wrong about this. Having a flag to remember whether I had to fall back to the utf-8b trick is one method to implement my requirement, but my actual requirement is this: Requirement: either the unicode string or the bytes are faithfully transmitted from one system to another. That is: if you read a filename from the filesystem, and transmit that filename to another system and use it, then there are two cases: Requirement 1: the byte string was valid in the encoding of source system, in which case the unicode name is faithfully transmitted (i.e. the bytes that finally land on the target system are the result of sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding). Requirement 2: the byte string was not valid in the encoding of source system, in which case the bytes are faithfully transmitted (i.e. the bytes that finally land on the target system are the same as the bytes that originated in the source system). Now I finally understand how fiendishly clever MvL's PEP 383 generalization of Markus Kuhn's utf-8b trick is! The only thing necessary to achieve both of those requirements above is that the 'python-escape' error handler is used on the target system .encode() as well as on the source system .decode()! Well, I'm going to have to let this sink in and maybe write some code to see if I really understand it. But if this is right, then I can do away with some of the mechanism that I've built up, and instead: Backport PEP 383 to Python 2. And, document the PEP 383 trick in some generic, widely respected format such as an Internet Draft so that I can explain to other users of the Tahoe data (many of whom use other languages than Python) what they have to do if they find invalid utf-8 in the data. Oh good, I just realized that Tahoe emits only utf-8, so all I have to do is point them to the utf-8b documents (such as they are) and explain that to read filenames produced by Tahoe they have to implement utf-8b. That's really good that they don't have to implement MvL's generalization of that trick to other encodings, since utf-8b is already understood by some folks. Okay, I find it surprisingly easy to make subtle errors in this encoding stuff, so please let me know if you spot one. Is it true that srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', 'python-escape') will always produce srcbytes ? That is my Requirement 2. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 and GUI libraries
Folks: Being new to the use of gmail, I accidentally sent the following only to MvL and not to the list. He promptly replied with a helpful counterexample showing that my design can suffer collisions. :-) Regards, Zooko On Fri, May 1, 2009 at 10:38 AM, "Martin v. Löwis" wrote: >> >> Requirement: either the unicode string or the bytes are faithfully >> transmitted from one system to another. > > I don't understand this requirement very well, in particular not > the "faithfully" part. > >> That is: if you read a filename from the filesystem, and transmit that >> filename to another system and use it, then there are two cases: > > What do you mean by "use it"? Things like opening files? How does > that work? In general, a file name valid on one system is invalid > on a different system - or, at least, refers to a different file > over there. This is independent of encodings. Tahoe is a backup and filesharing program, so you might for example, execute "tahoe cp -r Motörhead tahoe:" to copy all the contents of your "Motörhead" directory to your Tahoe filesystem. Later you or a friend, might execute "tahoe cp -r tahoe:Motörhead ." to copy everything from that directory within your Tahoe filesystem to your local filesystem. So in this case the flow of information is local_system_1 -> Tahoe -> local_system_2. The Requirement 1 is that for each filename encountered which is a valid encoding in local_system_1, then the resulting (unicode) name is transmitted through the Tahoe filesystem and then written out into local_system_2 in the expected way (i.e. just by using the Python unicode APIs and passing the unicode object to them). Requirement 2 is that for each filename encountered which is not a valid encoding in local_system_1, then the original bytes are transmitted through the Tahoe filesystem and then, if the target system is a byte-oriented system such as Linux, the original bytes are written into the target filesystem. (If the target is not Linux then mojibake! but we don't have to go into that now.) Does that make sense? > In all your descriptions, I'm puzzled as to where exactly you get > the source bytes from. If you use the PEP 383 interfaces, you will > start with character strings, not byte strings, always. On Mac and Windows, we use the Python unicode APIs e.g. os.listdir(u"Motörhead"). On Linux and Solaris, we use the Python bytestring APIs e.g. os.listdir("Motörhead".encode(sys.getfilesystemencoding())). >> Okay, I find it surprisingly easy to make subtle errors in this encoding >> stuff, so please let me know if you spot one. Is it true that >> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', >> 'python-escape') will always produce srcbytes ? > > I think you mixed up bytes and unicode here: if srcbytes is indeed > a bytes object, then you can't apply .encode to it. Yep, I reversed the order of encode() and decode(). However, my whole statement was utterly wrong and shows that I still didn't fully get it yet. I have flip-flopped again and currently think that PEP 383 is useless for this use case and that my original plan [1] is still the way to go. Please let me know if you spot a flaw in my plan or a ridiculousity in my requirements, or if you see a way that PEP 383 can help me. Thank you very much. Regards, Zooko [1] http://allmydata.org/trac/tahoe/ticket/534#comment:47 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 and GUI libraries
file. Therefore these three requirements imply that we have to detect such collisions and deal with them somehow. (Thanks to Martin v. Löwis for reminding me of this.) Possible Requirement 4 (faithful bytes if not unicode, a.k.a. "round-tripping"): Suppose you have a directory with some files with Japanese names, encoded using shift-jis, and some files with Russian names, encoded using koi8-r. Suppose your locale is set to shift-jis, and then you do "tahoe cp -r myfiles/ tahoe:". Then suppose you or someone else does "tahoe cp -r tahoe: copy_of_myfiles/". The "round-tripping" feature is that the files with Russian names that did not accidentally decode cleanly with shift-jis still have the same bytes in their names as they did in the original myfiles directory. As I write this, I am becoming skeptical of this (faithful bytes if not unicode, a.k.a. "round-tripping"), thanks in part to criticism from James Knight, MvL, Thomas Breuel, and others. One reason to be skeptical is that about a third of the Russian files will happen to decode cleanly as shift-jis anyway, and will therefore come out as something entirely different if the target filesystem's encoding is something other than shift-jis. But an even worse problem -- the show-stopper for me -- is that I don't want what Tahoe shows when you do "tahoe ls" or view it in a web browser to differ from what it writes out when you do "tahoe cp -r tahoe: newfiles/". So I'm ready to reject this one. Now about the "metadata" part which is separate from the filename itself. I have another requirement: Requirement 5 (no loss of information): I don't want Tahoe to destroy information -- every transformation should be (in principle) reversible by some future computer-augmented archaeologist. For example, if a bytestring decodes cleanly with the locale's suggested encoding, and we use the resulting unicode as the filename, then we also store the original byte string in the metadata since we don't know if the locale's suggested encoding was good. This allows the later invention of a tool which shows the user what the filename would have been with other encodings and let the user choose one that makes sense. It is important to note that this does not impose any requirement on the *filename* itself -- all such information can be stored in the metadata. Okay, in light of the above four requirements and the rejection of #4, I hereby propose to change from the previous Tahoe design [2] to the following: To copy an entry from a local filesystem into Tahoe: 1. On Windows or Mac read the filename with the unicode APIs. Normalize the string with filename = unicodedata.normalize('NFC', filename). Leave the "original_bytes" key and the "failed_decode" flag out of the metadata. 2. On Linux or Solaris read the filename with the string APIs, and store the result in the "original_bytes" part of the metadata. Call sys.getfilesystemencoding() to get an alleged_encoding. Then, call bytes.decode(alleged_encoding, 'strict') to try to get a unicode object. 2.a. If this decoding succeeds then normalize the unicode filename with filename = unicodedata.normalize('NFC', filename), store the resulting filename and leave the "failed_decode" flag out of the metadata. 2.b. If this decoding fails, then we decode it again with bytes.decode('latin-1', 'strict'). Do not normalize it. Store the resulting unicode object into the "filename" part, set the "failed_decode" flag to True. This is mojibake! 3. (handling collisions) In either case 2.a or 2.b the resulting unicode string may already be present in the directory. If so, check the failed_decode flags on the current entry and the new entry. If they are both set or both unset then the new entry overwrites the old entry -- they had the same name. If the failed_decode flags differ then this is a case of collision -- the old entry and the new entry had (as far as we are concerned) different names that accidentally generated the same unicode. Alter the new entry's name, for example by appending "~1" and then trying again and incrementing the number until it doesn't match any extant entry. To copy an entry from Tahoe into a local filesystem: Always use the Python unicode API. The original_bytes field and the failed_decode field in the metadata are not consulted. Now a question for python-dev people: could utf-8b or PEP 383 be useful for requirements like the four requirements listed above? If not, what requirements does PEP 383 help with? I'm sure that if can help with the use case of "I'm doing os.listdir() and then I'm going to turn around and use the resulting unicode objects on the same local filesystem in the same Python process". I'm not sure that it can help if
Re: [Python-Dev] PEP 383 and Tahoe [was: GUI libraries]
Thank you for sharing your extensive knowledge of these issues, SJT. On Sun, May 3, 2009 at 3:32 AM, Stephen J. Turnbull wrote: > Zooko O'Whielacronx writes: > > > However, it is moot because Tahoe is not a new system. It is > > currently at v1.4.1, has a strong policy of backwards- > > compatibility, and already has lots of data, lots of users, and > > programmers building on top of it. > > Cool! Thanks! Actually yes it is extremely cool that it really does this encryption, erasure-encoding, capability-based access control, and decentralized topology all in a fully functional, stable system. If you're interested in such stuff then you should definitely check it out! > Question: is there a way to negotiate versions, or better yet, > features? For the peer-to-peer protocol there is, but the persistent storage is an inherently one-way communication. A Tahoe client writes down information, and at a later point a Tahoe client, possibly of a different version, reads it. There is no way for the original writer to ask what versions or features the readers may eventually have. But, the writer can write down optional information which will be invisible to readers that don't know to look for it, but adding it into the "metadata" dictionary. For example: http://testgrid.allmydata.org:3567/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/?t=json renders the directory contents into json and results in this: "r\u00e9sum\u00e9.html": [ "filenode", { "mutable": false, "verify_uri": "URI:CHK-Verifier:63y4b5bziddi73jc6cmyngyqdq:5p7cxw7ofacblmctmjtgmhi6jq7g5wf77tx6befn2rjsfpedzkia:3:10:8328", "metadata": { "ctime": 1241365319.0695441, "mtime": 1241365319.0695441 }, "ro_uri": "URI:CHK:no2l46woyeri6xmhcrhhomgr5a:5p7cxw7ofacblmctmjtgmhi6jq7g5wf77tx6befn2rjsfpedzkia:3:10:8328", "size": 8328 } ], A new version of Tahoe writing entries like this is constrained to making the primary key (the filename) be a valid unicode string (if it wants older Tahoe clients to be able to read the directory at all). However, it is not constrained about what new keys it may add to the "metadata" dict, which is where we propose to add the "failed_decode" flag and the "original_bytes". > Well, it's a high-dimensional problem. Keeping track of all the > variables is hard. Well put. > That's why something like PEP 383 can be important > to you even though it's only a partial solution; it eliminates one > variable. Would that it were so! The possibility that PEP 383 could help me or other like me is why I am trying so hard to explain what kind of help I need. :-) > > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux > > system and then you inspect the files in the Tahoe filesystem, > > such as by examining the web interface [1] or by running > > "tahoe ls", either of which you could do either from the same > > machine where you ran "tahoe cp" or from a different machine > > (which could be using any operating system). We have the > > following requirements about what ends up in your Tahoe directory > > after that cp -r. > > Whoa! Slow down! Where's "my" "Tahoe directory"? Do you mean the > directory listing? A copy to whatever system I'm on? The bytes that > the Tahoe host has just loaded into a network card buffer to tell me > about it? The bytes on disk at the Tahoe host? You'll find it a lot > easier to explain things if you adopt a precise, consistent > terminology. Okay here's some more detail. There exists a Tahoe directory, the bytes of which are encrypted, erasure-coded, and spread out over multiple Tahoe servers. (To the servers it is utterly opaque, since it is encrypted with a symmetric encryption key that they don't have.) A Tahoe client has the decryption key and it recovers the cleartext bytes. (Note: the internal storage format is not the json encoding shown above -- it is a custom format -- the json format above is what is produced to be exported through the API, and it serves as a useful example for e-mail discussions.) Then for each bytestring childname in the directory it decodes it with utf-8 to get the unicode childname. Does that all make sense? > > Requirement 1 (unicode): Each filename that you see needs to be valid > > unicode > > What does "see" mean? In directory listings? Yes, either with "tahoe ls", with a FUSE plugin, wht the web UI. Remove the trailing "?t=json" from the URL above to see an example. > Under
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
On Tue, May 5, 2009 at 8:57 AM, Stephen J. Turnbull wrote: > > 2. The specification should state, and the discussion emphasize, that > strings which were produced by surrogate replacement *must not* be > used in data interchange with systems that do not specifically > accept such strings, and that this is the responsibility of the > application.[2] That sounds like a useful statement to make. How would an application make sure that they were producing only valid unicode? How about add an option to os.listdir() named "errors" with default value 'utf8b' (or 'surrogate-replace', or whatever the name is)? Then applications which need to produce only valid unicode strings could pass errors=strict, errors=ignore, or errors=replace? (If anyone really wants behavior like Python 3.0 then we could perhaps also add a new one just for os.listdir() named errors=skipfilename.) My most recent plan for Tahoe, as of the letter that I sent last night, is to emulate the behavior of Nautilus and GNU ls by using the 'replace' error handler and (emulating Nautilus) to append " (invalid encoding)" to the end of the string. (screenshot: http://zooko.com/Nautilus_vs_invalid_encoding.png ) So if I could ask os.listdir to return filenames with U+FFFD in place of undecodable characters, then I could subsequently do something like: for f in os.listdir(d, errors='replace'): if u"\ufffd" in f: f += " (invalid encoding)" (On top of that I would have to check for collisions, but that's out of scope.) Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] .pth files are evil
.pth files are why I can't easily use GNU stow with easy_install. If installing a Python package involved writing new files into the filesystem, but did not require reading, updating, and re-writing any extant files such as .pth files, then GNU stow would Just Work with easy_install the way it Just Works with most things. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how GNU stow is complementary rather than alternative to distutils
following-up to my own post to mention one very important reason why anyone cares: On Sun, May 10, 2009 at 12:04 PM, Zooko Wilcox-O'Hearn wrote: > It is a beautiful, elegant hack because it is sooo dumb. It is also very > nice to use the same tool to manage packages written in any programming > language, provided only that they can build a directory tree of the right > shape and content. And, you are not relying on the author of the package that you are installing to avoid accidentally or maliciously screwing up your system. You're not even relying on the authors of the *build system* (e.g. the authors of distutils or easy_install). You are relying *only* on GNU stow to avoid accidentally or maliciously screwing up your system, and GNU stow is very dumb, so it is easy to understand what it is going to do and why that isn't going to irreversibly screw up your system. That is: you don't run the "build yourself and install into $prefix" step as root. This is an important consideration for a lot of people, who absolutely refuse on principle to ever run "sudo python ./setup.py" on a system that they care about unless they wrote the "setup.py" script themselves. (Likewise they refuse to run "sudo make install" on packages written in C.) Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] please consider changing --enable-unicode default to ucs4
Dear Pythonistas: This issue causes serious problems. Users occasionally get binaries built for a compatible Linux and Python version but with a different UCS2-vs-UCS4 setting, and those users get mysterious memory corruption errors which are hard to diagnose. It is possible that these situations also open up security vulnerabilities. A couple such instances are documented on http://bugs.python.org/setuptools/issue78, but you can find more by googling. I would like to get this problem fixed! In order to help address this issue I sampled what UCS size is used by python executables in the wild. I instrumented a few buildslaves that are contributed by various people to the Tahoe-LAFS project to print out their platform, python version, and sys.maxunicode. The full results are appended below. maxunicode: 1114111 means that python executable was configured with --enable-unicode=ucs4, and maxunicode: 65535 means that python executable was configured with --enable-unicode=ucs2 or just with --enable-unicode . The only incompatibilities that I found are because some packagers have deliberately set UCS4 configuration and other packagers have left the default setting. In the three cases where someone configured python with UCS2, one of the three is certainly an accident (a custom-built python executable on an Ubuntu server) and the other two just use the default instead of specifically configuring ucs2 in their configurations of Python and I suspect that they don't know the difference and that it was an accident that they built a Python incompatible with other distributions of their operating system. In sum, while it would be good to add the unicode setting to the platform's ABI (as discussed in setuptools ticket #78), it would also be good to make the default value be UCS4 instead of UCS2. This would fix all three of the potential incompatibilities that I found (listed below), and once we have proper inclusion of the unicode setting in the ABI in order to prevent the memory corruption, defaulting to UCS4 would increase the likelihood that a binary built on one distribution would be usable on another. I'm sure that someone can come up with a reason why UCS2 is better than UCS4, but I'm also sure that the benefits of compatibility outweigh any benefits of UCS2 encoding, and that the widespread use of UCS4 demonstrates that there is nothing fatally wrong with it, and that people who really value UCS2 encoding more than compatibility can choose that for themselves by explicitly setting UCS2. Let me restate that I am not suggesting taking away anyone's options, only making the setting for people who don't specify default to the compatible option. Hm, I guess that means that it should default to UCS2 on Windows and Mac and to UCS4 on Linux and Solaris. Regards, Zooko Ubuntu 6.10 "edgy" i386: python: 2.4.4c1 (#2, Mar 7 2008, 03:03:38) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)], maxunicode: 1114111 Ubuntu 7.04 "feisty": python: 2.5.1 (r251:54863, Jul 31 2008, 22:53:39) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)], maxunicode: 1114111 Ubuntu 7.10 "gutsy" i386: python: 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)], maxunicode: 1114111 Ubuntu 8.04 "hardy" amd64: python: 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)], maxunicode: 1114111 Ubuntu 8.04 "hardy" i386: *custom* python: 2.6 (r26:66714, Oct 2 2008, 13:40:28) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)], maxunicode: 65535 Ubuntu 8.04 "hardy" i386: python: 2.5.2 (r252:60911, Jul 22 2009, 15:35:03) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)], maxunicode: 1114111 Ubuntu 9.04 "jaunty" amd64: *custom* python: 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) [GCC 4.3.3], maxunicode: 1114111 Debian 4.0 "etch" i386: python: 2.4.4 (#2, Oct 22 2008, 19:52:44) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)], maxunicode: 1114111 Debian 5.0 "lenny" i386: python: 2.5.2 (r252:60911, Jan 4 2009, 17:40:26) [GCC 4.3.2], maxunicode: 1114111 Debian 5.0 "lenny" amd64: python: 2.5.2 (r252:60911, Jan 4 2009, 21:59:32) [GCC 4.3.2], maxunicode: 1114111 Debian 5.0 "lenny" armv5tel: python: 2.5.2 (r252:60911, Jan 5 2009, 02:00:00) [GCC 4.3.2], maxunicode: 1114111 Debian unstable "squeeze/sid" i386: python: 2.5.4 (r254:67916, Feb 17 2009, 20:16:45) [GCC 4.3.3], maxunicode: 1114111 Fedora 11 "leonidas" amd64: python: 2.6 (r26:66714, Jul 4 2009, 17:37:13) [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)], maxunicode: 1114111 ArchLinux: python: 2.6.2 (r262:71600, Jul 20 2009, 02:23:30) [GCC 4.4.0 20090630 (prerelease)], maxunicode: 65535 NetBSD 4: python: 2.5.2 (r252:60911, Mar 20 2009, 14:00:07) [GCC 4.1.2 20060628 prerelease (NetBSD nb2 20060711)], maxunicode: 65535 OpenSolaris SunOS-5.11-i86pc-i386-32bit: python: 2.4.4 (#1, Mar
Re: [Python-Dev] please consider changing --enable-unicode default to ucs4
I'm sorry, I should have mentioned that I did read those archives before I posted my letter. That discussion was all about whether UCS2 or UCS4 is better. I consider that question to be mostly irrelevant to this issue, which is about compatibility for people who don't choose to configure that setting themselves. Platforms or people who prefer UCS2 will continue to use it as appropriate. UCS4 is clearly good enough for the vast majority of Linux users, and having fewer mysterious segfaults and potential security vulnerabilities would be an important improvement to the user experience of Python on Linux. I should mention that the reason I'm spending time on this right now is that it is currently blocking me from being able to distribute binaries of Python packages which will work for all of my Linux users. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] please consider changing --enable-unicode default to ucs4
On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou wrote: > > What "binaries" are you talking about? I mean extension modules with native code, which means .so shared library files on unix. > AFAIK, C extensions should fail loading when they have the wrong UCS2/4 > setting. That would be an improvement! Unfortunately we instead get mysterious misbehavior of the module, e.g.: http://bugs.python.org/setuptools/msg309 http://allmydata.org/trac/tahoe/ticket/704#comment:5 > For information, all Mandriva versions I've used until now have had their > Python's built with UCS2 (maxunicode == 65535). Thank you for the data point. This means that binary extension modules built on Mandriva can't be ported to Ubuntu or vice versa. However, is this an argument for or against changing the default setting to UCS4? Changing the default setting wouldn't interfere with Mandriva's decision, right? Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] please consider changing --enable-unicode default to ucs4
On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou wrote: > For information, all Mandriva versions I've used until now have had their > Python's built with UCS2 (maxunicode == 65535). By the way, I was investigating this, and discovered an issue on the Mandriva tracker which suggests that they intend to switch to UCS4 in the next release in order to avoid compatibility problems like these. (Not because they think that UCS4 is better than UCS2.) https://qa.mandriva.com/show_bug.cgi?id=48570 Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] please consider changing --enable-unicode default to ucs4
Dear MAL and python-dev: I failed to explain the problem that users are having. I will try again, and this time I will omit my ideas about how to improve things and just focus on describing the problem. Some users are having trouble using Python packages containing binary extensions on Linux. I want to provide such binary Python packages for Linux for the pycryptopp project (http://allmydata.org/trac/pycryptopp ) and the zfec project (http://allmydata.org/trac/zfec ). I also want to make it possible for users to install the Tahoe-LAFS project (http://allmydata.org ) without having a compiler or Python header files. (You'd be surprised at how often Tahoe-LAFS users try to do this on Linux. Linux is no longer only for people who have the knowledge and patience to compile software themselves.) Tahoe-LAFS also depends on many packages that are maintained by other people and are not packaged or distributed by me -- pyOpenSSL, simplejson, etc.. There have been several hurdles in the way that we've overcome, and no doubt there will be more, but the current hurdle is that there are two "formats" for Python extension modules that are used on Linux -- UCS2 and UCS4. If a user gets a Python package containing a compiled extension module which was built for the wrong UCS2/4 setting, he will get mysterious (to him) "undefined symbol" errors at import time. On Mon, Sep 28, 2009 at 2:25 AM, M.-A. Lemburg wrote: > > The Python default is UCS2 for a good reason: it's a good trade-off > between memory consumption, functionality and performance. I'm sure you are right about this. At some point I will try to measure the performance implications in the context of our application. I don't think it will be an issue for us, as so far no users have complained about any performance or functionality problems that were traceable to the choice of UCS2/4. > As already mentioned, I also don't understand how the changing > the Python default on Linux would help your users in any way - > if you let distutils compile your extensions, it's automatically > going to use the right Unicode setting for you (as well as your > users). My users are using some Python packages built by me and some built by others. The binary packages they get from others could have the incompatible UCS2/4 setting. Also some of my users might be using a python configured with the opposite setting of the python interpreter that I use to build packages. > Unfortunately, this automatic support doesn't help you when > shipping e.g. setuptools eggs, but this is a tool problem, > not one of Python: setuptools completely ignores the fact > that there are two ways to build Python. This is the setuptools/distribute issue that I mentioned: http://bugs.python.org/setuptools/issue78 . If that issue were solved then if a user tried to install a specific package, for example with a command-line like "easy_install http://allmydata.org/source/tahoe/deps/tahoe-dep-eggs/pyOpenSSL-0.8-py2.5-linux-i686.egg";, then instead of getting an undefined symbol error at import time, they would get an error message to the effect of "This package is not compatible with your Python interpreter." at install time. That would be good because it would be less confusing to the users. However, if they were using the default setuptools/distribute dependency-satisfaction feature, e.g. because they are installing a package and that package is marked as "install_requires=['pyOpenSSL']", then setuptools/distribute would do its fallback behavior in which it attempts to compile the package from source when it can't find a compatible binary package. This would probably confuse the users at least as much as the undefined symbol error currently does. In any case, improving the tools to handle incompatible packages nicely would not make more packages compatible. Let's do both! Improve tools to handle incompatible packages nicely, and encourage everyone who compiles python on Linux to use the same UCS2/4 setting. Thank you for your attention. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [New-bugs-announce] [issue7064] Python 2.6.3 / setuptools 0.6c9: extension module builds fail with KeyError
Here are three buildbot farms for three different projects that exercise various features of setuptools: build, install, sdist_dsc, bdist_egg, sdist, and various specific requirements that our projects have, such as the "Desert Island Build" in which setuptools is not allowed to download anything from the Internet at build time or else it flunks the test. http://allmydata.org/buildbot/waterfall http://allmydata.org/buildbot-pycryptopp/waterfall http://allmydata.org/buildbot-zfec/waterfall I would love it if new versions of setuptools/Distribute would make more of these tests pass on more platforms or at least avoid causing any regressions in these tests on these platforms. Unfortunately, we can't really deploy new versions of setuptools/Distribute to this buildbot farm in order to re-run all the tests, because the only way that I know of to trigger all the tests is to make a commit to our central darcs repository for Tahoe-LAFS, pycryptopp, or zfec, and I don't want to do that to experiment with new versions of setuptools/Distribute. Does anyone know how to use a buildbot farm like this one to run tests without committing patches to the central repository? Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] a new setuptools release?
+1 For a large number of people [1, 2, 3], setuptools is already a critical part of Python. Make it official. Let everyone know that future releases of Python will not break setuptools/Distribute, and that they can rely on backwards-compatibility with the myriad existing packages. Make the next release of the distutils standard lib module be Distribute. (Perhaps some people will complain, but they can channel their energy into improving the new distutils.) Regards, Zooko [1] The majority of professional developers using Python rely on setuptools to distribute and to use packages: http://tarekziade.wordpress.com/2009/03/26/packaging-survey-first-results/ [2] setuptools is one of the most downloaded packages on PyPI: http://pypi.python.org/pypi/setuptools http://blog.delaguardia.com.mx/tags/pypi [3] about one fifth of Debian users who install python2.5 also install python-pkg-resources: http://qa.debian.org/popcon.php?package=python-setuptools http://qa.debian.org/popcon.php?package=python2.5 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] please consider changing --enable-unicode default to ucs4
Folks: I accidentally sent this letter just to MAL when I intended it to python-dev. Please read it, as it explains why the issue I'm raising is not just the "we should switch to ucs4 because it is better" issue that was previously settled by GvR. This is a current, practical problem that is preventing people from distributing and using Python packages with binary extension modules on Linux. Regards, Zooko -- Forwarded message ------ From: Zooko O'Whielacronx Date: Sun, Sep 27, 2009 at 11:43 AM Subject: Re: [Python-Dev] please consider changing --enable-unicode default to ucs4 To: "M.-A. Lemburg" Folks: I'm sorry, I think I didn't make my concern clear. My users, and lots of other users, are having a problem with incompatibility between Python binary extension modules. One way to improve the situation would be if the Python devs would use their "bully pulpit" -- their unique position as a source respected by all Linux distributions -- and say "We recommend that Linux distributions use UCS4 for compatibility with one another". This would not abrogate anyone's ability to choose their preferred setting nor, as far as I can tell, would it interfere with the ongoing development of Python. Here are the details: I'm the maintainer of several Python packages. I work hard to make it easy for users, even users who don't know anything about Python, to use my software. There have been many pain points in this process and I've spent a lot of time on it for about three years now working on packaging, including the tools such as setuptools and distutils and the new "distribute" tool. Python packaging has been improving during these years -- things are looking up. One of the remaining pain points is that I can distribute binaries of my Python extension modules for Windows or Mac, but if I distribute a binary Python extension module on Linux, then if the user has a different UCS2/UCS4 setting then they won't be able to use the extension module. The current de facto standard for Linux is UCS4 -- it is used by Debian, Ubuntu, Fedora, RHEL, OpenSuSE, etc. etc.. The vast majority of Linux users in practice have UCS4, and most binary Python modules are compiled for UCS4. That means that a few folks will get left out. Those folks, from my experience, are people who built their python executable themselves without specifying an override for the default, and the smaller Linux distributions who insist on doing whatever upstream Python devs recommend instead of doing whatever the other Linux distros are doing. One of the data points that I reported was a Python interpreter that was built locally on an Ubuntu server. Since the person building it didn't know to override the default setting of --enable-unicode, he ended up with a Python interpreter built for UCS2, even though all the Python extension modules shipped by Ubuntu were built with UCS4. These are not isolated incidents. The following google searches suggest that a number of people spend time trying to figure out why Python extension modules fail on their linux systems: http://www.google.com/search?q=PyUnicodeUCS4_FromUnicode+undefined+symbol http://www.google.com/search?q=+PyUnicodeUCS2_FromUnicode+undefined+symbol http://www.google.com/search?q=_PyUnicodeUCS2_AsDefaultEncodedString+undefined+symbol Another data point is the Mandriva Linux distribution. It is probably much smaller than Debian, Ubuntu, or RedHat, but it is still one of the major, well-known distributions. I requested of the Python maintainer for Mandriva, Michael Scherer, that they switch from UCS2 to UCS4 in order to reduce compatibility problems like these. His answer as I understood it was that it is best to follow the recommendations of the upstream Python devs by using the default setting instead of choosing a setting for himself. (Now we could implement a protocol which would show whether a given Python package was compiled for UCS2 or UCS4. That would be good. Hopefully it would make incompatibility more explicit and understandable to users. Here is a ticket for that -- which project I am contributing to: http://bugs.python.org/setuptools/issue78 . However, even if we implement that feature in the distribute tool (the successor to setuptools), users who build their own python or who use a Linux distribution that follows upstream configuration defaults will still be unable to use most Python packages with compiled extension modules.) In a message on this thread, MvL wrote: > "For that reason I think it's also better that the configure script > continues to default to UTF-16 -- this will give the UTF-16 support > code the necessary exercise." > > This is effectively a BDFL pronouncement. Nothing has changed the > validity of the premise of the statement, so the conclusion remains > valid, as well. My understand of
Re: [Python-Dev] a new setuptools release?
Thanks for the reply, MAL. How would we judge whether Distribute is ready for inclusion in the Python standard lib? Maybe if it has a few more releases, leaving a trail of "closed: fixed" issue tickets behind it? Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python byte-compiled and optimized code
You might be interested in the new PYTHONDONTWRITEBYTECODE environment variable supported as of Python 2.6. I personally think it is a great improvement. :-) Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.6.4rc2
Barry: Do you know anything about this alleged regression in 2.6.3 with regard to the __doc__ property? https://bugs.edge.launchpad.net/ubuntu/+source/boost1.38/+bug/457688 Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible language summit topic: buildbots
Right, how do developers benefit from a buildbot? >From my experience (five large buildbots with many developers plus two with only a couple of developers), a buildbot does little good unless the tests are reliable and not too noisy. "Reliable" is best achieved by having tests be deterministic and reproducible. "Not too noisy" means that the builders are all green all the time (at least for a "supported" subset of the buildslaves). Beyond that, then I think there has to be a culture change where the development team decides that it is really, really not okay to leave a builder red after you turned it red, and that instead you need to revert the patch that made it go from green to red before you do anything else. It has taken me a long time to acculturate to that and I wouldn't expect most people to do it quickly or easily. (It is interesting to think of what would happen if that policy were automated -- any patch which caused any "supported" builder to go from green to red would be automatically be reverted.) Also, of course, this is mostly meaningless unless the code that is being changed by the patches is well-covered by tests. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?
Folks: I really don't want to make anyone feel bad or to criticize, but I should mention that I have no plans to use Python 3 or to support Python 3. My best guess at this time is that the current projects that I'm involved in will still require Python 2 for the forseeable future (let's say 5 years. I can see 5 years into the future.), and that as I start new projects I will probably try out interesting alternative programming languages like Haskell, Newspeak [1], Jacaranda [2], and other new things that appear in the coming years. Of course, I reserve the right to change my mind and start using and supporting Python 3. That might happen if there is some combination of: 1. my users start asking for it (no-one has yet), 2. my dependencies start providing it (I use Python because it has Twisted. Twisted requires Python 2.), 3. it becomes more possible for me to write code which is still Python-2-compatible and also is more and more close to being Python-3-compatible. By the way, one significant detail which makes Python 3 less interesting to me is [3]. Those two languages that I mentioned -- Newspeak and Jacaranda -- both have object-capability nature. If that issue in [3] were fixed then Python 3 would join Python 2 as a language that can (with the CapPython extension) have object-capability nature. Regards, Zooko [1] http://bracha.org/Site/Newspeak.html [2] http://jacaranda.org [3] http://lackingrhoticity.blogspot.com/2008/09/cappython-unbound-methods-and-python-30.html --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- http://allmydata.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?
Folks: It occurred to me to wonder why I haven't investigated how hard it would be to make my Python packages Python-3-compatible. That's right -- I haven't even looked closely. I couldn't even tell you off the top of my head what is in Python 3 that I would have to think about except for the new unicode regime. I think the answer is that the payoff is just *so* low to me at this point that it doesn't even justify me taking 15 minutes to read "What's New In Python 3" or to execute 2to3 on my smallest package and see what it does. On the other hand, I'm totally committed to supporting Python 2.7, because my customers will demand it and because I expect that it will be easy. So, if you guys slip in your favorite new Python 3 feature into 2.7 and add a deprecation warning for your least favorite Python 2 misfeature, then probably within about 24 months I'll have fixed all code that uses the deprecated feature, and probably within about five years I'll consider dropping backwards-compatibility with Python 2.6 and starting to use that new feature that you added to Python 2.7. (I'm currently considering dropping Python 2.4 compatibility for the next releases of most of my code.) Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] mingw support?
On Sat, Aug 7, 2010 at 2:14 PM, Steve Holden wrote: > There have certainly been demonstrations that Python can be compiled > with mingw, but as far as I am aware what's missing is a developer > sufficiently motivated to integrate that build system into the > distributions and maintain it. It looks like quite a lot of activity on http://bugs.python.org/issue3871 . I find it surprising that nobody mentioned it before on this thread. Perhaps nobody who has been posting to this thread was aware of this activity. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Goodbye
Speaking as a frequent contributor of bug reports and an occasional contributor of patches, I must say that I feel like status quo of the tracker before Mark's work was discouraging. If there is a vast collection of abandoned tickets, it gives me the strong impression that my attempted contributions are likely to end up in that pile. The messages I got from the tracker due to Mark's work saying things like "This ticket is closed due to inactivity." or "Would you be interested in refreshing this patch?" started to get me interested in contributing again. Also, I would like to point out that, not having read the other traffic that this thread alludes to, either from earlier mailing list threads or from IRC, I don't really understand what exactly Mark did wrong. The complaints about his behavior on this thread seem to be a little... non-specific. Did he continue to close tickets after he was asked not to do so? I didn't see any quotes or timestamps showing what happened or when. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
On May 6, 2009, at 7:33 AM, Stephen J. Turnbull wrote: You have convinced me that the PEP should wait as well. In its current form it is incomplete and dangerous. +1 on delaying PEP 383 I think PEP 383 is a good idea in principle, but I'm still struggling to understand it myself, and it seems to offer new hazards for the unwary programmer. On the other hand, maybe the wary programmers are waiting for Python 3.2 anyway . On the gripping hand, if PEP 383 is released in Python 3.1, will that obligate python-dev to support it indefinitely, at least in backwards- compatibility mode? I'm not thinking of API compatibility as much as data compatibility -- someone used Python 3.1 to write down some filenames, and now a few years later they are trying to use the latest and greatest Python release to read those filenames... Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote: Zooko Wilcox-O'Hearn zooko.com> writes: I'm not thinking of API compatibility as much as data compatibility -- someone used Python 3.1 to write down some filenames, and now a few years later they are trying to use the latest and greatest Python release to read those filenames... Well, if the filenames are generated by Python (as opposed to read from an existing directory on disk), they should be regular unicode objects without any lone surrogates, so I don't see the compatibility problem. I meant that the application reads filenames from an existing directory on disk, saves those filenames, and then later, using a future version of Python, wants to read them and use them. I'm not saying that I know this would be a problem. I'm saying that I personally can't tell whether it would be a problem or not, and the extensive discussions so far have not convinced me that there is anyone who both understands PEP 383 and considers this use case. Many people who apparently understand encoding issues well have said something to the effect that there is no problem, but those people haven't yet managed to get through my thick skull how I would use PEP 383 safely for this sort of use case -- the one where data generated by os.listdir() travels forward in time or the one were that data travels sideways to other systems, including Windows or other systems that validate incoming unicode. That's why I am a bit uncomfortable about PEP 383 being quickly implemented and deployed in Python 3.1. By the way, much of the detailed discussion about what Tahoe requires and how that may or may not benefit from PEP 383 has now moved to the tahoe-dev mailing list: http://allmydata.org/cgi-bin/mailman/listinfo/ tahoe-dev . Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] .pth files are evil
On May 9, 2009, at 9:39 AM, P.J. Eby wrote: It would be really straightforward, though, for someone to implement an easy_install variant that does this. Just invoke "easy_install -Zmaxd /some/tmpdir packagelist" to get a full set of unpacked .egg directories in /some/tmpdir, and then move the contents of the resulting .egg subdirs to the target location, renaming EGG-INFO subdirs to projectname-version.egg-info subdirs. Except for the renaming part, this is exactly what GNU stow does. (Of course, this ignores the issue of uninstalling previous versions, or overwriting of conflicting files in the target -- does pip handle these?) GNU stow does handle these issues. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] how GNU stow is complementary rather than alternative to distutils
On May 10, 2009, at 11:18 AM, Martin v. Löwis wrote: If GNU stow solves all your problems, why do you want to use easy_install in the first place? That's a good question. The answer is that there are two separate jobs: building executables and putting them in a directory structure of the appropriate shape for your system is one job, and installing or uninstalling that tree into your system is another. GNU stow does only the latter. The input to GNU stow is a set of executables, library files, etc., in a directory tree that is of the right shape for your system. For example, if you are on a Linux system, then your scripts all need to be in $prefix/bin/, your shared libs should be in $prefix/lib, your Python packages ought to be in $prefix/lib/python$x.$y/site- packages/, etc. GNU stow is blissfully ignorant about all issues of building binaries, and choosing where to place files, etc. -- that's the job of the build system of the package, e.g. the "./configure -- prefix=foo && make && make install" for most C packages, or the "python ./setup.py install --prefix=foo" for Python packages using distutils (footnote 1). Once GNU stow has the well-shaped directory which is the output of the build process, then it follows a very dumb, completely reversible (uninstallable) process of symlinking those files into the system directory structure. It is a beautiful, elegant hack because it is sooo dumb. It is also very nice to use the same tool to manage packages written in any programming language, provided only that they can build a directory tree of the right shape and content. However, there are lots of things that it doesn't do, such as automatically acquiring and building dependencies, or producing executables for the target platform for each of your console scripts. Not to mention creating a directory named "$prefx/lib/python $x.$y/site-packages" and cp'ing your Python files into it. That's why you still need a build system even if you use GNU stow for an install-and-uninstall system. The thing that prevents this from working with setuptools is that setuptools creates a file named easy_install.pth during the "python ./ setup.py install --prefix=foo" if you build two different Python packages this way, they will each create an easy_install.pth file, and then when you ask GNU stow to link the two resulting packages into your system, it will say "You are asking me to install two different packages which both claim that they need to write a file named '/usr/local/lib/python2.5/site-packages/easy_install.pth'. I'm too dumb to deal with this conflict, so I give up.". If I understand correctly, your (MvL's) suggestion that easy_install create a .pth file named "easy_install-$PACKAGE-$VERSION.pth" instead of "easy_install.pth" would indeed make it work with GNU stow. Regards, Zooko footnote 1: Aside from the .pth file issue, the other reason that setuptools doesn't work for this use while distutils does is that setuptools tries to hard to save you from making a mistake: maybe you don't know what you are doing if you ask it to install into a previously non-existent prefix dir "foo". This one is easier to fix: http://bugs.python.org/setuptools/issue54 # "be more like distutils with regard to --prefix=" . ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] ctime: I don't think that word means what you think it means.
The stat module uses the "st_ctime" slot to hold two kinds of values which are semantically different and which are frequently confused with one another. It chooses which kind of value to put in there based on platform -- Windows gets the file creation time and all other platforms get the "ctime". The only sane way to use this API is then to switch on platform: if platform.system() == "Windows": metadata["creation time"] = s.st_ctime else: metadata["unix ctime"] = s.st_ctime (That is an actual code snippet from the Allmydata-Tahoe project.) Many or even most programmers incorrectly think that unix ctime is file creation time, so instead of using the sane idiom above, they write the following: metadata["ctime"] = s.st_ctime thus passing on the confusion to the users of their metadata, who may not be able to tell on which platform this metadata was created. This is the situation we have found ourselves in for the Allmydata-Tahoe project -- we now have a bunch of "ctime" values stored in our filesystem and no way to tell which kind they were. More and more filesystems such as ZFS and Mac HFS+ apparently offer creation time nowadays. I propose the following changes: 1. Add a "st_crtime" field which gets populated on filesystems (Windows, ZFS, Mac) which can do so. That is hopefully not too controversial and we could proceed to do so even if the next proposal gets bogged down: 2. Add a "st_unixctime" field which gets populated *only* by the unix ctime and never by any other value (even on Windows, where the unix ctime *is* available even though nobody cares about it), and deprecate the hopelessly ambiguous "st_ctime" field. You may be interested in http://allmydata.org/trac/tahoe/ticket/628 ("mtime" and "ctime": I don't think that word means what you think it means.) where the Allmydata-Tahoe project is carefully unpicking the mess we made for ourselves by confusing ctime with file-creation time. This is ticket http://bugs.python.org/issue5720 . Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drop the new time.wallclock() function?
> I merged the two functions into one function: time.steady(strict=False). > > time.steady() should be monotonic most of the time, but may use a fallback. > > time.steady(strict=True) fails with OSError or NotImplementedError if > reading the monotonic clock failed or if no monotonic clock is available. If someone wants time.steady(strict=False), then why don't they just continue to use time.time()? I want time.steady(strict=True), and I'm glad you're providing it and I'm willing to use it this way, although it is slightly annoying because "time.steady(strict=True)" really means "time.steady(i_really_mean_it=True)". Else, I would have used "time.time()". I am aware of a large number of use cases for a steady clock (event scheduling, profiling, timeouts), and a large number of uses cases for a "NTP-respecting wall clock" clock (calendaring, displaying to a user, timestamping). I'm not aware of any use case for "steady if implemented, else wall-clock", and it sounds like a mistake to me. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drop the new time.wallclock() function?
On Fri, Mar 23, 2012 at 11:27 AM, Victor Stinner wrote: > > time.steady(strict=False) is what you need to implement timeout. No, that doesn't fit my requirements, which are about event scheduling, profiling, and timeouts. See below for more about my requirements. I didn't say this explicitly enough in my previous post: Some use cases (timeouts, event scheduling, profiling, sensing) require a steady clock. Others (calendaring, communicating times to users, generating times for comparison to remote hosts) require a wall clock. Now here's the kicker: each use case incur significant risks if it uses the wrong kind of clock. If you're implementing event scheduling or sensing and control, and you accidentally get a wall clock when you thought you had a steady clock, then your program may go seriously wrong -- events may fire in the wrong order, measurements of your sensors may be wildly incorrect. This can lead to serious accidents. On the other hand, if you're implementing calendaring or display of "real local time of day" to a user, and you are using a steady clock for some reason, then you risk displaying incorrect results to the user. So using one kind of clock and then "falling back" to the other kind is a choice that should be rare, explicit, and discouraged. The provision of such a function in the standard library is an attractive nuisance -- a thing that people naturally think that they want when they haven't though about it very carefully, but that is actually dangerous. If someone has a use case which fits the "steady or else fall back to wall clock" pattern, I would like to learn about it. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 418: Add monotonic clock
> system_clock = wall clock time > monotonic_clock = always goes forward but can be adjusted > steady_clock = always goes forward and cannot be adjusted > high_resolution_clock = steady_clock || system_clock Note that the C++ standard deprecated monotonic_clock once they realized that there is absolutely no point in having a clock that jumps forward but not back, and that none of the operating systems implement such a thing -- instead they all implement a clock which doesn't jump in either direction. http://stackoverflow.com/questions/6777278/what-is-the-rationale-for-renaming-monotonic-clock-to-steady-clock-in-chrono In other words, yes! +1! The C++ standards folks just went through the process that we're now going through, and if we do it right we'll end up at the same place they are: http://en.cppreference.com/w/cpp/chrono/system_clock """ system_clock represents the system-wide real time wall clock. It may not be monotonic: on most systems, the system time can be adjusted at any moment. It is the only clock that has the ability to map its time points to C time, and, therefore, to be displayed. steady_clock: monotonic clock that will never be adjusted high_resolution_clock: the clock with the shortest tick period available """ Note that we don't really have the option of providing a clock which is "monotonic but not steady" in the sense of "can jump forward but not back". It is a misunderstanding (doubtless due to the confusing name "monotonic") to think that such a thing is offered by the underlying platforms. We can choose to *call* it "monotonic", following POSIX instead of calling it "steady", following C++. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drop the new time.wallclock() function?
On Mon, Mar 26, 2012 at 5:07 PM, Victor Stinner wrote: >> >> If someone has a use case which fits the "steady or else fall back to wall >> clock" pattern, I would like to learn about it. > > Python 3.2 doesn't provide a monotonic clock, so most program uses > time.time() even if a monotonic clock would be better in some functions. For > these programs, you can replace time.time() by time.steady() where you need > to compute a time delta (e.g. compute a timeout) to avoid issues with the > system clock update. The idea is to improve the program without refusing to > start if no monotonic clock is available. I agree that this is a reasonable use case. I think of it as basically being a kind of backward-compatibility, for situations where an unsteady clock is okay, and a steady clock isn't available. Twisted faces a similar issue: http://twistedmatrix.com/trac/ticket/2424 It might good for use cases like this to explicitly implement the try-and-fallback, since they might have specific needs about how it is done. For one thing, some such uses may need to emit a warning, or even to require the caller to explicitly override, such a refusing to start if a steady clock isn't available unless the user specifies "--unsteady-clock-ok". For motivating examples, consider software written using Twisted > 12.0 or Python > 3.2 which is using a clock to drive real world sensing and control -- measuring the position of a machine and using time deltas to calculate the machine's velocity, in order to automatically control the motion of the machine. For some uses, it is okay if the measurement could, in rare cases, be drastically wrong. For other uses, that is not an acceptable risk. One reason I'm sensitive to this issue is that I work in the field of security, and making the behavior dependent on the system clock extends the "reliance set", i.e. the set of things that an attacker could use against you. For example, if your robot depends on the system clock for its sensing and control, and if your system clock obeys NTP, then the set of things that an attacker could use against you includes your NTP servers. If your robot depends instead on a steady clock, then NTP servers are not in the reliance set. Now, if your control platform doesn't have a steady clock, you may choose to go ahead, while making sure that the NTP servers are authenticated, or you may choose to disable NTP on the control platform, etc., but that choice might need to be made explicitly by the operator, rather than automatically by the library. Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 418 is too divisive and confusing and should be postponed
Folks: Good job, Victor Stinner on baking the accumulated knowledge of this thread into PEP 418. Even though I'm very interested in the topic, I haven't been able to digest the whole thread(s) on the list and understand what the current collective understanding is. The detailed PEP document helps a lot. I think there are still some mistakes, either in our collective understanding as reflected by the PEP, or in my own head. For starters, I still don't understand the first, most basic thing: what do people mean when they say "monotonic clock"? I don't understand the current text of PEP 418 with regard to the definition of that word. Allow me to resort to an analogy. There is an infinitely long, perfectly straight and flat racetrack. There is a flag that gets dragged along it at a constant rate, with the label "REAL TIME" on the flag. There are some runners, each with a different label on their chest: Runner A: a helicopter hovers over Runner A. Occasionally it picks him up and plops him down right next to the flag. Also, he wears a headset and listens to instructions from his coach to run a little faster or slower, as necessary, to remain abreast of the flag. Runner B: a helicopter hovers over Runner B. If he is behind the flag, it will pick him up and plop him down right next to the flag. However, if he is ahead of the flag it will not pick him up. Runner C: no helicopter ever picks up Runner C, but he does wear a headset and listens to instructions from his coach to run a little faster or a little slower. His coach tells him to run a little faster if he is behind the flag or run a little slower if he is in front of the flag, with the goal of eventually having him right next to the flag. Runner D: like Runner C, he never gets picked up, but he listens to instructions to run a little faster or a little slower. However, instead of telling him to run faster in order to catch up to the flag, or to run slower in order to "catch down" to the flag, his coach instead tells him to run a little faster if he is moving slower than the flag is moving, and to run a little slower if he is moving faster than the flag is moving. Note that this is very different from Runner C, in that it is not intended to cause him to eventually be right next to the flag, and indeed if it is done right it guarantees that he will *never* be right next to the flag, although he will be moving just as fast as the flag is moving. Runner E: no helicopter, no headset. He just proceeds at his own pace, blissfully unaware of the exhortations of others. Now: which ones of these five runners do you call "monotonic"? Which ones do you call "steady"? Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] this is why we shouldn't call it a "monotonic clock" (was: PEP 418 is too divisive and confusing and should be postponed)
On Thu, Apr 5, 2012 at 7:14 PM, Greg Ewing wrote: > > This is the strict mathematical meaning of the word "monotonic", but the way > it's used in relation to OS clocks, it seems to mean rather more than that. Yep. As far as I can tell, nobody has a use for an unsteady, monotonic clock. There seem to be two groups of people: 1. Those who think that "monotonic clock" means a clock that never goes backwards. These people are in the majority. After all, that's what the word "monotonic" means ¹ . However, a clock which guarantees *only* this is useless. 2. Those who think that "monotonic clock" means a clock that never jumps, and that runs at a rate approximating the rate of real time. This is a very useful kind of clock to have! It is what C++ now calls a "steady clock". It is what all the major operating systems provide. The people in class 1 are more correct, technically, and far more numerous, but the concept from 1 is a useless concept that should be forgotten. So before proceeding, we should mutually agree that we have no interest in implementing a clock of type 1. It wouldn't serve anyone's use case (correct me if I'm wrong!) and the major operating systems don't offer such a thing anyway. Then, if we all agree to stop thinking about that first concept, then we need to agree whether we're all going to use the word "monotonic clock" to refer to the second concept, or if we're going to use a different word (such as "steady clock") to refer to the second concept. I would prefer the latter, as it will relieve us of the need to repeatedly explain to newcomers: "That word doesn't mean what you think it means.". The main reason to use the word "monotonic clock" to refer to the second concept is that POSIX does so, but since Mac OS X, Solaris, Windows, and C++ have all avoided following POSIX's mistake, I think Python should too. Regards, Zooko ¹ http://mathworld.wolfram.com/MonotonicSequence.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of packaging in 3.3
On Thu, Jun 21, 2012 at 12:57 AM, Nick Coghlan wrote: > > Standard assumptions about the behaviour of site and distutils cease to be > valid once setuptools is installed … > - advocacy for the "egg" format and the associated sys.path changes that > result for all Python programs running on a system … > System administrators (and developers that think like system administrators > when it comes to configuration management) *hate* what setuptools (and > setuptools based installers) can do to their systems. I have extensive experience with this, including quite a few bug reports and a few patches in setuptools and distribute, plus maintaining my own fork of setuptools to build and deploy my own projects, plus interviewing quite a few Python developers about why they hated setuptools, plus supporting one of them who hates setuptools even though he and I use it in a build system (https://tahoe-lafs.org). I believe that 80% to 90% of the hatred alluded to above is due to a single issue: the fact that setuptools causes your Python interpreter to disrespect the PYTHONPATH, in violation of the documentation in http://docs.python.org/release/2.7.2/install/index.html#inst-search-path , which says: """ The PYTHONPATH variable can be set to a list of paths that will be added to the beginning of sys.path. For example, if PYTHONPATH is set to /www/python:/opt/py, the search path will begin with ['/www/python', '/opt/py']. (Note that directories must exist in order to be added to sys.path; the site module removes paths that don’t exist.) """ Fortunately, this issue is fixable! I opened a bug report and I and a others have provided patches that makes setuptools stop doing this behavior. This makes the above documentation true again. The negative impact on features or backwards-compatibility doesn't seem to be great. http://bugs.python.org/setuptools/issue53 Philip J. Eby provisionally approved of one of the patches, except for some specific requirement that I didn't really understand how to fix and that now I don't exactly remember: http://mail.python.org/pipermail/distutils-sig/2009-January/010880.html Regards, Zooko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com