Re: [Python-Dev] Integrate BeautifulSoup into stdlib?
On Mar 4, 2009, at 9:56 AM, Chris Withers wrote: Vaibhav Mallya wrote: We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup. Interesting, given that BeautifulSoup is built on HTMLParser ;-) I think html5lib would be a better candidate for an imrpoved HTML parser in the stdlib than BeautifulSoup. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Formatting mini-language suggestion
On Mar 11, 2009, at 9:06 PM, Nick Coghlan wrote: Raymond Hettinger wrote: The current formatting mini-language provisions left/right/center alignment, prefixes for 0b 0x 0o, and rules on when to show the plus-sign. I think it would be far more useful to provision a simple way of specifying a thousands separator. Financial users in particular find the locale approach to be frustrating and non-obvious. Putting in a thousands separator is a common task for output destined to be read by non-programmers. +1 for the general idea. A specific syntax proposal: [[fill]align][sign][#][0][minimumwidth][,sep][.precision][type] 'sep' is the new field that defines the thousands separator. It appears immediately before the precision specifier and starts with a leading comma. I believe this syntax is unambiguous and backwards compatible because the only other place a comma might appear (the fill field) is required to be followed by an alignment character. You might be interested to know that in India, the commas don't come every 3 digits. In india, they come every two digits, after the first three. Thus one billion = 1,00,00,00,000. How are you gonna represent *that* in a formatting mini-language? :) See also http://en.wikipedia.org/wiki/Indian_numbering_system James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Formatting mini-language suggestion
On Mar 11, 2009, at 11:40 PM, Nick Coghlan wrote: Raymond Hettinger wrote: It is not the goal to replace locale or to accomodate every possible convention. The goal is to make a common task easier for many users. The current, default use of the period as a decimal point has not proven to be problem eventhough that convention is not universal. For a thousands separator, a comma is a decent choice that makes it easy follow-on with s.replace(',', '_') or somesuch. In that case, I would simplify my suggestion to: [[fill]align][sign][#][0][minimumwidth][,][.precision][type] Addition to mini language documentation: The ',' option indicates that commas should be included in the output as a thousands separator. As with locales which do not use a period as the decimal point, locales which use a different convention for digit separation will need to use the locale module to obtain appropriate formatting. This proposal has the advantage that you're not overly specifying the behavior in the format string itself. That is: the "," option is really just indicating "please insert separators". With the current locale-ignorant implementation, that'd just mean "a comma every 3 digits". But it leaves the door open for a locale-sensitive variant of the format to be added in the future without conflicting with the instructions in the format string. (as the ability to specify an arbitrary character, or the ability to specify a comma instead of a period for the decimal point would). I'm not against Raymond's proposal, just against doing a *bad* job of making it work in multiple locales. Locale conventions can be complex, and are going to be best represented outside the format string. (BTW: single quote is used by printf for the grouping flag rather than comma) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible py3k io wierdness
On Apr 5, 2009, at 6:29 AM, Antoine Pitrou wrote: Brian Quinlan sweetapp.com> writes: I don't see why this is helpful. Could you explain why _RawIOBase.close() calling self.flush() is useful? I could not explain it for sure since I didn't write the Python version. I suppose it's so that people who only override flush() automatically get the flush-on-close behaviour. It seems that a separate method "_internal_close" should've been defined to do the actual closing of the file, and the close() method should've been defined on the base class as "self.flush(); self._internal_close()" and never overridden. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
On Apr 9, 2009, at 10:38 PM, Barry Warsaw wrote: So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. As I said in the thread having nearly the same exact discussion on web- sig, except about WSGI headers... What should this return: >>> message['Subject'] The raw bytes or the decoded unicode? Until you write a parser for every header, you simply cannot decode to unicode. The only sane choices are: 1) raw bytes 2) parsed structured data There's no "decoded to unicode but not parsed" option: that's doing things in the wrong order. If you RFC2047-decode the header before doing tokenization and parsing, you will just have a *broken* implementation. Here's an example where it matters. If you decode the RFC2047 part before parsing, you'd decide that there's two recipients to the message. There aren't. ", " is the display-name of "act...@example.com", not a second recipient. To: =?UTF-8?B?PGJyb2tlbkBleGFtcGxlLmNvbT4sIA==?= Here's a quote from RFC2047: NOTE: Decoding and display of encoded-words occurs *after* a structured field body is parsed into tokens. It is therefore possible to hide 'special' characters in encoded-words which, when displayed, will be indistinguishable from 'special' characters in the surrounding text. For this and other reasons, it is NOT generally possible to translate a message header containing 'encoded- word's to an unencoded form which can be parsed by an RFC 822 mail reader. And another quote for good measure: (2) Any header field not defined as '*text' should be parsed according to the syntax rules for that header field. However, any 'word' that appears within a 'phrase' should be treated as an 'encoded-word' if it meets the syntax rules in section 2. Otherwise it should be treated as an ordinary 'word'. Now, I suppose there's also a third possibility: 3) US-ASCII-only strings, unmolested except for doing a .decode('ascii'). That'll give you a string all right, but it's really just cheating. It's not actually a text string in any meaningful sense. (in all this I'm assuming your question is not about the "Subject" header in particular; that is of course just unstructured text so the parse step doesn't actually do anything...). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
On Apr 13, 2009, at 10:11 AM, Barry Warsaw wrote: The email package does not need a parser for every header, but it should provide a framework that applications (or third party libraries) can use to extend the built-in header parsers. A bare minimum for functionality requires a Content-Type parser. I think the email package should also include an address header (Originator, Destination) parser, and a Message-ID header parser. Possibly others. Sure, that's fine... The default would probably be some unstructured parser for headers like Subject. But for unknown headers, it's not a useful choice to return a "str" object. "str" is just one possible structured data representation for a header: there's no correct useful decoding of all headers into str. Of course for the "Subject" header, str is the correct result type, but that's not a default, that's explicit support for "Subject". You can't correctly decode "To" into a str, so what makes you think you can decode "X-Gabazaborph" into str? The only useful and correct representation for unknown (or unimplemented) headers is the raw bytes. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 382: Namespace Packages
On Apr 15, 2009, at 12:15 PM, M.-A. Lemburg wrote: The much more common use case is that of wanting to have a base package installation which optional add-ons that live in the same logical package namespace. The PEP provides a way to solve this use case by giving both developers and users a standard at hand which they can follow without having to rely on some non-standard helpers and across Python implementations. I'm not sure I understand what advantage your proposal gives over the current mechanism for doing this. That is, add to your __init__.py file: from pkgutil import extend_path __path__ = extend_path(__path__, __name__) Can you describe the intended advantages over the status-quo a bit more clearly? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue5434: datetime.monthdelta
On Apr 16, 2009, at 5:47 PM, Antoine Pitrou wrote: IMHO, the question is rather what the use case is for the behaviour you are proposing. In which kind of situation is it acceptable to turn 31/2 silently into 29/2? Essentially any situation in which you'd actually want a "next month" operation it's acceptable to do that. It's a human-interface operation, and as such, everyone (ahem) "knows what it means" to say "2 months from now", but the details don't usually have to be thought about too much. Of course when you have a computer program, you actually need to tell it what you really mean. I do a fair amount of date calculating, and use two different kinds of "add-month": Option 1) Add n to the month number, truncate day number to fit the month you end up in. Option 2) As above, but with the additional caveat that if the original date is the last day of its month, the new day should also be the last day of the new month. That is: April 30th + 1 month = May 31st, instead of May 30th. They're both useful behaviors, in different circumstances. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 22, 2009, at 2:50 AM, Martin v. Löwis wrote: I'm proposing the following PEP for inclusion into Python 3.1. Please comment. +1. Even if some people still want a low-level bytes API, it's important that the easy case be easy. That is: the majority of Python applications should *just work, damnit* even with not-properly-encoded- in-current-LC_CTYPE filenames. It looks like this proposal accomplishes that, and does so in a relatively nice fashion. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 24, 2009, at 8:00 AM, Paul Moore wrote: However, it *does* agree with the reality of Windows file systems. The fundamental problem here is that there is a strong OS disparity - for Windows, the OS uses Unicode, for POSIX, the OS uses bytes. It's unfortunately the case that this isn't *precisely* true. Windows uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit sequences. Neither one is required by the operating system to be a proper unicode encoding. The main difference is that there is already a widely accepted way to decode a improperly-encoded 16-bit-sequence with the utf-16 codec: simply leave the lone surrogate pairs in place. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 24, 2009, at 6:05 PM, Paul Moore wrote: - Windows systems where broken Unicode (lone surrogates or whatever) isn't involved - Unix systems where the user's stated filesystem encoding is correct Can you honestly say that this isn't the vast majority of real-world environments? (IIRC, you are based in Japan, so it may well be true that the likelihood of problems is a lot higher where you are than where I am - the UK - but I suspect that averaging out, things are generally as above). In my experience, it is normal on most unix systems that some programs (mostly daemons) are running in default "POSIX" locale, others (most user programs) are running in the "en_US.utf-8" locale, and some luddite users have set themselves to "en_US.8859-1". All running on the same system. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 27, 2009, at 11:35 PM, Martin v. Löwis wrote: No. You seem to assume that all bytes < 128 decode successfully always. I believe this assumption is wrong, in general: py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position 3-4: illegal multibyte sequence All bytes are below 128, yet it fails to decode. Surely nobody uses iso2022 as an LC_CTYPE encoding. That's expressly forbidden by POSIX, if I'm not mistaken...and I can't see how it would work, considering that it uses all the bytes from 0x20-0x7f, including 0x2f ("/"), to represent non-ascii characters. Hopefully it can be assumed that your locale encoding really is a non- overlapping superset of ASCII, as is required by POSIX... I'm a bit scared at the prospect that U+DCAF could turn into "/", that just screams security vulnerability to me. So I'd like to propose that only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be encoded/decoded via the error handler. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 28, 2009, at 2:50 AM, Martin v. Löwis wrote: James Y Knight wrote: Hopefully it can be assumed that your locale encoding really is a non-overlapping superset of ASCII, as is required by POSIX... Can you please point to the part of the POSIX spec that says that such overlapping is forbidden? I can't find it...I would've thought it would be on this page: http://opengroup.org/onlinepubs/007908775/xbd/charset.html but it's not (at least, not obviously). That does say (effectively) that all encodings must be supersets of ASCII and use the same codepoints, though. However, ISO-2022 being inappropriate for LC_CTYPE usage is the entire reason why EUC-JP was created, so I'm pretty sure that it is in fact inappropriate, and I cannot find any evidence of it ever being used on any system. From http://en.wikipedia.org/wiki/EUC-JP: "To get the EUC form of an ISO-2022 character, the most significant bit of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 to each of these original 7-bit codes); this allows software to easily distinguish whether a particular byte in a character string belongs to the ISO-646 code or the ISO-2022 (EUC) code." Also: http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html I'm a bit scared at the prospect that U+DCAF could turn into "/", that just screams security vulnerability to me. So I'd like to propose that only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be encoded/decoded via the error handler. It would be actually U+DC2f that would turn into /. Yes, I meant to say DC2F, sorry for the confusion. I'm happy to exclude that range from the mapping if POSIX really requires an encoding not to be overlapping with ASCII. I think it has to be excluded from mapping in order to not introduce security issues. However... There's also SHIFT-JIS to worry about...which apparently some people actually want to use as their default encoding, despite it being broken to do so. RedHat apparently refuses to provide it as a locale charset (due to its brokenness), and it's also not available by default on my Debian system. People do unfortunately seem to actually use it in real life. https://bugzilla.redhat.com/show_bug.cgi?id=136290 So, I'd like to propose this: The "python-escape" error handler when given a non-decodable byte from 0x80 to 0xFF will produce values of U+DC80 to U+DCFF. When given a non- decodable byte from 0x00 to 0x7F, it will be converted to U+-U +007F. On the encoding side, values from U+DC80 to U+DCFF are encoded into 0x80 to 0xFF, and all other characters are treated in whatever way the encoding would normally treat them. This proposal obviously works for all non-overlapping ASCII supersets, where 0x00 to 0x7F always decode to U+00 to U+7F. But it also works for Shift-JIS and other similar ASCII-supersets with overlaps in trailing bytes of a multibyte sequence. So, a sequence like "\x81\xFD".decode("shift-jis", "python-escape") will turn into u"\uDC81\u00fd". Which will then properly encode back into "\x81\xFD". The character sets this *doesn't* work for are: ebcdic code pages (obviously completely unsuitable for a locale encoding on unix), iso2022-* (covered above), and shift-jisx0213 (because it has replaced \ with yen, and - with overline). If it's desirable to work with shift_jisx0213, a modification of the proposal can be made: Change the second sentence to: "When given a non- decodable byte from 0x00 to 0x7F, that byte must be the second or later byte in a multibyte sequence. In such a case, the error handler will produce the encoding of that byte if it was standing alone (thus in most encodings, \x00-\x7f turn into U+00-U+7F)." It sounds from https://bugzilla.novell.com/show_bug.cgi?id=162501 like some people do actually use shift_jisx0213, unfortunately. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 30, 2009, at 5:42 AM, Martin v. Löwis wrote: I think you are right. I have now excluded ASCII bytes from being mapped, effectively not supporting any encodings that are not ASCII compatible. Does that sound ok? Yes. The practical upshot of this is that users who brokenly use "ja_JP.SJIS" as their locale (which, note, first requires editing some files in /var/lib/locales manually to enable its use..) may still have python not work with invalid-in-shift-jis filenames. Since that locale is widely recognized as a bad idea to use, and is not supported by any distros, it certainly doesn't bother me that it isn't 100% supported in python. It seems like the most common reason why people want to use SJIS is to make old pre-unicode apps work right in WINE -- in which case it doesn't actually affect unix python at all. I'd personally be fine with python just declaring that the filesystem- encoding will *always* be utf-8b and ignore the locale...but I expect some other people might complain about that. Of course, application authors can decide to do that themselves by calling sys.setfilesystemencoding('utf-8b') at the start of their program. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 and GUI libraries
On May 1, 2009, at 9:42 PM, Zooko O'Whielacronx wrote: Yep, I reversed the order of encode() and decode(). However, my whole statement was utterly wrong and shows that I still didn't fully get it yet. I have flip-flopped again and currently think that PEP 383 is useless for this use case and that my original plan [1] is still the way to go. Please let me know if you spot a flaw in my plan or a ridiculousity in my requirements, or if you see a way that PEP 383 can help me. If I were designing a new system such as this, I'd probably just go for utf8b *always*. That is, set the filesystem encoding to utf-8b. The end. All files always keep the same bytes transferring between unix systems. Thus, for the 99% of the world that uses either windows or a utf-8 locale, they get useful filenames inside tahoe. The other 1% of the world that uses something like latin-1, EUC_JP, etc. on their local system sees mojibake filenames in tahoe, but will see the same filename that they put in when they take it back out. Gnome already uses only utf-8 for filename displays for a few years now, for example, so this isn't exactly an unheard-of position to take... But if you don't do that, then, I still don't see what purpose your requirements serve. If I have two systems: one with a UTF-8 locale, and one with a Latin-1 locale, why should transmitting filenames from system 1 to system 2 through tahoe preserve the raw bytes, but doing the reverse *not* preserve the raw bytes? (all byte-sequences are valid in latin-1, remember, so they'll all decode into unicode without error, and then be reencoded in utf-8...). This seems rather a useless behavior to me. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
On May 6, 2009, at 5:39 AM, Stephen J. Turnbull wrote: Now, with Python's file system encoding == UTF-8 or any packed EUC, and more than a handful of Shift JIS or Big5 characters in file names, one is *almost certain* to encounter ASCII as the second byte of a multibyte sequence. PEP 383 can't handle this Hm, I haven't tried the implementation, but I thought that what would happen is: '\x85a'.decode('utf-8', 'utf8b/surrogate-replace/whateveritscalled') - > u'\uDC85a' If that indeed doesn't happen, that's certainly a defect and should be remedied. , but it is sure to be the most common use case for PEP 383 in East Asia. Yes. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 384: Defining a Stable ABI
On May 17, 2009, at 4:54 PM, Martin v. Löwis wrote: Currently, each feature release introduces a new name for the Python DLL on Windows, and may cause incompatibilities for extension modules on Unix. This PEP proposes to define a stable set of API functions which are guaranteed to be available for the lifetime of Python 3, and which will also remain binary-compatible across versions. Extension modules and applications embedding Python can work with different feature releases as long as they restrict themselves to this stable ABI. It seems like a good ideal to strive for. But I think this is too strong a promise. IMO it would be better to say that ABI compatibility across releases is a goal. If someone does make a change that breaks the ABI, I'd expect whomever is proposing it to put forth a fairly strong argument towards why it's a worthwhile change. But it should be possible and allowed, given the right circumstances. Because I think it's pretty much inevitable that it *will* need to happen, sometime. (of course there will need to be ABI tests, so that any potential ABI breakages are known about when they occur) Python is much more defined by its source language than its C extension API, so tying the python major version number to the C ABI might not be the best idea from a "marketing" standpoint. (I can see it now..."Python 4.0 major new features: we changed the C method definition struct layout incompatibly" :) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
On May 20, 2009, at 4:07 PM, Nick Coghlan wrote: Forcing developers to choose between the speed of the INCREF/DECREF macros and the proposed ABI compatibility mode for the benefit of an as yet hypothetical GIL-less CPython API implementation seems more like a way to kill adoption of the ABI compatibility mode rather than a way to encourage the use of the IncRef/Decref functions. Indeed, and if the promise of "no-ABI-breakages-till-4.0" is removed, this would be a non-issue. Keep Py_INCREF macros in the current ABI, and then break the ABI when someone wants to remove the GIL someday. That's certainly going to be a big enough change to justify changing the ABI. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Migration strategy for new-style string formatting [Was: Binary Operator for New-Style String Formatting]
On Jun 21, 2009, at 5:40 PM, Eric Smith wrote: I've basically come to accept that %-formatting can never go away, unfortunately. There are too many places where %-formatting is used, for example in logging Formatters. %-formatting either has to exist or it has to be emulated. It'd possibly be helpful if there were builtin objects which forced the format style to be either newstyle or oldstyle, independent of whether % or format was called on it. E.g. x = newstyle_formatstr("{} {} {}") x % (1,2,3) == x.format(1,2,3) == "1 2 3" and perhaps, for symmetry: y = oldstyle_formatstr("%s %s %s") y.format(1,2,3) == x % (1,2,3) == "1 2 3" This allows the format string "style" decision is to be made external to the API actually calling the formatting function. Thus, it need not matter as much whether the logging API uses % or .format() internally -- that only affects the *default* behavior when a bare string is passed in. This could allow for a controlled staged towards the new format string format, with a long deprecation period for users to migrate: 1) introduce the above feature, and recommend in docs that people only ever use new-style format strings, wrapping the string in newstyle_formatstr() when necessary for passing to an API which uses % internally. 2) A long time later...deprecate str.__mod__; don't deprecate newstyle_formatstr.__mod__. 3) A while after that (maybe), remove str.__mod__ and replace all calls in Python to % (used as a formatting operator) with .format() so that the default is to use newstyle format strings for all APIs from then on. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remove site-packages?!? [was: [Distutils] PEP 376 - from pythonpkgmgr's point of view]
On Jul 21, 2009, at 7:38 PM, David Lyon wrote: When I go into python on ubuntu I see there is /usr/local/pythonX.X/ lib/ site-packages and I'm wondering why the hubba setuptools/distutils doesn't put packages there by default. That would solve a lot of problems. Just leave /usr/lib/pythonX.X//lib/site-packages to the O/S. Uh guys, I'm not sure if anyone here noticed, but Debian and Ubuntu have switched to install their distribution-supplied python libraries into: /usr/lib/pythonX.Y/lib/dist-packages and distutils by default will install into /usr/local/lib/pythonX.Y/dist-packages starting with python 2.6. See: http://lists.debian.org/debian-devel/2009/02/msg00431.html Since that email says "Discussed this with Barry Warsaw and Martin v. Loewis", I'd assume this change would be more widely known in the distutils/python-dev community, but apparently not?? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remove site-packages?!? [was: [Distutils] PEP 376 - from pythonpkgmgr's point of view]
On Jul 22, 2009, at 4:49 AM, M.-A. Lemburg wrote: Debian has a long history of doing this different, so it's not much of a surprise. They also apply such changes to Python packages. However, all of this is non-standard and will cause problems with tools that rely on the standard site-packages/ location. Such changes should be discouraged. And yet, the change seems to have some strong reasoning, solves the problem discussed in this thread, and was apparently discussed and approved of by some core python developers before being implemented. It seems a bit foolish to me to thus just dismiss it as "evil debian being different"... If anything it seems like it's a failure of the Python project to make easily deployable software, compounded with a failure of communication within the python community. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] command line attachable debugger
On Jul 24, 2009, at 1:31 AM, Edward Peschko wrote: all, I'I was wondering if there was a command line python debugger that was able to attach to an existing process. I'd very much like to be able to debug over a ssh session using screen. Ed (ps - and yes, I know about winpdb, etc... that is not exactly what I'm looking for..) Winpdb is *exactly* what you asked for, so if it's not what you're looking for you'll need to be more specific about what you want that it doesn't do... James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decorator syntax
On Sep 2, 2009, at 6:15 AM, Rob Cliffe wrote: So - the syntax restriction seems not only inconsistent, but pointless; it doesn't forbid anything, but merely means we have to do it in a slightly convoluted (unPythonesque) way. So please, Guido, will you reconsider? Indeed, it's a silly inconsistent restriction. When it was first added I too suggested that any expression be allowed after the @, rather than having a uniquely special restricted syntax. I argued from consistency of grammar standpoint. But Guido was not persuaded. Good luck to you. :) Here's some of the more relevant messages from the thread back when the @decorator feature was first introduced: http://mail.python.org/pipermail/python-dev/2004-August/046654.html http://mail.python.org/pipermail/python-dev/2004-August/046659.html http://mail.python.org/pipermail/python-dev/2004-August/046675.html http://mail.python.org/pipermail/python-dev/2004-August/046711.html http://mail.python.org/pipermail/python-dev/2004-August/046741.html http://mail.python.org/pipermail/python-dev/2004-August/046753.html http://mail.python.org/pipermail/python-dev/2004-August/046818.html James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fuzziness in io module specs
On Sep 18, 2009, at 3:55 PM, MRAB wrote: I think that this should be an invariant: 0 <= file pointer <= file size so the file pointer might sometimes have to be moved. As for the question of whether 'truncate' should be able to lengthen a file, the method name suggests no; if the method name were 'resize', for example, then maybe yes, zeroing the new bytes for security. Why are you just making things up? There is a *vast* amount of precedent for how file operations should work. Python should follow that precedent and do like POSIX unless there's a compelling reason not to. Quoting: If fildes refers to a regular file, the ftruncate() function shall cause the size of the file to be truncated to length. If the size of the file previously exceeded length, the extra data shall no longer be available to reads on the file. If the file previously was smaller than this size, ftruncate() shall either increase the size of the file or fail. XSI-conformant systems shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled. The value of the seek pointer shall not be modified by a call to ftruncate(). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] POSIX [Fuzziness in io module specs]
On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote: I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them. The POSIX specs are quite easily accessible, without payment. I got my quote by doing: man 3p ftruncate I had previously done: apt-get install manpages-posix-dev to install the posix manpages. That package contains the POSIX standard as of 2003. Which is good enough for most uses. It seems to be available here, if you don't have a debian system: http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/ There's also a webpage, containing the official POSIX 2008 standard: http://www.opengroup.org/onlinepubs/9699919799/ And to navigate to ftruncate from there, click "System Interfaces" in the left pane, "System Interfaces" in the bottom pane, and then "ftruncate" in the bottom pane. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module precisions and exception hierarchy
On Sep 27, 2009, at 4:20 AM, Pascal Chambon wrote: Thus, at the moment IOErrors rather have the semantic of "particular case of OSError", and it's kind of confusing to have them remain in their own separate tree... Furthermore, OSErrors are often used where IOErrors would perfectly fit, eg. in low level I/O functions of the OS module. Since OSErrors and IOErrors are slightly mixed up when we deal with IO operations, maybe the easiest way to make it clearer would be to push to their limits already existing designs. How about just making IOError = OSError, and introducing your proposed subclasses? Does the usage of IOError vs OSError have *any* useful semantics? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
On Sep 27, 2009, at 3:18 PM, Peter Moody wrote: administrators) would use it, but it's doable. what you're claiming is that my use case is invalid. that's what I claim is broken. He's claiming your solution to address your use case is confusing, not that the use case is invalid. I'm not going to make ipaddr less useful (strictly removing functionality), more bulky and confusing (adding more confusingly named classes and methods) or otherwise break the library in a vain attempt to have it included in the stdlib. If I understand correctly, the proposal for addressing the issue is to make two rather simple changes: 1) if strict=False, mask off the bits described by the netmask when creating an IPNetwork, such that the host bits are always 0. 2) add a single new function: def parse_net_and_addr(s): return (IPNetwork(s), IPAddress(s.split('/')[0])) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] please consider changing --enable-unicode default to ucs4
On Sep 28, 2009, at 4:25 AM, M.-A. Lemburg wrote: Distributions should really not be put in charge of upstream coding design decisions. I don't think you can blame distros for this one From PEP 0261: It is also proposed that one day --enable-unicode will just default to the width of your platforms wchar_t. On linux, wchar_t is 4 bytes. If there's a consensus amongst python upstream that all the distros should be shipping Python with UCS2 unicode strings, you should reach out to them and say this, in a rather more clear fashion. Currently, most signs point towards UCS4 builds as being the better option. Or, one might reasonably wonder why UCS-4 is an option at all, if nobody should enable it. People building their own Python version will usually also build their own extensions, so I don't really believe that the above scenario is very common. I'd just like to note that I've run into this trap multiple times. I built a custom python, and expected it to work with all the existing, installed, extensions (same major version as the system install, just patched). And then had to build it again with UCS4, for it to actually work. Of course building twice isn't the end of the world, and I'm certainly used to having to twiddle build options on software to get it working, but, this *does* happen, and *is* a tiny bit irritating. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] transitioning from % to {} formatting
I'm resending a message I sent in June, since it seems the same thread has come up again, and I don't believe anybody actually responded (positively or negatively) to the suggestion back then. http://mail.python.org/pipermail/python-dev/2009-June/090176.html On Jun 21, 2009, at 5:40 PM, Eric Smith wrote: I've basically come to accept that %-formatting can never go away, unfortunately. There are too many places where %-formatting is used, for example in logging Formatters. %-formatting either has to exist or it has to be emulated. It'd possibly be helpful if there were builtin objects which forced the format style to be either newstyle or oldstyle, independent of whether % or format was called on it. E.g. x = newstyle_formatstr("{} {} {}") x % (1,2,3) == x.format(1,2,3) == "1 2 3" and perhaps, for symmetry: y = oldstyle_formatstr("%s %s %s") y.format(1,2,3) == x % (1,2,3) == "1 2 3" This allows the format string "style" decision is to be made external to the API actually calling the formatting function. Thus, it need not matter as much whether the logging API uses % or .format() internally -- that only affects the *default* behavior when a bare string is passed in. This could allow for a controlled switch towards the new format string format, with a long deprecation period for users to migrate: 1) introduce the above feature, and recommend in docs that people only ever use new-style format strings, wrapping the string in newstyle_formatstr() when necessary for passing to an API which uses % internally. 2) A long time later...deprecate str.__mod__; don't deprecate newstyle_formatstr.__mod__. 3) A while after that (maybe), remove str.__mod__ and replace all calls in Python to % (used as a formatting operator) with .format() so that the default is to use newstyle format strings for all APIs from then on. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] transitioning from % to {} formatting
On Sep 30, 2009, at 10:34 AM, Steven D'Aprano wrote: E.g. x = newstyle_formatstr("{} {} {}") x % (1,2,3) == x.format(1,2,3) == "1 2 3" Moving along, let's suppose the newstyle_formatstr is introduced. What's the intention then? Do we go through the std lib and replace every call to (say) somestring % args with newstyle_formatstr(somestring) % args instead? That seems terribly pointless to me Indeed, that *would* be terribly pointless! Actually, more than pointless, it would be broken, as you've changed the API from taking oldstyle format strings to newstyle format strings. That is not the suggestion. The intention is to change /nearly nothing/ in the std lib, and yet allow users to use newstyle string substitution with every API. Many Python APIs (e.g. logging) currently take a %-type formatting string. It cannot simply be changed to take a {}-type format string, because of backwards compatibility concerns. Either a new API can be added to every one of those functions/classes, or, a single API can be added to inform those places to use newstyle format strings. This could allow for a controlled switch towards the new format string format, with a long deprecation period for users to migrate: 1) introduce the above feature, and recommend in docs that people only ever use new-style format strings, wrapping the string in newstyle_formatstr() when necessary for passing to an API which uses % internally. And how are people supposed to know what the API uses internally? It's documented, (as it already must be, today!). Personally, I think your chances of getting people to write: logging.Formatter(newstyle_formatstr("%(asctime)s - %(name)s - % (level)s - %(msg)s")) instead of logging.Formatter("%(asctime)s - %(name)s - %(level)s - %(msg)s") That's not my proposal. The user could write either: logging.Formatter("%(asctime)s - %(name)s - %(level)s - %(msg)s") (as always -- that can't be changed without a long deprecation period), or: logging.Formatter(newstyle_formatstr("{asctime} - {name} - {level} - {msg}") This despite the fact that logging has not been changed to use {}- style formatting internally. It should continue to call "self._fmt % record.__dict__" for backward compatibility. That's not to say that this proposal would allow no work to be done to check the stdlib for issues. The Logging module presents one: it checks if the format string contains "%{asctime}" to see if it should bother to calculate the time. That of course would need to be changed. Best would be to stick an instance which lazily generates its string representation into the dict. The other APIs mentioned on this thread (BaseHTTPServer, email.generator) will work immediately without changes, however. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] transitioning from % to {} formatting
On Oct 1, 2009, at 9:11 AM, Paul Moore wrote: This seems to me to be almost the same as the previous suggestion of having a string subclass: class BraceFormatter(str): def __mod__(self, other): # Needs more magic here to cope with dict argument return self.format(*other) __ = BraceFormatter logger.debug(__("The {0} is {1}"), "answer", 42) I'd rather make that: class BraceFormatter: def __init__(self, s): self.s = s def __mod__(self, other): # Needs more magic here to cope with dict argument return s.format(*other) __ = BraceFormatter That is, *not* a string subclass. Then if someone attempts to mangle it, or use it for anything but %, it fails loudly. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] transitioning from % to {} formatting
On Sep 30, 2009, at 1:01 PM, Antoine Pitrou wrote: Why not allow logging.Formatter to take a callable, which would in turn call the callable with keyword arguments? Therefore, you could write: logging.Formatter("{asctime} - {name} - {level} - {msg}".format) and then: logging.critical(name="Python", msg="Buildbots are down") All this without having to learn about a separate "compatibility wrapper object". It's a nice idea -- but I think it's better for the wrapper (whatever form it takes) to support __mod__ so that logging.Formatter (and everything else) doesn't need to be modified to be able to know about how to use both callables and "%"ables. Is it possible for a C function like str.format to have other methods defined on its function type? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] transitioning from % to {} formatting
On Oct 1, 2009, at 5:54 PM, Nick Coghlan wrote: I believe classes like fmt_braces/fmt_dollar/fmt_percent will be part of a solution, but they aren't a complete solution on their own. (Naming the three major string formatting techniques by the key symbols involved is a really good idea though) 1. It's easy to inadvertently convert them back to normal strings. If a formatting API even calls "str" on the format string then we end up with a problem (and switching to containment instead of inheritance doesn't really help, since all objects implement __str__). Using containment instead of inheritance makes sure none of the *other* operations people do on strings will appear to work, at least (substring, contains, etc). I bet explicitly calling str() on a format string is even more rare than attempting to do those things. 2. They don't help with APIs that expect a percent-formatted string and do more with it than just pass it to str.__mod__ (e.g. inspecting it for particular values such as '%(asctime)s') True, but I don't think there's many such cases in the first place, and such places can be fixed to not do that as they're found. Until they are fixed, fmt_braces will loudly fail when used with that API (assuming fmt_braces is not a subclass of str). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] transitioning from % to {} formatting
On Oct 1, 2009, at 6:19 PM, Steven Bethard wrote: I see how this could allow a user to supply a {}-format string to an API that accepts only %-format strings. But I still don't see the transition strategy for the API itself. That is, how does the %-format API use this to eventually switch to {}-format strings? Could someone please lay it out for me, step by step, showing what happens in each version? Here's what I said in my first message, suggesting this change. Copy&pasted below: I wrote: 1) introduce the above feature, and recommend in docs that people only ever use new-style format strings, wrapping the string in newstyle_formatstr() when necessary for passing to an API which uses % internally. 2) A long time later...deprecate str.__mod__; don't deprecate newstyle_formatstr.__mod__. 3) A while after that (maybe), remove str.__mod__ and replace all calls in Python to % (used as a formatting operator) with .format() so that the default is to use newstyle format strings for all APIs from then on. So do (1) in 3.2. Then do (2) in 3.4, and (3) in 3.6. I skipped two versions each time because of how widely this API is used, and the likely pain that doing the transition quickly would cause. But I guess you *could* do it in one version each step. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] transitioning from % to {} formatting
On Oct 2, 2009, at 2:56 PM, Raymond Hettinger wrote: Do the users get any say in this? I imagine that some people are heavily invested in %-formatting. Because there has been limited uptake on {}-formatting (afaict), we still have limited experience with knowing that it is actually better, less error-prone, easier to learn/rember, etc. Outside a handful of people on this list, I have yet to see anyone adopt it as the preferred syntax. Well, I actually think it was a pretty bad idea to introduce {} formatting, because %-formatting is well-known in many other languages, and $-formatting is used by basically all the rest. So the introduction of {}-formatting has always seemed silly to me, and I wish it had not happened. HOWEVER, much worse than having a new, different, and strange formatting convention is having *multiple* formatting conventions arbitrarily used in different places within the language, with no rhyme or reason. So, given that brace-formatting was added, and that it's been declared the way forward, I'd *greatly* prefer it taking over everywhere in python, instead of having to use a mixture. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Package install failures in 2.6.3
On Oct 5, 2009, at 2:21 PM, Brett Cannon wrote: I should also mention this bug was not unknown. I discovered it after Distribute 0.6 was released as I always run cutting edge interpreters. Never bothered to report it until Distribute 0.6.1 was released which Tarek fixed in less than a week. I never bothered to report it for setuptools as I know it isn't maintained. It's probably in our best interest to just get people over to Distribute, let it continue to hijack setuptools, and slowly let that name fade out if it is going to continue to be unmaintained. I have to admit I find it really disheartening that we are letting an unmaintained project dictate how we fix a bug. I really hope this is a one-time deal and from this point forward we all move the community towards Distribute so we never feel pressured like this again. Even though the bug was noticed, nobody thought that, just perhaps, breaking other software in a minor point release might be a bad idea, no matter whether it was updated in less-than-a-week, or mostly- unmaintained? Once you have an API that you encourage people to subclass, *of course* it dictates how you can fix a bug. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug 7183 and Python 2.6.4
On Oct 22, 2009, at 11:04 AM, Barry Warsaw wrote: On Oct 22, 2009, at 10:47 AM, Benjamin Peterson wrote: 2009/10/22 Barry Warsaw : So does anybody else think bug 7183 should be a release blocker for 2.6.4 final, or is even a legitimate but that we need to fix? I think it cannot hold up a release with out a reproducible code snippet. It may not be reproducible in standard Python, see David's follow up to the issue. If that holds true and we can't reproduce it, I agree we should not hold up the release for this. >>> class Foo(property): ... __slots__=[] ... >>> x=Foo() >>> x.__doc__ = "asdf" Traceback (most recent call last): File "", line 1, in AttributeError: 'Foo' object attribute '__doc__' is read-only You can't add arbitrary attributes to instances, since some instances don't have the slot to put them in. Is that an equivalent demonstration to that which boost runs into? (except, it's using a C type not a python type). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug 7183 and Python 2.6.4
On Oct 22, 2009, at 3:53 PM, Robert Collins wrote: On Thu, 2009-10-22 at 13:16 -0400, Tres Seaver wrote: ... That being said, I can't this bug as a release blocker: people can either upgrade to super-current Boost, or stick with 2.6.2 until they can. Thats the challenge Ubuntu faces: https://bugs.edge.launchpad.net/ubuntu/+source/boost1.38/+bug/457688 We've just announced our Karmic RC, boost 1.40 isn't released, and python 2.6.3 doesn't work with a released boost :( If I were running a Linux distro, I'd revert the patch in 2.6.3. And if I were running a Python release process, I'd revert that patch for python 2.6.4, and reopen the bug that it fixed, so a less-breaky patch can be made. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Retrieve an arbitrary element from a set without removing it
On Oct 25, 2009, at 2:50 AM, Terry Reedy wrote: Alex Martelli wrote: Next(s) would seem good... That does not work. It has to be next(iter(s)), and that has been tried and eliminated because it is significantly slower. But who cares about the speed of getting an arbitrary element from a set? How can it *possibly* be a problem in a real program? If you want to optimize python, this operation is certainly not the right place to start... James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?
On Nov 2, 2009, at 6:24 PM, sstein...@gmail.com wrote: +1 on 2.7 being the last of the 2.x series. Enough already! -1. (not that it matters) I, personally, haven't even written my first line of 3.x code, nor have I had any good reason to. Me neither. If I saw the actual end of the line at 2.7, I would actually start looking for 3.x versions of my favorite tools and would be much more inclined to help push them along ASAP. I'd probably keep using 2.7 to be able to keep using those tools, instead. Right now, so much that I use on a daily basis doesn't even have a 3.x roadmap, much less any sort of working implementation, that I don't see switching to 3.x ever unless the 2.x line ends, and soon! I don't see switching to 3.x anytime soon either. But what's the rush? 2.x seems to be a fine edition of Python, why not let it keep going to 2.8 and beyond? Then you wouldn't have to switch to 3.x at all, and that'd save you a ton of work. (and save all the people you will have to convince to make a 3.x roadmap and do the port a ton of work too!) It really sounds like you're saying that switching to 3.x isn't worth the cost to you, but you want to force people (including yourself) to do so anyways, because ...? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?
On Nov 3, 2009, at 12:06 AM, Guido van Rossum wrote: Though I imagine what it really needs is a "quirks mode" parser that is compatible with the HTML dialect accepted by, say, IE6. Maybe a summer of code project? Already exists: html5lib. http://code.google.com/p/html5lib/ Or if you want a faster (yet I think less exact) HTML parser, libxml2's HTML parser, via lxml: http://codespeak.net/lxml/parsing.html#parsing-html James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?
On Nov 3, 2009, at 8:55 AM, sstein...@gmail.com wrote: And, as you point out, if 3.x doesn't start getting the crap beat out of it in the real world sooner rather than later, we may find ourselves, collectively with a stale 2.x, an under battle-tested 3.x, and nowhere to go. If that happens, it's not true that there's *nowhere* to go. A solution would be to discard 3.x as a failed experiment, take everything that is useful from it and port it to 2.x, and simply continue development from the last 2.x release. And from there, features can be deprecated and then removed a few releases later, as is the usual policy. Been there, done that, on a couple other projects. It's unfortunate when you have to throw out work you've done because it failed to gain traction over the thing you tried to replace, but sometimes that's life. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it
On Nov 5, 2009, at 6:04 PM, geremy condra wrote: Perhaps my test is flawed in some way? Yes: you're testing the speed of something that makes absolutely no sense to do in a tight loop, so *who the heck cares how fast any way of doing it is*! Is this thread over yet? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyPI comments and ratings, *really*?
On Nov 12, 2009, at 4:11 PM, Ben Finney wrote: I think Jesse's point (or, if he's not willing to claim it, my point) is that, compared to the mandatory comment system, it makes much *more* sense to have a mandatory field for “URL to the BTS for this project”. One might look at the "competition" for inspiration. Looking at CPAN. There's no "comments" feature, but there is a "CPAN RT" bug-tracker which appears to be a way for users to submit comments/problems about packages in a way common to all packages in CPAN, but distinct from upstream's bug trackers/lists/etc. I'd assume that gets emailed to the listed maintainer of the package as well as being accessible to other users, although I don't really have any idea. e.g. http://search.cpan.org/~capttofu/DBD-mysql/lib/DBD/mysql.pm There might be something to be said for providing users a way to provide feedback that doesn't require making a accounts in a bazillion separate bugtrackers. *shrug* James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyPI comments and ratings, *really*?
On Nov 12, 2009, at 5:23 PM, Masklinn wrote: On 12 Nov 2009, at 22:53 , James Y Knight wrote: On Nov 12, 2009, at 4:11 PM, Ben Finney wrote: I think Jesse's point (or, if he's not willing to claim it, my point) is that, compared to the mandatory comment system, it makes much *more* sense to have a mandatory field for “URL to the BTS for this project”. One might look at the "competition" for inspiration. Looking at CPAN. There's no "comments" feature There is, on search.cpan.org. See http://search.cpan.org/~petdance/ack/ for instance, the link leads to http://cpanratings.perl.org/ (a pretty interesting example of the "distributed" nature of cpan in fact). Ah, I see. I totally managed to miss that...I guess that's an interesting example of a bad web ui. :) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Drop support for ones' complement machines?
On Dec 1, 2009, at 11:08 AM, Martin v. Löwis wrote: >>> I'd rather prefer to explicitly list what CPython assumes about the >>> outcome of specific operations. If this is just about &, |, ^, and ~, >>> then its fine with me. >> >> I'm not even interested in going this far: > > I still am: with your list of assumptions, it is unclear (to me, at > least) what the consequences are. So I'd rather see an explicit list > of consequences, instead of buying a pig in a poke. I think all that needs to be defined is that conversion from unsigned to signed, and (negative) signed to unsigned integers have 2's complement wrapping semantics, and does not affect the bit pattern in memory. Stating it that way makes it clearer that all you're assuming is the operation of the cast operators, and it seems to me that it implies the other requirements. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL required for _all_ Python calls?
On Jan 7, 2010, at 3:27 PM, Martin v. Löwis wrote: I've been wondering whether it's possible to release the GIL in the regex engine during matching. I don't think that's possible. The regex engine can also operate on objects whose representation may move in memory when you don't hold the GIL (e.g. buffers that get mutated). Even if they stay in place - if their contents changes, regex results may be confusing. It seems probably worthwhile to optimize for the common case of using the regexp engine on an immutable object of type "str" or "bytes", and allow releasing the GIL in *that* case, even if you have to keep it for the general case. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
On Jan 8, 2010, at 4:14 PM, Tres Seaver wrote: I understood this proposal as a general processing guideline, not something the io library should do (but, say, a text editor). FWIW, I'm personally in favor of using the UTF-8 signature. If people consider them crazy talk, that may be because UTF-8 can't possibly have a byte order - hence I call it a signature, not the BOM. As a signature, I don't consider it crazy at all. There is a long tradition of having magic bytes in files (executable files, Postscript, PDF, ... - see /etc/magic). Having a magic byte sequence for plain text to denote the encoding is useful and helps reducing moji-bake. This is the reason it's used on Windows: notepad would normally assume that text is in the ANSI code page, and for compatibility, it can't stop doing that. So the UTF-8 signature gives them an exit strategy. Agreed. Having that marker at the start of the file makes interop with other tools *much* easier. Putting the BOM at the beginning of UTF-8 text files is not a good idea, it makes interop much *worse* on a unix system, not better. Without the BOM, most commands do the right thing with UTF-8 text. E.g. to concatenate two files: $ cat file-1 file-2 > file-3 With a BOM at the beginning of the file, it won't work right. Of course, you could modify "cat" (and every other stream processing command) to know how to consume and emit BOMs, and omit the extra one that would show up in the middle of the stream...but even that can't work; what about: $ (cat file-1; cat file-2) > file-3. Should the shell now know that when you run multiple commands, it should eat the BOM emitted from the second command? Basically, using a BOM in a utf-8 file is just not a good idea: it completely ruins interop with every standard unix tool. This is not to say that Python shouldn't have a way to read a file with a UTF-8 BOM: it just shouldn't encourage you to *write* such files. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailing List archive corruption?
On Jan 19, 2010, at 11:07 AM, Barry Warsaw wrote: On Jan 19, 2010, at 03:50 PM, Vinay Sajip wrote: When I look at the mailing list archive for python-dev, I see some odd stuff at the bottom of the page: http://mail.python.org/pipermail/python-dev/2010-January/thread.html#95232 Anyone know what's happened? WTF? I think the archives were recently regenerated, so there's probably a fubar there. CC'ing the postmasters. That happens if messages had unescaped "From" lines in the middle of them. No doubt, you've now broken every link anyone had ever made into the python-dev archives, because now all the article numbers are different. BTDT...unfortunately... Pipermail really is quite crappy, sigh. Anyhow, when I did that, I went back to a backup to get the original article numbers, and edited the mbox file escaping From lines or adding additional empty messages until the newly regenerated article numbers matched the originals. I'd highly recommend going through that painful process, since I suspect a *lot* of people have links to the python-dev archive. Hope you have a backup (or can find caches on google or archive.org or something). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: broken mailing list links in PEP(s?)
On May 5, 2010, at 8:22 AM, Barry Warsaw wrote: On May 5, 2010, at 7:09 AM, Oleg Broytman wrote: On Wed, May 05, 2010 at 11:43:45AM +0100, Michael Foord wrote: http://mail.python.org/pipermail/python-list/2000-July/108893.html which are broken Pipermail's links aren't stable AFAIU. The numbering is changing over time. They're only unstable if you regenerate the archives and the mbox file is old enough to have been a victim of a long-fixed delimiter bug. Which is true for python-dev. And of course if you're paying attention, you can fix the mbox file (quoting "From" etc) such that it generates the same numbers as it did the first time. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] HEADS UP: Compilation risk with new GCC 4.5.0
On May 12, 2010, at 9:13 AM, Jesus Cea wrote: Short history: new GCC 4.5.0 (released a month ago), when compiling with - -O3, is adding MMX/SSE instructions that requires stack aligned to 16 byte. This is wrong, since x86 ABI only requires stack aligned to 4 bytes. If you compile EVERYTHING with GCC 4.5.0, you are safe (I guess!), but if your environment has mixed compiled code (for instance, the OS libraries), you can possibly "core dump". If you have an old compiled Python and you update libs compiled with GCC 4.5.0, you can crash in the process. Psyco is showing the issue, but it is not the culprit. It only leaves - -correctly- the stack in not 16-byte alignment. But there are plenty of examples of crashes not related to python+psyco. Proposal: add "-fno-tree-vectorize" to compilation options for 2.7/3.2. Warm 2.3/2.4/2.5/2.6/3.0/3.1 users. Or warm users compiling with GCC 4.5.0. While assuming the stack is 16byte aligned is undeniably an ABI- violation in GCC, at this point, it's surely simpler to just go along: the new unofficial ABI for x86 is that the stack must always be left in 16-byte alignment... So, just change psyco to always use 16-byte-aligned stackframes. GCC has used 16byte-aligned stackframes for a heck of a long time now (so if the stack starts 16byte aligned on entry to a function it will stay that way on calls). So usually the only way people run into unaligned stacks is via hand-written assembly code or JIT compilers. I think you'll be a lot happier just modifying Psyco than making everyone else in the world change their compiler flags. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] HEADS UP: Compilation risk with new GCC 4.5.0
On May 12, 2010, at 10:01 AM, Jesus Cea wrote: On 12/05/10 15:39, James Y Knight wrote: While assuming the stack is 16byte aligned is undeniably an ABI-violation in GCC, at this point, it's surely simpler to just go along: the new unofficial ABI for x86 is that the stack must always be left in 16-byte alignment... You can not rule out other software embedding python inside, or callbacks from foreign code. For instance, Berkeley DB library can do callbacks to Python code. So? When calling callback functions, the Berkeley DB library won't un-16byte-align the stack, will it? (Assuming it's been compiled with gcc in the last 10 years) Not all the universe is GCC based. For instance, Solaris system libraries are not compiled using GCC. The world is bigger that Linux/ GCC. If the Solaris compilers don't use 16byte-aligned stackframes, and GCC on Solaris/x86 also assumes 16byte-aligned stacks, I guess GCC on Solaris/x86 is pretty broken indeed. But for Linux/x86, stacks have been de-facto 16byte aligned for so long, you can *almost* excuse the ABI violation as unimportant. But anyways, psyco should keep the stackframes 16byte aligned regardless, for performance reasons: even when accessing datatypes for which unaligned access doesn't crash, it's faster when it's aligned. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 4:29 PM, M.-A. Lemburg wrote: Here's a little known fact: by changing the Python2 default encoding to 'undefined' (yes, that's a real codec !), you can disable all automatic string coercion in Python2. I tried that once: half the stdlib stops working if you do (for example, the re module), so it's not particularly useful for checking if your own code is unicode-safe. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: Similarly I'd expect (from experience) that a programmer using Python to want to take the same approach, sticking with unencoded data in nearly all situations. Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.) This means that Python3 programs can become *more* fragile in the face of random data you encounter out in the real world, rather than less fragile, which was the goal of the whole exercise. The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :) James___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities
On Jun 22, 2010, at 5:14 PM, Craig Younkins wrote: I suggest rewording the documentation for the method making it more clear what it should and should not be used for. I would like to see the method changed to properly escape single-quotes, but if it is not changed, the documentation should explicitly say this method does not make input safe for inclusion in HTML. Well, it *does* make the input safe for inclusion in HTML...in a double-quoted attribute. The docs could make it clearer that you should always use double- quotes around your attribute values when using it, though, I agree. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] versioned .so files for Python 3.2
On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: On 6/24/2010 5:09 PM, Barry Warsaw wrote: What use case does this address? Specifically, it's the use case where we (Debian/Ubuntu) plan on installing all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, we can do that without collisions on the pyc files, but would still have to symlink for extension module .so files, because they are always named foo.so and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's foo.so. If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared) Because python looks for .so files in the same place it looks for the .py files of the same package. E.g., given a module like lxml, it contains the following files (among others): lxml/ lxml/__init__.py lxml/__init__.pyc lxml/builder.py lxml/builder.pyc lxml/etree.so And you can only put it in one place. Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] versioned .so files for Python 3.2
On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: On 6/24/2010 8:23 PM, James Y Knight wrote: On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared) Because python looks for .so files in the same place it looks for the .py files of the same package. My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk. Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory. However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions). In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package. The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] FHS compliance of Python installation
On Jun 26, 2010, at 4:35 PM, Matthias Klose wrote: On 26.06.2010 22:30, C. Titus Brown wrote: On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: On 25.06.2010 02:54, Ben Finney wrote: James Y Knight writes: Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. +1 So who's going to draft the ???Filesystem Hierarchy Standard compliance??? PEP? :-) This has nothing to do with the FHS. The FHS talks about data, not code. Really? It has some guidelines here for object files, etc., at least as of 2004. http://www.pathname.com/fhs/pub/fhs-2.3.html A quick scan suggests /usr/lib is the right place to look: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA agreed for object files, but http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA explicitely states "The /usr/share hierarchy is for all read-only architecture independent *data* files". I always figured the "read-only architecture independent" bit was the important part there, and "code is data". Emacs's el files go into / usr/share/emacs, for instance. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
On Aug 24, 2010, at 10:26 AM, Benjamin Peterson wrote: 2010/8/24 P.J. Eby : At 03:37 PM 8/24/2010 +0200, Hrvoje Niksic wrote: a) a "business" case of throwing anything other than AttributeError from __getattr__ and friends is almost certainly a bug waiting to happen, and FYI, best practice for __getattr__ is generally to bail with an AttributeError as soon as you see double underscores in the name, unless you intend to support special attributes. Unless you're in an old-style class, you shouldn't get an double underscore methods in __getattr__ (or __getattribute__). If you do, it's a bug. Uh, did you see the message that was in response to? Maybe it should be a bug report? >>> class Foo(object): ... def __getattr__(self, name): print "ATTR:",name ... def __iter__(self): yield 1 ... >>> print list(Foo()) ATTR: __length_hint__ [1] James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 384 status
On Aug 29, 2010, at 8:16 AM, Nick Coghlan wrote: > However, since even platforms other than Windows aren't immune to > version upgrades of the standard C runtime Aren't they? I don't know of any other platform that lets you have two versions of libc linked into a single address space. Linux has had incompatible libc updates in the past, but it was not possible to use both in one program. I believe BSD works the same way. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.path.normcase rationale?
On Sep 24, 2010, at 10:53 AM, Paul Moore wrote: > On 24 September 2010 15:29, Guido van Rossum wrote: >> I don't think we should try to reimplement what the filesystem does. I >> think we should just ask the filesystem (how exactly I haven't figured >> out yet but I expect it will be more OS-specific than >> filesystem-specific). It will have to be a new API -- normcase() at >> least is *intended* to return a case-flattened name on OSes where >> case-preserving filesystems are the default, and changing it to look >> at the filesystem would break too much code. For a new use case we >> need a new API. > > I dug into this once, and as far as I could tell, it's possible to get > the information on Windows, but there's no way on Linux to "ask the > filesystem". From my researches, the standard interfaces a filesystem > has to implement on Linux don't offer any means of asking this > question. > > Of course, (a) I'm no Linux expert so what do I know, and (b) it may > well be possible to come up with a "good enough" solution by ignoring > pathologically annoying theoretical cases. > > I'm happy to provide Windows code if someone needs it. > Paul An OSX code sketch is available here (summary: call FSPathMakeRef to get an FSRef from a path string, then FSRefMakePath to make it back into a path, which will then have the correct case). And note that it only works if the file actually exists. http://stackoverflow.com/questions/370186/how-do-i-find-the-correct-case-of-a-filename It would indeed be useful to have that be available in Python. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.path.normcase rationale?
On Sep 26, 2010, at 7:36 AM, Paul Moore wrote: > On 26 September 2010 09:01, Paul Moore wrote: >> On 25 September 2010 23:57, Greg Ewing wrote: >>> Paul Moore wrote: >>> Windows has (I believe) user definable filesystems, too, but the OS has "get me the real filename" style calls, >>> >>> Does it really, though? The suggestions I've seen for doing >>> this involve abusing the short/long filename translation >>> machinery, and I'm not sure they're guaranteed to return the >>> actual case rather than something that happens to work. >> >> There's another call available. I've been too lazy to go and look it >> up, but I'll do so sometime today. > > Hmm, I can't find the one I was thinking of. GetLongFileName correctly > sets the case of all but the final part, and FindFile can be used to > find the last part, but that's not what I recall. > > GetFinalPathNameByHandle works, and is documented to do so, but (a) it > works on an open file handle, so you need to open the file, and (b) > it's Vista and later only... Were you thinking of SHGetFileInfo? http://stackoverflow.com/questions/74451/getting-actual-file-name-with-proper-casing-on-windows James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.path.normcase rationale?
On Oct 3, 2010, at 9:18 AM, Dan Villiom Podlaski Christiansen wrote: > A simpler alternative would probably be the F_GETPATH fcntl. An example: That requires that you have permission to open the file (and to actually do so which might have other effects), while the File Manager's FSRef method does not. If Python adds a cross-platform function to do this canonicalization, users don't have to worry about how easy it is to invoke in pure-python... James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Distutils2 scripts
On Oct 8, 2010, at 5:24 PM, Gisle Aas wrote: > On Oct 8, 2010, at 9:22 , Jeroen Ruigrok van der Werven wrote: > >> +1 from me. I sincerely dislike the Perl-esque -m stuff. > > As a Perl/Python guy I have to object to calling the -m stuff Perl-esque. > This is a very Pythonish thing. In the Perl world we never treat modules as > scripts; they are separate concepts written separately and installed in > separate locations. There is no feature of perl similar to the Pythonish -m > stuff. Yes there is. -m and -M. E.g., the widely advertised perl -MCPAN -e install. It's not identical to python's -m, to be sure, but it's *similar*. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for async read/write
On Oct 19, 2010, at 1:47 PM, exar...@twistedmatrix.com wrote: > Adding more platform wrappers is always nice. Keep in mind that the quality > of most (all?) aio_* implementations is spotty at best, though. On Linux, > they will sometimes block (for example, if you fail to align buffers > properly, or open a file without O_DIRECT, or if there are too many other aio > operations active on the system at the time, etc). You're thinking of the linux-specific AIO calls. Those have the properties you're describing (which makes them pretty useless for most code too), but they're completely different from the aio_* functions. The POSIX aio_* calls don't do any of that. They aren't syscalls implemented in the kernel, they're implemented in glibc. They "simply" create a threadpool in your process to call the standard synchronous operations, and make it difficult to reliably get completion notification (completion notification takes place via Real-Time signals (SIGEV_SIGNAL), which can be dropped if linux runs out of space in its RT-signal-queue, and when that happens you get no indication that that has occurred. You can also do completion notification via calling a function on a thread (SIGEV_THREAD), but, for that, glibc will always spawns a brand new thread for each notification, which is quite slow.) Basically: you shouldn't ever use those APIs. Especially on linux, but probably everywhere else. So, in conclusion, I disagree that adding wrappers for these would be nice. It wouldn't. It would cause some people to think they would be useful things to call, and they would always be wrong. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for async read/write
On Oct 19, 2010, at 6:44 PM, Martin v. Löwis wrote: >> So, in conclusion, I disagree that adding wrappers for these would be >> nice. It wouldn't. It would cause some people to think they would be >> useful things to call, and they would always be wrong. > > We are all consenting adults. If people want to shoot themselves in > their feet, we let them. For example, we have os.open, even though > there is no garbage collection for file handles, and we have > os._exit, even though it doesn't call finalizers. There's a difference. os._exit is useful. os.open is useful. aio_* are *not* useful. For anything. If there's anything you think you want to use them for, you're wrong. It either won't work properly or it will worse performing than the simpler alternatives. It would absolutely be a waste of time (of both the implementor of the wrapper and the poor users who stumble across them in documentation and try to use them) to bother adding wrappers to these functions for python. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Continuing 2.x
On Oct 27, 2010, at 10:22 PM, Kristján Valur Jónsson wrote: > Hello all. > > So, python 2.7 is in bugfix only mode. ‘trunk’ is off limit. So, where does > one make improvements to the distinguished, and still very much alive, 2.x > series of Python? > The answer would seem to be “one doesn’t”. But must it be that way? > > When Morris stopped producing the Oxford III model back in ’57 in favor of > new developments, it didn’t spell the end for it. The plant was sold to > India and the Hindustan Ambassador continues to be developed and produced to > this day. It even has fuel injection. > The Morris Motor Company isn’t around anymore. > > So, here is my suggestion: > Let’s move the current ‘trunk’ into /branches/afterlife-27. Open it for > submissions from people such as myself that use 2.7 on a regular basis and > are willing to give it some extra love. Host it there without the usual > stringent python quality assurance, buildbot support, release management and > all that rigmarole. Open-source it, if you will. > Svn.python.org already plays host to some other, less official, projects such > as stackless, so why not this? The python community has already decided many times over that Python2 is dead and Python3 is the future. So if you want to continue maintaining Python2, that means you need to fork it. I think you'd be best off doing so on your own infrastructure: convincing the python developers to support such a thing is quite unlikely, and furthermore, completely unnecessary. Unlike the Oxford III, you don't need to be "sold" python2: it's open source, you can fork it without any official approval. So, just do it. I wish you best of luck, though: most unofficial forks die a lonely death. But, if enough people feel like you do, it could become successful. But I really doubt anyone else is going to want to use it any python2 afterlife without stringent quality assurance, multi-platform support releases, and other rigamarole. You'd have to set up all that stuff for yourself if you possibly hope to attract users. I can't think of any possible use for an unreliably maintained version of python2... James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/
On Nov 3, 2010, at 11:25 AM, Eric Smith wrote: > On 11/3/10 10:53 AM, Eric Smith wrote: > >> The problem is that there is no unittest.loader in 2.4, and >> unittest.loader.TestLoader is the name that the 2.7 pickle creates. We >> see this problem every time we try and move anything in the stdlib. > > And BTW: for me, this is the strongest reason not to break up modules into > packages or otherwise reorganize the stdlib. This is the strongest reason why I recommend to everyone I know that they not use pickle for storage they'd like to keep working after upgrades [not just of stdlib, but other 3rd party software or their own software]. :) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3 transition in Arch Linux
On Nov 4, 2010, at 8:43 PM, Stephen J. Turnbull wrote: > All of the Arch users I know expect Arch to occasionally do radical > things because they're the right things to do in the long run. But the previous consensus (at least, as I, and presumably many other people understood it) was that python2 would remain the owner of the name "/usr/bin/python" for the indefinite future, and python3 would be invoked with /usr/bin/python3. Given that, it's not at all clear that Arch's actions are the right thing to do. IMO, moving away from that consensus should've been brought up on python-dev rather than just one distro just doing it all alone, causing incompatibilities and annoyance. If python-dev wants python3 to inherit the name /usr/bin/python, then python2 should've been installing a binary called /usr/bin/python2 for a couple years ahead of time, and recommending that everyone use that in their #! lines, so that the switch could've been done without breaking everything... > Sure, and Guido should have exercised the Time Machine a little harder > so that Python 3 never needed to happen. IOW, this is the price of > success and wide distribution. Well, other programming languages seem to have avoided making sweeping bidirectionally-incompatible changes, despite being successful and widely distributed. But that's a whole other discussion. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3 transition in Arch Linux
On Nov 6, 2010, at 9:41 AM, Martin v. Löwis wrote: > So I don't recall a decision that there shouldn't be a python2 > binary, The decision to make one would have to be an active decision, since Python has never installed one before. If there should be one, then the Python Makefile should make one by default. > nor a decision that anything is done indefinitely > (it may be that the decision was actually just about 3.1 - changing > it again for 3.2 would require another decision, but certainly can't > be ruled out categorically). When I said "indefinite", I meant "until some point in the future not yet determined", with an implied undertone of "not anytime soon". James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Continuing 2.x
On Nov 8, 2010, at 4:42 AM, Lennart Regebro wrote: > Except for making releases that start backporting Python 3 features > and breaking backwards compatibility gradually (which may or may not > be a good idea) I don't see the point. There isn't much to do when it > comes to improving the language, and there is a moratorium anyway. > Improvements in the standard library can be more easily done in > external libraries anyway, and then you can release the improved > libraries for everything from Python 2.4 and forwards if you like. > > So it can be done, but the question is "Why?" To keep the batteries included? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Continuing 2.x
On Nov 8, 2010, at 6:08 PM, Lennart Regebro wrote: > 2010/11/8 James Y Knight : >> On Nov 8, 2010, at 4:42 AM, Lennart Regebro wrote: >>> So it can be done, but the question is "Why?" >> >> To keep the batteries included? > > But they'll only be included in > 2.7, which won't be used much, [...] If there was going to be an official python.org sanctioned Python 2.8 release, I'm not at all sure that'd be the case. Since there isn't going to be one, then yes, that's probably true. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 10, 2010, at 8:47 AM, Michael Foord wrote: > How about making this explicit (either pep 8 or our developer docs): > > If a module or package defines __all__ that authoritatively defines the > public interface. Modules with __all__ SHOULD still respect the naming > conventions (leading underscore for private members) to avoid confusing > users. Modules SHOULD NOT export private members in __all__. I don't like the idea of the authoritative definition of a public interface being defined based on __all__, because that provides users almost no warning that they're using a private API: the __all__ attribute doesn't do anything if you aren't using import *. If there was some proposal to make it so that accessing an attribute not in __all__ did prevent or somehow warn users that they're doing something dangerous, that'd be different, but there isn't such a proposal, and I don't even know what such a proposal would look like... On the other hand, if you make the primary mechanism to indicate privateness be a leading underscore, that's obvious to everyone. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r86441 - python/branches/py3k/Lib/test/test_nntplib.py
On Nov 13, 2010, at 7:08 AM, Antoine Pitrou wrote: > Funny, it shows that the NNTP SSL tests don't check the certificate, > then. Unsurprising, given that you need 140 lines of pretty non-obvious python code to do so... James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 17, 2010, at 9:19 AM, Nick Coghlan wrote: > (and is a little trickier in the case of module level globals, since those > can't be deprecated properly) People keep saying this, but there have already been examples shown of how to do it. I actually think that python should include a way to do so standard -- it's a reasonable enough desire, as shown by how many times in this thread the inability to do so has been mentioned. If the existing working 3rd-party mechanisms aren't good enough for python-dev standards, come up with a new way... James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 17, 2010, at 10:30 AM, Guido van Rossum wrote: > On Wed, Nov 17, 2010 at 7:24 AM, James Y Knight wrote: >> On Nov 17, 2010, at 9:19 AM, Nick Coghlan wrote: >>> (and is a little trickier in the case of module level globals, since those >>> can't be deprecated properly) >> >> People keep saying this, but there have already been examples shown of how >> to do it. I actually think that python should include a way to do so >> standard -- it's a reasonable enough desire, as shown by how many times in >> this thread the inability to do so has been mentioned. If the existing >> working 3rd-party mechanisms aren't good enough for python-dev standards, >> come up with a new way... > > That's quite the distraction from the current thread though. Start > discussing it on python-ideas, or submit a code fix, or something in > between. But the hackish way that some 3rd party frameworks use > (replacing the module object with a class instance in sys.modules) is > clearly not right for the standard library (I'll explain on > python-ideas if you insist). I just don't want people to use the current lack as an excuse to simply remove module attributes without prior deprecation (or make a compatibility policy which recommends doing such a thing). I'll leave it up to the experts on this list (or python-ideas...) to determine how to implement a module-level deprecation in a way that isn't considered "hackish". (Or, if there is no such way, there's also the alternative of simply never removing module-level names.) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 17, 2010, at 11:38 AM, Guido van Rossum wrote: > Deprecation doesn't *require* logging a warning or raising an > exception. You can also add a note to the docs, or if it is > undocumented, just add a comment to the code. (Though if it is in > widespread use despite being undocumented, a better way would be to > document it first -- as immediately deprecated if necessary.) > > Deprecation is in the end a way to give people advance warning about > future changes. The mechanism of the warning doesn't always have to be > implemented by the interpreter/compiler/parser or whatever other tool. Well, that's certainly a possible policy. I'd suggest that adding notes to the docs after-the-fact is a singularly ineffective way of giving people advance warning of feature removal compared to having the interpreter/compiler/parser or whatever other tool warn you. And if that's to be python's policy, when it's possible to do better, I'm disappointed. (But won't respond further, my point is made.) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
Why don't ya'll just call them "--unichar-width=16/32". That describes precisely what the options do, and doesn't invite any quibbling over definitions. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote: > Maybe Python should have used UTF-8 as its internal unicode > representation. Then people who were foolish enough to assume > one character per string item would have their programs break > rather soon under only light unicode testing. :-) You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote: > Or you can give user programs memory indicies, and enjoy the fun as > the poor developers do things like "pos += 1" which works fine on > the ASCII data they have lying around, then wonder why they get > Unicode errors when they take substrings. a) You seem to be hung up implementation details of emacs. But yes, positions should be stored as an byte offset into the utf8 string. NOT as number of codepoints since the beginning of the string. Probably you want it to be somewhat opaque, so that you actually have to specify whether you wanted to go to +1 byte, codepoint, or grapheme. b) Those poor developers are *already* screwed if they're using pos += 1 when pos is a codepoint index and they then take a substring based on that! They will get half a character when the string contains combining characters... Pretending that "codepoints" are a useful abstraction just makes poor developers get by without doing the correct thing (incrementing to the next grapheme boundary) for a little bit longer. But once you [the language implementor] are providing correct abstractions for grapheme movement, it's just as easy to also provide an abstraction for codepoint movement, and make your low-level implementation of the iterator object be a byte-offset into a UTF8 buffer. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote: > By the way, to send the ball back into your court, I have this feeling > that the demand for UTF-8 is once again driven by native English > speakers who are very shortly going to find themselves, and the data > they are most familiar with, very much in the minority. Of course the > market that benefits from UTF-8 compression will remain very large for > the immediate future, but in the grand scheme of things, most of the > world is going to prefer UTF-16 by a substantial margin. No, the demand for UTF-8 is because that's what much of the internet (and not coincidentally, unix) world has standardized on. The main pieces of software using UTF-16 (Windows, Java) started doing so before it became apparent that 16 bits wasn't enough to actually hold a unicode codepoint, so they were actually implementing UCS-2. In those days, UCS-2 was a fairly sensible choice. But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly superior. Not because it's smaller -- it's pretty much a tossup -- but because it is an ASCII superset, and thus more easily compatible with other software. That also makes it most commonly used for internet communication. (So, there's a huge advantage for using it internally as well right there: no transcoding necessary for writing your HTML output). UTF-16 is incompatible with ASCII, and furthermore, it's still a variable-width encoding, with all the same issues that causes. As such, there's really very little to be said in favor of it. If you really want a fixed-width encoding, you have to go to UTF-32, which is excessively large. UTF-32 is a losing choice, simply because of the wasted memory usage. But that's all a side issue: even if you do choose UTF-16 as your underlying encoding, you *still* need to provide iterators that work by "byte" (only now bytes are 16-bits), by codepoint, and by grapheme. Of course, people who implement UTF-16 (such as python, java, and windows) often pretend they're still implementing UCS-2, and don't bother even providing their users with the necessary APIs to do things correctly. Which, you can often get away with...just so long as you don't mind that you sometimes end up splitting a string in the middle of a codepoint and causing a unicode error! James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 384 final review
On Nov 29, 2010, at 8:58 AM, Nick Coghlan wrote: > The http read only URLs > didn't work (no diff returned, just "svn: OPTIONS of > 'http://svn.python.org/python/branches/pep-0384': 200 OK > (http://svn.python.org)"), That was the wrong url: you should've used http://svn.python.org/projects/python/branches/pep-0384 James___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ICU
On Dec 1, 2010, at 11:45 PM, Alexander Belopolsky wrote: > On Tue, Nov 30, 2010 at 3:13 PM, Antoine Pitrou wrote: >> >> Oh, about ICU: >> Actually, I remember you saying that locale should ideally be replaced with a wrapper around the ICU library. >>> >>> By that, I stand - however, I have given up the hope that this will >>> happen anytime soon. >> >> Perhaps this could be made a GSOC topic. >> > > Incidentally, this may also address another Python's Achilles' heel: > the timezone support. > > http://icu-project.org/download/icutzu.html Does ICU do anything regarding timezones that datetime + pytz doesn't already do? Wouldn't it make more sense to integrate the already-existing-and-pythonic pytz into Python than to make a new wrapper based on ICU? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The buffer() function
On Jul 13, 2006, at 12:52 PM, Thomas Heller wrote: > IIUC, the buffer object was broken some time ago, but I think it has > been fixed. Can the 'status' of the buffer function be changed? > To quote the next question from the OP: > > "Is buffer safe to use? Is there an alternative?" > > My thinking is that it *is* safe to use, and that there is > no alternative (but imo also no alternative is needed). I believe it's safe, except when used on an array.array object. However, that's not buffer's fault, but rather a bug in the array class. The buffer interface requires that, as long as a reference to a python object is alive, pointers into its buffer will not become invalidated. Array breaks that guarantee. To fix this, array ought to make a sub-object that this guarantee _does_ hold for. And when it needs more storage, simply make a new sub-object with more storage. Then, the buffer's reference would be to the refcounted sub-object, and thus the associated memory wouldn't go away until the buffer was done with it. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Community buildbots
On Jul 15, 2006, at 3:15 PM, M.-A. Lemburg wrote: > Note that it also helps setting the default encoding > to 'unknown'. That way you disable the coercion of strings > to Unicode and all the places where this implicit conversion > takes place crop up, allowing you to take proper action (i.e. > explicit conversion or changing of the string to Unicode > as appropriate). I've tried that before to verify no such conversion issues occurred in Twisted, but, as the python stdlib isn't usable like that, it's hard to use it to find bugs in any other libraries. (in particular, the re module is badly broken, some other stuff was too). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Capabilities / Restricted Execution
On Jul 16, 2006, at 5:42 AM, Scott Dial wrote: > Talin wrote: >> Scott Dial wrote: >>> Phillip J. Eby wrote: >>> A function's func_closure contains cell objects that hold the variables. These are readable if you can set the func_closure of some function of your own. If the overall plan includes the ability to restrict func_closure setting (or reading) in a restricted interpreter, then you might be okay. >>> >>> Except this function (__getattribute__) has been trapped inside of a >>> class which does not expose it as an attribute. So, you shouldn't be >>> able to get to the func_closure attribute of the __getattribute__ >>> function for an instance of the Guard class. I can't come up with >>> a way >>> to defeat this protection, at least. If you have a way, then I'd be >>> interested to hear it. >> >> I've thought of several ways to break it already. Some are >> repairable, >> I'm not sure that they all are. >> >> For example, neither of the following statements blows up: >> >> print t2.get_name.func_closure[0] >> print object.__getattribute__( t2, '__dict__' ) >> >> Still, its perhaps a useful basis for experimentation. >> >> -- Talin > > I quickly poked around it in python and realized that in 2.5 (as > opposed > to the 2.4 python I was playing in) the cell object exposes > cell_contents.. blargh. So, yes, you can defeat the protection because > the wrapped instance is exposed. > > print t2.get_name() > t2.get_name.func_closure[0].cell_contents.im_self.name = 'poop' > print t2.get_name() > > Although, your second example with using the object.__getattribute__ > doesn't seem to really be an issue. You retrieved the __dict__ for the > Guard class which is empty and is something we should not feel > concerned > about being leaked. > > Only way I see this as viable is if in "restricted" mode cell_contents > was removed from cell objects. Similarly to how function attributes aren't accessible in restricted mode. In older versions of python, it's always been possible to get the closure variables in non-restricted mode, via mutating func_code... def get_closure_contents(fun): num = len(fun.func_closure) vars = ["x%d" % n for n in range(num)] defines = ' = '.join(vars) + " = None" returns = ', '.join(vars)+',' exec """ def b(): %s def bb(): return %s return bb """ % (defines, returns) old_code = fun.func_code fun.func_code = b().func_code result = fun() fun.func_code = old_code return dict(zip(old_code.co_freevars, result)) def make_secret(x,y): def g(): return x*y return g >>> secret = f(5,7) >>> secret() 35 >>> get_closure_contents(secret) {'y': 7, 'x': 5} ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dynamic module namspaces
On Jul 15, 2006, at 2:38 PM, Johan Dahlin wrote: > What I want to ask, is it possible to have a sanctioned way to > implement > a dynamic module/namespace in python? > > For instance, it could be implemented to allow you to replace the > __dict__ attribute in a module with a user provided object which > implements the dictionary protocol. I'd like this, as well, although my use case is different: I'd like to be able to deprecate attributes in a module. That is, if I have: foo.py: SOME_CONSTANT = 5 I'd like to be able to do something such that any time anyone accessed foo.SOME_CONSTANT, it'd emit a DeprecationWarning. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] logging module broken because of locale
On Jul 18, 2006, at 1:54 PM, Martin v. Löwis wrote: > Mihai Ibanescu wrote: >> To follow up on my own email: it looks like, even though in some >> locale >> "INFO".lower() != "info" >> >> u"INFO".lower() == "info" (at least in the Turkish locale). >> >> Is that guaranteed, at least for now (for the current versions of >> python)? > > It's guaranteed for now; unicode.lower is not locale-aware. That seems backwards of how it should be ideally: the byte-string upper and lower should always do ascii uppering-and-lowering, and the unicode ones should do it according to locale. Perhaps that can be cleaned up in py3k? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strategy for converting the decimal module to C
On Jul 21, 2006, at 6:18 AM, Nick Maclaren wrote: > To cut a long story short, it is impractical for a language run-time > system to call user-defined handlers with any degree of reliability > unless the compiled code and run-time interoperate carefully - I have > been there and done that many times, but few people still working > have. > On architectures with out-of-order execution (and interrupts), you > have to assume that an interrupt may occur anywhere, even when the > code does not use the relevant facility. Floating-point overflow > in the middle of a list insertion? That's to be expected. While this _is_ a real problem, is it _not_ a general problem as you are describing it. Processors are perfectly capable of generating precise interrupts, and the inability to do so has nothing to do with the out-of-order execution, etc. Almost all interrupts are precise. The only interesting one which is not, on x86 processors, is the x87 floating point exception, which is basically for historical reasons. It has never been precise, ever since the actual 8087 coprocessor chip for the 8086. However, all is not lost: the exception cannot occur randomly. It can only occur on *some* floating point instruction, even if the instruction is not the one the error actually occurred in. So, unless your list insertion code uses floating point instructions, you should not get a floating point exception during your list insertion. Also, looking forward, the "simd" floating point instructions (ie mmx/ sse/sse2/sse3) _do_ generate precise interrupts. And on x86-64, x87 instructions are deprecated and everyone is recommended to use the simd ones, instead (so, for example, gcc defaults to using them). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Document performance requirements?
On Jul 21, 2006, at 12:45 PM, Giovanni Bajo wrote: > Jason Orendorff wrote: > >>> However, I'm also struggling to think of a case other than list vs >>> deque where the choice of a builtin or standard library data >>> structure would be dictated by big-O() concerns. >> >> OK, but that doesn't mean the information is unimportant. +1 on >> making this something of a priority. People looking for this info >> should find it in the obvious place. Some are unobvious. (How >> fast is >> dict.__eq__ on average? Worst case?) > > I also found out that most people tend to think of Python's lists as a > magical data structure optimized for many operations (like a "rope" or > something complex like that). Documenting that it's just a bare vector > (std::vector in C++) would be of great help. Indeed, I was talking to someone a while back who thought that lists were magically hashed, in that he did something like: dictionary = open("/usr/share/dict/words").readlines() and then expected: "word" in dictionary would be fast. And was very surprised when it turned out to be slow a linear search of the list. :) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.4, VS 2005 & Profile Guided Optmization
On Jul 23, 2006, at 4:41 PM, Giovanni Bajo wrote: > I think Martin decided to keep VC71 (Visual Studio .NET 2003) for > another > release cycle. Given the impressive results of VC8 with PGO, and > the fact > that Visual Studio Express 2005 is free forever, I would hope as > well for > the decision to be reconsidered. Wasn't there a "Free Forever" 2003 edition too, which has since completely disappeared? Why do you think that MS won't stop distributing the Free Forever VS 2005 once VS 2005+1 comes out, the same way they did the 2003 one? James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rounding float to int directly (Re: struct module and coercing floats to integers)
On Aug 2, 2006, at 11:26 PM, Raymond Hettinger wrote: > Also, -10 on changing the semantics of int() to round instead of > truncate. The truncating version is found is so many other languages > and book examples, that it would be a disaster for us to choose a > different meaning. I'd be happy to see floats lose their __int__ method entirely, replaced by an explicit truncate function. I've always thought it quite a hack that python floats have implicit truncation to ints, and then a random smattering of APIs go to extra lengths to explicitly prevent float.__int__ from being called because people thought "passing a float makes no sense!". That's right, it doesn't, and it _never_ should happen implicitly, not just in those particular few cases. Explicit is better than implicit. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
On Aug 3, 2006, at 5:47 PM, M.-A. Lemburg wrote: >> The only way this error could be the right thing is if you were >> trying >> to suggest that he shouldn't mix unicode and bytestrings at all. > > Good question. I wonder whether that's a reasonable approach for > Python 2.x (I'd say it is for Py3k). It's my understanding that in py3k, there will be no implicit conversion, bytestrings and unicodes will never be equal (no matter what the contents), and so this wouldn't be an issue. (as u"1" == "1" would be the same sort of situation as 1 == "1" is now) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rounding float to int directly (Re: struct module and coercing floats to integers)
On Aug 3, 2006, at 2:34 AM, Greg Ewing wrote: > Raymond Hettinger wrote: > >> -1 on an extra built-in just to save the time for function call > > The time isn't the main issue. The main issue > is that almost all the use cases for round() > involve doing an int() on it afterwards. At > least nobody has put forward an argument to > the contrary yet. And I bet the main reason why round() in python returns a float is because it does in C. And it does in C because C doesn't have arbitrary size integers, so if round returned integers, round(1e+308) couldn't work. In python, however, that's no problem, since python does have arbitrarily big integers. There's also round(float("inf")), of course, which wouldn't be defined if the result was an integer, but I don't think rounding infinity is much of a use case. And I do think the extension of round to allow the specification of number of decimal places was a mistake. If you want that, you probably really mean to do something like round(x * 10**y) instead. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
On Aug 4, 2006, at 12:34 AM, Josiah Carlson wrote: > As an alternate idea, rather than attempting to .decode('ascii') when > strings and unicode compare, why not .decode('latin-1')? We lose the > unicode decoding error, but "the right thing" happens (in my opinion) > when u'\xa1' and '\xa1' compare. Maybe you want those to compare equal, but _I_ want u'\xa1' and '\xc2 \xa1' to compare equal, so it should obviously use .decode('utf-8')! (okay, no, I don't really want that.) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SyntaxError: can't assign to function call
On Aug 10, 2006, at 12:01 PM, Josiah Carlson wrote: > > "Michael Urman" <[EMAIL PROTECTED]> wrote: >> >> On 8/9/06, Michael Hudson <[EMAIL PROTECTED]> wrote: >>> The question doesn't make sense: in Python, you assign to a name, >>> an attribute or a subscript, and that's it. >> >> Just to play devil's advocate here, why not to a function call via a >> new __setcall__? I'm not saying there's the use case to justify it, >> but I don't see anything that makes it a clear abomination or >> impossible with python's syntax. > > Describe the syntax and semantics. Every time I try to work them > out, I > end up with a construct that makes less than no sense, to be used in > cases I have never seen. Further, if you want to call a method > __setcall__ on an object just created, you can use 'x().__setcall__ > (y)'. > There is no reason to muck up Python's syntax. It makes just as much sense as assigning to an array access, and the semantics would be pretty similar. There's similarly "no reason" to allow x[5] = True. You can just spell that x.__setitem__(5, True). x(*args, **kwargs) = val could translate into x.__setcall__(val, *args, **kwargs). x(5) = True could translate into x.__setcall__(True, 5) Please note I'm actually arguing for this proposal. Just agreeing that it is not a completely nonsensical idea. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SyntaxError: can't assign to function call
On Aug 10, 2006, at 12:19 PM, James Y Knight wrote: > Please note I'm actually arguing for this proposal. Just agreeing > that it is not a completely nonsensical idea. ERK! Big typo there. I meant to say: Please note I'm NOT*** actually arguing for this proposal. Sorry for any confusion. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SyntaxError: can't assign to function call
On Aug 10, 2006, at 12:24 PM, Guido van Rossum wrote: > On 8/10/06, James Y Knight <[EMAIL PROTECTED]> wrote: >> It makes just as much sense as assigning to an array access, and the >> semantics would be pretty similar. > > No. Array references (x[i]) and attribute references (x.a) represent > "locations". Function calls represent values. This is no different > than the distinction between lvalues and rvalues in C. Yes, function calls cannot be lvalues right now. However, there is no reason that a function call _could not_ be an lvalue. That is exactly what the addition of __setcall__ would allow. On Aug 10, 2006, at 12:31 PM, Phillip J. Eby wrote: > Honestly, it might make more sense to get rid of augmented > assignment in Py3K rather than to add this. It seems that the need > for something like this springs primarily from the existence of > augmented assignment. It makes just as much (and just as little) sense to have normal assignment to function calls as it does augmented assignment to function calls. I don't see any reason to single out augmented assignment here. Anyhow, enough time wasted on this. I don't really think python should add this feature, but it _does_ make sense, and would have understandable and consistent semantics if it were added. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SyntaxError: can't assign to function call
On Aug 10, 2006, at 4:57 PM, Phillip J. Eby wrote: > However, I'm also not clear that trying to assign to a function > call *is* ill-advised. One of the things that attracted me to > Python in the first place is that it had a lot of features that > would be considered "hypergeneralization" in other languages, e.g. > the ability to create your own sequences, mappings, and callable > objects in the first place. > > That being said, the benefit of hypergeneralizing assignment seems > small compared to its price. Well, it's a mostly obvious extension of an existing idea, so the price doesn't seem all that high. The main problem is that so far, there have been 0 convincing use cases. So no matter how moderate the price, it's definitely bigger than the benefit. But anyhow, speaking of hypergeneralization...since this has 0 use cases anyhow, might as well hyperhypergeneralize it... Well, why should assignment be limited to only local variables, item access, and function calls. Why shouldn't you be able to potentially assign to _any_ expression! Since x + a turns into (very roughly...) x.__add__(a), then, x + a = 5 could turn into x.__add__.__setcall__(5, a). Of course, since normal __add__ functions don't have a __setcall__, doing this will raise an error. But, a user defined __add__ could have one! And what would such a user defined __add__.__setcall__ actually *do*? Well, that would be a use case, and I sure don't have any of those! Ta Da. Who's going to make the patch? ;) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On Aug 15, 2006, at 6:20 PM, Martin v. Löwis wrote: > Guido van Rossum schrieb: >> From the Python *user*'s perspective, yes, as much as possible. But >> I'm still playing with the thought of having two implementation >> types, >> since otherwise we'd have to devote 4 bytes (8 on a 64-bit platform) >> to the single *bit* telling the difference between the two internal >> representations. > > We had this discussion before; if you use ob_size==0 to indicate > that it's an int, this space isn't needed in a long int. On a 32-bit > platform, the size of an int would go up from 12 to 16; if we stop > using a special-cased allocator (which we should (*)), there isn't > any space increase on such a platform. On a 64-bit platform, the > size of an int would go up from 24 bytes to 32 bytes. But it's the short int that you probably really want to make size efficient. Which is of course also doable via something like: typedef struct { PyObject_HEAD long ob_islong : 1; long ob_ival_or_size : LONG_BITS - 1; long ob_digit[0]; } PyIntObject; There's no particular reason that a short int must be able to store the entire range of C "long", so, as many bits can be stolen from it as desired. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com