[Python-Dev] PEP 372 -- Adding an ordered directory to collections ready for pronouncement
Hi everybody, PEP 372 was modified so that it provides a simpler API (only the dict API to be exact) and it was decided to start with a Python-only implementation and replace it with a C version later if necessary. Annotated changes from earlier versions of the PEP: - the extra API for ordered dict was dropped to keep the interface simple and clean. Future versions can still be expanded but it's impossible to drop features later on. - To keep the implementation simple 3.1 / 2.7 will ship with a Python-only version of the class. It can still be rewritten in C if it turns out to be too slow or thread safety is required. The corresponding issue in the tracker: http://bugs.python.org/issue5397 Link to the PEP: http://www.python.org/dev/peps/pep-0372/ Anything else that should be done? Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 372 -- Adding an ordered directory t ocollectionsready for pronouncement
Hi, Guido van Rossum python.org> writes: > This sounds fair. Note that dict.__eq__ actually returns > NotImplemented if not isinstance(other, dict) so you could tighten the > test to isinstance(other, dict) if you wanted to. I'm actually very happy with that decision. The original PEP was doing exactly that and I still think it makes more sense. [sorry Raymond :)] Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 372 -- Adding an ordered directory to collections ready for pronouncement
Hi, Georg Brandl gmx.net> writes: > We're already quite inconsistent with type name casing in the collections > module, so it wouldn't matter so much. (Though I'd find symmetry with > defaultdict pleasing as well.) We either have the way to be consistent with defaultdict and dict or with Counter, MutableMapping etc. I think it's a bit too chaotic already to make a fair decision here. If we seriously consider a C implementation it would probably be a good idea to call it `odict`. C-Classes are usually lower cased as far as I can see. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 372 -- Adding an ordered directory to collections ready for pronouncement
Guido van Rossum python.org> writes: > +1 for odict. Somehow I thought that was the name proposed by the PEP. It originally was, Raymond wanted to change it. I would still vote for odict if that's still possible :) Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 372 -- Adding an ordered directory to collections ready for pronouncement
Steve Holden holdenweb.com> writes: > Surely that's just a thinko in the subject line? The PEP specifies > "ordered dictionary" and nobody has been talking about "directories". Actually, the initial version of the PEP had that typo in the topic. Guess I copy pasted wrong again: http://www.google.com/search?q=%22adding+an+ordered+directory%22 Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 372 -- Adding an ordered directory t o collections ready for pronouncement
Hi, Raymond Hettinger rcn.com> writes: > When we use the class, we typically only spell-out the constructor > once while actually using the returned > object many times. So, > have we really saved any typing? I'm fine with the typed out name as well, but I still would prefer lowercase to stay consistent with defaultdict/dict. Unfortunately PEP 8 never really took off naming-wise, so we're mostly following the "reuse the naming scheme from existing code in the same module" rule, and I think there lowercase wins, thanks to defaultdict. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 372 -- Adding an ordered directory to collections ready for pronouncement
Hi, Guido van Rossum python.org> writes: > Anyway, it seems the collections module in particular is already > internally inconsistent -- NamedTuple vs. defaultdict. In a sense > defaultdict is the odd one out here, since these are things you import > from some module, they're not built-in. Maybe it should be renamed to > NamedDict? I suppose you mean "DefaultDict". That would actually be the best solution. Then the module would be consistent and the new ordered dict version would go by the name "OrderedDict". Regards, Armin PS.: so is datetime.datetime a builtin then? :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Unicode Data in Python2.5 is missing a u cd_4_1_0 object
Hi, I discovered that unicodedata in python2.5 implements unicode 4.1. While this is ok it's possible enforce unicode 3.2 by using the ucd_3_2_0 object. But it's not possible to enforce a ucd_4_1_0 standard because that object does not exist by now. In the description of #1031288 (http://www.python.org/sf/1031288) Martin v. Löwis says the following: | Python relies on the unicodedata 3.2.0, as the IDNA RFCs | mandate that Unicode 3.2 is used to implement IDNA. So any | integration of 4.0.1 must | a) still maintain access to the 3.2.0 data And furthermore the docstring claims that this module just implements unicode 3.2.0 whereas unidata_version gives me 4.1.0 Doesn't that mean that there should also be an way to enforce unicode 4.1.0? Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Data in Python2.5 is missing a u cd_4_1_0 object
Hi, Martin v. Löwis v.loewis.de> writes: > > Doesn't that mean that there should also be an way to enforce unicode 4.1.0? > > You mean, that there should be a ucd_4_1_0 object? No, why do you think > there should be one? We don't plan to provide a copy of the UCD for each > UCD version that was ever released (e.g. we skipped some between 3.2 and > 4.1 also). Right, I didn't know that. From that old bug report it sounded like a programmer should be able to select a specific UCD version. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Low-Level Encoding Behavior on Python 3
Hi everybody, We (me and Carl Meyer) did some experimentation with encoding behavior on Python 3. Carl did some hacking on getting virtualenv running on Python 3 and it turned out that his version of virtualenv did not work on Python 3 on my server either. So none of the virtulenv installations did though they all seemed to work for some people. Looking closer the problem is that virtualenv was assuming that 'open(filename).read()' works. However on my particular system the default encoding in Python 3 for files was 'ASCII'. That encoding was picked up because of three things: a) Python 3's default encoding for opening files is picked up from the system locale, b) the ssh server accepts the client's encoding for everything (including filenames) and c) the OS X default installation for many people does not initialize locales properly which forces the server to fall back to 'POSIX' which then by applications (including Python) is picked up as ASCII. Now this showcases a couple of problems on different levels: - developers assume that the default for encodings is UTF-8 because that is the encoding on their local machine. Now falling back to the platform dependent encoding is documented but does not make a lot of sense. The limiting platform is probably Windows which historically has problems with UTF-8 in the notepad editor. As a compromise I recommend UTF-8 for POSIX and UTF-8-sig for Windows as the Windows editor feels happier with this encoding. As the latter reads every file of the former that should not cause that many problems in practice - Seeing that SSH happily overrides the filesystem encoding I would like to forward this issue to some of the linux maintainers. Having the SSH client override your filesystem encoding sounds like a terrible decision. Apparently Python guesses the filesystem encoding from LC_CTYPES which however is overriden by connecting SSH clients. Seeing how ubuntu and a bunch of other distributions are using Gnome which uses UTF-8 for filesystems as somewhat established default I would argue that Python should just assume UTF-8 as default encoding on a Linux environment. - Inform Apple about the fact that some Snow Leopard machines are by default setting the LC_CTYPES (and all other locales) variables to something that is not even a valid locale. I am not yet sure why it does not happen on all machines, but it happens on more than one at PyCon alone. On top of that I know that issue because it broke the Python "Babel" package for a while which is why I added a work- around for that particular problem. I will either way file a bug report at Apple for what the SSH client is doing on mixed local environments. Are we missing anything? Any suggestions? Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Low-Level Encoding Behavior on Python 3
Hi, On 3/16/11 3:48 AM, Antoine Pitrou wrote: I may be mistaken, but you seem to conflate two things: encoding of file names, and encoding of file contents. I guess that virtualenv chokes on the file contents, but most of your argument seems related to encoding of file names (aka "filesystem encoding"). These are two pretty unrelated problems but both are problems nonetheless. The filename encoding should not be guessed from the environment variables as those are from the connecting client. The default encoding for file contents also should not be platform dependent. It *will* lead to people thinking it works when in practice it will break if they move their code to a remote server and SSH into it and then trigger the code execution. I argue that the first is just wrong (filename encoding guessing) and the latter is dangerous (file content encoding being platform dependent). virtualenv itself is already fixed and explicitly tells it to read with UTF-8 encoding. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The docs, reloaded
Hoi, Fred L. Drake, Jr. acm.org> writes: > > On Monday 21 May 2007, A.M. Kuchling wrote: > > Disadvantages: > > > > * reST markup isn't much simpler than LaTeX. > > * reST doesn't support nested markup, which is used in the current > documentation. For a lightweight markup language that is human readable (which rst certainly is) the syntax is surprisingly powerful. You can nest any block tag and I'm not sure how often you have to nest roles and stuff like that. The goal of the new docs is a less complex syntax and currently nothing beats reStructuredText in terms of readability and possibilities. rst is simpler than latex: LaTeX: \item[\code{*?}, \code{+?}, \code{??}] The \character{*}, \character{+}, and \character{?} qualifiers are all \dfn{greedy}; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE \regexp{<.*>} is matched against \code{'title'}, it will match the entire string, and not just \code{''}. Adding \character{?} after the qualifier makes it perform the match in \dfn{non-greedy} or \dfn{minimal} fashion; as \emph{few} characters as possible will be matched. Using \regexp{.*?} in the previous expression will match only \code{''}. Here the same in rst: ``*?``, ``+?``, ``??`` The ``'\*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE :regexp:`<.\*>` is matched against ``'title'``, it will match the entire string, and not just ``''``. Adding ``'?'`` after the qualifier makes it perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few* characters as possible will be matched. Using :regexp:`.\*?` in the previous expression will match only ``''``. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The docs, reloaded
Hoi, Stephen J. Turnbull xemacs.org> writes: > > IMO that pair of examples shows clearly that, in this application, > reST is not an improvement over LaTeX in terms of readability/ > writability of source. It's probably not worse, although I can't help > muttering "EIBTI". In particular I find the "``'...'``" construct > horribly unreadable because it makes it hard to find the Python syntax > in all the reST. Well. That was a bad example. But if you look at the converted sources and open the source file you can see that rst is a lot cleaner that latex for this type of documentation. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The docs, reloaded
Hoi, Martin Blais furius.ca> writes: > About possibilities: I'm sorry but that is simply not true. LaTeX > provides the full power of macro expansion, while ReST is a fixed > (almost, roles are an exception) syntax which has its own set of > problems and ambiguities. I was speaking of rst in comparison with other lightweight markup languages. (textile, markdown, wiki syntax). > That, and the ability to already parse it from Python and more easily > convert to other formats (one of LaTeX's weaknesses), are the only > benefits that I can see to switching away from LaTeX. I have to admit > I'm afraid we would be moving to a lesser technology, and the driver > for that seems to be people's reluctance to work with the more > powerful, more complex tool. Not saying it is invalid (it's about > people, in the end), but I still don't see what's the big problem with > LaTeX. The problem with latex is that it requires more knowledge thus the amount of people that can contribute is a lot lower. It's a lot harder to generate a searchindex, do interlinking, generate a dynamic documentation with comments etc. Don't get me wrong, LaTeX is a powerful tool and I use it for every bigger document i type. I just think it's not the best choice for documenting scripting languages. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The docs, reloaded
Hoi, Georg Brandl gmx.net> writes: > Who's documenting a scripting language? Wanted to say agile :D Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The docs, reloaded
Hoi, Additionally to the offline docs that Georg published some days ago there is also a web version which currently looks and works pretty much like the offline version. There are however some differences that are worth knowing: - Cleaner URLs. You can actually guess them because we took the idea the PHP people had and check for similar pages if a page does not exist. We do however redirect if there was a match so that the URL stays unique. - The search doesn't require JavaScript (but is currently disabled due to a buggy stemmer and indexer) That's it for now, you can try it online at http://pydoc.gbrandl.de:3000/ Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The docs, reloaded
Hoi, Dennis Benzinger gmx.net> writes: > Looks good. But should the source pages really use syntax highlighting? > I think if somebody is interested in the source then they should get > the real source without any highlighting. If you decide to keep the > syntax highlighting then the highlighting of multiline ReST strings > should be fixed. For example see the source for splitext(). Yeah. Georg said the same yesterday. I'll revert that change so that it displays the plain sources as text/plain in the online version too. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The docs, reloaded
Hoi, Due to some server issues I had to take the web version down. But expect an updated version in a few days. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] NaN / Infinity in Python
Hi, It's one of those "non issues" but there are still some situations where you have to deal with Infinity and NaN, even in Python. Basically one the problems is that the repr of floating point numbers is platform depending and sometimes yields "nan" which is not evaluable. It's true that eval() is probably a bad thing but there are some libraries that use repr/%r for code generation and it could happen that they produce erroneous code because of that. Also there is no way to get the Infinity and NaN values and also no way to test if they exist. Maybe changing the repr of `nan` to `math.NaN` and `inf` to `math.Infinity` as well as `-inf` to `(-math.Infinity)` and add that code to the math module (of course as a C implementation, there are even macros for testing for NaN values):: Infinity = 1e1 NaN = Infinity / Infinity def is_nan(x): return type(x) is float and x != x def is_finite(x): return x != Infinity Bugs related to this issue: - http://bugs.python.org/1732212 [repr of 'nan' floats not parseable] - http://bugs.python.org/1481296 [long(float('nan'))!=0L] Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Proposal: Slightly Changed Semantics for from-Import
Hi all, I propose a small change on the "from"-import behavior. Currently there are two ways to import a module name foo from bar. Namely "import foo.bar as bar" and "from foo import bar". The main problem with the second is that it will not work in every situation. Modules are registered with their names in sys.modules before the import but become and attribute of their parent after. This leads to the surprising behavior that "from foo.bar import baz" works but "from foo import bar" not until the foo.bar module finished setting up. This behavior is especially surprising if you have circular dependencies and you notice that your imports work properly if you import a specific module before another one and will fail if you import it afterwards although it doesn't seem like that should make any difference. My proposal is that "from foo import bar" returns sys.modules["foo.bar"] if this one exists and the foo.bar attribute lookup failed. If a test case is needed I can attach one later on. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Module Suggestion: ast
Hi all, I would like to propose a new module for the stdlib for Python 2.6 and higher: "ast". The motivation for this module is the pending deprecation for compiler.ast which is widely used (debugging, template engines, code coverage etc.). _ast is a very solid module and is without a doubt easier to maintain then compiler.ast which was written in Python, it's lacking some features such as pretty printing the AST or traversing it. The idea of "ast" would be adding high level functionality for easier working with the AST. It currently provides these features: - pretty printing AST objects - a parse function as easier alias for compile() + flag - operator-node -> operator symbol mappings (eg: _ast.Add -> '+') - methods to modify lineno / col_offset (incrementing or copying the data over from existing nodes) - getting the fields of nodes as dicts - iterating over all child nodes - a function to get the docstring or an AST node - a node walker that yields all child-nodes recursively - a `NodeVistor` and `NodeTransformer` Additionally there is a `literate_eval` function in that module that can safely evaluate python literals in a string. Module and unittests are located in the Pocoo Sandbox HG repository: http://dev.pocoo.org/hg/sandbox/file/tip/ast/ast.py http://dev.pocoo.org/hg/sandbox/file/tip/ast/test_ast.py A slightly modified version of the ast.py module for Python 2.5 compatibility is currently in use for the Mako template engine to achieve support for Google's AppEngine. An example module for the NodeVisitor is in the repository which converts a Python AST back into Python source code: http://dev.pocoo.org/hg/sandbox/file/tip/ast/codegen.py Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Problems with the new super()
Hi all, I blogged about that topic today which turned out to be a very bad idea, so I summarize it for the mailinglist here to hopefully start a discussion about the topic, which I think is rather important. In the last weeks something remarkable happened in the Python3 sources: self kinda became implicit. Not in function definitions, but in super calls. But not only self: also the class passed to `super`. That's interesting because it means that the language shifts into a completely different direction. `super` was rarely used in the past, mainly because it was weird to use. In the most common use case the current class and the current instance where passed to it, and the super typed returned looked up the parent methods on the MRO for you. It was useful for multiple inheritance and mixin classes that don't know their parent but confusing for many. I can see that a replacement is a good idea, but I'm not so sure if the current implementation is the way to go. The main problem with replacing `super(Foo, self).bar()` with something like `super.bar()` is obviously that self is explicit and the class (in that case Foo) can't be determined by the caller. Furthermore the Python principle was always against functions doing stack introspection to find the caller. There are few examples in the stdlib or builtins that do some sort of caller introspection. Those are the special functions `vars`, `locals`, `gloabal`, `vars` and some functions in the inspect module. And all of them do nothing more than getting the current frame and accessing the dict of locals or globals. What super in current Python 3 builds does goes way beyond that. The implementation of the automatic super currently has two ugly details that I think violate the Python Zen: The bytecode generated is differently if the name "super" is used in the function. `__class__` is only added as cell to the code if `super` or `__class__` is referenced. That and the fact that `f_localsplus` is completely unavailable from within python makes the whole process appear magical. This is way more magical than anything we've had in Python in the past and just doesn't fit into the language in my opinion. We do have an explicit self in methods and methods are more or less just functions. Python's methods are functions, just that a descriptor puts a method object around it to pass the self as first arguments. That's an incredible cool thing to have and makes things very simple and non-magical. Breaking that principle by introducing an automatic super seems to harm the concept. Another odd thing is that Python 3 starts keeping information on the C layer we can't access from within Python. Super is one example, another good one are methods. They don't have a descriptor that wraps them if they are accessed via their classes. This as such is not a problem as you can call them the same (just that you can call them with completely different receivers now) but it becomes a problem if some of the functions are marked as staticmethods. Then they look completely the same when looking at them from a classes perspective: | >>> class C: | ... normal = lambda x: None | ... static = staticmethod(lambda x: None) | ... | >>> type(C.normal) is type(C.static) | True | >>> C.normal |at 0x4da150> As far as I can see a documentation tool has no chance to keep them apart even though they are completely different on an instance: | >>> type(C().normal) is type(C().static) | False | >>> C().normal |of <__main__.C object at 0x4dbcf0>> | >>> C().static |at 0x4da198> While I don't knwo about the method thing, I think an automatic super should at least be implementable from within Python. I could imagine that by adding __class__ and __self__ to scopes automatically a lot of that magic could be removed. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the new super()
Hi, Guido van Rossum python.org> writes: > The staticmethod thing isn't new; that's also the case in 2.x. staticmethod hasn't changed, method has. In the past Class.method gave you a unbound method, now you get a function back as if it was a static method. > The super() thing is a case of practicality beats purity. Note that > you pay a small but measurable cost for the implicit __class__ (it's > implemented as a "cell variable", the same mechanism used for nested > scopes) so we wouldn't want to introduce it unless it is used. I do agree that super() is a lot easier to work with than regular way to call it. But the fact that it breaks if i do `_super = super` or that it's impossible to emulate it from within Python. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] urllib unicode handling
Hi, Jeroen Ruigrok van der Werven in-nomine.org> writes: > Would people object if such functionality got added to urllib? I would ;-) There are IRIs, just that nobody wrote a useful module for that. There are algorithms in the RFC that can convert URIs to IRIs and the other way round. IMO that's the way to go. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ABC issues
Hi, Guido van Rossum python.org> writes: > There's no need to register as Sized -- the Sized ABC recognizes > classes that define __len__ automatically. The Container class does > the same looking for __contains__. Since the deque class doesn't > implement __contains__, it is not considered a Container -- correctly > IMO. True. deque doesn't implement __contains__. However "in" still works because of the __iter__ fallback. So from the API's perspective it's still compatible, even though it doesn't implement it. The same probably affects old style iterators (__getitem__ with index). One could argue that they are still iterable or containers, but that's harder to check so probably not worth the effort. > >> Another issue is that builtin types don't accept ABCs currently. For > >> example > >> set() | SomeSet() gives a TypeError, SomeSet() | set() however works. > > > > Pandora's Box -- sure you want to open it? > > In 3.0 I'd like to; this was my original intent. In 2.6 I think it's > not worth the complexity, though I won't complain. I would love to help on that as I'm very interested in that feature. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Iterable String Redux (aka String ABC)
Hi, Strings are currently iterable and it was stated multiple times that this is a good idea and shouldn't change. While I still don't think that that's a good idea I would like to propose a solution for the problem many people are experiencing by introducing an abstract base class for strings. Basically *the* problematic situation with iterable strings is something like a `flatten` function that flattens out every iterable object except of strings. Imagine it's implemented in a way similar to that:: def flatten(iterable): for item in iterable: try: if isinstance(item, basestring): raise TypeError() iterator = iter(item) except TypeError: yield item else: for i in flatten(iterator): yield i A problem comes up as soon as user defined strings (such as UserString) is passed to the function. In my opinion a good solution would be a "String" ABC one could test against. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Iterable String Redux (aka String ABC)
Hi, Georg Brandl gmx.net> writes: > I'd argue that "find" is more primitive than "split" -- split is intuitively > implemented using find and slicing, but implementing find using split and > len is unintuitive. (Of course, "index" can be used instead of "find".) It surely is, but it would probably make sense to require both. Maybe have something like this: class SymbolSequence(Sequence) class String(SymbolSequence) String would be the base of str/unicode and CharacterSequence of str/bytes. A SymbolSequence is basically a sequence based on one type of symbols that implements slicing, getting symbols by index, count() and index(). A String is basically everything a str/unicode provides as method except of those which depend on informatio based on the symbol. For example upper() / isupper() etc would go. Additionally I guess it makes sense to get rid of encode() / decode() / format(). Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Iterable String Redux (aka String ABC)
Greg Ewing canterbury.ac.nz> writes: > Well, I'm skeptical about the whole ABC thing in the > first place -- it all seems very unpythonic to me. I think it's very pythonic and the very best solution to interfaces *and* duck typing. Not only does it extend duck-typing in a very, very cool way but also does it provide a very cool way to get custom sets or lists going with few extra work. Subclassing builtins was always very painful in the past and many used the User* objects which however often broke because some code did something like isinstance(x, (tuple, list)). Of course one could argue that instance checking is the root of all evil but there are situations where you have to do instance checking. And ABCs are the perfect solution for that as they combine duck-typing and instance checking. In my oppinion ABCs are the best feature of 2.6 and 3.0. > But another way of thinking about it is that we > already have an ABC of sorts for strings, and it's > called basestring. It might be better to enhance > that with whatever's considered missing than > introducing another one. basestring is not subclassable for example. Also it requires subclassing which ABCs do not. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pep8ity __future__
Hi, That's just a flaming-sword thread but I want to mention it nonetheless :-) Basically I propose getting rid of __future__._Feature.getMandatoryRelease() in favour of __future__._Feature.mandatory. That's backwards compatibile and much more pythonic. Getters/Setters are considered unpythonic and the getting rid of all that Java-like naming sounds like a good idea to me :) If threading/processing gets a pep8ified API with 2.6, why not __future__? Proposed patch: http://paste.pocoo.org/show/64512/ Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Proposal: add odict to collections
Hi, I noticed lately that quite a few projects are implementing their own subclasses of `dict` that retain the order of the key/value pairs. However half of the implementations I came across are not implementing the whole dict interface which leads to weird bugs, also the performance of a Python implementation is not that great. To fight that problem I want to proposed a new class in "collections" called odict which is a dict that keeps the items sorted, similar to a PHP array. The interface would be fully compatible with dict and implemented as dict subclass. Updates to existing keys does not change the order of a key but new keys are inserted at the end. Additionally it would support slicing where a list of key, value tuples is returned and sort/reverse/index methods that work like their list equivalents. Index based lookup could work via odict.byindex(). An implementation of that exists as part of the ordereddict implementation which however goes beyond that and is pretty much a fork of the python dict[1]. Some reasons why ordered dicts are a useful feature: - in XML/HTML processing it's often desired to keep the attributes of an tag ordered during processing. So that input ordering is the same as the output ordering. - Form data transmitted via HTTP is usually ordered by the position of the input/textarea/select field in the HTML document. That information is currently lost in most Python web applications / frameworks. - Eaiser transition of code from Ruby/PHP which have sorted associative arrays / hashmaps. - Having an ordered dict in the standard library would allow other libraries support them. For example a PHP serializer could return odicts rather then dicts which drops the ordering information. XML libraries such as etree could add support for it when creating elements or return attribute dicts. Regards, Armin [1]: http://www.xs4all.nl/~anthon/Python/ordereddict/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
Guido van Rossum python.org> writes: > > On Sat, Jun 14, 2008 at 4:57 PM, André Malo perlig.de> wrote: > > I find this collection of cases pretty weak as an argument for implementing > > that in the stdlib. A lot of special purpose types would fit into such > > reasoning, but do you want to have all of them maintained here? > > No, but an ordered dict happens to be a *very* common thing to need, > for a variety of reasons. So I'm +0.5 on adding this to the > collections module. However someone needs to contribute working code. > It would also be useful to verify that it actually fulfills the needs > of some actual use case. Perhaps looking at how Django uses its > version would be helpful. I compared multiple ordered dicts now (including Babel, Django and the C-implementation I mentioned earlier) and implemented a python version of the ordered dict as reference implementation: http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
BJörn Lindqvist gmail.com> writes: > I think that the name ordereddict would be more readable though, and > match the existing defaultdict class in the collections module. While I agree that "ordered" makes more sense, it's pretty stupid to type with two 'd's right after another. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
Hasan Diwan gmail.com> writes: > > 2008/6/14 Talin acm.org>: > > There's been a lot of controversy/confusion about ordered dicts. One of the > > sources of confusion is that people mean different things when they use the > > term "ordered dict": In some cases, the term is used to mean a dictionary > > that remembers the order of insertions, and in other cases it is used to > > mean a sorted dict, i.e. an associative data structure in which the entries > > are kept sorted. (And I'm not sure that those are the only two > > possibilities.) > > Have the comparison function passed in as a parameter then, if it's > None, then have it maintain the order of insertion? Something like: > def __init__(self, cmpfunc = None): >self.dict = dict() > > I think that would be contraproductive and would make the constructor incompatible with the normal dict constructor which accepts keyword arguments too. Also that dict is just in order, but not sorted by something. You could still do something like this: d = odict() d['Pinguin'] = 'telly' d['Parrot'] = 'cage' d['Mouse'] = 'Napoleon' d.sort(key=lambda x: x[1].lower()) That of course would not sort items inserted after the sort call. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
Raymond Hettinger rcn.com> writes: > For an insertion order dictionary, there was also a question about > how to handle duplicate keys. > > Given odict([(k1,v1), (k2,v2), (k1,v3)]), what should the odict.items() > return? > >[(k1,v3), (k2,v2)] >[(k2,v2), (k1,v3)] All the ordered dict implementations I saw behave like this: >>> odict([(1, 'foo'), (2, 'bar'), (1, 'baz')]).items() [(1, 'baz'), (2, 'bar')] And that makes sense because it's consistent with the dict interface. But I guess that is a pretty small issue and most of the time you are not dealing with double keys when using an ordered dict. > IIRC, previous discussions showed an interest in odicts but that > there were a lot of questions of the specific semantics of the API. No doubt there. But > This would no doubt be compounded by attempts to emulate > dict views. Regardless, there should probably be a PEP and > alternate pure python versions should be posted on ASPN so people > can try them out. That's true, but by now there are countless of ordered dict implementations with a mostly-compatible interface and applications and libraries are using them already. I have an example implementation here that incorporates the ideas from ordereddict, Django's OrderedDict and Babel's odict: http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
Steven D'Aprano pearwood.info> writes: > Conceptually, I would expect the following behaviour: > > >>> od = odict() > >>> od[1] = 'spam' # insert a new key > >>> od[2] = 'parrot' # insert a new key > >>> od[1] = 'ham' # modify existing key > >>> od.items() > [(1, 'ham'), (2, 'parrot')] That behavior is different to any ordered-dict implementation out there ;-) Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
Hi, Alexander Schremmer <2008a usenet.alexanderweb.de> writes: > Even worse, most of them are slow, i.e. show a wrong algorithmic > complexity ... I already said that in the initial post. > > I have an example implementation here that incorporates the ideas > > from ordereddict, Django's OrderedDict and Babel's odict: > > > > http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py > > ... like your implementation. It is not too hard to get the delitem > O(log n) compared to your O(n), see here: That implementation is intended as example implementation for a possible API not one you should actually use. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
There are far more responses for that topic than I imagined so I would love to write a PEP about that topic, incorporating the ideas/questions and suggestions discussed here. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
Martin v. Löwis v.loewis.de> writes: > > > I compared multiple ordered dicts now (including Babel, Django and the > > C-implementation I mentioned earlier) and implemented a python version > > of the ordered dict as reference implementation: > > > >http://dev.pocoo.org/hg/sandbox/raw-file/tip/odict.py > > I find the slicing API surprising. IMO, if you do support slicing, then > a) the start and end indices should be the same ones that you also use >in regular indexing. > b) the result should be of the same kind as the thing being sliced, i.e. >an odict. > > So I would rather expect > > >>> d['c':'spam'] > odict.odict([('c', 'd'), ('foo', 'bar')]) > > The slicing operation that you provide should be spelled as > d.items()[1:3], or, if you don't want to pay the cost of creating > the full items list, then add a method such as d.slice_by_index(1,3). > What's the use case for this operation, anyway? The use case in my particular case is a ordered dict for messages of a .po file I want to display page-wise in an application. However I agree that it's not useful most of the time so dropping it makes sense. If an application really needs slicing it can still subclass it and implement that. Furthermore with dict-views in Python 3 it would be possible to implement an efficient slicing operation on the dict view returned. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal: add odict to collections
Armin Ronacher active-4.com> writes: > > There are far more responses for that topic than I imagined so I would love > to write a PEP about that topic, incorporating the ideas/questions and > suggestions discussed here. There is now a PEP for the ordered dict: - PEP: http://www.python.org/dev/peps/pep-0372/ - Thread on comp.lang.python: http://thread.gmane.org/gmane.comp.python.general/577894 Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] UCS2/UCS4 default
Guido van Rossum python.org> writes: > The one thing that may be missing from Python is things like > interpretation of surrogates by functions like isalpha() and I'm okay > with adding that (since those have to loop over the entire string > anyway). That and methods to safely iterate and slice strings by codepoint. Java supports that via String.codePointCount / String.codePointAt / String.codePointBefore / String.offsetByCodepoints. Maybe not on the unicode/str object itself but as part of unicodedata that would make sense for applications that have to deal with unicode on that level. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Confusing listreverseiterator Behavior
Hi, I stumbled upon a confusing listreverseiterator behavior: >>> l = [1, 2, 3, 4] >>> i = iter(l) >>> ri = reversed(l) >>> len(i) Traceback (most recent call last): File "", line 1, in TypeError: object of type 'listiterator' has no len() >>> len(ri) 4 >>> ri.next() 4 >>> len(ri) 3 This is the only reverse iterator with that sort of behavior. Is that intentional if yes, why? I stumbled upon that when writing a templating engine that tried to lazily get the length of the sequence / iterator but failed doing so after the first iteration because the length of the reverse iterator changes every iteration. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Confusing listreverseiterator Behavior
Jeff Hall gmail.com> writes: > reversed( > seq) > Return a reverse iterator. seq must be an object which > supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at > 0). New in version 2.4. the above appears to only be true for lists. For > tuples and strings it creates a reverse OBJECT which behaves slightly > differently (notably by not including a len() method as you noticed) > I can't find how to actually create a "tuplereverseiterator" or > "stringreverseiterator" objects... nor does there appear to be a way to > create a "reversed" object from a list... That's an implementation detail what exactly the reverse iterator is. The same applies to iter() calls. iter("foo") for example returns a iterator type, iter([]) returns a list iterator. The thing you quoted above is the requirement for the object that you pass to reverse(), not the object returned which is some kind of iterator that happens to iterate over the sequence in reverse. The problem I personally have with it is that the listreverseiterator is the only iterator in the standard library that changes the length during iteration and that it's the only reverse iterator that has a length. Even more stunning as the normal iterator doesn't have one. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we still support MacOS < X?
Hi, Benjamin Peterson gmail.com> writes: > On Sat, Sep 13, 2008 at 4:49 AM, Georg Brandl gmx.net> wrote: > > If not, I'll remove the traces from the docs, where they only serve to > > confuse where MacOS X actually belongs under "Unix", not "Mac". > > Yes, according to PEP 11, support was removed in 2.4. I suppose this > also means we could killed macpath in py3k... Pleaes don't do that. The OS 9 path handling is still present in OS X GUI applications. While you don't see it that often because OS X doesn't know the location bars but you can see it in iTunes for example. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Weak Dictionary Iteration Behavior in Python 3
Hi everybody, In Python 2.x when iterating over a weak key dictionary for example, the common idom for doing that was calling dictionary.keys() to ensure that a list of all objects is returned it was safe to iterate over as a weak reference could stop existing during dict iteration which of course raises a runtime error by the dict iterator. This was documented behavior and worked pretty well, with the small problem that suddenly all references in the dict wouldn't die until iteration is over because the list holds references to the object. This no longer works in Python 3 because .keys() on the weak key dictionary returns a generator over the key view of the internal dict which of course has the same problem as iterkeys in Python 2.x. The following code shows the problem:: from weakref import WeakKeyDictionary f1 = Foo() f2 = Foo() d = WeakKeyDictionary() d[f1] = 42 d[f2] = 23 i = iter(d.keys()) # or use d.keyrefs() here which has the same problem print(next(i)) del f2 print(next(i)) This example essentially dies with "RuntimeError: dictionary changed size during iteration" as soon as f2 is deleted. Iterating over weak key dictionaries might not be the most common task but I know some situations where this is necessary. Unfortunately I can't see a way to achieve that in Python 3. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Weak Dictionary Iteration Behavior in Python 3
Hi, Josiah Carlson gmail.com> writes: > i = list(d.keys()) Obviously that doesn't solve the problem. list() consumes the generator one after another, objects can still die when the list is created. Imagine the following example which uses threads:: from time import sleep from weakref import WeakKeyDictionary from threading import Thread class Foo(object): pass d = WeakKeyDictionary() l = [] for x in range(10): obj = Foo() d[obj] = None l.append(obj) del obj def iterater(): for item in list(d.keys()): pass Thread(target=iterater).start() while True: del l[0] Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Weak Dictionary Iteration Behavior in Python 3
Hi, Adam Olsen gmail.com> writes: > IMO, this is a deeper problem than suggested. As far as I know, > python does not (and should not) make promises as to when it'll > collect object. We should expect weakrefs to be cleared at random > points, and code defensively. It doesn't promise when objects are collected and that's not the problem here. The main problem is that the old solution for weakrefs relayed on the fact that .keys() was considered atomic. I don't say it has to become again, but the weak dictionaries have to somehow counter that problem. They could for example only remove items from the internal dict if no dict view to that dict is alive. Speaking of atom keys() / values() / items() operations: I guess we will see more of those problems in threaded situations when people start to convert code over to Python. I've seen quite a few situations where code relays on keys() holding the interpreter lock. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Hi, Just some extra thoughts about the whole topic in the light of web applications (since this was hinted in the talk) running on Python: Yes, you can limit the number of maximum allowed parameters for post data but really there are so many places where data is parsed into hashing containers that it's quite a worthless task. Here a very brief list of things usually parsed into a dict or set and where it happens: - URL parameters and url encoded form data Generally this happens somewhere in a framework but typically also in utility libraries that deal with URLs. For instance the stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do just that and that code is used left and right. Even if a framework would start limiting it's own URL parsing there is still a lot of code that does not do that the stdlib does that as well. With form data it's worse because you have multipart headers that need parsing and that is usually abstracted away so far from the user that they do not do that. Many frameworks just use the cgi module's parsing functions which also just directly feed into a dictionary. - HTTP headers. There is zero a WSGI framework can do about that since the headers are parsed into a dictionary by the WSGI server. - Incoming JSON data. Again outside of what the framework can do for the most part. simplejson can be modified to stop parsing with the hook stuff but nobody does that and since users invoke simplejson's parsing routines themselves most webapps would still be vulnerable even if all frameworks would fix the problem. - Hidden dict parameters. Things like the parameter part of content-type or the content-disposition headers are generally also just parsed into a dictionary. Likewise many frameworks parse things into set headers (for instance incoming etags). The cookie header is usually parsed into a dictionary as well. The issue is nothing new and at least my current POV on this topic was that your server should be guarded and shoot handlers of requests going rogue. Dictionaries are not the only thing that has a worst case performance that could be triggered by user input. That said. Considering that there are so many different places where things are probably close to arbitrarily long that is parsed into a dictionary or other hashing structure it's hard for a web application developer or framework to protect itself against. In case the watchdog is not a viable solution as I had assumed it was, I think it's more reasonable to indeed consider adding a flag to Python that allows randomization of hashes optionally before startup. However as it was said earlier, the attack is a lot more complex to carry out on a 64bit environment that it's probably (as it stands right now!) safe to ignore. The main problem there however is not that it's a new attack but that some dickheads could now make prebaked attacks against websites to disrupt them that might cause some negative publicity. In general though there are so many more ways to DDOS a website than this that I would rate the whole issue very low. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to implement - Breaks unittests most likely (those were broken in the first place but that's still a very annoying change to make) - Might cause problems with interoperability of Pythons compiled with different hash salts - You're not really solving the problem because each linux distribution (besides Gentoo I guess) would have just one salt compiled in and that would be popular enough to have the same issue. - Option b: Environment variable for the salt + Easy-ish to implement + Easy to synchronize over different machines - initialization for base types happens early and unpredictive which makes it hard for embedded Python interpreters (think mod_wsgi and other things) to specify the salt - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do Where to add the salt to? Unicode strings and bytestrings (byte objects) I guess since those are the most common offenders. Sometimes tuples are keys of dictionaries but in that case a contributing factor to the hash is the string in the tuple anyways. Also related: since this is a security related issue, would this be something that goes into Python 2? Does that affect how a fix would look like? Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3. You can read the PEP online: http://www.python.org/dev/peps/pep-0414/ This is a followup to the discussion about this topic here on the mailinglist and on twitter/IRC over the last few weeks. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414
Hi, On 2/26/12 12:34 PM, Ned Batchelder wrote: > There are already __future__ imports that violate this principle: from > __future__ import division. That doesn't mean I'm in favor of this new > __future__, just keeping a wide angle on the viewfinder. That's actually mentioned in the PEP :-) > A quick poll on Twitter about the use of the division future import > supported my suspicions that people opt out of behaviour-changing > future imports because they are a maintenance burden. Every time you > review code you have to check the top of the file to see if the > behaviour was changed. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/26/12 12:35 PM, Serhiy Storchaka wrote: > Some microbenchmarks: > > $ python -m timeit -n 1 -r 100 -s "x = 123" "'foobarbaz_%d' % x" > 1 loops, best of 100: 1.24 usec per loop > $ python -m timeit -n 1 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" > 1 loops, best of 100: 1.59 usec per loop > $ python -m timeit -n 1 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" > 1 loops, best of 100: 1.58 usec per loop > $ python -m timeit -n 1 -r 100 -s "x = 123; n = lambda s: s" "n('foobarbaz_%d') % x" > 1 loops, best of 100: 1.41 usec per loop > $ python -m timeit -n 1 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s % x" > 1 loops, best of 100: 1.22 usec per loop > > There are no significant overhead to use converters. That's because what you're benchmarking here more than anything is the overhead of eval() :-) See the benchmark linked in the PEP for one that measures the actual performance of the string literal / wrapper. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414
Hi, On 2/26/12 12:42 PM, Vinay Sajip wrote: > When this came up earlier (when I think Chris McDonough raised it) the issue > of > what to do on 3.2 came up, and though it has been addressed somewhat in the > PEP, > it would be nice to see the suggested on-installation hook fleshed out a > little > more. I wanted to do that but the tokenizer module is quite ugly to customize in order to allow "u" prefixes to strings which is why I postponed that. It would work similar to how 2to3 is invoked however. In case this PEP gets approved I will refactor the tokenize module while adding support for "u" prefixes and use that as the basis for a installation hook for older Python 3 versions. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/26/12 5:45 PM, Antoine Pitrou wrote: >> The automatic upgrading of binary strings to unicode strings that >> would be enabled by this proposal would make it much easier to port >> such libraries over. > > What "automatic upgrading" is that talking about? The word "upgrade" is probably something that should be changed. It refers to the fact that 'foo' is a bytestring in 2.x and the same syntax means a unicode string in Python 3. This is exactly what is necessary for interfaces that were promoted to unicode interfaces in Python 3 (for instance Python identifiers, URLs etc.) > Are you talking about urllib.parse perhaps? Not only the parsing module. Headers on the urllib.request module are unicode as well. What the PEP is referring to is the urllib/urlparse and cgi module which was largely consolidated to the urllib package in Python 3. > What does "leveraging a native string" mean here? It means by using a native string to achieve the automatic upgrading which "does the right thing" in a lot of situations. > I'm confused. This PEP talks about unicode literals, not native string > literals, so why would these APIs "directly benefit from this support"? The native string literal already exists. It disappears if `unicode_literals` are future imported which is why this is relevant since the unicode literals future import in 2.x is recommended by some for making libraries run in both 2.x and 3.x. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414
Hi, On 2/27/12 1:55 AM, Terry Reedy wrote: > I presume such a hook would simply remove 'u' prefixes and would run > *much* faster than 2to3. If such a hook is satisfactory for 3.2, why > would it not be satisfactory for 3.3? Agile development and unittests. An installation hook means that you need to install the package before running the tests. Which is fine for CI but horrible during development. "python3 run-tests.py" beats "make venv; install library; run testsuite" anytime in terms of development speed. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 10:17 AM, "Martin v. Löwis" wrote: > There are a few other unproven performance claims in the PEP. Can you > kindly provide the benchmarks you have been using? In particular, I'm > interested in the claim " In many cases 2to3 runs one or two orders of > magnitude slower than the testsuite for the library or application it's > testing." The benchmarks used are linked in the PEP. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 4:44 PM, mar...@v.loewis.de wrote: > Maybe I'm missing something, but there doesn't seem to be a benchmark > that measures the 2to3 performance, supporting the claim that it > runs "two orders of magnitude" slower (which I'd interpret as a > factor of 100). My Jinja2+Werkzeug's testsuite combined takes 2 seconds to run (Werkzeug actually takes 3 because it pauses for two seconds in a cache expiration test). 2to3 takes 45 seconds to run. And those are small code bases (15K lines combined). It's not exactly two orders of magnitude so I will probably change the writing to "just" 20 times slower but it illustrates the point. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 9:36 PM, Antoine Pitrou wrote: > You don't want to be 3.2-compatible? See the PEP. It shows how it would still be 3.2 compatible at installation time due to an installation hook that would be provided. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 9:47 PM, Serhiy Storchaka wrote: > And not for code intended for both Python 2 and Python 3.0-3.2. Even then since you can use the installation time hook to strip off the 'u' prefixes. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 9:54 PM, Terry Reedy wrote: > Before we make this change, I would like to know if this is Armin's last > proposal to revert Python 3 toward Python 2 or merely the first in a > series. I question this because last December Armin wrote You're saying as if providing a sane upgrade path was a bad thing. That said, if I had other proposals I would have submitted them *now* since waiting for another Python version to go by would not be helpful. I only have myself to blame for providing that PEP now instead of earlier which would have been a lot more useful. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 9:58 PM, R. David Murray wrote: > But the PEP doesn't address the unicode_literals plus str() approach. > That is, the rationale currently makes a false claim. Which would be exactly what that u() does not do? Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 10:29 PM, Barry Warsaw wrote: > I still urge the PEP author to clean up the PEP and specifically address the > issues brought up in this thread. That will be useful for the historical > record. That is a given. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414
Hi, On 2/27/12 10:18 PM, Terry Reedy wrote: > I would like to know if you think that this one change is enough to do > agile development and testing, etc, or whether, as Chris McDonough > hopes, this is just the first of a series of proposals you have planned. Indeed I have three other PEPs in the work. The reintroduction of "except (((ExceptionType),),)", the "<>" comparision operator and the removal of "nonlocal", the latter to make Python 2.x developers feel better about themselves. :-) Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/28/12 12:16 AM, mar...@v.loewis.de wrote: > Armin, I propose that you correct the *factual* deficits of the PEP > (i.e. remove all claims that cannot be supported by facts, or are otherwise > incorrect or misleading). Many readers here would be more open to accepting > the PEP if it was factual rather than polemic. Please don't call this PEP polemic. > The PEP author is supposed to collect all arguments, even the ones he > doesn't agree with, and refute them. I brought up all the arguments that were I knew about before I submitted this mailinglist thread and I had since not updated it. > In this specific issue, the PEP states "the unicode_literals import the > native string type is no longer available and has to be incorrectly > labeled as bytestring" > > This is incorrect: even though the native string type indeed is no longer > available, it is *not* consequential that it has to be labeled as byte > string. Instead, you can use the str() function. Obviously it means not available by syntax. > It may be that you don't like that solution for some reason. If so, please > mention the approach in the PEP, along with your reason for not liking it. If by str() you mean using "str('x')" as replacement for 'x' in both 2.x and 3.x with __future__ imports as a replacement for native string literals, please mention why this is better than u(), s(), n() etc. It would be equally slow than a custom wrapper function and it would not support non-ascii characters. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - Unicode Literals for Python 3
Hi, On 2/27/12 11:54 PM, Steven D'Aprano wrote: > That would be one order of magnitude. I am aware of that :-) Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414
Hi, On 2/29/12 12:30 PM, Yury Selivanov wrote: > I see you've (or somebody) changed: Yes, I reworded that. > Could you just remove the statement completely? I will let Nick handle the PEP wording. > I don't think that PEPs are the right place to put such polemic > and biased statements. Why call it polemic? If you want to use ubuntu LTS you're forcing yourself to stick to a particular Python version for a longer time. Which means you don't want to have to adjust your code. Which again means that you're better of with the Python 2.x ecosystem which is proven, does not change nearly as quickly as the Python 3 one (hopefully) so if you have the choice between those two you would chose 2.x over 3.x. That's what this sentence is supposed to say. That's not polemic, that's just a fact. > Nobody asked you to express your *personal* feelings and thoughts > about applicability or state of python3 in the PEP. That is not a personal-feeling-PEP. If people would be 100% happy with Python 3 we would not have these discussions, would we. Why is it that I'm getting "attacked" on this mailinglist for writing this PEP, or the wording etc. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414
Hi, On 3/1/12 10:38 PM, Yury Selivanov wrote: > Sorry if I sounded like 'attacking' you. I certainly had no such > intention, as I believe nobody on this list. Sorry if I sound cranky but I got that impression from the responses here (which are greatly different from the responses I got on other communication channels and by peers). You were just the unlucky mail I responded to :-) > But if you'd just stuck to the point, without touching very > controversial topics of what version of python is a good choice > and what is a bad, with full review of all porting scenarios with > well-thought set of benchmarks, nobody would ever call your PEP > "polemic". I tried my best but obviously it was not good enough to please everybody. In all honesty I did not expect that such a small change would spawn such a great discussion. After all what we're discussing here is the introduction of one letter to literals :-) Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 updated
Hi, It should also be added that the Python 3.3 alpha will release with support: Python 3.3.0a0 (default:042e7481c7b4, Mar 4 2012, 12:37:26) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> u"Hello" + ' World!' 'Hello World!' Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 updated
Hi, On 3/4/12 2:01 PM, Nick Coghlan wrote: > Nice :) > > Do you have any more updates left to do? I saw the change, the tests, > the docs and the tokenizer updates go by on python-checkins, so if > you're done we can mark the PEP as Final (at which point the inclusion > in the first alpha is implied). Docs just have a minor notice regarding the reintroduced support for 'u' prefixes, someone might want to add more to it. Especially regarding the intended use for them. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Install Hook [Was: Re: PEP 414 updated]
Hi, Jut to reiterate what I wrote on IRC: Please do not write or advocate for import hooks, especially not for porting purposes. It would either mean that people start adding that hook on their own to the code (and that awfully reminds me of the days of 'require "rubygems"' in the Ruby world) or that the __init__.py has to do that and that's a non trivial thing. The hook on install time works perfectly fine and the only situation where it might not work is when you're trying to use Python 3.2 for development and also support down to 2.x by using the newly introduced u-prefixes. In this case I would recommend using Python 3.3 for development and running the testsuite periodically from Python 3.2 after installating the library (into a virtualenv for instance). The current work in progress install time hook can be found here: https://github.com/mitsuhiko/unicode-literals-pep/tree/master/install-hook Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Install Hook [Was: Re: PEP 414 updated]
Hi, On 3/4/12 4:44 PM, Guido van Rossum wrote: > I'd love a pointer to the rubygems debacle... Setuptools worked because Python had .pth files for a long, long time. When the Ruby world started moving packages into nonstandard locations (GameName/) something needed to activate that import machinery hack. For a while all Ruby projects had the line "require 'rubygems'" somewhere in the project. Some libraries even shipped that line to bootstrap rubygems. I think an article about that should be found here: http://tomayko.com/writings/require-rubygems-antipattern But since the page errors out currently I don't know if that is the one I'm referring to. Considering such an import hook has to run over all imports because it would not know which to rewrite and which not I think it would be equally problematic, especially if libraries would magically activate that hook. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Install Hook [Was: Re: PEP 414 updated]
Hi, On 3/4/12 9:00 PM, Vinay Sajip wrote: > I realise that the implementation is different, using tokenize rather than > lib2to3, but in terms of its effect on the transformed code, what are the > differences between this hook and running 2to3 with just the fix_unicode > fixer? I would hope they both have the same effect. Namely stripping the 'u' prefix in all variations. Why did I go with the tokenize approach? Because I never even considered a 2to3 solution. Part of the reason why I wrote this PEP was that 2to3 is so awfully slow and I was assuming that this would be largely based on the initial parsing step and not the fixers themselves. Why did I not time it with just the unicode fixer? Because if you look at how simple the tokenize version is you can see that this one did not take me more than a good minute and maybe 10 more for the distutils hooking. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - some numbers from the Django port
Hi, On 3/3/12 2:28 AM, Vinay Sajip wrote: > So, looking at a large project in a relevant problem domain, unicode_literals > and native string markers would appear not to adversely impact readability or > performance. What are you trying to argue? That the overall Django testsuite does not do a lot of string processing, less processing with native strings? I'm surprised you see a difference at all over the whole Django testsuite and I wonder why you get a slowdown at all for the ported Django on 2.7. Regards, Armin` ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Add os.path.resolve to simplify the use of os.readlink
Due to an user error on my part I was not using os.readlink correctly. Since links can be relative to their location I think it would make sense to provide an os.path.resolve helper that automatically returns the absolute path: def resolve(filename): try: target = os.readlink(filename) except OSError as e: if e.errno == errno.EINVAL: return abspath(filename) raise return normpath(join(dirname(filename), target)) The above implementation also does not fail if an entity exists but is not a link and just returns the absolute path of the given filename in that case. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add os.path.resolve to simplify the use of os.readlink
Hi, > Am 21.06.2012 12:23, schrieb Armin Ronacher: > Does the code handle a chain of absolute and relative symlinks > correctly, for example a relative symlink that points to another > relative symlink in a different directory that points to a file in a > third directry? No, but that's a good point. It should attempt to resolve these in a loop until it either loops too often (would have to check the POSIX spec for a reasonable value) or until it terminates by finding an actual file or directory. Regards, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com