Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Fri, Jun 27, 2014 at 03:07:46AM +0300, Paul Sokolovsky wrote: > With my MicroPython hat on, os.scandir() would make things only worse. > With current interface, one can either have inefficient implementation > (like CPython chose) or efficient implementation (like MicroPython > chose) - all transparently. os.scandir() supposedly opens up efficient > implementation for everyone, but at the price of bloating API and > introducing heavy-weight objects to wrap info. os.scandir is not part of the Python API, it is not a built-in function. It is part of the CPython standard library. That means (in my opinion) that there is an expectation that other Pythons should provide it, but not an absolute requirement. Especially for the os module, which by definition is platform-specific. In my opinion that means you have four options: 1. provide os.scandir, with exactly the same semantics as on CPython; 2. provide os.scandir, but change its semantics to be more lightweight (e.g. return an ordinary tuple, as you already suggest); 3. don't provide os.scandir at all; or 4. do something different depending on whether the platform is Linux or an embedded system. I would consider any of those acceptable for a library feature, but not for a language feature. [...] > But reusing os.stat struct is glaringly not what's proposed. And > it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(), > but requires no system calls on Windows". Nice, but OS "FooBar" can do > much more than Windows - it has a system call to send a file by email, > right when scanning a directory containing it. So, why not to have > DirEntry.send_by_email(recipient) method? I hear the answer - it's > because CPython strives to support Windows well, while doesn't care > about "FooBar" OS. Correct. If there is sufficient demand for FooBar, then CPython may support it. Until then, FooBarPython can support it, and offer whatever platform-specific features are needed within its standard library. > And then it again leads to the question I posed several times - where's > line between "CPython" and "Python"? Is it grounded for CPython to add > (or remove) to Python stdlib something which is useful for its users, > but useless or complicating for other Python implementations? I think so. And other implementations are free to do the same thing. Of course there is an expectation that the standard library of most implementations will be broadly similar, but not that they will be identical. I am surprised that both Jython and IronPython offer an non-functioning dis module: you can import it successfully, but if there's a way to actually use it, I haven't found it: steve@orac:~$ jython Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19) [OpenJDK Server VM (Sun Microsystems Inc.)] on java1.6.0_27 Type "help", "copyright", "credits" or "license" for more information. >>> import dis >>> dis.dis(lambda x: x+1) Traceback (most recent call last): File "", line 1, in File "/usr/share/jython/Lib/dis.py", line 42, in dis disassemble(x) File "/usr/share/jython/Lib/dis.py", line 64, in disassemble linestarts = dict(findlinestarts(co)) File "/usr/share/jython/Lib/dis.py", line 183, in findlinestarts byte_increments = [ord(c) for c in code.co_lnotab[0::2]] AttributeError: 'tablecode' object has no attribute 'co_lnotab' IronPython gives a different exception: steve@orac:~$ ipy IronPython 2.6 Beta 2 DEBUG (2.6.0.20) on .NET 2.0.50727.1433 Type "help", "copyright", "credits" or "license" for more information. >>> import dis >>> dis.dis(lambda x: x+1) Traceback (most recent call last): TypeError: don't know how to disassemble code objects It's quite annoying, I would have rather that they just removed the module altogether. Better still would have been to disassemble code objects to whatever byte code the Java and .Net platforms use. But there's surely no requirement to disassemble to CPython byte code! -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Thu, Jun 26, 2014 at 09:37:50PM -0400, Ben Hoyt wrote: > I don't mind iterdir() and would take it :-), but I'll just say why I > chose the name scandir() -- though it wasn't my suggestion originally: > > iterdir() sounds like just an iterator version of listdir(), kinda > like keys() and iterkeys() in Python 2. Whereas in actual fact the > return values are quite different (DirEntry objects vs strings), and > so the name change reflects that difference a little. +1 I think that's a good objective reason to prefer scandir, which suits me, because my subjective opinion is that "iterdir" is an inelegant and less than attractive name. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
On Sat, Jun 28, 2014 at 03:55:00PM -0400, Ben Hoyt wrote: > Re is_dir etc being properties rather than methods: [...] > The problem with this is that properties "look free", they look just > like attribute access, so you wouldn't normally handle exceptions when > accessing them. But .lstat() and .is_dir() etc may do an OS call, so > if you're needing to be careful with error handling, you may want to > handle errors on them. Hence I think it's best practice to make them > functions(). I think this one could go either way. Methods look like they actually re-test the value each time you call it. I can easily see people not realising that the value is cached and writing code like this toy example: # Detect a file change. t = the_file.lstat().st_mtime while the_file.lstat().st_mtime == t: sleep(0.1) print("Changed!") I know that's not the best way to detect file changes, but I'm sure people will do something like that and not realise that the call to lstat is cached. Personally, I would prefer a property. If I forget to wrap a call in a try...except, it will fail hard and I will get an exception. But with a method call, the failure is silent and I keep getting the cached result. Speaking of caching, is there a way to freshen the cached values? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x
On Mon, Jul 07, 2014 at 04:52:17PM -0700, Ethan Furman wrote: > On 07/07/2014 04:49 PM, Benjamin Peterson wrote: > > > >Probably the best argument for the behavior is that "x is y" should > >imply "x == y", which preludes raising an exception. No such invariant > >is desired for ordering, so default implementations of < and > are not > >provided in Python 3. > > Nice. This bit should definitely make it into the doc patch if not already > in the docs. However, saying this should not preclude classes where this is not the case, e.g. IEEE-754 NANs. I would not like this wording (which otherwise is very nice) to be used in the future to force reflexivity on object equality. https://en.wikipedia.org/wiki/Reflexive_relation To try to cut off arguments: - Yes, it is fine to have the default implementation of __eq__ assume reflexivity. - Yes, it is fine for standard library containers (lists, dicts, etc.) to assume reflexivity of their items. - I'm fully aware that some people think the non-reflexivity of NANs is logically nonsensical and a mistake. I do not agree with them. - I'm not looking to change anything here, the current behaviour is fine, I just want to ensure that an otherwise admirable doc change does not get interpreted in the future in a way that prevents classes from defining __eq__ to be non-reflexive. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x - summary
On Tue, Jul 08, 2014 at 01:53:06AM +0200, Andreas Maier wrote: > Thanks to all who responded. > > In absence of class-specific equality test methods, the default > implementations revert to use the identity (=address) of the object as a > basis for the test, in both Python 2 and Python 3. Scrub out the "= address" part. Python does not require that objects even have an address, that is not part of the language definition. (If I simulate a Python interpreter in my head, what is the address of the objects?) CPython happens to use the address of objects as their identity, but that is an implementation-specific trick, not a language guarantee, and it is documented as such. Neither IronPython nor Jython use the address as ID. > In absence of specific ordering test methods, the default > implementations revert to use the identity (=address) of the object as a > basis for the test, in Python 2. I don't think that is correct. This is using Python 2.7: py> a = (1, 2) py> b = "Hello World!" py> id(a) < id(b) True py> a < b False And just to be sure that neither a nor b are controlling this: py> a.__lt__(b) NotImplemented py> b.__gt__(a) NotImplemented So the identity of the instances a and b are not used for < , although the identity of their types may be: py> id(type(a)) < id(type(b)) False Using the identity of the instances would be silly, since that would mean that sorting a list of mixed types would depend on the items' history, not their values. > In Python 3, an exception is raised in that case. I don't think the ordering methods are terribly relevant to the behaviour of equals. > The bottom line of the discussion seems to be that this behavior is > intentional, and a lot of code depends on it. > > We still need to figure out how to document this. Options could be: I'm not sure it needs to be documented other than to say that the default object.__eq__ compares by identity. Everything else is, in my opinion, over-thinking it. > 1. We define that the default for the value of an object is its > identity. That allows to describe the behavior of the equality test > without special casing such objects, but it does not work for ordering. Why does it need to work for ordering? Not all values define ordering relations. Unlike type and identity, "value" does not have a single concrete definition, it depends on the class designer. In the case of object, the value of an object instance is itself, i.e. its identity. I don't think we need more than that. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x
On Tue, Jul 08, 2014 at 02:59:30AM +0100, Rob Cliffe wrote: > >- "*Every object has an identity, a type and a value.*" > > Hm, is that *really* true? Yes. It's pretty much true by definition: objects are *defined* to have an identity, type and value, even if that value is abstract rather than concrete. > Every object has an identity and a type, sure. > Every *variable* has a value, which is an object (an instance of some > class). (I think? :-) ) I don't think so. Variables can be undefined, which means they don't have a value: py> del x py> print x Traceback (most recent call last): File "", line 1, in NameError: name 'x' is not defined > But ISTM that the notion of the value of an *object* exists more in our > minds than in Python. Pretty much. How could it be otherwise? Human beings define the semantics of objects, that is, their value, not Python. [...] > If I came across an int object and had no concept of what an integer > number was, how would I know what its "value" is supposed to be? You couldn't, any more than you would know what the value of a Watzit object was if you knew nothing about Watzits. The value of an object is intimitely tied to its semantics, what the object represents and what it is intended to be used for. In general, we can say nothing about the value of an object until we've read the documentation for the object. But we can be confident that the object has *some* value, otherwise what would be the point of it? In some cases, that value might be nothing more than it's identity, but that's okay. I think the problem we're having here is that some people are looking for a concrete definition of what the value of an object is, but there isn't one. [...] > And can the following *objects* (class instances) be said to have a > (obvious) value? > obj1 = object() > def obj2(): pass > obj3 = (x for x in range(3)) > obj4 = xrange(4) The value as understood by a human reader, as opposed to the value as assumed by Python, is not necessarily the same. As far as Python is concerned, the value of all four objects is the object itself, i.e. its identity. (For avoidance of doubt, not its id(), which is just a number.) A human reader could infer more than Python: - the second object is a "do nothing" function; - the third object is a lazy sequence (0, 1, 2); - the fourth object is a lazy sequence (0, 1, 2, 3); but since the class designer didn't deem it important enough, or practical enough, to implement an __eq__ method that takes those things into account, *for the purposes of equality* (but perhaps not other purposes) we say that the value is just the object itself, its identity. > And is there any sensible way of comparing two such similar objects, e.g. > obj3 = (x for x in range(3)) > obj3a = (x for x in range(3)) > except by id? In principle, one might peer into the two generators and note that they perform exactly the same computations on exactly the same input, and therefore should be deemed to have the same value. But since that's hard, and "exactly the same" is not always well-defined, Python doesn't try to be too clever and just uses a simpler idea: the value is the object itself. > Well, possibly in some cases. You might define two functions as equal > if their code objects are identical (I'm outside my competence here, so > please no-one correct me if I've got the technical detail wrong). But I > don't see how you can compare two generators (other than by id) except > by calling them both destructively (possibly an infinite number of > times, and hoping that neither has unpredictable behaviour, side > effects, etc.). Generator objects have code objects as well. py> x = (a for a in (1, 2)) py> x.gi_code at 0xb7ee39f8, file "", line 1> > >- "An object's /identity/ never changes once it has been created; > >The /value/ of some objects can change. Objects whose value can change > >are said to be /mutable/; objects whose value is unchangeable once > >they are created are called /immutable/." > > ISTM it needs to be explicitly documented for each class what the > "value" of an instance is intended to be. Why? What value (pun intended) is there in adding an explicit statement of value to every single class? "The value of a str is the str's sequence of characters." "The value of a list is the list's sequence of items." "The value of an int is the int's numeric value." "The value of a float is the float's numeric value, or in the case of INFs and NANs, that they are an INF or NAN." "The value of a complex number is the ordered pair of its real and imaginary components." "The value of a re MatchObject is the MatchObject itself." I don't see any benefit to forcing all classes to explicitly document this sort of thing. It's nearly always redundant and unnecessary. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.pyt
Re: [Python-Dev] == on object tests identity in 3.x
On Tue, Jul 08, 2014 at 04:53:50PM +0900, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > The reason NaN isn't equal to itself is because there are X bit > > patterns representing NaN, but an infinite number of possible > > non-numbers that could result from a calculation. > > I understand that. But you're missing at least two alternatives that > involve raising on some calculations involving NaN, as well as the > fact that forcing inequality of two NaNs produced by equivalent > calculations is arguably just as wrong as allowing equality of two > NaNs produced by the different calculations. I don't think so. Floating point == represents *numeric* equality, not (for example) equality in the sense of "All Men Are Created Equal". Not even numeric equality in the most general sense, but specifically in the sense of (approximately) real-valued numbers, so it's an extremely precise definition of "equal", not fuzzy in any way. In an early post, you suggested that NANs don't have a value, or that they have a value which is not a value. I don't think that's a good way to look at it. I think the obvious way to think of it is that NAN's value is Not A Number, exactly like it says on the box. Now, if something is not a number, obviously you cannot compare it numerically: "Considered as numbers, is the sound of rain on a tin roof numerically equal to the sight of a baby smiling?" Some might argue that the only valid answer to this question is "Mu", https://en.wikipedia.org/wiki/Mu_%28negative%29#.22Unasking.22_the_question but if we're forced to give a Yes/No True/False answer, then clearly False is the only sensible answer. No, Virginia, Santa Claus is not the same number as Santa Claus. To put it another way, if x is not a number, then x != y for all possible values of y -- including x. [Disclaimer: despite the name, IEEE-754 arguably does not intend NANs to be Not A Number in the sense that Santa Claus is not a number, but more like "it's some number, but it's impossible to tell which". However, despite that, the standard specifies behaviour which is best thought of in terms of as the Santa Claus model.] > That's where things get > fuzzy for me -- in Python I would expect that preserving invariants > would be more important than computational efficiency, but evidently > it's not. I'm not sure what you're referring to here. Is it that containers such as lists and dicts are permitted to optimize equality tests with identity tests for speed? py> NAN = float('NAN') py> a = [1, 2, NAN, 4] py> NAN in a # identity is checked before equality True py> any(x == NAN for x in a) False When this came up for discussion last time, the clear consensus was that this is reasonable behaviour. NANs and other such "weird" objects are too rare and too specialised for built-in classes to carry the burden of having to allow for them. If you want a "NAN-aware list", you can make one yourself. > I assume that I would have a better grasp on why Python > chose to go this way rather than that if I understood IEEE 754 better. See the answer by Stephen Canon here: http://stackoverflow.com/questions/1565164/ [quote] It is not possible to specify a fixed-size arithmetic type that satisfies all of the properties of real arithmetic that we know and love. The 754 committee has to decide to bend or break some of them. This is guided by some pretty simple principles: When we can, we match the behavior of real arithmetic. When we can't, we try to make the violations as predictable and as easy to diagnose as possible. [end quote] In particular, reflexivity for NANs was dropped for a number of reasons, some stronger than others: - One of the weaker reasons for NAN non-reflexivity is that it preserved the identity x == y <=> x - y == 0. Although that is the cornerstone of real arithmetic, it's violated by IEEE-754 INFs, so violating it for NANs is not a big deal either. - Dropping reflexivity preserves the useful property that NANs compare unequal to everything. - Practicality beats purity: dropping reflexivity allowed programmers to identify NANs without waiting years or decades for programming languages to implement isnan() functions. E.g. before Python had math.isnan(), I made my own: def isnan(x): return isinstance(x, float) and x != x - Keeping reflexivity for NANs would have implied some pretty nasty things, e.g. if log(-3) == log(-5), then -3 == -5. Basically, and I realise that many people disagree with their decision (notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the IEEE-754 committee led by William Kahan decided that the problems caused by having NANs compare unequal to themselves were much less than the problems that would have been caused without it. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/
Re: [Python-Dev] == on object tests identity in 3.x
On Tue, Jul 08, 2014 at 04:58:33PM +0200, Anders J. Munch wrote: > For two NaNs computed differently to compare equal is no worse than 2+2 > comparing equal to 1+3. You're comparing values, not their history. a = -23 b = -42 if log(a) == log(b): print "a == b" -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x
On Tue, Jul 08, 2014 at 06:33:31PM +0100, MRAB wrote: > The log of a negative number is a complex number. Only in complex arithmetic. In real arithmetic, the log of a negative number isn't a number at all. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?
On Sun, Jul 13, 2014 at 05:13:20PM +0200, Andreas Maier wrote: > Second, if not by delegation to equality of its elements, how would the > equality of sequences defined otherwise? Wow. I'm impressed by the amount of detailed effort you've put into investigating this. (Too much detail to absorb, I'm afraid.) But perhaps you might have just asked on the python-l...@python.org mailing list, or here, where we would have told you the answer: list __eq__ first checks element identity before going on to check element equality. If you can read C, you might like to check the list source code: http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c but if I'm reading it correctly, list.__eq__ conceptually looks something like this: def __eq__(self, other): if not isinstance(other, list): return NotImplemented if len(other) != len(self): return False for a, b in zip(self, other): if not (a is b or a == b): return False return True (The actual code is a bit more complex than that, since there is a single function, list_richcompare, which handles all the rich comparisons.) The critical test is PyObject_RichCompareBool here: http://hg.python.org/cpython/file/22e5a85ba840/Objects/object.c which explicitly says: /* Quick result when objects are the same. Guarantees that identity implies equality. */ [...] > I added this test only to show that float NaN is a special case, NANs are not a special case. List __eq__ treats all object types identically (pun intended): py> class X: ... def __eq__(self, other): return False ... py> x = X() py> x == x False py> [x] == [X()] False py> [x] == [x] True [...] > Case 6.c) is the surprising case. It could be interpreted in two ways > (at least that's what I found): > > 1) The comparison is based on identity of the float objects. But that is > inconsistent with test #4. And why would the list special-case NaN > comparison in such a way that it ends up being inconsistent with the > special definition of NaN (outside of the list)? It doesn't. NANs are not special cased in any way. This was discussed to death some time ago, both on python-dev and python-ideas. If you're interested, you can start here: https://mail.python.org/pipermail/python-list/2012-October/633992.html which is in the middle of one of the threads, but at least it gets you to the right time period. > 2) The list does not always delegate to element equality, but attempts > to optimize if the objects are the same (same identity). Right! It's not just lists -- I believe that tuples, dicts and sets behave the same way. > We will see > later that that happens. Further, when comparing float NaNs of the same > identity, the list implementation forgot to special-case NaNs. Which > would be a bug, IMHO. "Forgot"? I don't think the behaviour of list comparisons is an accident. NAN equality is non-reflexive. Very few other things are the same. It would be seriously weird if alist == alist could return False. You'll note that the IEEE-754 standard has nothing to say about the behaviour of Python lists containing NANs, so we're free to pick whatever behaviour makes the most sense for Python, and that is to minimise the "Gotcha!" factor. NANs are a gotcha to anyone who doesn't know IEEE-754, and possibly even some who do. I will go to the barricades to fight to keep the non-reflexivity of NANs *in isolation*, but I believe that Python has made the right decision to treat lists containing NANs the same as everything else. NAN == NAN # obeys IEEE-754 semantics and returns False [NAN] == [NAN] # obeys standard expectation that equality is reflexive This behaviour is not a bug, it is a feature. As far as I am concerned, this only needs documenting. If anyone needs list equality to honour the special behaviour of NANs, write a subclass or an equal() function. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exposing the Android platform existence to Python modules
On Sat, Aug 02, 2014 at 05:53:45AM +0400, Akira Li wrote: > Python uses os.name, sys.platform, and various functions from `platform` > module to provide version info: [...] > If Android is posixy enough (would `posix` module work on Android?) > then os.name could be left 'posix'. Does anyone know what kivy does when running under Android? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On Fri, Aug 01, 2014 at 10:57:38PM -0700, Allen Li wrote: > On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote: > > No. We just can't put all possible use cases in the docstring. :-) > > > > > > On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini wrote: > > > > help(sum) tells clearly that it should be used to sum numbers and not > > strings, and with strings actually fails. > > > > However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists. > > > > Is this to be considered a bug? > > Can you explain the rationale behind this design decision? It seems > terribly inconsistent. Why are only strings explicitly restricted from > being sum()ed? sum() should either ban everything except numbers or > accept everything that implements addition (duck typing). Repeated list and str concatenation both have quadratic O(N**2) performance, but people frequently build up strings with + and rarely do the same for lists. String concatenation with + is an attractive nuisance for many people, including some who actually know better but nevertheless do it. Also, for reasons I don't understand, many people dislike or cannot remember to use ''.join. Whatever the reason, repeated string concatenation is common whereas repeated list concatenation is much, much rarer (and repeated tuple concatenation even rarer), so sum(strings) is likely to be a land mine buried in your code while sum(lists) is not. Hence the decision that beginners in particular need to be protected from the mistake of using sum(strings) but bothering to check for sum(lists) is a waste of time. Personally, I wish that sum would raise a warning rather than an exception. As for prohibiting anything except numbers with sum(), that in my opinion would be a bad idea. sum(vectors), sum(numeric_arrays), sum(angles) etc. should all be allowed. The general sum() built-in should accept any type that allows + (unless explicitly black-listed), while specialist numeric-only sums could go into modules (like math.fsum). -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On Sat, Aug 02, 2014 at 10:52:07AM -0400, Alexander Belopolsky wrote: > On Sat, Aug 2, 2014 at 3:39 AM, Steven D'Aprano wrote: > > > String concatenation with + is an attractive > > nuisance for many people, including some who actually know better but > > nevertheless do it. Also, for reasons I don't understand, many people > > dislike or cannot remember to use ''.join. > > > > Since sum() already treats strings as a special case, why can't it simply > call (an equivalent of) ''.join itself instead of telling the user to do > it? It does not matter why "many people dislike or cannot remember to use > ''.join" - if this is a fact - it should be considered by language > implementors. It could, of course, but there is virtue in keeping sum simple, rather than special-casing who knows how many different types. If sum() tries to handle strings, should it do the same for lists? bytearrays? array.array? tuple? Where do we stop? Ultimately it comes down to personal taste. Some people are going to wish sum() tried harder to do the clever thing with more types, some people are going to wish it was simpler and didn't try to be clever at all. Another argument against excessive cleverness is that it ties sum() to one particular idiom or implementation. Today, the idiomatic and efficient way to concatenate a lot of strings is with ''.join, but tomorrow there might be a new str.concat() method. Who knows? sum() shouldn't have to care about these details, since they are secondary to sum()'s purpose, which is to add numbers. Anything else is a bonus (or perhaps a nuisance). So, I would argue that when faced with something that is not a number, there are two reasonable approaches for sum() to take: - refuse to handle the type at all; or - fall back on simple-minded repeated addition. By the way, I think this whole argument would have been easily side-stepped if + was only used for addition, and & used for concatenation. Then there would be no question about what sum() should do for lists and tuples and strings: raise TypeError. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote: > Good point -- I was trying to make the point about .join() vs + for strings > in an intro python class last year, and made the mistake of having the > students test the performance. > > You need to concatenate a LOT of strings to see any difference at all -- I > know that O() of algorithms is unavoidable, but between efficient python > optimizations and a an apparently good memory allocator, it's really a > practical non-issue. If only that were the case, but it isn't. Here's a cautionary tale for how using string concatenation can blow up in your face: Chris Withers asks for help debugging HTTP slowness: https://mail.python.org/pipermail/python-dev/2009-August/091125.html and publishes some times: https://mail.python.org/pipermail/python-dev/2009-September/091581.html (notice that Python was SIX HUNDRED times slower than wget or IE) and Simon Cross identified the problem: https://mail.python.org/pipermail/python-dev/2009-September/091582.html leading Guido to describe the offending code as an embarrassment. It shouldn't be hard to demonstrate the difference between repeated string concatenation and join, all you need do is defeat sum()'s prohibition against strings. Run this bit of code, and you'll see a significant difference in performance, even with CPython's optimized concatenation: # --- cut --- class Faker: def __add__(self, other): return other x = Faker() strings = list("Hello World!") assert ''.join(strings) == sum(strings, x) from timeit import Timer setup = "from __main__ import x, strings" t1 = Timer("''.join(strings)", setup) t2 = Timer("sum(strings, x)", setup) print (min(t1.repeat())) print (min(t2.repeat())) # --- cut --- On my computer, using Python 2.7, I find the version using sum is nearly 4.5 times slower, and with 3.3 about 4.2 times slower. That's with a mere twelve substrings, hardly "a lot". I tried running it on IronPython with a slightly larger list of substrings, but I got sick of waiting for it to finish. If you want to argue that microbenchmarks aren't important, well, I might agree with you in general, but in the specific case of string concatenation there's that pesky factor of 600 slowdown in real world code to argue with. > Blocking sum( some_strings) because it _might_ have poor performance seems > awfully pedantic. The rationale for explicitly prohibiting strings while merely implicitly discouraging other non-numeric types is that beginners, who are least likely to understand why their code occasionally and unpredictably becomes catastrophically slow, are far more likely to sum strings than sum tuples or lists. (I don't entirely agree with this rationale, I'd prefer a warning rather than an exception.) -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On Fri, Aug 08, 2014 at 10:20:37PM -0400, Alexander Belopolsky wrote: > On Fri, Aug 8, 2014 at 8:56 PM, Ethan Furman wrote: > > > I don't use sum at all, or at least very rarely, and it still irritates me. > > > You are not alone. When I see sum([a, b, c]), I think it is a + b + c, but > in Python it is 0 + a + b + c. If we had a "join" operator for strings > that is different form + - then sure, I would not try to use sum to join > strings, but we don't. I've long believed that + is the wrong operator for concatenating strings, and that & makes a much better operator. We wouldn't be having these interminable arguments about using sum() to concatenate strings (and lists, and tuples) if the & operator was used for concatenation and + was only used for numeric addition. > I have always thought that sum(x) is just a > shorthand for reduce(operator.add, x), but again it is not so in Python. The signature of reduce is: reduce(...) reduce(function, sequence[, initial]) -> value so sum() is (at least conceptually) a shorthand for reduce: def sum(values, initial=0): return reduce(operator.add, values, initial) but that's an implementation detail, not a language promise, and sum() is free to differ from that simple version. Indeed, even the public interface is different, since sum() prohibits using a string as the initial value and only promises to work with numbers. The fact that it happens to work with lists and tuples is somewhat of an accident of implementation. > While "sum should only be used for numbers," it turns out it is not a > good choice for floats - use math.fsum. Correct. And if you (generic you, not you personally) do not understand why simple-minded addition of floats is troublesome, then you're going to have a world of trouble. Anyone who is disturbed by the question of "should I use sum or math.fsum?" probably shouldn't be writing serious floating point code at all. Floating point computations are hard, and there is simply no escaping this fact. > While "strings are blocked because > sum is slow," numpy arrays with millions of elements are not. That's not a good example. Strings are potentially O(N**2), which means not just "slow" but *agonisingly* slow, as in taking a week -- no exaggeration -- to concat a million strings. If it takes a nanosecond to concat two strings, then 1e6**2 such concatenations could take over eleven days. Slowness of such magnitude might as well be "the process has locked up". In comparison, summing a numpy array with a million entries is not really slow in that sense. The time taken is proportional to the number of entries, and differs from summing a list only by a constant factor. Besides, in the case of strings it is quite simple to decide "is the initial value a string?", whereas with lists or numpy arrays it's quite hard to decide "is the list or array so huge that the user will consider this too slow?". What counts as "too slow" depends on the machine it is running on, what other processes are running, and the user's mood, and leads to the silly result that summing an array of N items succeeds but N+1 items doesn't. So in the case of strings, it is easy to make a blanket prohibition, but in the case of lists or arrays, there is no reasonable place to draw the line. > And try to > explain to someone that sum(x) is bad on a numpy array, but abs(x) is fine. I think that's because sum() has to box up each and every element in the array into an object, which is wasteful, while abs() can delegate to a specialist array.__abs__ method. Although that's not something beginners should be expected to understand, no serious Python programmer should be confused by this. As a programmer, we should expect to have some understanding of our tools, how they work, their limitations, and when to use a different tool. That's why numpy has its own version of sum which is designed to work specifically on numpy arrays. Use a specialist tool for a specialist job: py> with Stopwatch(): ... sum(carray) # carray is a numpy array of 7500 floats. ... 11250.0 time taken: 52.659770 seconds py> with Stopwatch(): ... numpy.sum(carray) ... 11250.0 time taken: 0.161263 seconds > Why have builtin sum at all if its use comes with so many caveats? Because sum() is a perfectly reasonable general purpose tool for adding up small amounts of numbers where high floating point precision is not required. It has been included as a built-in because Python comes with "batteries included", and a basic function for adding up a few numbers is an obvious, simple battery. But serious programmers should be comfortable with the idea that you use the right tool for the right job. If you visit a hardware store, you will find that even something as simple as the hammer exists in many specialist varieties. There are tack hammers, claw hammers, framing hammers, lump hammers, rubber and wooden mallets, "brass" non-sparking
Re: [Python-Dev] class Foo(object) vs class Foo: should be clearly explained in python 2 and 3 doc
On Sat, Aug 09, 2014 at 02:44:10PM -0400, John Yeuk Hon Wong wrote: > Hi. > > Referring to my discussion on [1] and then on #python this afternoon. > > A little background would help people to understand where this was > coming from. > > 1. I write Python 2 code and have done zero Python-3 specific code. > 2. I have always been using class Foo(object) so I do not know the new > style is no longer required in Python 3. I feel "stupid" and "wrong" by > thinking (object) is still a convention in Python 3. But object is still a convention in Python 3. It is certainly required when writing code that will behave the same in version 2 and 3, and it's optional in 3-only code, but certainly not frowned upon or discouraged. There's nothing wrong with explicitly inheriting from object in Python 3, and with the Zen of Python "Explicit is better than implicit" I would argue that *leaving it out* should be very slightly discouraged. class Spam: # okay, but a bit lazy class Spam(object): # better Perhaps PEP 8 should make a recommendation, but if so, I think it should be a very weak one. In Python 3, it really doesn't matter which you write. My own personal practice is to explicitly inherit from object when the class is "important" or more than half a dozen lines, and leave it out if the class is a stub or tiny. > 3. Many Python 2 tutorials do not use object as the base class whether > for historical reason, or lack of information/education, and can cause > confusing to newcomers searching for answers when they consult the > official documentation. We can't do anything about third party tutorials :-( > While Python 3 code no longer requires object be the base class for the > new-style class definition, I believe (object) is still required if one > has to write a 2-3 compatible code. But this was not explained or warned > anywhere in Python 2 and Python 3 code, AFAIK. (if I am wrong, please > correct me) It's not *always* required, only if you use features which require new-style classes, e.g. super, or properties. > I propose the followings: > > * It is desirable to state boldly to users that (object) is no longer > needed in Python-3 **only** code I'm against that. Stating this boldly will be understood by some readers that object should not be used, and I'm strongly against that. I believe explicitly inheriting from object should be mildly preferred, not strongly discouraged. > and warn users to revert to (object) > style if the code needs to be 2 and 3 compatible. I don't think that should be necesary, but have no objections to it being mentioned. I think it should be obvious: if you need new-style behaviour in Python 2, then obviously you have to inherit from object otherwise you have a classic class. That requirement doesn't go away just because your code will sometimes run under Python 3. Looking at your comment here: > [1]: https://news.ycombinator.com/item?id=8154471 there is a reply from zeckalpha, who says: "Actually, leaving out `object` is the preferred convention for Python 3, as they are semantically equivalent." How does (s)he justify this claim? "Explicit is better than implicit." which is not logical. If you leave out `object`, that's implicit, not explicit. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] class Foo(object) vs class Foo: should be clearly explained in python 2 and 3 doc
On Sun, Aug 10, 2014 at 11:51:51AM -0400, Alexander Belopolsky wrote: > On Sat, Aug 9, 2014 at 8:44 PM, Steven D'Aprano wrote: > > > It is certainly required when writing code that will behave the same in > > version 2 and 3 > > > > This is not true. An alternative is to put > > __metaclass__ = type > > at the top of your module to make all classes in your module new-style in > python2. So it is. I forgot about that, thank you for the correction. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On Tue, Aug 12, 2014 at 10:28:14AM +1000, Nick Coghlan wrote: > On 12 Aug 2014 09:09, "Allen Li" wrote: > > > > This is a problem I sometimes run into when working with a lot of files > > simultaneously, where I need three or more `with` statements: > > > > with open('foo') as foo: > > with open('bar') as bar: > > with open('baz') as baz: > > pass > > > > Thankfully, support for multiple items was added in 3.1: > > > > with open('foo') as foo, open('bar') as bar, open('baz') as baz: > > pass > > > > However, this begs the need for a multiline form, especially when > > working with three or more items: > > > > with open('foo') as foo, \ > > open('bar') as bar, \ > > open('baz') as baz, \ > > open('spam') as spam \ > > open('eggs') as eggs: > > pass > > I generally see this kind of construct as a sign that refactoring is > needed. For example, contextlib.ExitStack offers a number of ways to manage > multiple context managers dynamically rather than statically. I don't think that ExitStack is the right solution for when you have a small number of context managers known at edit-time. The extra effort of writing your code, and reading it, in a dynamic manner is not justified. Compare the natural way of writing this: with open("spam") as spam, open("eggs", "w") as eggs, frobulate("cheese") as cheese: # do stuff with spam, eggs, cheese versus the dynamic way: with ExitStack() as stack: spam, eggs = [stack.enter_context(open(fname), mode) for fname, mode in zip(("spam", "eggs"), ("r", "w")] cheese = stack.enter_context(frobulate("cheese")) # do stuff with spam, eggs, cheese I prefer the first, even with the long line. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On Tue, Aug 12, 2014 at 08:04:35AM -0500, Ian Cordasco wrote: > I think by introducing parentheses we are going to risk seriously > confusing users who may then try to write an assignment like > > a = (open('spam') as spam, open('eggs') as eggs) Seriously? If they try it, they will get a syntax error. Now, admittedly Python's syntax error messages tend to be terse and cryptic, but it's still enough to show that you can't do that. py> a = (open('spam') as spam, open('eggs') as eggs) File "", line 1 a = (open('spam') as spam, open('eggs') as eggs) ^ SyntaxError: invalid syntax I don't see this as a problem. There's no limit to the things that people *might* do if they don't understand Python semantics: for module in sys, math, os, import module (and yes, I once tried this as a beginner) but they try it once, realise it doesn't work, and never do it again. > Because it looks like a tuple but isn't and I think the extra > complexity this would add to the language would not be worth the > benefit. Do we have a problem with people thinking that, since tuples are normally interchangable with lists, they can write this? from module import [fe, fi, fo, fum, spam, eggs, cheese] and then being "seriously confused" by the syntax error they receive? Or writing this? from (module import fe, fi, fo, fum, spam, eggs, cheese) It's not sufficient that people might try it, see it fails, and move on. Your claim is that it will cause serious confusion. I just don't see that happening. > If we simply look at Ruby for what happens when you have an > overloaded syntax that means two different things, you can see why I'm > against modifying this syntax. That ship has sailed in Python, oh, 20+ years ago. Parens are used for grouping, for tuples[1], for function calls, for parameter lists, class base-classes, generator expressions and line continuations. I cannot think of any examples where these multiple uses for parens has cause meaningful confusion, and I don't think this one will either. [1] Technically not, since it's the comma, not the ( ), which makes a tuple, but a lot of people don't know that and treat it as if it the parens were compulsary. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
On Thu, Aug 14, 2014 at 02:26:29AM +1000, Chris Angelico wrote: > On Wed, Aug 13, 2014 at 11:11 PM, Isaac Morland wrote: > > While I would not claim a Python sandbox is utterly impossible, I'm > > suspicious that the whole "consenting adults" approach in Python is > > incompatible with a sandbox. The whole idea of a sandbox is to absolutely > > prevent people from doing things even if they really want to and know what > > they are doing. The point of a sandbox is that I, the consenting adult writing the application in the first place, may want to allow *untrusted others* to call Python code without giving them control of the entire application. The consenting adults rule applies to me, the application writer, not them, the end-users, even if they happen to be writing Python code. If they want unrestricted access to the Python interpreter, they can run their code on their own machine, not mine. > It's certainly not *fundamentally* impossible to sandbox Python. > However, the question becomes one of how much effort you're going to > go to and how much you're going to restrict the code. I believe that PyPy has an effective sandbox, but to what degree of effectiveness I don't know. http://pypy.readthedocs.org/en/latest/sandbox.html I've had rogue Javascript crash my browser or make my entire computer effectively unusable often enough that I am skeptical about claims that Javascript in the browser is effectively sandboxed, so I'm doubly cautious about Python. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On Wed, Aug 13, 2014 at 08:08:51PM +0300, yoav glazner wrote: [...] > Just a thought, would it bit wierd that: > with (a as b, c as d): "works" > with (a, c): "boom" > with(a as b, c): ? If this proposal is accepted, there is no need for the "boom". The syntax should allow: # Without parens, limited to a single line. with a [as name], b [as name], c [as name], ...: block # With parens, not limited to a single line. with (a [as name], b [as name], c [as name], ... ): block where the "as name" part is always optional. In both these cases, whether there are parens or not, it will be interpreted as a series of context managers and never as a single tuple. Note two things: (1) this means that even in the unlikely event that tuples become context managers in the future, you won't be able to use a tuple literal: with (1, 2, 3): # won't work as expected t = (1, 2, 3) with t: # will work as expected But I cannot imagine any circumstances where tuples will become context managers. (2) Also note that *this is already the case*, since tuples are made by the commas, not the parentheses. E.g. this succeeds: # Not a tuple, actually two context managers. with open("/tmp/foo"), open("/tmp/bar", "w"): pass -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On Fri, Aug 15, 2014 at 02:08:42PM -0700, Ethan Furman wrote: > On 08/13/2014 10:32 AM, Steven D'Aprano wrote: > > > >(2) Also note that *this is already the case*, since tuples are made by > >the commas, not the parentheses. E.g. this succeeds: > > > ># Not a tuple, actually two context managers. > >with open("/tmp/foo"), open("/tmp/bar", "w"): > >pass > > Thanks for proving my point! A comma, and yet we did *not* get a tuple > from it. Um, sorry, I don't quite get you. Are you agreeing or disagreeing with me? I spent half of yesterday reading the static typing thread over on Python-ideas and it's possible my brain has melted down *wink* but I'm confused by your response. Normally when people say "Thanks for proving my point", the implication is that the person being thanked (in this case me) has inadvertently undercut their own argument. I don't think I have. I'm suggesting that the argument *against* the proposal: "Multi-line with statements should not be allowed, because: with (spam, eggs, cheese): ... is syntactically a tuple" is a poor argument (that is, I'm disagreeing with it), since *single* line parens-free with statements are already syntactically a tuple: with spam, eggs, cheese: # Commas make a tuple, not parens. ... I think the OP's suggestion is a sound one, and while Nick's point that bulky with-statements *may* be a sign that some re-factoring is needed, there are many things that are a sign that re-factoring is needed and I don't think this particular one warrents rejecting what is otherwise an obvious and clear way of using multiple context managers. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On Fri, Aug 15, 2014 at 08:29:09PM -0700, Ethan Furman wrote: > On 08/15/2014 08:08 PM, Steven D'Aprano wrote: [...] > >is a poor argument (that is, I'm disagreeing with it), since *single* > >line parens-free with statements are already syntactically a tuple: > > > > with spam, eggs, cheese: # Commas make a tuple, not parens. > > This point I do not understand -- commas /can/ create a tuple, but don't > /necessarily/ create a tuple. So, semantically: no tuple. Right! I think we are in agreement. It's not that with statements actually generate a tuple, but that they *look* like they include a tuple. That's what I meant by "syntactically a tuple", sorry if that was confusing. I didn't mean to suggest that Python necessarily builds a tuple of context managers. If people were going to be prone to mistake with (a, b, c): ... as including a tuple, they would have already mistaken: with a, b, c: ... the same way. But they haven't. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On Sat, Aug 16, 2014 at 05:25:33PM +1000, Ben Finney wrote: [...] > > they would have already mistaken: > > > > with a, b, c: ... > > > > the same way. But they haven't. > > Right. The presence or absence of parens make a big semantic difference. from silly.mistakes.programmers.make import ( hands, up, anyone, who, thinks, this, is_, a, tuple) def function(how, about, this, one): ... But quite frankly, even if there is some person somewhere who gets confused and tries to write: context_managers = (open("a"), open("b", "w"), open("c", "w")) with context_managers as things: text = things[0].read() things[1].write(text) things[2].write(text.upper()) I simply don't care. They will try it, discover that tuples are not context managers, fix their code, and move on. (I've made sillier mistakes, and became a better programmer from it.) We cannot paralyse ourselves out of fear that somebody, somewhere, will make a silly mistake. You can try that "with tuple" code right now, and you will get nice runtime exception. I admit that the error message is not the most descriptive I've ever seen, but I've seen worse, and any half-decent programmer can do what they do for any other unexpected exception: read the Fine Manual, or ask for help, or otherwise debug the problem. Why should this specific exception be treated as so harmful that we have to forgo a useful piece of functionality to avoid it? Some designs are bug-magnets, like the infamous "except A,B" syntax, which fails silently, doing the wrong thing. Unless someone has a convincing rationale for how and why this multi-line with will likewise be a bug-magnet, I don't think that some vague similarity between it and tuples is justification for rejecting the proposal. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
On Sun, Aug 17, 2014 at 11:28:48AM +1000, Nick Coghlan wrote: > I've seen a few people on python-ideas express the assumption that > there will be another Py3k style compatibility break for Python 4.0. I used to refer to Python 4000 as the hypothetical compatibility break version. Now I refer to Python 5000. > I've also had people express the concern that "you broke compatibility > in a major way once, how do we know you won't do it again?". Even languages with ISO standards behind them and release schedules measured in decades make backward-incompatible changes. For example, I see that Fortran 95 (despite being classified as a minor revision) deleted at least six language features. To expect Python to never break compatibility again is asking too much. But I think it is fair to promise that Python won't make *so many* backwards incompatible changes all at once again, and has no concrete plans to make backwards incompatible changes to syntax in the foreseeable future. (That is, not before Python 5000 :-) [...] > If folks (most signficantly, Guido) are amenable to the idea, it > shouldn't take long to put such a PEP together, and I think it could > help reduce some of the confusions around the expectations for Python > 4.0 and the evolution of 3.x in general. I think it's a good idea, so long as there's no implied or explicit promise that Python language is now set in stone never to change. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote: > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal > wrote: > > This brings up the other key problem. If file names are (almost) > > arbitrary bytes, how do you write one to/read one from a text file > > with a particular encoding? ( or for that matter display it on a > > terminal) > >There is no such thing as an encoding of text files. I don't understand this comment. It seems to me that *text* files have to have an encoding, otherwise you can't interpret the contents as text. Files, of course, only contain bytes, but to be treated as bytes you need some way of transforming byte N to char C (or multiple bytes to C), which is an encoding. Perhaps you just mean that encodings are not recorded in the text file itself? To answer Chris' question, you typically cannot include arbitrary bytes in text files, and displaying them to the user is likewise problematic. The usual solution is to support some form of escaping, like \t #x0A; or %0D, to give a few examples. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote: > The point is that if you are reading a file name from the system, and then > passing it back to the system, then you can treat it as just bytes -- who > cares? And if you add the byte value of 47 thing, then you can even do > basic path manipulations. But once you want to do other things with your > file name, then you need to know the encoding. And it is very, very common > for users to need to do other things with filenames, and they almost always > want them as text that they can read and understand. > > Python3 supports this case very well. But it does indeed make it hard to > work with filenames when you don't know the encoding they are in. Just "not knowing" is not sufficient. In that case, you'll likely get a Unicode string containing moji-bake: # I write a file name using UTF-8 on my system: filename = 'music by Наӥв.txt'.encode('utf-8') # You try to use it assuming ISO-8859-7 (Greek) filename.decode('iso-8859-7') => 'music by Π\x9dΠ°Σ₯Π².txt' which, even though it looks wrong, still lets you refer to the file (provided you then encode back to bytes with ISO-8859-7 again). This won't always be the case, sometimes the encoding you guess will be wrong. When I started this email, I originally began to say that the actual problem was with byte file names that cannot be decoded into Unicode using the system encoding (typically UTF-8 on Linux systems. But I've actually had difficulty demonstrating that it actually is a problem. I started with a byte sequence which is invalid UTF-8, namely: b'ZZ\xdb\xdf\xfa\xff' created a file with that name, and then tried listing it with os.listdir. Even in Python 3.1 it worked fine. I was able to list the directory and open the file, so I'm not entirely sure where the problem lies exactly. Can somebody demonstrate the failure mode? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
On Wed, Sep 10, 2014 at 05:17:57PM +1000, Nick Coghlan wrote: > Since it may come in handy when discussing "Why was Python 3 > necessary?" with folks, I wanted to point out that my article on the > transition to multilingual programming has now been reposted on the > Red Hat developer blog: > http://developerblog.redhat.com/2014/09/09/transition-to-multilingual-programming-python/ That's awesome! Thank you Nick. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
On Wed, Sep 17, 2014 at 11:14:15AM +1000, Chris Angelico wrote: > On Wed, Sep 17, 2014 at 5:29 AM, R. David Murray > wrote: > > Basically, we are pretending that the each smuggled > > byte is single character for string parsing purposes...but they don't > > match any of our parsing constants. They are all "any character" matches > > in the regexes and what have you. > > This is slightly iffy, as you can't be sure that one byte represents > one character, but as long as you don't much care about that, it's not > going to be an issue. This discussion would probably be a lot more easy to follow, with fewer miscommunications, if there were some examples. Here is my example, perhaps someone can tell me if I'm understanding it correctly. I want to send an email including the header line: 'Subject: “NOBODY expects the Spanish Inquisition!”' Note the curly quotes. I've read the manifesto "UTF-8 Everywhere" so I do the right thing and encode it as UTF-8: b'Subject: \xe2\x80\x9cNOBODY expects the Spanish Inquisition!\xe2\x80\x9d' but my mail package, not being written in a language as awesome as Python, is just riddled with bugs, and somehow I end up with this corrupted byte-string instead: b'Subject: \x9c\x80\xe2NOBODY expects the Spanish Inquisition!\xe2\x80\x9d' Note that the bytes from the first curly quote bytes are in the wrong order, but the second is okay. (Like I said, it's just *riddled* with bugs.) That means that trying to decode those bytes will fail in Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 9: invalid start byte but it's not up to Python's email package to throw those invalid bytes out or permantly replace them with something else. Also, we want to work with Unicode strings, not byte strings, so there has to be a way to smuggle those three bytes into Unicode, without ending up with either the replacement bytes: # using the 'replace' error handler 'Subject: ���NOBODY expects the Spanish Inquisition!”' or incorrectly interpreting them as valid, but wrong, code points. (If we do the second, we end up with two control characters "\x9c\x80" followed by "â".) We want to be able to round-trip back to the same bytes we received. Am I right so far? So the email package uses the surrogate-escape error handler and ends up with this Unicode string: 'Subject: \udc9c\udc80\udce2NOBODY expects the Spanish Inquisition!”' which can be encoded back to the bytes we started with. Note that technically those three \u... code points are NOT classified as "noncharacters". They are actually surrogate code points: http://www.unicode.org/faq/private_use.html#nonchar4 http://www.unicode.org/glossary/#surrogate_code_point and they're supposed to be reserved for UTF-16. I'm not sure of the implication of that. > I'm fairly sure you're never going to find an > encoding in which one unknown byte represents two characters, There are encodings which use a "shift" mechanism, whereby a byte X represents one character by default, and a different character after the shift mechanism. But I don't think that matters, since we're not able to interpret those bytes. If we were, we'd just decode them to a text string and be done with it. > but > there are cases where it takes more than one byte to make up a > character (or the bytes are just shift codes or something). Multi-byte encodings are very common. All the Unicode encodings are multi-byte. So are many East Asian encodings. > Does that > ever throw off your regexes? It wouldn't be an issue to a .* between > two character markers, but if you ever say .{5} then it might match > incorrectly. I don't think the idea is to match on these smuggled bytes specifically. I think the idea is to match *around* them. In the example above, we might match everything from "Subject: " to the end of the line. So long as we never end up with a situation where the smuggled bytes are replaced by something else, or shuffled around into different positions, we should be fine. David, is my understanding correct? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
On Wed, Sep 17, 2014 at 09:21:56AM +0900, Stephen J. Turnbull wrote: > Guido's mantra is something like "Python's str doesn't contain > characters or even code points[1], it contains code units." But is that true? If it were true, I would expect to be able to make Python text strings containing code units that aren't code points, e.g. something like "\U1234" or chr(0x1234) should work, but neither do. As far as I can tell, there is no way to build a string containing items which aren't code points. I don't think it is useful to say that strings *contain* code units, more that they *are made up from* code units. Code units are the implementation: 16-bit code units in narrow builds, 32-bit code units in wide builds, and either 8-, 16- or 32-bit code units in Python 3.3 and beyond. (I don't know of any Python implementation which uses UTF-8 internally, but if there was one, it would use 8-bit code units.) It isn't very useful to say that in Python 3.3 the string "A" *contains* the 8-bit code unit 0x41. That's conflating two different levels of explanation (the high-level interface and the underlying implemention) and potentially leads to user confusion like # 8-bit code units are bytes, right? assert b'\41' in "A" which is Not Even Wrong. http://rationalwiki.org/wiki/Not_even_wrong I think it is correct to say that Python strings are sequences of Unicode code points U+ through U+10. There are no other restrictions, e.g. strings can contain surrogates, noncharacters, or nonsensical combinations of code points such as a U+0300 COMBINING GRAVE ACCENT combined with U+000A (newline). > Implying > that dealing with characters (or the grapheme globs that occasionally > raise their ugly heads here) is an issue for higher-level facilities > than str to deal with. Agreed that Python doesn't offer a string type based on graphemes, and that such a facility belongs as a high-level library, not a built-in type. Also agreed that talking about characters is sloppy. Nevertheless, for English speakers at least, "code point = character" isn't too awful a first approximation. > The point being that > > > Basically, we are pretending that the each smuggled byte is single > > character > > is something of a misstatement (good enough for present purpose of > discussing email, but not good enough for the general case of > understanding how this is supposed to work when porting the construct > to other Python implementations), while > > > for string parsing purposes...but they don't match any of our > > parsing constants. > > is precisely Pythonically correct. You might want to add "because all > parsing constants contain only valid characters by construction." I don't understand what you are trying to say here. > > [*] I worried a lot that this was re-introducing the bytes/string > > problem from python2. > > It isn't, because the bytes/str problem was that given a str object > out of context you could not tell whether it was a binary blob or > text, and if text, you couldn't tell if it was external encoded text > or internal abstract text. > > That is not true here because the representations of characters vs. > smuggled bytes in str are disjoint sets. Nor am I sure what you are trying to say here either. > Footnotes: > [1] In Unicode terminology, a code unit is the smallest computer > object that can represent a character (this is uniquely and sanely > defined for all real Unicode transformation formats aka UTFs). A code > point is an integer 0 - (17*256*256-1) that can represent a character, > but many code points such as surrogates and 0x are defined to be > non-characters. Actually not quite. "Noncharacter" is concretely defined in Unicode, and there are only 66 of them, many fewer than the surrogate code points alone. Surrogates are reserved, not noncharacters. http://www.unicode.org/glossary/#surrogate_code_point http://www.unicode.org/faq/private_use.html#nonchar1 It is wrong to talk about "surrogate characters", but perhaps you mean to say that surrogates (by which I understand you to mean surrogate code points) are "not human-meaningful characters", which is not the same thing as a Unicode noncharacter. > Characters are those code points that may be assigned > an interpretation as a character, including undefined characters > (private space and reserved). So characters are code points which are characters, including undefined characters? :-) http://www.unicode.org/glossary/#character -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 - Clarification of what "python" command should invoke
On Fri, Sep 19, 2014 at 04:44:26AM -0400, Donald Stufft wrote: > > > On Sep 19, 2014, at 3:31 AM, Bohuslav Kabrda wrote: > > > > Hi, as Fedora is getting closer to having python3 as a default, I'm > > being more and more asked by Fedora users/contributors what'll > > "/usr/bin/python" invoke when we achieve this (Fedora 22 hopefully). > > So I was rereading PEP 394 and I think I need a small clarification > > regarding two points in the PEP: - "for the time being, all > > distributions should ensure that python refers to the same target as > > python2." - "Similarly, the more general python command should be > > installed whenever any version of Python is installed and should > > invoke the same version of Python as either python2 or python3." > > > > The important word in the second point is, I think, *whenever*. > > Trying to apply these two points to Fedora 22 situation, I can think > > of several approaches: > > - /usr/bin/python will always point to python3 (seems to go against > > the first mentioned PEP recommendation) Definitely not that. Arch Linux pointed /usr/bin/python at Python 3 some years ago, and I understand that this has caused no end of trouble for the folks on #python. I haven't seen any sign of this being an issue on the tutor@ or python-l...@python.org mailing lists, but the demographics are quite different so that's not surprising. > > - /usr/bin/python will always point to python2 (seems to go against > > the second mentioned PEP recommendation, there is no /usr/bin/python > > if python2 is not installed) My understanding is that this is the intention of the PEP, at least until such time as Python 2 is end-of-lifed. My interpretion would be that the second recommendation in the PEP is just confused :-) Perhaps the PEP author could clarify what the intention is. > > - /usr/bin/python will point to python3 if python2 is not installed, > > else it will point to python2 (inconsistent; also the user doesn't > > know he's running and what libraries he'll be able to import - the > > system can have different sets of python2-* and python3-* extension > > modules installed) Likely to cause all sorts of problems, and I understood that this was not the intention. Perhaps it was added *only* as a "grand-father clause" so that people don't yell at Arch Linux "See, the PEP says you're doing it wrong!". > > - there will be no /usr/bin/python (goes against PEP and seems just wrong) Seems like the least-worst to me. If you think of "python == Python 2.x" (at least for the next few years), then if Python 2.x isn't installed, there should be no /usr/bin/python either. > I don’t know for a fact, but I assume that as long as Python 2.x is > installed by default than ``python`` should point to ``python2``. If > Python 3.x is the default version and Python 2.x is the “optional” > version than I think personally it makes sense to switch eventually. > Maybe not immediately to give people time to update though? Agreed. Once Python 2 is finally end-of-lifed in 2023 or thereabouts, then we can reconsider pointing /usr/bin/python at Python 3 (or 4, whatever is current by then). If Arch Linux jumped the gun by a decade or so, that's their problem :-) -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 - Clarification of what "python" command should invoke
On Fri, Sep 19, 2014 at 10:41:58AM -0400, Barry Warsaw wrote: > On Sep 19, 2014, at 10:23 AM, Donald Stufft wrote: > > >My biggest problem with ``python3``, is what happens after 3.9. > > FWIW, 3.9 by my rough calculation is 7 years away. That makes it 2021, one year after Python 2.7 free support ends, but two years before Red Hat commercial support for it ends. > I seem to recall Guido saying that *if* there's a 4.0, it won't be a major > break like Python 3, whatever that says about the numbering scheme after 3.9. > > Is 7 years enough to eradicate Python 2 the way we did for Python 1? Then > maybe Python 4 can reclaim /usr/bin/python. I expect not quite. Perhaps 10 years though. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Critical bash vulnerability CVE-2014-6271 may affect Python on *n*x and OSX
On Fri, Sep 26, 2014 at 12:17:46AM +0200, Antoine Pitrou wrote: > On Thu, 25 Sep 2014 13:00:16 -0700 > Bob Hanson wrote: > > Critical bash vulnerability CVE-2014-6271 may affect Python on > > *n*x and OSX: [...] See also: http://adminlogs.info/2014/09/25/again-bash-cve-2014-7169/ > Fortunately, Python's subprocess has its `shell` argument default to > False. However, `os.system` invokes the shell implicitly and is > therefore a possible attack vector. Perhaps I'm missing something, but aren't there easier ways to attack os.system than the bash env vulnerability? If I'm accepting and running arbitrary strings from an untrusted user, there's no need for them to go to the trouble of feeding me: "env x='() { :;}; echo gotcha' bash -c 'echo do something useful'" when they can just feed me: "echo gotcha" In other words, os.system is *already* an attack vector, unless you only use it with trusted strings. I don't think the bash env vulnerability adds to the attack surface. Have I missed something? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [OFF-TOPIC] It is true that is impossible write in binary code, the lowest level of programming that you can write is in hex code?
This is off-topic for this mailing list, as you know. There are some mailing lists which approve of off-topic conversations, but this is not one of those. You could ask on the python-l...@python.org mailing list, where it will still be off-topic, but the people there are more likely to answer. But even better would be to look for a mailing list or forum for assembly programming, machine code, or micro-code. On Mon, Nov 03, 2014 at 09:19:46PM -0200, françai s wrote: > I intend to write in lowest level of computer programming as a hobby. [...] -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 479: Change StopIteration handling inside generators
On Sat, Nov 22, 2014 at 12:53:41AM +1100, Chris Angelico wrote: > On Sat, Nov 22, 2014 at 12:47 AM, Raymond Hettinger > wrote: > > Also, the proposal breaks a reasonably useful pattern of calling > > next(subiterator) inside a generator and letting the generator terminate > > when the data stream ends. Here is an example that I have taught for > > years: > > > > def izip(iterable1, iterable2): > > it1 = iter(iterable1) > > it2 = iter(iterable2) > > while True: > > v1 = next(it1) > > v2 = next(it2) > > yield v1, v2 > > Is it obvious to every user that this will consume an element from > it1, then silently terminate if it2 no longer has any content? "Every user"? Of course not. But it should be obvious to those who think carefully about the specification of zip() and what is available to implement it. zip() can't detect that the second argument is empty except by calling next(), which it doesn't do until after it has retrieved a value from the first argument. If it turns out the second argument is empty, what can it do with that first value? It can't shove it back into the iterator. It can't return a single value, or pad it with some sentinel value (that's what izip_longest does). Since zip() is documented as halting on the shorter argument, it can't raise an exception. So what other options are there apart from silently consuming the value? Indeed that is exactly what the built-in zip does: py> a = iter("abcdef") py> b = iter("abc") py> list(zip(a, b)) [('a', 'a'), ('b', 'b'), ('c', 'c')] py> next(a) 'e' -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 479: Change StopIteration handling inside generators
On Thu, Nov 20, 2014 at 11:36:54AM -0800, Guido van Rossum wrote: [...] > That said, I think for most people the change won't matter, some people > will have to apply one of a few simple fixes, and a rare few will have to > rewrite their code in a non-trivial way (sometimes this will affect > "clever" libraries). > > I wonder if the PEP needs a better transition plan, e.g. > > - right now, start an education campaign > - with Python 3.5, introduce "from __future__ import generator_return", and > silent deprecation warnings > - with Python 3.6, start issuing non-silent deprecation warnings > - with Python 3.7, make the new behavior the default (subject to some kind > of review) I fear that there is one specific corner case that will be impossible to deal with in a backwards-compatible way supporting both Python 2 and 3 in one code base: the use of `return value` in a generator. In Python 2.x through 3.1, `return value` is a syntax error inside generators. Currently, the only way to handle this case in 2+3 code is by using `raise StopIteration(value)` but if that changes in 3.6 or 3.7 then there will be no (obvious?) way to deal with this case. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 479: Change StopIteration handling inside generators
On Sun, Nov 23, 2014 at 08:17:00AM -0800, Ethan Furman wrote: > While I am in favor of PEP 479, and I have to agree with Raymond that > this isn't pretty. > > Currently, next() accepts an argument of what to return if the > iterator is empty. Can we enhance that in some way so that the > overall previous behavior could be retained? [...] > Then, if the iterator is empty, instead of raising StopIteration, or > returning some value that would then have to be checked, it could > raise some other exception that is understood to be normal generator > termination. We *already* have an exception that is understood to be normal generator termination. It is called StopIteration. Removing the long-standing ability to halt generators with StopIteration, but then recreating that ability under a different name is the worst of both worlds: - working code is still broken; - people will complain that the new exception X is silently swallowed by generators, just as they complained about StopIteration; - it is yet another subtle difference between Python 2 and 3; - it involves a code smell ("no constant arguments to functions"); - and does nothing to help generators that don't call next(). The current behaviour is nice and clean and has worked well for over a decade. The new behaviour exchanges consistency in one area (generators behave like all other iterators) for consistency in another (generator expressions will behave like comprehensions in the face of StopIteration). But trying to have both at the same time via a new exception adds even more complexity and would leave everyone unhappy. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Move selected documentation repos to PSF BitBucket account?
On Sun, Nov 23, 2014 at 08:55:50AM -0800, Guido van Rossum wrote: > But I strongly believe that if we want to do the right thing for the > long term, we should switch to GitHub. Encouraging a software, or social, monopoly is never the right thing for the long term. http://nedbatchelder.com/blog/201405/github_monoculture.html > I promise you that once the pain of the switch is over you will feel > much better about it. I am also convinced that we'll get more > contributions this way. I'm sure that we'll get *more* contributions, but will they be *better* contributions? I know that there are people who think that mailing lists are old and passe, and that we should shift discussion to a social media site like Reddit. If we did, we'd probably get twenty times as many comments, and the average quality would probably plummet. More is not necessarily a good thing. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Move selected documentation repos to PSF BitBucket account?
On Sun, Nov 23, 2014 at 06:08:07PM -0600, Brian Curtin wrote: > On Sun, Nov 23, 2014 at 5:57 PM, Steven D'Aprano wrote: > > I'm sure that we'll get *more* contributions, but will they be *better* > > contributions? > > > > I know that there are people who think that mailing lists are old and > > passe, and that we should shift discussion to a social media site like > > Reddit. If we did, we'd probably get twenty times as many comments, and > > the average quality would probably plummet. More is not necessarily a > > good thing. > > If we need to ensure that we're getting better contributions than we > are now, then we should be interviewing committers, rejecting > newcomers (or the opposite, multiplying core-mentors by 100), and > running this like a business. I've written some crappy code that got > committed, so I should probably be fired. None of those things are guarenteed to lead to better contributions. The quality of code from the average successful business is significantly lower than that from successful FOSS projects like Python. Interviews just weed out people who are poor interviewees, not poor performers. And any organisation that fires contributors for relatively trivial mistakes like "crappy code" would soon run out of developers. My point is that increasing the number of contributions is not, in and of itself, a useful aim to have. More contributions is just a means to an end, the end we want is better Python. > Enabling our community to be active contributors is an important > thing. Give them a means to level up and we'll all be better off from > it. Right. But this isn't a feel-good exercise where anyone who wants a Gold Star for contributing gets commit privileges. (That would "enable our community to be active contributors" too.) Barriers to contribute work two ways: (1) we miss out on good contributions we would want; (2) we also miss out on poor contributions that would just have to be rejected. Enabling more people to contribute increases both. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 479: Change StopIteration handling inside generators
On Mon, Nov 24, 2014 at 10:22:54AM +1100, Chris Angelico wrote: > My point is that doing the same errant operation on a list or a dict > will give different exceptions. In the same way, calling next() on an > empty iterator will raise StopIteration normally, but might raise > RuntimeError instead. It's still an exception, it still indicates a > place where code needs to be changed I wouldn't interpret it like that. Calling next() on an empty iterator raises StopIteration. That's not a bug indicating a failure, it's the protocol working as expected. Your response to that may be to catch the StopIteration and ignore it, or to allow it to bubble up for something else to deal with it. Either way, next() raising StopIteration is not a bug, it is normal behaviour. (Failure to deal with any such StopIteration may be a bug.) However, if next() raises RuntimeError, that's not part of the protocol for iterators, so it is almost certainly a bug to be fixed. (Probably coming from an explicit "raise StopIteration" inside a generator function.) Your fix for the bug may be to refuse to fix it and just catch the exception and ignore it, but that's kind of nasty and hackish and shouldn't be considered good code. Do you agree this is a reasonable way to look at it? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?
On Sun, Nov 30, 2014 at 11:07:57AM +1300, Greg Ewing wrote: > Nathaniel Smith wrote: > >So pkgname/__new__.py might look like: > > > >import sys > >from pkgname._metamodule import MyModuleSubtype > >sys.modules[__name__] = MyModuleSubtype(__name__, docstring) > > > >To start with, the 'from > >pkgname._metamodule ...' line is an infinite loop, > > Why does MyModuleSubtype have to be imported from pkgname? > It would make more sense for it to be defined directly in > __new__.py, wouldn't it? Isn't the purpose of separating > stuff out into __new__.py precisely to avoid circularities > like that? Perhaps I'm missing something, but won't that imply that every module which wants to use a "special" module type has to re-invent the wheel? If this feature is going to be used, I would expect to be able to re-use pre-written module types. E.g. having written "module with properties" (so to speak) once, I can just import it and use it in my next project. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github
I have some questions and/or issues with the PEP, but first I'm going to add something to Nick's comments: On Sun, Nov 30, 2014 at 11:12:17AM +1000, Nick Coghlan wrote: > Beyond that, GitHub is indeed the most expedient option. My two main > reasons for objecting to taking the expedient path are: > > 1. I strongly believe that the long term sustainability of the overall open > source community requires the availability and use of open source > infrastructure. While I admire the ingenuity of the "free-as-in-beer" model > for proprietary software companies fending off open source competition, I > still know a proprietary platform play when I see one (and so do venture > capitalists looking to extract monopoly rents from the industry in the > future). (So yes, I regret relenting on this principle in previously > suggesting the interim use of another proprietary hosted service) > > 2. I also feel that this proposal is far too cavalier in not even > discussing the possibility of helping out the Mercurial team to resolve > their documentation and usability issues rather than just yelling at them > "your tool isn't popular enough for us, and we find certain aspects of it > too hard to use, so we're switching to something else rather than working > with you to address our concerns". We consider the Mercurial team a > significant enough part of the Python ecosystem that Matt was one of the > folks specifically invited to the 2014 language summit to discuss their > concerns around the Python 3 transition. Yet we'd prefer to switch to > something else entirely rather than organising a sprint with them at PyCon > to help ensure that our existing Mercurial based infrastructure is > approachable for git & GitHub users? (And yes, I consider some of the core > Mercurial devs to be friends, so this isn't an entirely abstract concern > for me) Thanks Nick, I think these are excellent points, particularly the second. It would be a gross strawman to say that we should "only" use software developed in Python, but we should eat our own dogfood whenever practical and we should support and encourage the Python ecosystem, including Mercurial. Particularly since hg and git are neck and neck feature-wise, we should resist the tendency to jump on bandwagons. If git were clearly the superior product, then maybe there would be an argument for using the best tool for the job, but it isn't. As for the question of using Github hosting, there's another factor which has been conspicuous by its absence. Has GitHub's allegedly toxic and bullying culture changed since Julie Horvath quit in March? And if it has not, do we care? I'm not a saint, but I do try to choose ethical companies and institutions over unethical ones whenever it is possible and practical. I'm not looking for a witch-hunt against GitHub, but if the allegations made by Horvath earlier this year are true, and I don't believe anyone has denied them, then so long as GitHub's internal culture remains sexist and hostile to the degree reported, then I do not believe that we should use GitHub's services even if we shift some repos to git. I have serious doubts about GitHub's compatibility with the ideals expressed by the PSF. Even if our code of conduct does not explicitly forbid it, I think that it goes against the principles that we say we aspire to. Given Horvath's experiences, and the lack of clear evidence that anything has changed in GitHub, I would be deeply disappointed if Python lent even a smidgeon of legitimacy to their company, and I personally will not use their services. I acknowledge that it's hard to prove a negative, and GitHub may have difficulty proving to my satisfaction that they have changed. (My experience is that company culture rarely changes unless there is a change in management, and even then only slowly.) Particularly given GitHub's supposed egalitarian, non-hierarchical, and meritocratic structure, that nobody apparently saw anything wrong with the bullying of staff and workplace sexism until it became public knowledge suggests that it is not just a few bad apples but a problem all through the company. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github
On Sun, Nov 30, 2014 at 02:56:22PM -0500, Donald Stufft wrote: > As I mentioned in my other email, we’re already supporting two > different tools, and it’s a hope of mine to use this as a sort of > testbed to moving the other repositories as well. If we go down this path, can we have some *concrete* and *objective* measures of success? If moving to git truly does improve things, then the move can be said to be a success. But if it makes no concrete difference, then we've wasted our time. In six months time, how will we know which it is? Can we have some concrete and objective measures of what would count as success, and some Before and After measurements? Just off the top of my head... if the number of documentation patches increases significiantly (say, by 30%) after six months, that's a sign the move was successful. It's one thing to say that using hg is discouraging contributors, and that hg is much more popular. It's another thing to say that moving to git will *actually make a difference*. Maybe all the would-be contributors using git are too busy writing kernel patches for Linus or using Node.js and wouldn't be caught dead with Python :-) With concrete and objective measures of success, you will have ammunition to suggest moving the rest of Python to git in a few years time. And without it, we'll also have good evidence that any further migration to git may be a waste of time and effort and we should focus our energy elsewhere rather than git vs hg holy wars. [...] > I also think it’s hard to look at a company like bitbucket, for > example, and say they are *better* than Github just because they > didn’t have a public and inflammatory event. We can't judge companies on what they might be doing behind closed doors, only on what we can actually see of them. Anybody might be rotten bounders and cads in private, but how would we know? It's an imperfect world and we have imperfect knowledge but still have to make a decision as best we can. > Attempting to reduce the cognitive burden for contributing and aligning > ourselves > with the most popular tools allows us to take advantage of the network effects > of these tools popularity. This can be the difference between someone with > limited > amount of time being able to contribute or not, which can make real inroads > towards > making it easier for under privileged people to contribute much more than > refusing > to use a product of one group of people over another just because the other > group > hasn’t had a public and inflammatory event. In other contexts, that could be a pretty awful excuse for inaction against the most aggregiously bad behaviour. "Sure, Acme Inc might have adulterated baby food with arsenic, but other companies might have done worse things that we haven't found out about. So we should keep buying Acme's products, because they're cheaper and that's good for the poor." Not that I'm comparing GitHub's actions with poisoning babies. What GitHub did was much worse. *wink* -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github
On Tue, Dec 02, 2014 at 12:37:22AM +1100, Steven D'Aprano wrote: [...] > It's one thing to say that using hg is discouraging contributors, and > that hg is much more popular. /s/more/less/ -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition
On Fri, Dec 12, 2014 at 10:24:15AM -0800, Mark Roberts wrote: > So, I'm more than aware of how to write Python 2/3 compatible code. I've > ported 10-20 libraries to Python 3 and write Python 2/3 compatible code at > work. I'm also aware of how much writing 2/3 compatible code makes me hate > Python as a language. I'm surprised by the strength of feeling there. Most of the code I write supports 2.4+, with the exception of 3.0 where I say "it should work, but if it doesn't, I don't care". I'll be *very* happy when I can drop support for 2.4, but with very few exceptions I have not found many major problems supporting both 2.7 and 3.3+ in the one code-base, and nothing I couldn't work around (sometimes by just dropping support for a specific feature in certain versions). I'm not disputing that your experiences are valid, but I am curious what specific issues you have come across and wondering if there are things which 3.5 can include to ease that transition. E.g. 3.3 re-added support for u'' syntax. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] also
On Wed, Jan 28, 2015 at 09:39:25AM -0500, Alan Armour wrote: > if you can do this > > a chemical physics and element physics like everything from melting points > to how much heat you need to add two chemicals together > > and physics like aerodynamics, space dynamics, and hydrodynamics etcetera > for propellers and motors and stuff. > > just having this in a main language seems to make a shit ton of sense. You should check out Frink: http://futureboy.us/frinkdocs/ -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] subclassing builtin data structures
On Thu, Feb 12, 2015 at 06:14:22PM -0800, Ethan Furman wrote: > On 02/12/2015 05:46 PM, MRAB wrote: > > On 2015-02-13 00:55, Guido van Rossum wrote: > >> Actually, the problem is that the base class (e.g. int) doesn't know how > >> to construct an instance of the subclass -- there is no reason (in > >> general) why the signature of a subclass constructor should match the > >> base class constructor, and it often doesn't. > >> > >> So this is pretty much a no-go. It's not unique to Python -- it's a > >> basic issue with OO. > >> > > Really? > > What I was asking about, and Guido responded to, was not having to > specifically override __add__, __mul__, __sub__, and > all the others; if we do override them then there is no problem. I think you have misunderstood MRAB's comment. My interpretation is that MRAB is suggesting that methods in the base classes should use type(self) rather than hard-coding their own type. E.g. if int were written in pure Python, it might look something like this: class int(object): def __new__(cls, arg): ... def __add__(self, other): return int(self, other) (figuratively, rather than literally). But if it looked like this: def __add__(self, other): return type(self)(self, other) then sub-classing would "just work" without the sub-class having to override each and every method. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] subclassing builtin data structures
On Fri, Feb 13, 2015 at 06:03:35PM -0500, Neil Girdhar wrote: > I personally don't think this is a big enough issue to warrant any changes, > but I think Serhiy's solution would be the ideal best with one additional > parameter: the caller's type. Something like > > def __make_me__(self, cls, *args, **kwargs) > > and the idea is that any time you want to construct a type, instead of > > self.__class__(assumed arguments…) > > where you are not sure that the derived class' constructor knows the right > argument types, you do > > def SomeCls: > def some_method(self, ...): >return self.__make_me__(SomeCls, assumed arguments…) > > Now the derived class knows who is asking for a copy. What if you wish to return an instance from a classmethod? You don't have a `self` available. class SomeCls: def __init__(self, x, y, z): ... @classmethod def from_spam(cls, spam): x, y, z = process(spam) return cls.__make_me__(self, cls, x, y, z) # oops, no self Even if you are calling from an instance method, and self is available, you cannot assume that the information needed for the subclass constructor is still available. Perhaps that information is used in the constructor and then discarded. The problem we wish to solve is that when subclassing, methods of some base class blindly return instances of itself, instead of self's type: py> class MyInt(int): ... pass ... py> n = MyInt(23) py> assert isinstance(n, MyInt) py> assert isinstance(n+1, MyInt) Traceback (most recent call last): File "", line 1, in ? AssertionError The means that subclasses often have to override all the parent's methods, just to ensure the type is correct: class MyInt(int): def __add__(self, other): o = super().__add__(other) if o is not NotImplemented: o = type(self)(o) return o Something like that, repeated for all the int methods, should work: py> n = MyInt(23) py> type(n+1) This is tedious and error prone, but at least once it is done, subclasses of MyInt will Just Work: py> class MyOtherInt(MyInt): ... pass ... py> a = MyOtherInt(42) py> type(a + 1000) (At least, *in general* they will work. See below.) So, why not have int's methods use type(self) instead of hard coding int? The answer is that *some* subclasses might override the constructor, which would cause the __add__ method to fail: # this will fail if the constructor has a different signature o = type(self)(o) Okay, but changing the constructor signature is quite unusual. Mostly, people subclass to add new methods or attributes, or to override a specific method. The dict/defaultdict situation is relatively uncommon. Instead of requiring *every* subclass to override all the methods, couldn't we require the base classes (like int) to assume that the signature is unchanged and call type(self), and leave it up to the subclass to override all the methods *only* if the signature has changed? (Which they probably would have to do anyway.) As the MyInt example above shows, or datetime in the standard library, this actually works fine in practice: py> from datetime import datetime py> class MySpecialDateTime(datetime): ... pass ... py> t = MySpecialDateTime.today() py> type(t) Why can't int, str, list, tuple etc. be more like datetime? -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] subclassing builtin data structures
On Sat, Feb 14, 2015 at 01:26:36PM -0500, Alexander Belopolsky wrote: > On Sat, Feb 14, 2015 at 7:23 AM, Steven D'Aprano > wrote: > > > Why can't int, str, list, tuple etc. be more like datetime? > > > They are. In all these types, class methods call subclass constructors but > instance methods don't. But in datetime, instance methods *do*. Sorry that my example with .today() was misleading. py> from datetime import datetime py> class MyDatetime(datetime): ... pass ... py> MyDatetime.today() MyDatetime(2015, 2, 15, 12, 45, 38, 429269) py> MyDatetime.today().replace(day=20) MyDatetime(2015, 2, 20, 12, 45, 53, 405889) > In the case of int, there is a good reason for this behavior - bool. In > python, we want True + True == 2. Sure. But bool is only one subclass. I expect that it should be bool's responsibility to override __add__ etc. to return an instance of the parent class (int) rather have nearly all subclasses have to override __add__ etc. to return instances of themselves. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 488: elimination of PYO files
On Fri, Mar 06, 2015 at 09:37:05PM +0100, Antoine Pitrou wrote: > On Fri, 06 Mar 2015 18:11:19 + > Brett Cannon wrote: > > And the dropping of docstrings does have an impact on > > memory usage when you use Python at scale. > > What kind of "scale" are you talking about? Do you have any numbers > about such impact? > > > You're also assuming that we will never develop an AST optimizer > > No, the assumption is that we don't have such an optimizer *right now*. > Having command-line options because they might be useful some day is > silly. Quoting the PEP: This issue is only compounded when people optimize Python code beyond what the interpreter natively supports, e.g., using the astoptimizer project [2]_. Brett, I'm a very strong +1 on the PEP. It's well-written and gives a good explanation for why such a thing is needed. The current behaviour of re-using the same .pyo file for two distinct sets of bytecode is out-and-out buggy: [steve@ando ~]$ python3.3 -O -c "import dis; print(dis.__doc__[:32])" Disassembler of Python byte code [steve@ando ~]$ python3.3 -OO -c "import dis; print(dis.__doc__[:32])" Disassembler of Python byte code The second should fail, since doc strings should be removed under -OO optimization, but because the .pyo file already exists it doesn't. Even if CPython drops -O and -OO altogether, this PEP should still be accepted to allow third party optimizers like astoptimizer to interact without getting in each other's way. (And for the record, I'm an equally strong -1 on dropping -O and -OO.) Thank you. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 488: elimination of PYO files
On Fri, Mar 06, 2015 at 08:00:20PM -0500, Ron Adam wrote: > Have you considered doing this by having different magic numbers in the > .pyc file for standard, -O, and -O0 compiled bytecode files? Python > already checks that number and recompiles the files if it's not what it's > expected to be. And it wouldn't require any naming conventions or new > cache directories. It seems to me it would be much easier to do as well. And it would fail to solve the problem. The problem isn't just that the .pyo file can contain the wrong byte-code for the optimization level, that's only part of the problem. Another issue is that you cannot have pre-compiled byte-code for multiple different optimization levels. You can have a "no optimization" byte-code file, the .pyc file, but only one "optimized" byte-code file at the same time. Brett's proposal will allow -O optimized and -OO optimized byte-code files to co-exist, as well as setting up a clear naming convention for future optimizers in either the Python compiler or third-party optimizers. No new cache directories are needed. The __pycache__ directory has been used since Python 3.3 (or was it 3.2? I forget which). -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] boxing and unboxing data types
On Sun, Mar 08, 2015 at 08:31:30PM -0700, Ethan Furman wrote: > When data is passed from Python to a native library (such as in an O/S > call), how does the unboxing of data types occur? [...] > So the real question: anywhere in Python where an int is expected (for > lower-level API work), but not directly received, should __int__ (or > __index__) be called? and failure to do so is a bug? I think the answer is in the docs: https://docs.python.org/3/reference/datamodel.html#object.__int__ Immediately below that __index__ is described, with this note: In order to have a coherent integer type class, when __index__() is defined __int__() should also be defined, and both should return the same value. The PEP adding __index__ is also useful: https://www.python.org/dev/peps/pep-0357/ My summary is as follows: __int__ is used as the special method for int(), and it should coerce the object to an integer. This may be lossy e.g. int(2.999) --> 2 or may involve a conversion from a non-numeric type to integer e.g. int("2"). __index__ is used when the object in question actually represents an integer of some kind, e.g. a fixed-with integer. Conversion should be lossless and conceptually may be thought of a way of telling Python "this value actually is an int, even though it doesn't inherit from int" (for some definition of "is an int"). There's no built-in way of calling __index__ that I know of (no equivalent to int(obj)), but slicing at the very least will call it, e.g. seq[a:] will call type(a).__index__. If you define __index__ for your class, you should also define __int__ and have the two return the same value. I would expect that an IntFlags object should inherit from int, and if that is not possible, practical or desirable for some reason, then it should define __index__ and __int__. Failure to call __index__ is not necessarily a bug. I think it is allowed for functions to insist on an actual int, as slicing did for many years, but it is an obvious enhancement to allow such functions to accept arbitrary int-like objects. Does that answer your questions? -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] boxing and unboxing data types
On Mon, Mar 09, 2015 at 09:52:01AM -0400, Neil Girdhar wrote: > Here is a list of methods on > int that should not be on IntFlags in my opinion (give or take a couple): > > __abs__, __add__, __delattr__, __divmod__, __float__, __floor__, > __floordiv__, __index__, __lshift__, __mod__, __mul__, __pos__, __pow__, > __radd__, __rdivmod__, __rfloordiv__, __rlshift__, __rmod__, __rmul__, > __round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__, > __sub__, __truediv__, __trunc__, conjugate, denominator, imag, numerator, > real. > > I don't think __index__ should be exposed either since are you really going > to slice a list using IntFlags? Really? In what way is this an *Int*Flags object if it is nothing like an int? It sounds like what you want is a bunch of Enum inside a set with a custom __str__, not IntFlags. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.
On Sun, Mar 08, 2015 at 10:57:30PM -0700, Ryan Smith-Roberts wrote: > I suspect that you will find the Python community extremely conservative > about any changes to its sorting algorithm, given that it took thirteen > years and some really impressive automated verification software to find > this bug: On the other hand, the only person who really needs to be convinced is Tim Peters. It's really not up to the Python community. The bug tracker is the right place for discussing this. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 488: elimination of PYO files
On Wed, Mar 11, 2015 at 05:34:10PM +, Brett Cannon wrote: > I have a poll going on G+ to see what people think of the various proposed > file name formats at > https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm . Feel free to > vote if you have an opinion. G+ hates my browser and won't let me vote. I click on the button and nothing happens. I have Javascript enabled and I'm not using any ad blockers. For the record, I think only the first two options importlib.cpython-35.opt-0.pyc importlib.cpython-35.opt0.pyc are sane, and I prefer the first. I'm mildly inclined to leave out the opt* part for default, unoptimized code. In other words, the file name holds two or three '.' delimited fields, plus the extension: .-.[opt-].pyc where [...] is optional and the optimization codes for CPython will be 1 for -O and 2 for -OO. And 0 for unoptimized, if you decide that it should be mandatory. Thank you for moving forward on this, I think it is a good plan. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 8 update
On Tue, Apr 07, 2015 at 03:11:30AM +0100, Rob Cliffe wrote: > As a matter of interest, how far away from mainstream am I in > preferring, *in this particular example* (obviously it might be > different for more complicated computation), > > def foo(x): > return math.sqrt(x) if x >= 0 else None > > I probably have a personal bias towards compact code, but it does seem > to me that the latter says exactly what it means, no more and no less, > and therefore is somewhat more readable. (Easier to keep the reader's > attention for 32 non-whitespace characters than 40.) In my opinion, code like that is a good example of why the ternary if operator was resisted for so long :-) Sometimes you can have code which is just too compact. My own preference would be: def foo(x): if x >= 0: return math.sqrt(x) return None but I'm not terribly fussed about whether the "else" is added or not, whether the return is on the same line as the if, and other minor variations. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 8 update
On Tue, Apr 07, 2015 at 08:47:25AM -0400, Ben Hoyt wrote: > > My own preference would be: > > > > def foo(x): > > if x >= 0: > > return math.sqrt(x) > > return None > > Kind of getting into the weeds here, but I would always invert this to > "return errors early, and keep the normal flow at the main indentation > level". Depends a little on what foo() means, but it seems to me the > "return None" case is the exceptional/error case, so this would be: > > def foo(x): > if x < 0: > return None > return math.sqrt(x) While *in general* I agree with "handle the error case early", there are cases where "handle the normal case early" is better, and I think that this is one of them. Also, inverting the comparison isn't appropriate, due to float NANs. With the first version, foo(NAN) returns None (which I assumed was deliberate by the OP). In your version, it returns NAN. But as you say, we're now deep into the weeds... -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keyword-only parameters
On Tue, Apr 14, 2015 at 01:40:40PM -0400, Eric V. Smith wrote: > But, I don't see a lot of keyword-only parameters being added to stdlib > code. Is there some position we've taken on this? Barring someone saying > "stdlib APIs shouldn't contain keyword-only params", I'm inclined to > make numeric_owner keyword-only. I expect that's because keyword-only parameters are quite recent (3.x only) and most of the stdlib is quite old. Keyword-only feels right for this to me too. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Mon, Apr 20, 2015 at 11:34:51PM +0100, Harry Percival wrote: > exactly. yay stub files! we all agree! everyone loves them! Not even close. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Mon, Apr 20, 2015 at 07:30:39PM +0100, Harry Percival wrote: > Hi all, > > tldr; type hints in python source are scary. Would reserving them for stub > files be better? No no no, a thousand times no it would not! Please excuse my extreme reaction, but over on the python-list mailing list (comp.lang.python if you prefer Usenet) we already had this discussion back in January. Anyone wishing to read those conversations should start here: https://mail.python.org/pipermail/python-list/2015-January/697202.html https://mail.python.org/pipermail/python-list/2015-January/697315.html Be prepared for a long, long, long read. Nothing in your post hasn't already been discussed (except for your proposal to deprecate annotations altogether). So if you feel that I'm giving any of your ideas or concerns short-shrift, I'm not, it's just that I've already given them more time than I can afford. And now I get to do it all again, yay. (When reading those threads, please excuse my occasional snark towards Rick, he is a notorious troll on the list and sometimes I let myself be goaded into somewhat less than professional responses.) While I sympathise with your thought that "it's scary", I think it is misguided and wrong. As a thought-experiment, let us say that we roll back the clock to 1993 or thereabouts, just as Python 1.0 (or so) was about to be released, and Guido proposed adding *default values* to function declarations [assuming they weren't there from the start]. If we were used to Python's clean syntax: def zipmap(f, xx, yy): the thought of having to deal with default values: def zipmap(f=None, xx=(), yy=()): might be scary. Especially since those defaults could be arbitrarily complex expressions. Twenty years on, what should we think about such fears? Type hinting or declarations are extremely common in programming languages, and I'm not just talking about older languages like C and Java. New languages, both dynamic and static, like Cobra, Julia, Haskell, Go, Boo, D, F#, Fantom, Kotlin, Rust and many more include optional or mandatory type declarations. You cannot be a programmer without expecting to deal with type hints/declarations somewhere. As soon as you read code written in other languages (and surely you do that, don't you? you practically cannot escape Java and C code on the internet) and in my opinion Python cannot be a modern language without them. I'm going to respond to your recommendation to use stub files in another post (replying to Barry Warsaw), here I will discuss your concerns first. > My first reaction to type hints was "yuck", and I'm sure I'm not the only > one to think that. viz (from some pycon slides): > > def zipmap(f: Callable[[int, int], int], xx: List[int], >yy: List[int]) -> List[Tuple[int, int, int]]: > > arg. and imagine it with default arguments. You've picked a complex example and written it poorly. I'd say yuck too, but let's use something closer to PEP-8 formatting: def zipmap(f: Callable[[int, int], int], xx: List[int], yy: List[int] ) -> List[Tuple[int, int, int]]: Not quite so bad with each parameter on its own line. It's actually quite readable, once you learn what the annotations mean. Like all new syntax, of course you need to learn it. But the type hints are just regular Python expressions. > Of course, part of this reaction is just a knee-jerk reaction to the new > and unfamiliar, and should be dismissed, entirely justifiably, as mere > irrationality. But I'm sure sensible people agree that they do make our > function definitions longer, more complex, and harder to read. Everything has a cost and all features, or lack of features, are a trade-off. Function definitions could be even shorter and simpler and easier to read if we didn't have default values. [...] > I'm not so sure. My worry is that once type hinting gets standardised, > then they will become a "best practice", and there's a particular > personality type out there that's going to start wanting to add type hints > to every function they write. Similarly to mindlessly obeying PEP8 while > ignoring its intentions, hobgoblin-of-little-minds style, I think we're > very likely to see type hints appearing in a lot of python source, or a lot > of pre-commit-hook checkers. Pretty soon it will be hard to find any open > source library code that doesn't have type hints, or any project style > guide that doesn't require them. I doubt that very much. I'm not a betting man, but if I were, I would put money on it. Firstly: libraries tend to be multi-version, and these days they are often hybrid Python 2 & 3 code. Since annotations are 3 only, libraries cannot use these type hints until they drop support for Python 2.7, which will surely be *no less* than five years away. Probably more like ten. So "annotations everywhere" are, at best, many years away. Secondly, more importantl
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Mon, Apr 20, 2015 at 02:41:06PM -0400, Barry Warsaw wrote: > On Apr 20, 2015, at 07:30 PM, Harry Percival wrote: > > >tldr; type hints in python source are scary. Would reserving them for stub > >files be better? > > I think so. I think PEP 8 should require stub files for stdlib modules and > strongly encourage them for 3rd party code. A very, very strong -1 to that. Stub files are a necessary evil. Except where absolutely necessary, they should be strongly discouraged. A quote from the Go FAQs: Dependency management is a big part of software development today but the “header files” of languages in the C tradition are antithetical to clean dependency analysis—and fast compilation. http://golang.org/doc/faq#What_is_the_purpose_of_the_project Things that go together should be together. A function parameter and its type information (if any) go together: the type is as much a part of the parameter declaration as the name and the default. Putting them together is the best situation: def func(n: Integer): ... and should strongly be prefered as best practice for when you choose to use type hinting at all. Alternatives are not as good. Second best is to put them close by, as in a decorator: @typing(n=Integer) # Don't Repeat Yourself violation def func(n): ... A distant third best is a docstring. Not only does it also violate DRY, but it also increases the likelyhood of errors: def func(n): """Blah blah blah blah blah blah Arguments: m: Integer """ Keeping documentation and code in synch is hard, and such mistakes are not uncommon. Putting the type information in a stub file is an exponentially more distant fourth best, or to put it another way, *the worst* solution for where to put type hints. Not only do you Repeat Yourself with the name of the parameter, but also the name of the function (or method and class) AND module. The type information *isn't even in the same file*, which increases the chance of it being lost, forgotten, deleted, out of date, unmaintained, etc. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Mon, Apr 20, 2015 at 08:37:28PM -0700, Guido van Rossum wrote: > On Mon, Apr 20, 2015 at 4:41 PM, Jack Diederich wrote: > > > Twelve years ago a wise man said to me "I suggest that you also propose a > > new name for the resulting language" > > > > The barrage of FUD makes me feel like the woman who asked her doctor for a > second opinion and was told "you're ugly too." Don't worry Guido, some of us are very excited to see this coming to fruition :-) It's been over ten years since your first blog post on optional typing for Python. At least nobody can accuse you of rushing into this. http://www.artima.com/weblogs/viewpost.jsp?thread=85551 -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Tue, Apr 21, 2015 at 11:56:15AM +0100, Rob Cliffe wrote: > (Adding a type hint that restricted the argument to say a > sequence of numbers turns out to be a mistake. Let's find out how big a mistake it is with an test run. py> def sorter(alist: List[int]) -> List[int]: ... return sorted(alist) ... py> data = (chr(i) + 'ay' for i in range(97, 107)) py> type(data) py> sorter(data) ['aay', 'bay', 'cay', 'day', 'eay', 'fay', 'gay', 'hay', 'iay', 'jay'] When we say that type checking is optional, we mean it. [Disclaimer: I had to fake the List object, since I don't have the typing module, but everything else is exactly as you see it.] Annotations will be available for type checking. If you don't want to type check, don't type check. If you want to go against the type hints, you can go against the type hints, and get exactly the same runtime errors as you have now: py> sorter(None) Traceback (most recent call last): File "", line 1, in File "", line 2, in sorter TypeError: 'NoneType' object is not iterable There is no compile time checking unless you choose to run a type checker or linter -- and even if the type checker flags errors, you can ignore it and run the program regardless. Just like today with linters like PyFlakes, PyLint and similar. For those who choose not to run a type checker, the annotations will be nothing more than introspectable documentation. > And what is a number? > Is Fraction? What about complex numbers, which can't be > sorted? What if the function were written before the Decimal class?) I know that you are intending these as rhetorical questions, but Python has had a proper numeric tower since version 2.5 or 2.6. So: py> from numbers import Number py> from decimal import Decimal py> isinstance(Decimal("1.25"), Number) True py> isinstance(2+3j, Number) True > Errors are often not caught until run time that would be caught at > compile time in other languages (though static code checkers help). Yes they do help, which is exactly the point. > (Not much of a disadvantage because of Python's superb error > diagnostics.) That's certainly very optimistic of you. If I had to pick just one out of compile time type checking versus run time unit tests, I'd pick run time tests. But it is naive to deny the benefits of compile time checks in catching errors that you otherwise might not have found even with extensive unit tests (and lets face it, we never have enough unit tests). Ironically, type hinting will *reduce* the need for intrusive, anti-duck-testing explicit calls to isinstance() at runtime: def func(x:float): if isinstance(x, float): ... else: raise TypeError Why bother making that expensive isinstance call every single time the function is called, if the type checker can prove that x is always a float? > Python code typically says what it is doing, with the minimum of > syntactic guff. (Well, apart from colons after if/while/try etc. :-) ) > Which makes it easy to read. > Now it seems as if this proposal wants to start turning Python in the > C++ direction, encouraging adding ugly boilerplate code. (This may only > be tangentially relevant, but I want to scream when I see some > combination of public/private/protected/static/extern etc., most of > which I don't understand.) Perhaps if you understood it you would be less inclined to scream. > Chris A makes the valid point (if I understand correctly) that > Authors of libraries should make it as easy as possible to > (i) know what object types can be passed to functions > (ii) diagnose when the wrong type of object is passed > Authors of apps are not under such obligation, they can basically > do what they want. > > Well, > (i) can be done with good documentation (docstrings etc.). > (ii) can be done with appropriate runtime checks and good error > messages. How ironic. After singing the praises of duck-typing, now you are recommending runtime type checks. As far as good error messages go, they don't help you one bit when the application suddenly falls over in a totally unexpected place due to a bug in your code. I can't go into too many details due to commercial confidentiality, but we experienced something similar recently. A situation nobody foresaw, that wasn't guarded against, and wasn't tested for, came up after deployment. There was a traceback, of course, but a failure in the field 200km away with a stressed customer and hundreds of angry users is not as useful as a compile-time failure during development. > You see where I'm going with this - adding type hints to Python feels a > bit like painting feet on the snake. Pythons are one of the few snakes which have vestigal legs: http://en.wikipedia.org/wiki/Pelvic_spur -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Tue, Apr 21, 2015 at 01:25:34PM +0100, Chris Withers wrote: > Anyway, I've not posted much to python-dev in quite a while, but this is > a topic that I would be kicking myself in 5-10 years time when I've had > to move to Javascript or because everyone > else has drifted away from Python as it had become ugly... Facebook released Flow, a static typechecker for Javascript, to a very positive reaction. From their announcement: Flow’s type checking is opt-in — you do not need to type check all your code at once. However, underlying the design of Flow is the assumption that most JavaScript code is implicitly statically typed; even though types may not appear anywhere in the code, they are in the developer’s mind as a way to reason about the correctness of the code. Flow infers those types automatically wherever possible, which means that it can find type errors without needing any changes to the code at all. On the other hand, some JavaScript code, especially frameworks, make heavy use of reflection that is often hard to reason about statically. For such inherently dynamic code, type checking would be too imprecise, so Flow provides a simple way to explicitly trust such code and move on. This design is validated by our huge JavaScript codebase at Facebook: Most of our code falls in the implicitly statically typed category, where developers can check their code for type errors without having to explicitly annotate that code with types. Quoted here: http://blog.jooq.org/2014/12/11/the-inconvenient-truth-about-dynamic-vs-static-typing/ More about flow: http://flowtype.org/ Matz is interested in the same sort of gradual type checking for Ruby as Guido wants to add to Python: https://www.omniref.com/blog/blog/2014/11/17/matz-at-rubyconf-2014-will-ruby-3-dot-0-be-statically-typed/ Julia already includes this sort of hybrid dynamic+static type checking: http://julia.readthedocs.org/en/latest/manual/types/ I could keep going, but I hope I've made my point. Whatever language you are using in 5-10 years time, it will almost certainly be either mostly static with some dynamic features like Java, or dynamic with optional and gradual typing. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Tue, Apr 21, 2015 at 03:08:27PM +0200, Antoine Pitrou wrote: > On Tue, 21 Apr 2015 22:47:23 +1000 > Steven D'Aprano wrote: > > > > Ironically, type hinting will *reduce* the need for intrusive, > > anti-duck-testing explicit calls to isinstance() at runtime: > > It won't, since as you pointed out yourself, type checks are purely > optional and entirely separate from compilation and runtime evaluation. Perhaps you are thinking of libraries, where the library function has to deal with whatever junk people throw at it. To such libraries, I believe that the major benefit of type hints is not so much in proving the library's correctness in the face of random arguments, but as documentation. In any case, of course you are correct that public library functions and methods will continue to need to check their arguments. (Private functions, perhaps not.) But for applications, the situation is different. If my application talks to a database and extracts a string which it passes on to its own function spam(), then it will be a string. Not a string-like object. Not something that quacks like a string. A string. Once the type checker is satisfied that spam() always receives a string, then further isinstance checks inside spam() is a waste of time. If spam()'s caller changes and might return something which is not a string, then the type checker will flag that. Obviously to get this benefit you need to actually use a type checker. I didn't think I needed to mention that. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Tue, Apr 21, 2015 at 03:51:05PM +0100, Cory Benfield wrote: > On 21 April 2015 at 15:31, Chris Angelico wrote: > > Granted, there are some > > vague areas - how many functions take a "file-like object", and are > > they all the same? - but between MyPy types and the abstract base > > types that already exist, there are plenty of ways to formalize duck > > typing. > > Are there? Can I have a link or an example, please? I feel like I > don't know how I'm supposed to do this, and I'd like to see how that > works. I'll even give a concrete use-case: I want to be able to take a > file-like object that has a .read() method and a .seek() method. I've never done this before, so I might not quite have done it correctly, but this appears to work just fine: py> import abc py> class SeekableReadable(metaclass=abc.ABCMeta): ... @classmethod ... def __subclasshook__(cls, C): ... if hasattr(C, 'seek') and hasattr(C, 'read'): ... return True ... return NotImplemented ... py> f = open('/tmp/foo') py> isinstance(f, SeekableReadable) True py> from io import StringIO py> issubclass(StringIO, SeekableReadable) True py> issubclass(int, SeekableReadable) False That gives you your runtime check for an object with seek() and read() methods. For compile-time checking, I expect you would define SeekableReadable as above, then make the declaration: def read_from_start(f:SeekableReadable, size:int): f.seek(0) return f.read(size) So now you have runtime interface checking via an ABC, plus documentation for the function parameter type via annotation. But will the static checker understand that annotation? My guess is, probably not as it stands. According to the docs, MyPy currently doesn't support this sort of duck typing, but will: [quote] There are also plans to support more Python-style “duck typing” in the type system. The details are still open. [end quote] http://mypy.readthedocs.org/en/latest/class_basics.html#abstract-base-classes-and-multiple-inheritance I expect that dealing with duck typing will be very high on the list of priorities for the future. In the meantime, for this specific use-case, you're probably not going to be able to statically check this type hint. Your choices would be: - don't type check anything; - don't type check the read_from_start() function, but type check everything else; - don't type check the f parameter (remove the SeekableReadable annotation, or replace it with Any, but leave the size:int annotation); - possibly some type checkers will infer from the function body that f must have seek() and read() methods, and you don't have to declare anything (structural typing instead of nominal?); - (a bad idea, but just for the sake of completeness) leave the annotation in, and ignore false negatives. Remember that there is no built-in Python type checker. If you have no checker, the annotations are just documentation and nothing else will have changed. If you don't like the checker you have, you'll be able to replace it with another. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Thu, Apr 23, 2015 at 03:25:30PM +0100, Harry Percival wrote: > lol @ the fact that the type hints are breaking github's syntax highlighter > :) That just tells us that Github's syntax highlighter has been broken for over five years. Function annotations go back to Python 3.0, more than five years ago. The only thing which is new about type hinting is that we're adding a standard *use* for those annotations. I just tested a version of kwrite from 2005, ten years old, and it highlights the following annotated function perfectly: def func(a:str='hello', b:int=int(x+1)) -> None: print(a + b) Of course, I'm hoping that any decent type checker won't need the type hints. It should be able to infer from the default values that a is a string and b an int, and only require a type hint if you want to accept other types as well. (It should also highlight that a+b cannot succeed.) -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] async/await in Python; v2
On Thu, Apr 23, 2015 at 01:51:52PM -0400, Barry Warsaw wrote: > Why "async def" and not "def async"? > > My concern is about existing tools that already know that "def" as the first > non-whitespace on the line starts a function/method definition. Think of a > regexp in an IDE that searches backwards from the current line to find the > function its defined on. Sure, tools can be updated but it is it *necessary* > to choose a syntax that breaks tools? Surely its the other way? If I'm searching for the definition of a function manually, I search for "def spam". `async def spam` will still be found, while `def async spam` will not. It seems to me that tools that search for r"^\s*def\s+spam\s*\(" are going to break whichever choice is made, while a less pedantic search like r"def\s+spam\s*\(" will work only if async comes first. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] typeshed for 3rd party packages
On Wed, Apr 22, 2015 at 11:26:14AM -0500, Ian Cordasco wrote: > On a separate thread Cory provided an example of what the hints would look > like for *part* of one function in the requests public functional API. > While our API is outwardly simple, the values we accept in certain cases > are actually non-trivially represented. Getting the hints *exactly* correct > would be extraordinarily difficult. I don't think you need to get them exactly correct. The type-checker does two things: (1) catch type errors involving types which should not be allowed; (2) allow code which involves types which should be allowed. If the type hints are wrong, there are two errors: false positives, when code which should be allowed is flagged as a type error; and false negatives, when code which should be flagged as an error is not. Ideally, there should be no false positives. But false negatives are not so important, since you will still be doing runtime checks. All that means is that the static type-checker will be a little less capable of picking up type errors at compile time. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] typeshed for 3rd party packages
On Fri, Apr 24, 2015 at 03:44:45PM +0100, Cory Benfield wrote: > On 24 April 2015 at 15:21, Steven D'Aprano wrote: > > > If the type hints are wrong, there are two errors: false positives, when > > code which should be allowed is flagged as a type error; and false > > negatives, when code which should be flagged as an error is not. > > Ideally, there should be no false positives. But false negatives are not > > so important, since you will still be doing runtime checks. All that > > means is that the static type-checker will be a little less capable of > > picking up type errors at compile time. > > I think that's a rational view that will not be shared as widely as I'd like. I can't tell if you are agreeing with me, or disagreeing. The above sentence seems to be agreeing with me, but you later end your message with "do it properly or not at all" which disagrees. So I'm confused. > Given that the purpose of a type checker is to catch bugs caused by > passing incorrectly typed objects to a function, it seems entirely > reasonable to me to raise a bug against a type hint that allows code > that was of an incorrect type where that incorrectness *could* have > been caught by the type hint. Of course it is reasonable for people to submit bug reports to do with the type hints. And it is also reasonable for the package maintainer to reject the bug report as "Won't Fix" if it makes the type hint too complex. The beauty of gradual typing is that unlike Java or Haskell, you can choose to have as little or as much type checking as works for you. You don't have to satisfy the type checker over the entire program before the code will run, you only need check the parts you want to check. > Extending from that into the general > ratio of "reports that are actually bugs" versus "reports that are > errors on the part of the reporter", I can assume that plenty of > people will raise bug reports for incorrect cases as well. Okay. Do you get many false positive bug reports for your tests too? > From the perspective of sustainable long-term maintenance, I think the > only way to do type hints is to have them be sufficiently exhaustive > that a user would have to actively *try* to hit an edge case false > negative. I believe that requests' API is too dynamically-typed to fit > into that category at this time. I think we agree that, static type checks or no static type checks, requests is going to need to do runtime type checks. So why does it matter if it misses a few type errors at compile time? I think we're all in agreement that for extremely dynamic code like requests, you may not get as much value from static type checks as some other libraries or applications. You might even decide that you get no value at all. Okay, that's fine. I'm just suggesting that you don't have just two choices, "all or nothing". The whole point of gradual typing is to give developers more options. > PS: I should mention that, as Gary Bernhardt pointed out at PyCon, > people often believe (incorrectly) that types are a replacement for > tests. They *can* be a replacement for tests. You don't see Java or Haskell programmers writing unit tests to check that their code never tries to add a string to a float. Even if they could write such as test, they don't bother because the type checker will catch that sort of error. The situation in Python is a bit different, and as Antoine points out, libraries cannot rely on their callers obeying the type restrictions of the public API. (Private functions are different -- if you call my private function with the wrong type and blow up your computer, it's your own fault.) For libraries, I see type checks as complementing tests, not replacing them. But for application code, type checks may replace unit tests, provided that nobody checks in production code until both the type checker and the unit tests pass. If you work under that rule, there's no point in having the unit tests check what the type checker already tested. > For that reason I feel like underspecified type hints are > something of an attractive nuisance. Again, I really think this is a > case of do it properly or not at all. In my opinion, underspecified type hints are no more of an attractive nuisance than a test suite which doesn't test enough. Full coverage is great, but 10% coverage is better than 5% coverage, which is better than nothing. That applies whether we are talking about tests, type checks, or documentation. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] async/await in Python; v2
On Fri, Apr 24, 2015 at 09:32:51AM -0400, Barry Warsaw wrote: > On Apr 24, 2015, at 11:17 PM, Steven D'Aprano wrote: > > >It seems to me that tools that search for r"^\s*def\s+spam\s*\(" are > > They would likely search for something like r"^\s*def\s+[a-zA-Z0-9_]+" which > will hit "def async spam" but not "async def". Unless somebody wants to do a survey of editors and IDEs and other tools, arguments about what regex they may or may not use to search for function definitions is an exercise in futility. They may use regexes anchored to the start of the line. They may not. They may deal with "def async" better than "async def", or the other way around. Either way, it's a pretty thin argument for breaking the invariant that the token following `def` is the name of the function. Whatever new syntax is added, something is going to break. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type hints -- a mediocre programmer's reaction
On Sat, Apr 25, 2015 at 02:05:15AM +0100, Ronan Lamy wrote: > * Hints have no run-time effect. The interpreter cannot assume that they > are obeyed. I know what you mean, but just for the record, annotations are runtime inspectable, so people can (and probably have already started) to write runtime argument checking decorators or frameworks which rely on the type hints. > * PEP484 hints are too high-level. Replacing an 'int' object with a > single machine word would be useful, but an 'int' annotation gives no > guarantee that it's correct (because Python 3 ints can have arbitrary > size and because subclasses of 'int' can override any operation to > invoke arbitrary code). Then create your own int16, uint64 etc types. > * A lot more information is needed to produce good code (e.g. “this f() > called here really means this function there, and will never be > monkey-patched” – same with len() or list(), btw). > * Most of this information cannot easily be expressed as a type > * If the interpreter gathers all that information, it'll probably have > gathered a superset of what PEP484 can provide anyway. All this is a red herring. If type hints are useful to PyPy, that's a bonus. Cython uses its own system of type hints, a future version may be able to use PEP 484 hints instead. But any performance benefit is a bonus. PEP 484 is for increasing correctness, not speed. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What's missing in PEP-484 (Type hints)
On Thu, Apr 30, 2015 at 01:41:53PM +0200, Dima Tisnek wrote: > # Syntactic sugar > "Beautiful is better than ugly, " thus nice syntax is needed. > Current syntax is very mechanical. > Syntactic sugar is needed on top of current PEP. I think the annotation syntax is beautiful. It reminds me of Pascal. > # internal vs external > @intify > def foo() -> int: > b = "42" > return b # check 1 > x = foo() // 2 # check 2 > > Does the return type apply to implementation (str) or decorated callable > (int)? I would expect that a static type checker would look at foo, and flag this as an error. The annotation says that foo returns an int, but it clearly returns a string. That's an obvious error. Here is how I would write that: # Perhaps typing should have a Function type? def intify(func: Callable[[], str]) -> Callable[[], int]: @functools.wraps(func) def inner() -> int: return int(func()) return inner @intify def foo() -> str: b = "42" return b That should, I hope, pass the type check, and without lying about the signature of *undecorated* foo. The one problem with this is that naive readers will assume that *decorated* foo also has a return type of str, and be confused. That's a problem. One solution might be, "don't write decorators that change the return type", but that seems horribly restrictive. Another solution might be to write a comment: @intify # changes return type to int def foo() -> str: ... but that's duplicating information already in the intify decorator, and it relies on the programmer writing a comment, which people don't do unless they really need to. I think that the only solution is education: given a decorator, you cannot assume that the annotations still apply unless you know what the decorator does. > How can same annotation or a pair of annotations be used to: > * validate return statement type > * validate subsequent use > * look reasonable in the source code > > > # lambda > Not mentioned in the PEP, omitted for convenience or is there a rationale? > f = lambda x: None if x is None else str(x ** 2) > Current syntax seems to preclude annotation of `x` due to colon. > Current syntax sort of allows lamba return type annotation, but it's > easy to confuse with `f`. I don't believe that you can annotate lambda functions with current syntax. For many purposes, I do not think that is important: a good type checker will often be able to infer the return type of the lambda, and from that infer what argument types are permitted: lambda arg: arg + 1 Obviously arg must be a Number, since it has to support addition with ints. > # local variables > Not mentioned in the PEP > Non-trivial code could really use these. Normally local variables will have their type inferred from the operations done to them: s = arg[1:] # s has the same type as arg When that is not satisfactory, you can annotate variables with a comment: s = arg[1:] #type: List[int] https://www.python.org/dev/peps/pep-0484/#id24 > # global variables > Not mentioned in the PEP > Module-level globals are part of API, annotation is welcome. > What is the syntax? As above. > # comprehensions > [3 * x.data for x in foo if "bar" in x.type] > Arguable, perhaps annotation is only needed on `foo` here, but then > how complex comprehensions, e.g. below, the intermediate comprehension > could use an annotation > [xx for y in [...] if ...] A list comprehension is obviously of type List. If you need to give a more specific hint: result = [expr for x in things if cond(x)] #type: List[Whatever] See also the discussion of "cast" in the PEP. https://www.python.org/dev/peps/pep-0484/#id25 > # class attributes > s = socket.socket(...) > s.type, s.family, s.proto # int > s.fileno # callable > If annotations are only available for methods, it will lead to > Java-style explicit getters and setters. > Python language and data model prefers properties instead, thus > annotations are needed on attributes. class Thing: a = 42 # can be inferred b = [] # inferred as List[Any] c = [] #type: List[float] -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492 quibble and request
On Wed, Apr 29, 2015 at 06:12:37PM -0700, Guido van Rossum wrote: > On Wed, Apr 29, 2015 at 5:59 PM, Nick Coghlan wrote: > > > On 30 April 2015 at 10:21, Ethan Furman wrote: > > > From the PEP: > > > > > >> Why not a __future__ import > > >> > > >> __future__ imports are inconvenient and easy to forget to add. > > > > > > That is a horrible rationale for not using an import. By that logic we > > > should have everything in built-ins. ;) > > > > This response is silly. The point is not against import but against > __future__. A __future__ import definitely is inconvenient -- few people I > know could even recite the correct constraints on their placement. Are you talking about actual Python programmers, or people who dabble with the odd Python script now and again? I'm kinda shocked if it's the first. It's not a complex rule: the __future__ import must be the first line of actual executable code in the file, so it can come after any encoding cookie, module docstring, comments and blank lines, but before any other code. The only part I didn't remember was that you can have multiple __future__ imports, I thought they all had to be on one line. (Nice to learn something new!) [...] > > 'as' went through the "not really a keyword" path, and > > it's a recipe for complexity in the code generation toolchain and > > general quirkiness as things behave in unexpected ways. > > > > I don't recall that -- but it was a really long time ago so I may > misremember (did we even have __future__ at the time?). I have a memory of much rejoicing when "as" was made a keyword, and an emphatic "we're never going to do that again!" about semi-keywords. I've tried searching for the relevant post(s), but cannot find anything. Maybe I imagined it? But I do have Python 2.4 available, when we could write lovely code like this: py> import math as as py> as I'm definitely not looking forward to anything like that again. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492 quibble and request
On Wed, Apr 29, 2015 at 07:31:22PM -0700, Guido van Rossum wrote: > Ah, but here's the other clever bit: it's only interpreted this way > *inside* a function declared with 'async def'. Outside such functions, > 'await' is not a keyword, so that grammar rule doesn't trigger. (Kind of > similar to the way that the print_function __future__ disables the > keyword-ness of 'print', except here it's toggled on or off depending on > whether the nearest surrounding scope is 'async def' or not. The PEP could > probably be clearer about this; it's all hidden in the Transition Plan > section.) You mean we could write code like this? def await(x): ... if condition: async def spam(): await (eggs or cheese) else: def spam(): await(eggs or cheese) I must admit that's kind of cool, but I'm sure I'd regret it. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: async/await in Python; version 4
On Fri, May 01, 2015 at 09:24:47PM +0100, Arnaud Delobelle wrote: > I'm not convinced that allowing an object to be both a normal and an > async iterator is a good thing. It could be a recipe for confusion. In what way? I'm thinking that the only confusion would be if you wrote "async for" instead of "for", or vice versa, and instead of getting an exception you got the (a)syncronous behaviour you didn't want. But I have no intuition for how likely it is that you could write an asyncronous for loop, leave out the async, and still have the code do something meaningful. Other than that, I think it would be fine to have an object be both a syncronous and asyncronous iterator. You specify the behaviour you want by how you use it. We can already do that, e.g. unittest's assertRaises is both a test assertion and a context manager. Objects can have multiple roles, and it's not usually abused, or confusing. I'm not sure that async iterables will be any different. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 557: Data Classes
On Fri, Sep 08, 2017 at 10:37:12AM -0700, Nick Coghlan wrote: > > def __eq__(self, other): > > if other.__class__ is self.__class__: > > return (self.name, self.unit_price, self.quantity_on_hand) == > > (other.name, other.unit_price, other.quantity_on_hand) > > return NotImplemented > > My one technical question about the PEP relates to the use of an exact > type check in the comparison methods, rather than "isinstance(other, > self.__class__)". I haven't read the whole PEP in close detail, but that method stood out for me too. Only, unlike Nick, I don't think I agree with the decision. I'm also not convinced that we should be adding ordered comparisons (__lt__ __gt__ etc) by default, if these DataClasses are considered more like structs/records than tuples. The closest existing equivalent to a struct in the std lib (apart from namedtuple) is, I think, SimpleNamespace, and they are unorderable. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 559 - built-in noop()
On Mon, Sep 11, 2017 at 07:39:07AM +1000, Chris Angelico wrote: [...] > As a language change, definitely not. But I like this idea for > PYTHONBREAKPOINT. You set it to the name of a function, or to "pass" > if you want nothing to be done. It's a special case that can't > possibly conflict with normal usage. I disagree -- its a confusion of concepts. "pass" is a do-nothing statement, not a value, so you can't set something to pass. Expect a lot of StackOverflow questions asking why this doesn't work: sys.breakpoint = pass In fact, in one sense pass is not even a statement. It has no runtime effect, it isn't compiled into any bytecode. It is a purely syntactic feature to satisfy the parser. Of course env variables are actually strings, so we can choose "pass" to mean "no break point" if we wanted. But I think there are already two perfectly good candidates for that usage which don't mix the concepts of statements and values, the empty string, and None: setenv PYTHONBREAKPOINT="" setenv PYTHONBREAKPOINT=None -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 544
On Wed, Oct 04, 2017 at 03:56:14PM -0700, VERY ANONYMOUS wrote: > i want to learn Start by learning to communicate in full sentences. You want to learn what? Core development? Python? How to program? English? This is not a mailing list for Python beginners. Try the "tutor" or "python-list" mailing lists. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] iso8601 parsing
On Wed, Oct 25, 2017 at 04:32:39PM -0400, Alexander Belopolsky wrote: > On Wed, Oct 25, 2017 at 3:48 PM, Alex Walters wrote: > > Why make parsing ISO time special? > > It's not the ISO format per se that is special, but parsing of str(x). > For all numeric types, int, float, complex and even > fractions.Fraction, we have a roundtrip invariant T(str(x)) == x. > Datetime types are a special kind of numbers, but they don't follow > this established pattern. This is annoying when you deal with time > series where it is common to have text files with a mix of dates, > timestamps and numbers. You can write generic code to deal with ints > and floats, but have to special-case anything time related. Maybe I'm just being slow today, but I don't see how you can write "generic code" to convert text to int/float/complex/Fraction, but not times. The only difference is that instead of calling the type directly, you call the appropriate classmethod. What am I missing? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] \G (match last position) regex operator non-existant in python?
On Sun, Oct 29, 2017 at 12:31:01AM +0100, MRAB wrote: > Not that I'm planning on making any further additions, just bug fixes > and updates to follow the Unicode updates. I think I've crammed enough > into it already. There's only so much you can do with the regex syntax > with its handful of metacharacters and possible escape sequences... What do you think of the Perl 6 regex syntax? https://en.wikipedia.org/wiki/Perl_6_rules#Changes_from_Perl_5 -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 563: Postponed Evaluation of Annotations
On Wed, Nov 01, 2017 at 03:48:00PM -0700, Lukasz Langa wrote: > PEP: 563 > Title: Postponed Evaluation of Annotations > This PEP proposes changing function annotations and variable annotations > so that they are no longer evaluated at function definition time. > Instead, they are preserved in ``__annotations__`` in string form. This means that now *all* annotations, not just forward references, are no longer validated at runtime and will allow arbitrary typos and errors: def spam(n:itn): # now valid ... Up to now, it has been only forward references that were vulnerable to that sort of thing. Of course running a type checker should pick those errors up, but the evaluation of annotations ensures that they are actually valid (not necessarily correct, but at least a valid name), even if you happen to not be running a type checker. That's useful. Are we happy to live with that change? > Rationale and Goals > === > > PEP 3107 added support for arbitrary annotations on parts of a function > definition. Just like default values, annotations are evaluated at > function definition time. This creates a number of issues for the type > hinting use case: > > * forward references: when a type hint contains names that have not been > defined yet, that definition needs to be expressed as a string > literal; After all the discussion, I still don't see why this is an issue. Strings makes perfectly fine forward references. What is the problem that needs solving? Is this about people not wanting to type the leading and trailing ' around forward references? > * type hints are executed at module import time, which is not > computationally free. True; but is that really a performance bottleneck? If it is, that should be stated in the PEP, and state what typical performance improvement this change should give. After all, if we're going to break people's code in order to improve performance, we should at least be sure that it improves performance :-) > Postponing the evaluation of annotations solves both problems. Actually it doesn't. As your PEP says later: > This PEP is meant to solve the problem of forward references in type > annotations. There are still cases outside of annotations where > forward references will require usage of string literals. Those are > listed in a later section of this document. So the primary problem this PEP is designed to solve, isn't actually solved by this PEP. (See Guido's comments, quoted later.) > Implementation > == > > In Python 4.0, function and variable annotations will no longer be > evaluated at definition time. Instead, a string form will be preserved > in the respective ``__annotations__`` dictionary. Static type checkers > will see no difference in behavior, Static checkers don't see __annotations__ at all, since that's not available at edit/compile time. Static checkers see only the source code. The checker (and the human reader!) will no longer have the useful clue that something is a forward reference: # before class C: def method(self, other:'C'): ... since the quotes around C will be redundant and almost certainly left out. And if they aren't left out, then what are we to make of the annotation? Will the quotes be stripped out, or left in? In other words, will method's __annotations__ contain 'C' or "'C'"? That will make a difference when the type hint is eval'ed. > If an annotation was already a string, this string is preserved > verbatim. That's ambiguous. See above. > Annotations can only use names present in the module scope as postponed > evaluation using local names is not reliable (with the sole exception of > class-level names resolved by ``typing.get_type_hints()``). Even if you call get_type_hints from inside the function defining the local names? def function(): A = something() def inner(x:A)->int: ... d = typing.get_type_hints(inner) return (d, inner) I would expect that should work. Will it? > For code which uses annotations for other purposes, a regular > ``eval(ann, globals, locals)`` call is enough to resolve the > annotation. Let's just hope nobody doing that has allowed any tainted strings to be stuffed into __annotations__. > * modules should use their own ``__dict__``. Which is better written as ``vars()`` with no argument, I believe. Or possibly ``globals()``. > If a function generates a class or a function with annotations that > have to use local variables, it can populate the given generated > object's ``__annotations__`` dictionary directly, without relying on > the compiler. I don't understand this paragraph. > The biggest controversy on the issue was Guido van Rossum's concern > that untokenizing annotation expressions back to their string form has > no precedent in the Python programming language and feels like a hacky > workaround. He said: > > One thing that comes to mind is that i
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Nov 06, 2017 at 01:07:51PM +1000, Nick Coghlan wrote: > That means our choices for 3.7 boil down to: > > * make this a language level guarantee that Python devs can reasonably rely on > * deliberately perturb dict iteration in CPython the same way the > default Go runtime does [1] I agree with this choice. My preference is for the first: having dicts be unordered has never been a positive virtue in itself, but always the cost we paid for fast O(1) access. Now what we have fast O(1) access *without* dicts being unordered, we should make it a language guarantee. Provided of course that we can be reasonable certain that other implementations can do the same. And it looks like we can. But if we wanted to still keep our options open, how about weakening the requirement that globals() and object __dicts__ be specifically the same type as builtin dict? That way if we discover a super-fast and compact dict implementation (maybe one that allows only string keys?) that is unordered, we can use it for object namespaces without affecting the builtin dict. > When we did the "insertion ordered hash map" availability review, the > main implementations we were checking on behalf of were Jython & VOC > (JVM implementations), Batavia (JavaScript implementation), and > MicroPython (C implementation). Adding IronPython (C# implementation) > to the mix gives: Shouldn't we check with Nuitka (C++) and Cython as well? I'd be surprised if this is a problem for either of them, but we should ask. > Since the round-trip behaviour that comes from guaranteed order > preservation is genuinely useful, and we're comfortable with folks > switching to more specialised containers when they need different > performance characteristics from what the builtins provide, elevating > insertion order preservation to a language level requirements makes > sense. +1 OrderedDict could then become a thin wrapper around regular dicts. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Nov 06, 2017 at 12:27:54PM +0100, Antoine Pitrou wrote: > The ordered-ness of dicts could instead become one of those stable > CPython implementation details, such as the fact that resources are > cleaned up timely by reference counting, that people nevertheless > should not rely on if they're writing portable code. Given that (according to others) none of IronPython, Jython, Batavia, Nuitka, or even MicroPython, should have trouble implementing an insertion-order preserving dict, and that PyPy already has, why should we say it is a CPython implementation detail? -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote: > > I don't think that situation should change the decision, > > Indeed, it shouldn't. What may change it is the simple and obvious fact > that there's no need to change anything, as proven by the 25-year > history of the language. I disagree -- the history of Python shows that having dicts be unordered is a PITA for many Python programmers. Python eventually gained an ordered dict because it provides useful functionality that developers demand. Every new generation of Python programmers comes along and gets confused by why dicts mysteriously change their order from how they were entered, why doctests involving dicts break, why keyword arguments lose their order, why they have to import a module to get ordered dicts instead of having it be built-in, etc. Historically, we had things like ConfigParser reordering ini files when you write them. Having dicts be unordered is not a positive virtue, it is a limitation. Up until now, it was the price we've paid for having fast, O(1) dicts. Now we have a dict implementation which is fast, O(1) and ordered. Why pretend that we don't? This is a long-requested feature, and the cost appears to be small: by specifying this, all we do is rule out some, but not all, hypothetical future optimizations. Unordered dicts served CPython well for 20+ years, but I doubt many people will miss them. > What happens now borders on technologic surrealism - the CPython, after > many years of persuasion, switched its dict algorithm, rather > inefficient in terms of memory, to something else, less inefficient > (still quite inefficient, taking "no overhead" as the baseline). Trading off space for time is a very common practice. You said that lookups on MicroPython's dicts are O(N). How efficient is µPy when doing a lookup of a dict with ten million keys? µPy has chosen to optimize for space, rather than time. That's great. But I don't think you should sneer at CPython's choice to optimize for time instead. And given that µPy's dicts already fail to meet the expected O(1) dict behviour, and the already large number of functional differences (not just performance differences) between µPy and Python: http://docs.micropython.org/en/latest/pyboard/genrst/index.html I don't think that this will make much practical difference. MicroPython users already cannot expect to run arbitrary Python code that works in other implementations: the Python community is fragmented between µPy code written for tiny machines, and Python code for machines with lots of memory. > That > algorithm randomly had another property. Now there's a seemingly > serious talk of letting that property leak into the *language spec*, It will no more be a "leak" than any other deliberate design choice. > despite the fact that there can be unlimited number of dictionary > algorithms, most of them not having that property. Sure. So what? There's an unlimited number of algorithms that don't provide the functionality that we want. There are an unlimited number of sort algorithms, but Python guarantees that we're only going to use those that are stable. Similar applies for method resolution (which µPy already violates), strings, etc. > What it will lead to is further fragmentation of the community. Aren't you concerned about fragmenting the community because of the functional differences between MicroPython and the specs? Sometimes a small amount of fragmentation is unavoidable, and not necessarily a bad thing. > > P.S. If anyone does want to explore MicroPython's dict implementation, > > and see if there might be an alternate implementation strategy that > > offers both O(1) lookup and guaranteed ordering without using > > additional memory > > That would be the first programmer in the history to have a cake and > eat it too. Memory efficiency, runtime efficiency, sorted order: choose > 2 of 3. Given that you state that µPy dicts are O(N) and unordered, does that mean you picked only 1 out of 3? -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Nov 06, 2017 at 11:33:10AM -0800, Barry Warsaw wrote: > If we did make the change, it’s possible we would need a way to > explicit say that order is not preserved. That seems a little weird > to me, but I suppose it could be useful. Useful for what? Given that we will hypothetically have order-preserving dicts that perform no worse than unordered dicts, I'm struggling to think of a reason (apart from performance) why somebody would intentionally use a non-ordered dict. If performance was an issue, sure, it makes sense to have a non-ordered dict for when you don't want to pay the cost of keeping insertion order. But performance seems to be a non-issue. I can see people wanting a SortedDict which automatically sorts the keys into some specified order. If I really work at it, I can imagine that there might even be a use-case for randomizing the key order (like calling random.shuffle on the keys). But if you are willing to use a dict with arbitrary order, that means that *you don't care* what order the keys are in. If you don't care, then insertion order should be no better or worse than any other implementation-defined arbitrary order. > I like the idea previously > brought up that iteration order be deliberately randomized in that > case, but we’d still need a good way to spell that. That would only be in the scenario that we decide *not* to guarantee insertion-order preserving semantics for dicts, in order to prevent users from relying on an implementation feature that isn't a language guarantee. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote: > I strongly opposed adding an ordered guarantee to regular dicts. If the > implementation happens to keep that, great. That's the worst of both worlds. The status quo is that unless we deliberately perturb the dictionary order, developers will come to rely on implementation order (because that's what the CPython reference implementation actually offers, regardless of what the docs say). Consequently: - people will be writing non-portable code, whether they know it or not; - CPython won't be able to change the implementation, because it will break too much code; - other implementations will be pressured to match CPython's implementation. The only difference is that on the one hand we are honest and up-front about requiring order-preserving dicts, and on the other we still require it, but pretend that we don't. And frankly, it seems rather perverse to intentionally perturb dictionary order just to keep our options open that someday there might be some algorithm which offers sufficiently better performance but doesn't preserve order. Preserving order is useful, desirable, often requested functionality, and now that we have it, it would have to be one hell of an optimization to justify dropping it again. (It is like Timsort and stability. How much faster sorting would it have taken to justify giving up sort stability? 50% faster? 100%? We wouldn't have done it for a 1% speedup.) It would be better to relax the requirement that builtin dict is used for those things that would benefit from improved performance. Is there any need for globals() to be the same mapping type as dict? Probably not. If somebody comes up with a much more efficient, non-order- preserving map ideal for globals, it would be better to change globals than dict. In my opinion. > Maybe OrderedDict can be > rewritten to use the dict implementation. But the evidence that all > implementations will always be fine with this restraint feels poor, I think you have a different definition of "poor" to me :-) Nick has already done a survey of PyPy (which already has insertion- order preserving dicts), Jython, VOC, and Batavia, and they don't have any problem with this. IronPython is built on C#, which has order- preserving mappings. Nuitka is built on C++, and if C++ can't implement an order-preserving mapping, there is something terribly wrong with the world. Cython (I believe) uses CPython's implementation, as does Stackless. The only well-known implementation that may have trouble with this is MicroPython, but it already changes the functionality of a lot of builtins and core language features, e.g. it uses a different method resolution order (so multiple inheritence won't work right), some builtins don't support slicing with three arguments, etc. I think the evidence is excellent that other implementations shouldn't have a problem with this, unless (like MicroPython) they are targetting machines with tiny memory resources. µPy runs on the PyBoard, which I believe has under 200K of memory. I think we can all forgive µPy if it only *approximately* matches Python semantics. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Nov 06, 2017 at 06:35:48PM +0200, Paul Sokolovsky wrote: > For MicroPython, it would lead to quite an overhead to make > dictionary items be in insertion order. As I mentioned, MicroPython > optimizes for very low bookkeeping memory overhead, so lookups are > effectively O(n), but orderedness will increase constant factor > significantly, perhaps 5x. Paul, it would be good if you could respond to Raymond's earlier comments where he wrote: I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering. The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search. However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered. See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a110a23c6a232/py/map.c#L139 Raymond has also volunteered to assist with this. > Also, arguably any algorithm which would *maintain* insertion order > over mutating operations would be more complex and/or require more > memory that one which doesn't. I think it would be reasonable to say that builtin dicts only maintain insertion order for insertions, lookups, and changing the value. Any mutation which deletes keys may arbitrarily re-order the dict. If the user wants a stronger guarantee, then they should use OrderedDict. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Nov 06, 2017 at 10:17:23PM -0200, Joao S. O. Bueno wrote: > And also, forgot along the discussion, is the big disadvantage that > other Python implementations would have a quite > significant overhead on mandatory ordered dicts. I don't think that is correct. Nick already did a survey, and found that C# (IronPython), Java (Jython and VOC) and Javascript (Batavia) all have acceptable insertion-order preserving mappings. C++ (Nuitka) surely won't have any problem with this (if C++ cannot implement an efficient order-preserving map, there is something terribly wrong with the world). As for other languages that somebody might choose to build Python on (the Parrot VM, Haskell, D, Rust, etc) surely we shouldn't be limiting what Python does for the sake of hypothetical implementations in "underpowered" languages? I don't mean to imply that any of those examples are necessarily underpowered, but if language Foo is incapable of supporting an efficient ordered map, then language Foo is simply not good enough for a serious Python implementation. We shouldn't allow Python's evolution to be hamstrung by the requirement to support arbitrarily weak implementation languages. > One that was mentioned along the way is transpilers, with > Brython as an example - but there might be others. Since Brython transpiles to Javascript, couldn't it use the standard Map object, which preserves insertion order? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map Quote: Description A Map object iterates its elements in insertion order The EMCAScript 6 standard specifies that Map.prototype.forEach operates over the key/value pairs in insertion order: https://tc39.github.io/ecma262/#sec-map-objects -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Tue, Nov 07, 2017 at 05:28:24PM +1000, Nick Coghlan wrote: > On 7 November 2017 at 16:21, Steven D'Aprano wrote: > > On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote: > >> Maybe OrderedDict can be > >> rewritten to use the dict implementation. But the evidence that all > >> implementations will always be fine with this restraint feels poor, > > > > I think you have a different definition of "poor" to me :-) > > While I think "poor" is understating the case, I think "excellent" > (which you use later on) is overstating it. My own characterisation > would be "at least arguably good enough". Fair enough, and thanks for elaborating. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The current dict is not an "OrderedDict"
On Tue, Nov 07, 2017 at 03:32:29PM +0100, Antoine Pitrou wrote: [...] > > "Insertion ordered until the first key removal" is the only guarantee > > that's being proposed. > > Is it? It seems to me that many arguments being made are only relevant > under the hypothesis that insertion is ordered even after the first key > removal. For example the user-friendliness argument, for I don't > think it's very user-friendly to have a guarantee that disappears > forever on the first __del__. Don't let the perfect be the enemy of the good. For many applications, keys are never removed from the dict, so this doesn't matter. If you never delete a key, then the remaining keys will never be reordered. I think that Nick's intent was not to say that after a single deletion, the ordering guarantee goes away "forever", but that a deletion may be permitted to reorder the keys, after which further additions will honour insertion order. At least, that's how I interpret him. To clarify: if we start with an empty dict, add keys A...D, delete B, then add E...H, we could expect: {A: 1} {A: 1, B: 2} {A: 1, B: 2, C: 3} {A: 1, B: 2, C: 3, D: 4} {D: 4, A: 1, C: 3} # some arbitrary reordering {D: 4, A: 1, C: 3, E: 5} {D: 4, A: 1, C: 3, E: 5, F: 6} {D: 4, A: 1, C: 3, E: 5, F: 6, G: 7} {D: 4, A: 1, C: 3, E: 5, F: 6, G: 7, H: 8} Nick, am I correct that this was your intent? -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The current dict is not an "OrderedDict"
On Tue, Nov 07, 2017 at 05:37:15PM +0200, Serhiy Storchaka wrote: > 07.11.17 16:56, Steven D'Aprano пише: > >To clarify: if we start with an empty dict, add keys A...D, delete B, > >then add E...H, we could expect: [...] > Rather > > {A: 1, D: 4, C: 3} # move the last item in place of removed > {A: 1, D: 4, C: 3, E: 5} Thanks for the correction. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Comment on PEP 562 (Module __getattr__ and __dir__)
On Sun, Nov 19, 2017 at 08:24:00PM +, Mark Shannon wrote: > Hi, > > Just one comment. Could the new behaviour of attribute lookup on a > module be spelled out more explicitly please? > > > I'm guessing it is now something like: > > `module.__getattribute__` is now equivalent to: > > def __getattribute__(mod, name): > try: > return object.__getattribute__(mod, name) > except AttributeError: > try: > getter = mod.__dict__["__getattr__"] A minor point: this should(?) be written in terms of the public interface for accessing namespaces, namely: getter = vars(mod)["__getattr__"] -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Comment on PEP 562 (Module __getattr__ and __dir__)
On Sun, Nov 19, 2017 at 05:34:35PM -0800, Guido van Rossum wrote: > On Sun, Nov 19, 2017 at 4:57 PM, Steven D'Aprano > wrote: > > A minor point: this should(?) be written in terms of the public > > interface for accessing namespaces, namely: > > > > getter = vars(mod)["__getattr__"] > > Should it? The PEP is not proposing anything for other namespaces. What > difference do you envision this way of specifying it would make? I don't know if it should -- that's why I included the question mark. But my idea is that __dict__ is the implementation and vars() is the interface to __dir__, and we should prefer using the interface rather than the implementation unless there's a good reason not to. (I'm not talking here about changing the actual name lookup code to go through vars(). I'm just talking about how we write the equivalent recipe.) Its not a big deal either way, __dict__ is already heavily used and vars() poorly known. Call it a matter of taste, if you like, but in my opinion the fewer times we directly reference dunders, the better. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What's the status of PEP 505: None-aware operators?
On Thu, Nov 30, 2017 at 11:54:39PM -0500, Random832 wrote: > The OP isn't confusing anything; it's Eric who is confused. The quoted > paragraph of the PEP clearly and unambiguously claims that the sequence > is "arguments -> function -> call", meaning that something happens after > the "function" stage [i.e. a None check] cannot short-circuit the > "arguments" stage. But in fact the sequence is "function -> arguments -> > call". I'm more confused than ever. You seem to be arguing that Python functions CAN short-circuit their arguments and avoid evaluating them. Is that the case? If not, then I fail to see the difference between "arguments -> function -> call" "function -> arguments -> call" In *both cases* the arguments are fully evaluated before the function is called, and so there is nothing the function can do to delay evaluating its arguments. If this is merely about when the name "function" is looked up, then I don't see why that's relevant to the PEP. What am I missing? -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What's the status of PEP 505: None-aware operators?
On Fri, Dec 01, 2017 at 08:24:05AM -0500, Random832 wrote: > On Fri, Dec 1, 2017, at 05:31, Steven D'Aprano wrote: > > I'm more confused than ever. You seem to be arguing that Python > > functions CAN short-circuit their arguments and avoid evaluating them. > > Is that the case? > > > If this is merely about when the name "function" is looked up, then I > > don't see why that's relevant to the PEP. > > > > What am I missing? > > You're completely missing the context of the discussion, Yes I am. That's why I asked. > which was the > supposed reason that a *new* function call operator, with the proposed > syntax function?(args), that would short-circuit (based on the > 'function' being None) could not be implemented. Given that neither your post (which I replied to) nor the post you were replying to mentioned anything about function?() syntax, perhaps I might be forgiven for having no idea what you were talking about? The PEP only mentions function?() as a rejected idea, do I don't know why we're even talking about it. The PEP is deferred, with considerable opposition and luke-warm support, even the PEP author has said he's not going to push for it, and we're arguing about a pedantic point related to a part of the PEP which is rejected... :-) -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Dec 18, 2017 at 06:11:05PM -0800, Chris Barker wrote: > Now that dicts are order-preserving, maybe we should change prettyprint: > > In [7]: d = {'one':1, 'two':2, 'three':3} > > In [8]: print(d) > {'one': 1, 'two': 2, 'three': 3} > > order preserved. > > In [9]: pprint.pprint(d) > {'one': 1, 'three': 3, 'two': 2} > > order not preserved ( sorted, I presume? ) Indeed. pprint.PrettyPrinter has separate methods for OrderedDict and regular dicts, and the method for printing dicts calls sorted() while the other does not. > With arbitrary order, it made sense to sort, so as to always give the same > "pretty" representation. But now that order is "part of" the dict itself, > it seems prettyprint should present the preserved order of the dict. I disagree. Many uses of dicts are still conceptually unordered, even if the dict now preserves insertion order. For those use-cases, insertion order is of no interest whatsoever, and sorting is still "prettier". -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Dec 18, 2017 at 07:37:03PM -0800, Nathaniel Smith wrote: > On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw wrote: > > On Dec 18, 2017, at 21:11, Chris Barker wrote: > > > >> Will changing pprint be considered a breaking change? > > > > Yes, definitely. > > Wait, what? Why would changing pprint (so that it accurately reflects > dict's new underlying semantics!) be a breaking change? I have a script which today prints data like so: {'Aaron': 62, 'Anne': 51, 'Bob': 23, 'George': 30, 'Karen': 45, 'Sue': 17, 'Sylvester': 34} Tomorrow, it will suddenly start printing: {'Bob': 23, 'Karen': 45, 'Sue': 17, 'George': 30, 'Aaron': 62, 'Anne': 51, 'Sylvester': 34} and my users will yell at me that my script is broken because the data is now in random order. Now, maybe that's my own damn fault for using pprint instead of writing my own pretty printer... but surely the point of pprint is so I don't have to write my own? Besides, the docs say very prominently: "Dictionaries are sorted by key before the display is computed." https://docs.python.org/3/library/pprint.html so I think I can be excused having relied on that feature. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Guarantee ordered dict literals in v3.7?
On Mon, Dec 18, 2017 at 08:28:52PM -0800, Chris Barker wrote: > On Mon, Dec 18, 2017 at 7:41 PM, Steven D'Aprano > wrote: > > > > With arbitrary order, it made sense to sort, so as to always give the > > same > > > "pretty" representation. But now that order is "part of" the dict itself, > > > it seems prettyprint should present the preserved order of the dict. > > > > I disagree. Many uses of dicts are still conceptually unordered, even if > > the dict now preserves insertion order. For those use-cases, insertion > > order is of no interest whatsoever, and sorting is still "prettier". > > > > and many uses of dicts have "sorted" order as completely irrelevant, and > sorting them arbitrarily is not necessarily pretty (you can't provide a > sort key can you? -- so yes, it's arbitrary) I completely agree. We might argue that it was a mistake to sort dicts in the first place, or at least a mistake to *always* sort them without allowing the caller to provide a sort key. But what's done is done: the fact that dicts are sorted by pprint is not merely an implementation detail, but a documented behaviour. > I'm not necessarily saying we should break things, but I won't agree that > pprint sorting dicts is the "right" interface for what is actually an > order-preserving mapping. If sorting dicts was the "right" behaviour in Python 3.4, it remains the right behaviour -- at least for use-cases that don't care about insertion order. Anyone using pprint on dicts *now* doesn't care about insertion order. If they did, they would be using OrderedDict. That will change in the future, but even in the future there are lots of use-cases for dicts where insertion order might as well be random. The order that some dict happen to be constructed may not be "pretty" or significant or even consistent from one dict to the next. (If your key/values pairs are coming in from an external source, they might not always come in the same order.) I'm not denying that sometimes it would be nice to see dicts in insertion order. Right now, those use-cases are handled by OrderedDict but in the future many of them will be taken over by regular dicts. So we have a conflict: - for some use-cases, insertion order is the "right" way for pprint to display the dict; - but for others, sorting by keys is the "pretty" way for pprint to display the dict; - and there's no way for pprint to know which is which just by inspecting the dict. How to break this tie? Backwards compatibility trumps all. If we want to change the default behaviour of pprint, we need to go through a deprecation period. Or add a flag sorted=True, and let the caller decide. > I would think it was only the right choice in the first place in order (get > it?) to get a consistent representation, not because sorting was a good > thing per se. *shrug* That's arguable. As you said yourself, dicts were sorted by key to give a "pretty" representation. I'm not so sure that consistency is the justification. What does that even mean? If you print the same dict twice, with no modifications, it will print the same whether you sort first or not. If you print two different dicts, who is to say that they were constructed in the same order? But the point is moot: whatever the justification, the fact that pprint sorts dicts by key is the defined behaviour, and even if it was a mistake to guarantee it, we can't just change it without a deprecation period. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Revisiting old enhancement requests
What is the best practice for revisiting old enhancement requests on the tracker, if I believe that the time is right to revisit a rejected issue from many years ago? (Nearly a decade.) Should I raise a new enhancement request and link back to the old one, or re-open the original? Thanks, -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com