[Python-Dev] PEP 487 vs 422 (dynamic class decoration)
I recently got an inquiry from some of my users about porting some of my libraries to Python 3 that make use of the Python 2 __metaclass__ facility. While checking up on the status of PEP 422 today, I found out about its recently proposed replacement, PEP 487. While PEP 487 is a generally fine PEP, it actually *rules out* the specific use case that I wanted PEP 422 for in the first place: dynamic addition of callbacks or decorators for use at class creation time without requiring explicit inheritance or metaclass participation. (So that e.g. method decorators can access the enclosing class at class definition time.) As discussed previously prior to the creation of PEP 422, it is not possible to port certain features of my libraries to work on Python 3 without some form of that ability, and the only thing that I know of that could even *potentially* provide that ability outside of PEP 422 is monkeypatching __build_class__ (which might not even work). That is, the very thing that PEP 422 was created to avoid the need for. ;-) One possible alteration would be to replace __init_subclass__ with some sort of __init_class__ invoked on the class that provides it, not just subclasses. That would allow the kind of dynamic decoration that PEP 422 allows. However, this approach was rather specifically ruled out in earlier consideration of PEP 422, so Another alternative would be to have the default __init_subclass__ look at a class-level __decorators__ attribute, as originally discussed for PEP 422. That would solve *my* problem, but feels too much like adding more than One Way To Do It. So... honestly, I'm not sure where to go from here. Is there any chance that this is going to be changed, or revert to the PEP 422 approach, or... something? If so, what Python version will the "something" be in? Or is this use case just going to be a dead parrot in Python 3, period? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Wed, Apr 1, 2015 at 10:39 PM, Nick Coghlan wrote: > On 2 April 2015 at 07:35, PJ Eby wrote: >> I recently got an inquiry from some of my users about porting some of >> my libraries to Python 3 that make use of the Python 2 __metaclass__ >> facility. While checking up on the status of PEP 422 today, I found >> out about its recently proposed replacement, PEP 487. >> >> While PEP 487 is a generally fine PEP, it actually *rules out* the >> specific use case that I wanted PEP 422 for in the first place: >> dynamic addition of callbacks or decorators for use at class creation >> time without requiring explicit inheritance or metaclass >> participation. (So that e.g. method decorators can access the >> enclosing class at class definition time.) > > How hard is the requirement against relying on a mixin class or class > decorator to request the defining class aware method decorator > support? Is the main concern with the fact that failing to apply the > right decorator/mixin at the class level becomes a potentially silent > failure where the class aware method decorators aren't invoked > properly? The concern is twofold: it breaks proper information hiding/DRY, *and* it fails silently. It should not be necessary for clients of package A1 (that uses a decorator built using package B2) to mixin a metaclass or decorator from package C3 (because B2 implemented its decorators using C3), just for package A1's decorator to work properly in the *client package's class*. (And then, of course, this all silently breaks if you forget, and the breakage might happen at the A1, B2, or C3 level.) Without a way to hook into the class creation process, there is no way to verify correctness and prevent the error from passing silently. (OTOH, if there *is* a way to hook into the creation process, the problem is solved: there's no need to mix anything in anyway, because the hook can do whatever the mixin was supposed to do.) The only way PEP 487 could be a solution is if the default `object.__init_subclass__` supported one of the earlier __decorators__ or __autodecorate__ proposals, or if the PEP were for an `__init_class__` that operated on the defining class, instead of operating only on subclasses. (I need to hook the creation of a class that's *being defined*, not the definition of its future subclasses.) > My preference at this point would definitely be to introduce a mixin > class into the affected libraries and frameworks with an appropriate > PEP 487 style __init_subclass__ that was a noop in Python 2 (which > would rely on metaclass injection instead), but implemented the > necessary "defining class aware" method decorator support in Python 3. If this were suitable for the use case, I'd have done it already. DecoratorTools has had a mixin that provides a __class_init__ feature since 2007, which could be ported to Python 3 in a straighforward manner as a third-party module. (It's just a mixin that provides a metaclass; under 3.x it could probably just be a plain metaclass with no mixin.) > The question of dynamically injecting additional base classes from the > class body to allow the use of certain method decorators to imply > specific class level behaviour could then be addressed as a separate > proposal (e.g. making the case for an "__append_mixins__" attribute), > rather than being linked directly to the question of how we going > about defining inherited creation time behaviour without needing a > custom metaclass. Then maybe we should do that first, since PEP 487 doesn't do anything you can't *already* do with a mixin, all the way back to Python 2.2. IOW, there's no need to modify the core just to have *that* feature, since if you control the base class you can already do what PEP 487 does in essentially every version of Python, ever. If that's all PEP 487 is going to do, it should just be a PyPI package on a stdlib-inclusion track, not a change to core Python. It's not actually adding back any of the dynamicness (dynamicity? hookability?) that PEP 3115 took away. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Thu, Apr 2, 2015 at 4:46 AM, Nick Coghlan wrote: > On 2 April 2015 at 16:38, PJ Eby wrote: >> >> IOW, there's no need to modify the core just to have *that* feature, >> since if you control the base class you can already do what PEP 487 >> does in essentially every version of Python, ever. If that's all PEP >> 487 is going to do, it should just be a PyPI package on a >> stdlib-inclusion track, not a change to core Python. It's not >> actually adding back any of the dynamicness (dynamicity? >> hookability?) that PEP 3115 took away. > > The specific feature that PEP 487 is adding is the ability to > customise creation of subclasses without risking the introduction of a > metaclass conflict. That allows it to be used in situations where > adopting any of the existing metaclass based mechanisms would require > a potential compatibility break But metaclass conflicts are *also* fixable in end-user code, and have been since 2.2. All you need to do is use a metaclass *function* that automatically merges the metaclasses involved, which essentially amounts to doing `class MergedMeta(base1.__class__, base2.__class__,...)`. (Indeed, I've had a library for doing just that since 2002, that originally ran on Python 2.2,.) On Python 3, it's even easier to use that approach, because you can just use something like `class whatever(base1, base2, metaclass=noconflict)` whenever a conflict comes up. (And because the implementation wouldn't have to deal with classic classes or __metaclass__, as my Python 2 implementation has to.) IOW, *all* of PEP 487 is straightforward to implement in userspace as a metaclass and a function that already exist off-the-shelf in Python 2... and whose implementations would be simplified by porting them to Python 3, and dropping any extraneous features: * http://svn.eby-sarna.com/PEAK/src/peak/util/Meta.py?view=markup (the `makeClass` function does what my hypothetical `noconflict` above does, with a slightly different API, and support for classic classes, __metaclass__, etc., that could all be stripped out) * http://svn.eby-sarna.com/DecoratorTools/peak/util/decorators.py?view=markup (see the `classy_class` metaclass and `classy` mixin base that implement features similar to `__init_subclass__`, plus others that could be stripped out) Basically, you can pull out those functions/classes (and whatever else they use in those modules), port 'em to Python 3, make any API changes deemed suitable, and call it a day. And the resulting code could go to a stdlib metaclass utility module after a reasonable break-in period. > (as well as being far more > approachable as a mechanism than the use of custom metaclasses). Sure, nobody's arguing that it's not a desirable feature. I *implemented* that mechanism for Python 2 (eight years ago) because it's easier to use even for those of us who are fully versed in the dark metaclass arts. ;-) Here's the documentation: http://peak.telecommunity.com/DevCenter/DecoratorTools#meta-less-classes So the feature doesn't even require *stdlib* adoption, let alone changes to Python core. (Heck, I wasn't even the first to implement this feature: Zope had it for Python *1.5.2*, in their ExtensionClass.) It's a totally solved problem in Python 2, although the solution is admittedly not widely known. If the PEP 487 metaclass library, however, were to just port some bits of my code to Python 3 this could be a done deal already and available in *all* versions of Python 3, not just the next one. > The gap I agree this approach leaves is a final > post-namespace-execution step that supports establishing any class > level invariants implied by decorators and other functions used in the > class body. Python 2 allowed that to be handled with a dynamically > generated __metaclass__ and PEP 422 through __autodecorate__, while > PEP 487 currently has no equivalent mechanism. Right. And it's *only* having such a mechanism available by *default* that requires a language change. Conversely, if we *are* making a language change, then adding a hook that allows method decorators to access the just-defined class provides roughly the same generality that Python 2 had in this respect. All I want is the ability for method decorators to find out what class they were added to, at the time the class is built, rather than having to wait for an access or invocation that may never come. This could be as simple as __build_class__ or type.__call__ looking through the new class's dictionary for objects with a `__used_in_class__(cls, name)` method, e.g.: for k, v in dict.items(): if hasattr(v, '__used_in_class__'): v.__used_in_class__(cls, k) This doesn't do what PEP 487 or 422 do, but it's the bare minimum for what I need, and it actually allows this type of decor
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Thu, Apr 2, 2015 at 1:42 PM, PJ Eby wrote: > If the PEP 487 metaclass library, > however, were to just port some bits of my code to Python 3 this could > be a done deal already and available in *all* versions of Python 3, > not just the next one. Just for the heck of it, here's an actual implementation and demo of PEP 487, that I've tested with 3.1, 3.2, and 3.4 (I didn't have a copy of 3.3 handy): https://gist.github.com/pjeby/75ca26f8d2a7a0c68e30 The first module is just a demo that shows the features in use. The second module is the implementation. Notice that the actual *functionality* of PEP 487 is just *16 lines* in Python 3... including docstrings and an `__all__` definition. ;-) The other 90 lines of code are only there to implement the `noconflict` feature for fixing metaclass conflicts... and quite a lot of *those* lines are comments and docstrings. ;-) Anyway, I think this demo is a knockout argument for why PEP 487 doesn't need a language change: if you're writing an __init_subclass__ method you just include the `pep487.init_subclasses` base in your base classes, and you're done. It'll silently fail if you leave it out (but you'll notice that right away), and it *won't* fail in third-party subclasses because the *third party* didn't include it. In contrast, PEP 422 provided a way to have both the features contemplated by 487, *and* a way to allow method-level decorators to discover the class at class creation time. If there's going to be a language change, it should include that latter feature from the outset. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Thu, Apr 2, 2015 at 6:24 PM, Martin Teichmann wrote: > The whole point of PEP 487 was to reduce PEP 422 so much that > it can be written in python and back-ported. As I said earlier, it's a fine feature and should be in the stdlib for Python 3. (But it should have a `noconflict` feature added, and it doesn't need a language change.) However, since my specific use case was the one PEP 422 was originally written to solve, and PEP 487 does not address that use case, it is not a suitable substitute *for PEP 422*. This is also not your fault; you didn't force Nick to withdraw it, after all. ;-) My main concern in this thread, however, is ensuring that either the use case behind PEP 422 doesn't get dropped, or that Nick is now okay with me implementing that feature by monkeypatching __build_class__. Since he practically begged me not to do that in 2012, and IIRC *specifically created* PEP 422 to provide an alternative way for me to accomplish this *specific* use case, I wanted to see what his current take was. (That is, did he forget the history of the PEP, or does he no longer care about userspace code hooking __build_class__? Is there some other proposal that would be a viable alternative? etc.) > Now you want to be able to write decorators whose details > are filled in at class creation time. Not "now"; it's been possible to do this in Python 2 for over a decade, and code that does so is in current use by other packages. The package providing this feature (DecoratorTools) was downloaded 145 times today, and 3274 times in the past month, so there is active, current use of it by other Python 2 packages. (Though I don't know how many of them depend directly or indirectly upon this particular feature.) Currently, however, it is not possible to port this feature of DecoratorTools (or any other package that uses that feature, recursively) to Python 3, due to the removal of __metaclass__ and the lack of any suitable substitute hook. > Your point is that you want to be able to use your decorators > without having to ask users to also inherit a specific class. > I personally don't think that's desirable. Many frameworks out > there have such kind of decorators and mandatory base classes > and that works fine. The intended use case is for generic method decorators that have nothing to do with the base class per se, so inheriting from a specific base-class is an anti-feature in this case. > The only problem remains once you need to > inherit more than one of those classes, as their metaclasses > most likely clash. This is what PEP 487 fixes. No, it addresses the issue for certain *specific* metaclass use cases. It does not solve the problem of metaclass conflict in general; for that you need something like the sample `noconflict` code I posted, which works for Python 3.1+ and doesn't require a language change. > So my opinion is that it is not too hard a requirement to ask > a user to inherit a specific mixin class for the sake of using > a decorator. If this logic were applied to PEP 487 as it currently stands, the PEP should be rejected, since its use case is even *more* easily accomplished by inheriting from a specific mixin class. (Since the feature only works on subclasses anyway!) Further, if the claim is that metaclass conflict potential makes PEP 487 worthy of a language change, then by the same logic method decorators are just as worthy of a language change, since any mixin required to use a method decorator would be *just as susceptible* to metaclass conflicts as SubclassInit. (Notably, the stdlib's ABCMeta is a common cause of metaclass conflicts in Python 2.6+ -- if you mix in anything that implements an ABC by subclassing it, you will get a metaclass conflict.) Finally, I of course disagree with the conclusion that it's okay to require mixins in order for method decorators to access the containing class, since it is not a requirement in Python 2, due to the availability of the __metaclass__ hook. Further, PEP 422 was previously approved to fix this problem, and has a patch in progress, so I'm understandably upset by its sudden withdrawal and lack of suitable replacement. So personally, I think that PEP 422 should be un-withdrawn (or replaced with something else), and PEP 487 should be retargeted towards defining a `metaclass` module for the stdlib, including a `noconflict` implementation to address metaclass conflict issues. (Mine or someone else's, as long as it works.) PEP 487 should not be a proposal to change the language, as the provided features don't require it. (And it definitely shouldn't pre-empt a separately useful feature that *does* require a language change.) At this point, though, I mostly just want to get some kind of closure. After three years, I'd like to know if this is a yea or nay, so I can port the thing and move on, whether it's through a standardized mechanism or ugly monkeypatching. Honestly, the only reason I'm even discussing this in the first
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Thu, Apr 2, 2015 at 10:29 PM, Greg Ewing wrote: > On 04/03/2015 02:31 PM, Nick Coghlan wrote: >> >> If I'm understanding PJE's main concern correctly it's that this >> approach requires explicitly testing that the decorator has been >> applied correctly in your automated tests every time you use it, as >> otherwise there's a risk of a silent failure when you use the >> decorator but omit the mandatory base class that makes the decorator >> work correctly. > > > Could the decorator be designed to detect that situation > somehow? E.g. the first time the decorated method is called, > check that the required base class is present. No, because in the most relevant use case, the method will never be called if the base class isn't present. For more details, see also the previous discussion at https://mail.python.org/pipermail/python-dev/2012-June/119883.html ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Thu, Apr 2, 2015 at 9:31 PM, Nick Coghlan wrote: > On 3 April 2015 at 08:24, Martin Teichmann wrote: > However, I'm also now wondering if it may be possible to reach out to > the pylint authors (similar to what Brett did for the "pylint --py3k" > flag) and ask for a way to make it easy to register "base class, > decorator" pairs where pylint will complain if it sees a particular > method decorator but can't determine at analysis time if the named > base class is in the MRO for the class defining the method. Will it *also* check the calling chain of the decorator, or any other thing that's called or invoked in the class body,to find out if somewhere, somehow, it asks for a class decoration? If not, it's not going to help with this use case. There are many ways to solve this problem by re-adding a hook -- you and I have proposed several, in 2012 and now. There are none, however, which do not involve putting back the hookability that Python 3 took out, except by using hacks like sys.set_trace() or monkeypatching __build_class__. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Fri, Apr 3, 2015 at 8:44 AM, Martin Teichmann wrote: > This proposal can actually be seen as an extension to the __class__ > and super() mechanism of normal methods: methods currently have the > priviledge to know which classes they are defined in, while descriptors > don't. So we could unify all this by giving functions a __post_process__ > method which sets the __class__ in the function body. This is about the > same as what happened when functions got a __get__ method to turn > them into object methods. > > While this all is in the making, PJ could monkey-patch __build_class__ > to do the steps described above, until it gets accepted into cpython. > So I pose the question to PJ: would such an approach solve the > problems you have? Universal member post-processing actually works *better* for the motivating use case than the metaclass or class level hooks, so yes. In practice, there is one potential hiccup, and that's that decorators which aren't aware of __post_process__ will end up masking it. But that's not an insurmountable obstacle. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Fri, Apr 3, 2015 at 11:04 AM, Nick Coghlan wrote: > Extending the descriptor protocol to include a per-descriptor hook that's > called at class definition time sounds like a potentially nice way to go to > me. While you *could* still use it to arbitrarily mutate the class object, > it's much clearer that's not the intended purpose, so I don't see it as a > major problem. Just to be clear, mutating the class object was never the point for my main use case that needs the PEP 422 feature; it was for method overloads that are called remotely and need to be registered elsewhere. For some of my other use cases, adding metadata to the class is a convenient way to do things, but classes are generally weak-referenceable so the add-on data can be (and often is) stored in a weak-key dictionary rather than placed directly on the class. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Fri, Apr 3, 2015 at 4:21 AM, Nick Coghlan wrote: > That means I'm now OK with monkeypatching __build_class__ being the > only way to get dynamic hooking of the class currently being defined > from the class body - folks that really want that behaviour can > monkeypatch it in, while folks that think it's a bad idea don't need > to worry about. I'd still prefer to only do that as an emulation of an agreed-upon descriptor notification protocol, such that it's a backport of an approved PEP, so I hope we can work that out. But I guess if not, then whatever works. I just wish you'd been okay with it in 2012, as there was more than once in the last few years where I had some downtime and thought about trying to do some porting work. :-( And in the meantime, the only alternative Python implementation I know of that's made *any* headway on Python 3 in the last few years (i.e., PyPy 3) *includes* a compatibly monkeypatchable __build_class__. It appears that the *other* obstacles to making a compatible Python 3 implementation are a lot tougher for implementers to get over than compatibility with __build_class__. ;-) > Neither PEP 422 nor 487 are designed to eliminate metaclass conflicts > in general, they're primarily designed to let base classes run > arbitrary code after the namespace has been executed in a subclass > definition *without* needing a custom metaclass. And yet the argument was being made that the lack of custom metaclass was a feature because it avoided conflict. I'm just trying to point out that if avoiding conflict is desirable, building *every possible metaclass feature* into the Python core isn't a scalable solution. At this point, co-operative inheritance is a well-understood model in Python, so providing an API to automatically mix metaclasses (explicitly, at first) seems like a good step towards solving the metaclass conflict problem in general. When Guido introduced the new MRO scheme in Python 2.2, he noted that the source he'd gotten that scheme from had explained that it could be extended to automatically mixing metaclasses, but he (Guido) didn't want to do that in Python until more experience was had with the new MRO scheme in general. And I think we have enough experience with that *now*, to be able to take a step forward, by providing a stdlib-blessed metaclass mixer. It not only makes the prototype, PyPI-based version of PEP 487 more usable immediately, it will also encourage people to develop metaclasses as *mixins* rather than one-size-fits-all monoliths. For example, there's no reason that both of PEP 487''s features need to live in the *same* metaclass, if you could trivially mix metaclasses at the point where you inherit from bases with different metaclasses. (And eventually, a future version of Python could do the mixing automatically, without the `noconflict` function. The theory was well-understood for other languages, after all, before Python 2.2 even came out.) > No, you can't do it currently without risking a backwards > incompatibility through the introduction of a custom metaclass. Right... which is precisely why I'm suggesting the `noconflict()` metaclass factory function as a *general* solution for providing useful metaclasses, and why I think that PEP 487 should break the namespacing and subclass init features into separate metaclasses, and add that noconflict feature. It will then become a good example for people moving forward writing metaclasses. Basically, as long as you don't have the pointless conflict errors, you can write co-operative metaclass mixins as easily as you can write regular co-operative mixins. I was missing this point myself because I've been too steeped in Python 2's complexities: writing a usable version of `noconflict()` is a lot more complex and its invocation far more obscure. In Python 2, there's classic classes, class- and module-level __metaclass__, ExtensionClass, and all sorts of other headaches for automatic mixing. In Python 3, though, all that stuff goes out the window, and even my 90-line version that's almost half comments is probably still overengineered compared to what's actually needed to do the mixing. >> Further, if the claim is that metaclass conflict potential makes PEP >> 487 worthy of a language change, then by the same logic method >> decorators are just as worthy of a language change, since any mixin >> required to use a method decorator would be *just as susceptible* to >> metaclass conflicts as SubclassInit. > > There wouldn't be a custom metaclass involved in the native > implementation of PEP 487, only in the backport. Right... and if there were a native implementation of PEP 422, that would also be the case for PEP 422. The point is that if the PEP 487 can justify a *language* change to avoid needing a metaclass, then arguably PEP 422 has an even *better* justification, because its need to avoid needing a metaclass is at least as strong. Indeed, you said the same yourself as recent
Re: [Python-Dev] PEP 487 vs 422 (dynamic class decoration)
On Sat, Apr 4, 2015 at 9:33 PM, Nick Coghlan wrote: > So actually reading https://gist.github.com/pjeby/75ca26f8d2a7a0c68e30 > properly, you're starting to convince me that a "noconflict" metaclass > resolver would be a valuable and viable addition to the Python 3 type > system machinery. > > The future possible language level enhancement would then be to make > that automatic resolution of metaclass conflicts part of the *default* > metaclass determination process. I realise you've been trying to > explain that to me for a few days now, I'm just writing it out > explicitly to make it clear I finally get it :) I'm glad you got around to reading it. Sometimes it's really frustrating trying to get things like that across. What's funny is that once I actually 1) wrote that version, and 2) ended up doing a version of six's with_metaclass() function so I could write 2/3 mixed code in DecoratorTools, I realized that there isn't actually any reason why I can't write a Python 2 version of noconflict. Indeed, with a slight change to eliminate ClassType from the metaclass candidate list, the Python 3 version would also work as the Python 2 version: just use it as the explicit __metaclass__, or use with_metaclass, i.e.: class something(base1, base2, ...): __metaclass__ = noconflict # ... or: class something(with_metaclass(noconflict, base1, base2, ...)): # ... And the latter works syntactically from Python 2.3 on up. > My apologies for that - while I don't actually recall what I was > thinking when I said it, I suspect I was all fired up that PEP 422 was > definitely the right answer, and hence thought I'd have an official > solution in place for you in fairly short order. I should have let you > know explicitly when I started having doubts about it, so you could > reassess your porting options. Well, at least it's done now. Clearing up the issue allowed me to spend some time on porting some of the relevant libraries this weekend, where I promptly ran into challenges with several of the *other* features removed from Python 3 (like tuple arguments), but fortunately those are issues more of syntactic convenience than irreplaceable functionality. ;-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] async/await in Python; v2
On Tue, Apr 21, 2015 at 1:26 PM, Yury Selivanov wrote: > It is an error to pass a regular context manager without ``__aenter__`` > and ``__aexit__`` methods to ``async with``. It is a ``SyntaxError`` > to use ``async with`` outside of a coroutine. I find this a little weird. Why not just have `with` and `for` inside a coroutine dynamically check the iterator or context manager, and either behave sync or async accordingly? Why must there be a *syntactic* difference? Not only would this simplify the syntax, it would also allow dropping the need for `async` to be a true keyword, since functions could be defined via "def async foo():" rather than "async def foo():" ...which, incidentally, highlights one of the things that's been bothering me about all this "async foo" stuff: "async def" looks like it *defines the function* asynchronously (as with "async with" and "async for"), rather than defining an asynchronous function. ISTM it should be "def async bar():" or even "def bar() async:". Also, even that seems suspect to me: if `await` looks for an __await__ method and simply returns the same object (synchronously) if the object doesn't have an await method, then your code sample that supposedly will fail if a function ceases to be a coroutine *will not actually fail*. In my experience working with coroutine systems, making a system polymorphic (do something appropriate with what's given) and idempotent (don't do anything if what's wanted is already done) makes it more robust. In particular, it eliminates the issue of mixing coroutines and non-coroutines. To sum up: I can see the use case for a new `await` distinguished from `yield`, but I don't see the need to create new syntax for everything; ISTM that adding the new asynchronous protocols and using them on demand is sufficient. Marking a function asynchronous so it can use asynchronous iteration and context management seems reasonably useful, but I don't think it's terribly important for the type of function result. Indeed, ISTM that the built-in `object` class could just implement `__await__` as a no-op returning self, and then *all* results are trivially asynchronous results and can be awaited idempotently, so that awaiting something that has already been waited for is a no-op. (Prior art: the Javascript Promise.resolve() method, which takes either a promise or a plain value and returns a promise, so that you can write code which is always-async in the presence of values that may already be known.) Finally, if the async for and with operations have to be distinguished by syntax at the point of use (vs. just always being used in coroutines), then ISTM that they should be `with async foo:` and `for async x in bar:`, since the asynchronousness is just an aspect of how the main keyword is executed. tl;dr: I like the overall ideas but hate the syntax and type segregation involved: declaring a function async at the top is OK to enable async with/for semantics and await expressions, but the rest seems unnecessary and bad for writing robust code. (e.g. note that requiring different syntax means a function must either duplicate code or restrict its input types more, and type changes in remote parts of the program will propagate syntax changes throughout.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Minimal async event loop and async utilities (Was: PEP 492: async/await in Python; version 4)
On Mon, May 11, 2015 at 6:05 PM, Guido van Rossum wrote: > OTOH you may look at micropython's uasyncio -- IIRC it doesn't have Futures > and it definitely has I/O waiting. Here's a sketch of an *extremely* minimal main loop that can do I/O without Futures, and might be suitable as a PEP example. (Certainly, it would be hard to write a *simpler* example than this, since it doesn't even use any *classes* or require any specially named methods, works with present-day generators, and is (I think) both 2.x/3.x compatible.) coroutines = [] # round-robin of currently "running" coroutines def schedule(coroutine, val=None, err=None): coroutines.insert(0, (coroutine, val, err)) def runLoop(): while coroutines: (coroutine, val, err) = coroutines.pop() try: if err is not None: suspend = coroutine.throw(err) else suspend = coroutine.send(val) except StopIteration: # coroutine is finished, so don't reschedule it continue except Exception: # framework-specific detail (i.e., log it, send # to an error handling coroutine, or just stop the program # Here, we just ignore it and stop the coroutine continue else: if hasattr(suspend, '__call__') and suspend(coroutine): continue else: # put it back on the round-robin list schedule(coroutine) To use it, `schedule()` one or more coroutines, then call `runLoop()`, which will run as long as there are things to do. Each coroutine scheduled must yield *thunks*: callable objects that take a coroutine as a parameter, and return True if the coroutine should be suspended, or False if it should continue to run. If the thunk returns true, that means the thunk has taken responsibility for arranging to `schedule()` the coroutine with a value or error when it's time to send it the result of the suspension. You might be asking, "wait, but where's the I/O?" Why, in a coroutine, of course... readers = {} writers = {} timers = [] def readable(fileno): """yield readable(fileno) resumes when fileno is readable""" def suspend(coroutine): readers[fileno] = coroutine return True return suspend def writable(fileno): """yield writable(fileno) resumes when fileno is writable""" def suspend(coroutine): writers[fileno] = coroutine return True return suspend def sleepFor(seconds): """yield sleepFor(seconds) resumes after that much time""" return suspendUntil(time.time() + seconds) def suspendUntil(timestamp): """yield suspendUntil(timestamp) resumes when that time is reached""" def suspend(coroutine) heappush(timers, (timestamp, coroutine) return suspend def doIO(): while coroutines or readers or writers or timers: # Resume scheduled tasks while timers and timers[0][0] <= time.time(): ts, coroutine = heappop(timers) schedule(coroutine) if readers or writers: if coroutines: # Other tasks are running; use minimal timeout timeout = 0.001 else if timers: timeout = max(timers[0][0] - time.time(), 0.001) else: timeout = 0 # take as long as necessary r, w, e = select(readers, writers, [], timeout) for rr in r: schedule(readers.pop(rr)) for ww in w: schedule(writers.pop(ww)) yield # allow other coroutines to run schedule(doIO()) # run the I/O loop as a coroutine (This is painfully incomplete for a real framework, but it's a rough sketch of how one of peak.events' first drafts worked, circa early 2004.) Basically, you just need a coroutine whose job is to resume coroutines whose scheduled time has arrived, or whose I/O is ready. And of course, some data structures to keep track of such things, and an API to update the data structures and suspend the coroutines. The I/O loop exits once there are no more running tasks and nothing waiting on I/O... which will also exit the runLoop. (A bit like a miniature version of NodeJS for Python.) And, while you need to preferably have only *one* such I/O coroutine (to prevent busy-waiting), the I/O coroutine is completely replaceable. All that's required to implement one is that the core runloop expose the count of active coroutines. (Notice that, apart from checking the length of `coroutines`, the I/O loop shown above uses only the public `schedule()` API and the exposed thunk-suspension protocol to do its thing.) Also, note that you *can* indeed have mult
Re: [Python-Dev] PEP: Collecting information about git
On Sat, Sep 12, 2015 at 9:54 AM, Oleg Broytman wrote: > The plan is to extend the PEP in the future collecting information > about equivalence of Mercurial and git scenarios to help migrating > Python development from Mercurial to git. I couldn't find any previous discussion about this, but I figure I should mention: If the motivation here is to get away from the often-awful bitbucket to the friendlier and more-popular Github, then it might be useful to know that hg-git works beautifully with Github. I have over a dozen open source projects on Github that I manage entirely using hg command lines without having yet touched git at all. Even the forks and pull requests I've done of others' projects on Github worked just fine, so long as I remember to use hg bookmarks instead of hg branches. It's possible there are things you can't do with Mercurial on Github, but I haven't encountered one thus far. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 451 update
I've not really had time to review this PEP yet, but from skimming discussion to date, the only thing I'm still worried about is whether this will break lazy import schemes that use a module subclass that hooks __getattribute__ and calls reload() in order to perform what's actually an *initial* load. IOW, does anything in this proposal rely on a module object having *any* attributes besides __name__ set at reload() time? That is, is there an assumption that a module being reloaded has 1. Been loaded, and 2. Is being reloaded via the same location, __loader__, etc. as before? At least through all 2.x, reload() just uses module.__name__ to restart the module find-and-load process, and does not assume that __loader__ is valid in advance. (Also, if this has changed in recent Python versions independent of this PEP, it's a backwards-compatibility break that should be documented somewhere.) On Thu, Oct 24, 2013 at 2:05 AM, Eric Snow wrote: > I've had some offline discussion with Brett and Nick about PEP 451 > which has led to some meaningful clarifications in the PEP. In the > interest of pulling further discussions back onto this > (archived/public) list, here's an update of what we'd discussed and > where things are at. :) > > * path entry finders indicate that they found part of a possible > namespace package by returning a spec with no loader set (but with > submodule_search_locations set). Brett wanted some clarification on > this. > * The name/path signature and attributes of file-based finders in > importlib will no longer be changing. Brett had some suggestions on > the proposed change and it became clear that the the change was > actually pointless. > * I've asserted that there shouldn't be much difficulty in adjusting > pkgutil and other modules to work with ModuleSpec. > * Brett asked for clarification on whether the "load()" example from > the PEP would be realized implicitly by the import machinery or > explicitly as a method on ModuleSpec. This has bearing on the ability > of finders to return instances of ModuleSpec subclasses or even > ModuleSpec-like objects (a la duck typing). The answer is the it will > not be a method on ModuleSpec, so it is effectively just part of the > general import system implementation. Finders may return any object > that provides the attributes of ModuleSpec. I will be updating the > PEP to make these points clear. > > * Nick suggested writing a draft patch for the language reference > changes (the import page). Such a patch will be a pretty good > indicator of the impact of PEP 451 on the import system and should > highlight any design flaws in the API. This is on my to-do list > (hopefully by tomorrow). > * Nick also suggested moving all ModuleSpec methods to a separate > class that will simply make use of a separate, existing ModuleSpec > instance. This will help address several issues, particularly by > relaxing the constraints on what finders can return, but also by > avoiding the unnecessary exposure of the methods via every > module.__spec__. I plan on going with this, but currently am trying > out the change to see if there are any problems I've missed. Once I > feel good about it I'll update the PEP. > > That about sums up our discussions. I have a couple of outstanding > updates to the PEP to make when I get a chance, as well as putting up > a language reference patch for review. > > -eric > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 451 update
On Fri, Oct 25, 2013 at 1:15 PM, Brett Cannon wrote: > > On Fri, Oct 25, 2013 at 12:24 PM, PJ Eby wrote: >> At least through all 2.x, reload() just uses module.__name__ to >> restart the module find-and-load process, and does not assume that >> __loader__ is valid in advance. > > > That doesn't make much sense in a post-importlib world where import makes > sure that __loader__ is set (which it has since Python 3.3). Otherwise you > are asking for not just a reload but a re-find as well. That's a feature, not a bug. A reload() after changing sys.path *should* take into account the change, not to mention any changes to meta_path, path hooks, etc. (And it's how reload() worked before importlib.) I suppose it's not really documented all that well, but way way back in the 2.3 timeframe I asked for a tweak to PEP 302 to make sure that reload() (in the re-find sense) would work properly with PEP 302 loaders like zipimport -- some of the language still in the PEP is there specifically to support this use case. (Specifically, the bit that says loaders *must* use the existing module object in sys.modules if there's one there already, so that reload() will work. It was actually in part to ensure that reload() would work in the case of a re-find.) It appears that since then, the PEP has been changed in a way that invalidates part of the purpose of the prior change; I guess I missed the discussion of that change last year. :-( ISTM there should've been such a discussion, since IIRC importlib wasn't supposed to change any Python semantics, and this is a non-trivial change to the semantics of reload() even in cases that aren't doing lazy imports or other such munging. reload() used to take sys.* and __path__ changes into account, and IMO should continue to do so. If this is an intentional change in reload() semantics, other Python implementations need to know about this too! (That being said, I'm not saying I shouldn't or couldn't have tested this in 3.3 and found out about it that way. And the existence of issue18698 suggests that nobody's relying yet on even the *fundamental* semantics of PEP 302 reload() working properly in 3.3, since that was an even bigger change that nobody spotted till a couple of months ago.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 451 update
On Fri, Oct 25, 2013 at 3:15 PM, Brett Cannon wrote: > On Fri, Oct 25, 2013 at 2:10 PM, PJ Eby wrote: >> On Fri, Oct 25, 2013 at 1:15 PM, Brett Cannon wrote: >> > On Fri, Oct 25, 2013 at 12:24 PM, PJ Eby wrote: >> >> At least through all 2.x, reload() just uses module.__name__ to >> >> restart the module find-and-load process, and does not assume that >> >> __loader__ is valid in advance. >> > >> > >> > That doesn't make much sense in a post-importlib world where import >> > makes >> > sure that __loader__ is set (which it has since Python 3.3). Otherwise >> > you >> > are asking for not just a reload but a re-find as well. >> >> That's a feature, not a bug. A reload() after changing sys.path >> *should* take into account the change, not to mention any changes to >> meta_path, path hooks, etc. (And it's how reload() worked before >> importlib.) > > > Fair enough, but in my mind that obviously doesn't really click for what I > view as a reload in an importlib world where secret import code no longer > exists. When I think re-load I think "load again", not "find the module > again and execute a load with a possibly new loader". Sure, and the reference manual is rather vague on this point. However, I would guess that at least some web frameworks with automatic reload support are going to barf on this change in at least some edge cases. (OTOH, it's unlikely the bugs will ever be reported, because the problem will mysteriously go away once the process is restarted, probably never to occur again.) Mostly, this just seems like an ugly wart -- Python should be dynamic by default, and that includes reloading. While the import machinery has lots of ugly caching under the hood, a user-level function like reload() should not require you to do the equivalent of saying, "no, really... I want you to *really* reload, not just pull in whatever exists where you found it last time, while ignoring whether I switched from module to package or vice versa, or just fixed my sys.path so I can load the right version of the module." It is a really tiny thing in the overall scheme of things, because reload() is not used all that often, but it's still a thing. If this isn't treated as a bug, then the docs for reload() at least need to include a forward-supported workaround so you can say "no, really... *really* reload" in an approved fashion. (ISTM that any production code out there that currently uses reload() would want to perform the "really reload" incantation in order to avoid the edge cases, even if they haven't actually run into any of them yet.) > And in a PEP 451 world it should be dead-simple to make this work the way > you want in your own code even if this doesn't go the way you want:: > > spec = importlib.find_spec(name) > module.__spec__ = spec > importlib.reload(module) # Which in itself is essentially > init_module_attrs(spec, module); spec.loader.exec_module(module) > > Heck, you can do this in Python 3.3 right now:: > > loader = importlib.find_loader(name) > module = sys.modules[name] > module.__loader__ = loader > importlib.reload(module) And will that later version still work correctly in a PEP 451 world, or will you have to detect which world you live in before waving this particular dead chicken? ;-) > Ah, okay. That is not explicit in the PEP beyond coming off a total nuisance > in order to support reloading by the loader, not an explicit finder + loader > use-case. Yeah, it actually was to ensure that you could reload a module using a different loader than the one that originally loaded it, e.g. due to a change in path hooks, etc. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 451 update
On Sun, Oct 27, 2013 at 1:03 AM, Nick Coghlan wrote: > Now, regarding the signature of exec_module(): I'm back to believing > that loaders should receive a clear indication that a reload is taking > place. Legacy loaders have to figure that out for themselves (by > seeing that the module already exists in sys.modules), but we can do > better for the new API by making the exec_module signature look like: > > def exec_module(self, module, previous_spec=None): > # module is as per the current PEP 451 text > # previous_spec would be set *only* in the reload() case > # loaders that don't care still need to accept it, but can > just ignore it Just to be clear, this means that a lazy import implementation that creates a module object without a __spec__ in the first place will look like an initial import? Or will that crash importlib because of a missing __spec__ attribute? That is, is reload()'s contract adding a new prerequisite for the object passed to it? (The specific use case is creating a ModuleType subclass instance for lazy importing upon attribute access. Pre-importlib, all that was needed was a working __name__ attribute on the module.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 451 update
On Sun, Oct 27, 2013 at 4:59 PM, Nick Coghlan wrote: > > On 28 Oct 2013 02:37, "PJ Eby" wrote: >> >> On Sun, Oct 27, 2013 at 1:03 AM, Nick Coghlan wrote: >> > Now, regarding the signature of exec_module(): I'm back to believing >> > that loaders should receive a clear indication that a reload is taking >> > place. Legacy loaders have to figure that out for themselves (by >> > seeing that the module already exists in sys.modules), but we can do >> > better for the new API by making the exec_module signature look like: >> > >> > def exec_module(self, module, previous_spec=None): >> > # module is as per the current PEP 451 text >> > # previous_spec would be set *only* in the reload() case >> > # loaders that don't care still need to accept it, but can >> > just ignore it >> >> Just to be clear, this means that a lazy import implementation that >> creates a module object without a __spec__ in the first place will >> look like an initial import? Or will that crash importlib because of >> a missing __spec__ attribute? >> >> That is, is reload()'s contract adding a new prerequisite for the >> object passed to it? >> >> (The specific use case is creating a ModuleType subclass instance for >> lazy importing upon attribute access. Pre-importlib, all that was >> needed was a working __name__ attribute on the module.) > > For custom loaders, that's part of the contract for create_module() (since > you'll get an ordinary module otherwise), Huh? I don't understand where custom loaders come into it. For that matter, I don't understand what "get an ordinary module object" means here, either. I'm talking about userspace code that implements lazy importing features, like the lazyModule() function in this module: http://svn.eby-sarna.com/Importing/peak/util/imports.py?view=markup Specifically, I'm trying to get an idea of how much that code will need to change under the PEP (and apparently under importlib in general). > and so long as *setting* the > special module attributes doesn't cause the module to be imported during the > initial load operation, attribute access based lazy loading will work fine > (and you don't even have to set __name__, since the import machinery will > take care of that). There's no "initial load operation", just creation of a dummy module and stuffing it into sys.modules. The way it works is that in, say, foo/__init__.py, one uses: bar = lazyModule('foo.bar') baz = lazyModule('foo.baz') Then anybody importing 'foo.bar' or 'foo.baz' (or using "from foo import bar", etc.) ends up with the lazy module. That is, it's for lazily exposing APIs, not something used as an import hook. > For module level lazy loading that injects a partially initialised module > object into sys.modules rather than using a custom loader or setting a > __spec__ attribute, yes, the exec_module invocation on reloading would > always look like a fresh load operation (aside from the fact that the custom > instance would already be in sys.modules from the first load operation). Right. > It *will* still work, though (at least, it won't break any worse than such > code > does today, since injecting a replacement into sys.modules really isn't > reload friendly in the first place). Wait, what? Who's injecting a replacement into sys.modules? A replacement of what? Or do you mean that loaders aren't supposed to create new modules, but use the one in sys.modules? Honestly, I'm finding all this stuff *really* confusing, which is kind of worrying. I mean, I gather I'm one of the handful of people who really understood how importing *used to work*, and I'm having a lot of trouble wrapping my brain around the new world. (Granted, I think that may be because I understand how a lot of old corner cases work, but what's bugging me is that I no longer understand how those old corners work under the new regime, nor do I feel I understand what the *new* corners will be. This may also just be communication problems, and the fact that it's been months since I really walked through importlib line by line, and have never really walked through it (or PEP 451) quite as thoroughly as I have import.c. I also seem to be having trouble grokking why the motivating use cases for PEP 451 can't be solved by just providing people with good base classes to use for writing loaders -- i.e., I don't get why the core protocol has to change to address the use case of writing loaders more easily. The new protocol seems way more complex than PEP 302, and ISTM the complexity could just be pushed off to the loader side of the protocol without creating more interdependency between importlib and the loaders.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 451 update
On Thu, Oct 31, 2013 at 5:52 AM, Nick Coghlan wrote: > > On 31 Oct 2013 18:52, "Eric Snow" wrote: >> >> On Wed, Oct 30, 2013 at 10:24 PM, Nick Coghlan wrote: >> > There's also the option of implementing the constraint directly in the >> > finder, which *does* have the necessary info (with the change to pass >> > the >> > previous spec to find_spec). >> >> Yeah, I thought of that. I just prefer the more explicit >> supports_reload(). That said... >> >> > >> > I still think it makes more sense to leave this out for the moment - >> > it's >> > not at all clear we need the extra method, and adding it later would be >> > a >> > straightforward protocol update. >> >> ...I agree that makes the most sense for now. :) >> >> BTW, thanks for pushing these issues. I think the API has gotten >> pretty solid. I just need to make sure the PEP covers the cases and >> conclusions we're discussing. > > Thanks are also due to PJE for making me realise we were handwaving too much > when it came to the expected reload semantics :) You're welcome. ;-) But speaking of handwaving, I also want to be sure that loader developers know that "reloading" is only really "reloading" if there's a previous existing spec, or the module type is... Hm. Actually, I think I now know how to state what's bugging me every time I see this "supports_reload()" or "reload=True" or other reloading flags in this process. I think that references to reloading should be replaced with references to what's *actually* at issue, because "reloading" itself is vague and carries too many assumptions for a loader author to understand or get right. (Look how hard it is for *us*!) That is, I think we should clarify what use cases there are for knowing whether a "reload" is happening, and address those use cases explicitly rather than lumping them under a general heading. For example, if the reason a loader cares about reloading is because it's a C extension using a custom module type, and the existing module isn't of the right type, then we should just spell out how to handle it. (e.g. raise an exception) If the reason a loader cares about reloading is because of some sort of caching or reuse, then we should just spell out how to handle that, too. Lumping these cases together under a "reloading" flag or a check for "reloading" support is a nasty code smell, because it requires a loader developer to have the *same* vaguely-defined idea of "reloading" as the PEP authors. ;-) I also suspect, that if properly spelled out, those use cases are going to boil down to: 1. Throwing errors if you have an existing module object you can't load into, and 2. Passing in a previous spec object, if available In other words, loaders should not really have any responsibility for or concept of "reloading" -- they always load into a module object (that they may or may not have created), and they may get given a spec from a previous load. They should deal only in "module reuse" and "spec reuse". While a typical reload() might involve both reuses, there are cases where one sort of reuse could occur independently, and not all loaders care about both (or even either) condition. At any rate, it means a loader author doesn't have to figure out how to handle "reloading", all they have to figure out is whether they can load into a particular module object, and whether they can do something useful with a spec that was previously used to load a module with the same name -- a spec that may or may not refer to a similar previous loader. These are rather more well-defined endeavors than trying to determine in the abstract whether one "supports reload". ;-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiple inheritance from builtin (C) types [still] supported in Python3?
On Mon, Apr 28, 2014 at 7:26 PM, Paul Sokolovsky wrote: > Well, sure I did, as I mentioned, but as that's first time I see that > code (that specific piece is in typeobject.c:extra_ivars()), it would > take quite some time be sure I understand all aspects of it. Thanks for > confirming that it's governed essentially by CPython implementation > details and not some language-level semantics like metaclasses (I > mentioned them because error message in Python2 did so, though Python3 > doesn't refer to metaclasses). > > An example would really help me to get a feel of the issue, but I > assume lack of them means that there's no well-known idiom where such > inheritance is used, and that's good enough on its own. I also tried to > figure how it's important to support such multi-base cases, so the code > I write didn't require complete rewrite if it hits one day, but > everything seems to turn out to be pretty extensible. > >From memory of the last time I dealt with this, the rules were that you could mix two classes only if their __slots__ differed from their common __base__ by *at most* __dict__ and/or __weakref__. The dict and weakref slots are special, in that the type structure contains their offsets, which makes them relocatable in subclasses. But any other __slots__ aren't relocatable in subclasses, because the type structure doesn't directly keep track of the offsets. (The slot descriptors do.) But I don't think there's anything in principle that requires this, it's just the implementation. You could in theory relocate __slots__ defined from Python code in order to make a merged subclass. It's just that the effective "__slots__" of C code can't be moved, because C code is expecting to find them at specific offsets. Therefore, if two types define their own struct fields, they can't be inherited from unless one is a subtype of the other. In the C code (again if I recall correctly), this is done using the __base__ attribute of the type, which indicates what struct layout the object will use. A type can have a larger struct than its base type, adding its own fields after the base type's struct fields. (The dict and weakref fields are added -- if they are added -- *after* the base struct fields. If your __base__ already has them, those offsets within the existing layout are used, which is why them being in another base class's __slots__ isn't a problem.) When you create a new type, CPython looks at your bases to find a suitable __base__. If two of your bases inherit from each other, the ancestor can be ignored, keeping the more-derived one as a candidate __base__. If a base adds only __dict__ and/or __weakref__ (or neither) to its __base__, then its __base__ is a candidate (not recursively, though). If at the end there is more than one base left standing, then it's an error, since you have bases with incompatible layouts. That is not a precise description of the algorithm, but that's the gist of how it works. __base__ is a slot on the type object and is tracked at the C level in order to sort out layouts like this. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: PEP 426 is now the draft spec for distribution metadata 2.0
On Wed, Feb 20, 2013 at 5:30 AM, M.-A. Lemburg wrote: > The wording in the PEP alienates the egg format by defining > an incompatible new standard for the location of the metadata > file: This isn't a problem, because there's not really a use case at the moment for eggs to include a PEP 426-format metadata file, and if they ever do, it ought to be called METADATA, so that pkg_resources will know to read it if there are no egg-format dependencies listed. Setuptools also doesn't care what format PKG-INFO is in, as it only ever reads the "Version:" field, and that only in the case of in-development source packages. > It's easy to upgrade distribute and distutils to write > metadata 1.2 format, simply by changing the version in the > PKG-INFO files. As soon as distutils does it, setuptools will do it, because setuptools delegates metadata writing to distutils. So there's no "alienation" here. What will need to happen at some point is for pkg_resources to implement support for PEP 426-style version requirements, which I haven't tried to fully wrap my head around as yet. I'm hoping that there are simple textual substitutions (e.g. regexes) that can be done to convert them to pkg_resources requirements. If need be, I'll swipe whatever's needed from distlib. ;-) In the meantime, eggs aren't actually going anywhere, since wheels aren't actually trying to replace all of their use cases. And since the new metadata and binary formats don't actually add much new functionality over what eggs already do, eggs wouldn't lose much by not being able to use the same metadata, anyway. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] PEP 426 is now the draft spec for distribution metadata 2.0
On Tue, Feb 19, 2013 at 6:42 AM, Nick Coghlan wrote: > Nothing in the PEP is particularly original - almost all of it is > either stolen from other build and packaging systems, or is designed > to provide a *discoverable* alternative to existing > setuptools/distribute/pip practices (specifically, the new extension > mechanism means things like entry points can be defined in the > standard metadata files without python-dev needing to get involved). FWIW, I actually think this is a step in the wrong direction relative to eggs; the ability to add new metadata files is a useful feature for application frameworks. For example, the EggTranslations project uses egg metadata to implement resource localization plugins. It lets you have an application with plugins that either contain their own translations, contain multiple translations for another plugin, a single language translation for an assortment of plugins, etc. These kinds of runtime-discovery use cases haven't seen much attention in the metadata standard discussion. On one level, that's fine, because it makes sense that distribution-provided metadata should be parseable by all tools, and that at build/download/install time the performance and ease-of-use favor a single file approach. That does not mean, however, that the presence of other files is bad or should be deprecated. IMO, metadata that see significant runtime use independent of the core metadata *should* appear in their own files, even if redundant. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Planning on removing cache invalidation for file finders
On Sun, Mar 3, 2013 at 12:31 PM, Brett Cannon wrote: > But how about this as a compromise over introducing write_module(): > invalidate_caches() can take a path for something to specifically > invalidate. The path can then be passed to the invalidate_caches() on > sys.meta_path. In the case of PathFinder it would take that path, try to > find the directory in sys.path_importer_cache, and then invalidate the most > specific finder for that path (if there is one that has any directory prefix > match). > > Lots of little details to specify (e.g. absolute path forced anywhere in > case a relative path is passed in by sys.path is all absolute paths? How do > we know something is a file if it has not been written yet?), but this would > prevent importlib from subsuming file writing specifically for source files > and minimize performance overhead of invalidating all caches for a single > file. ISTR that when we were first discussing caching, I'd proposed a TTL-based workaround for the timestamp granularity problem, and it was mooted because somebody already proposed and implemented a similar idea. But my approach -- or at least the one I have in mind now -- would provide an "eventual consistency" guarantee, while still allowing fast startup times. However I think the experience with this heuristic so far shows that the real problem isn't that the heuristic doesn't work for the normal case; it works fine for that. Instead, what happens is that *it doesn't work when you generate modules*. And *that* problem can be fixed without even invalidating the caches: it can be fixed by doing some extra work when writing a module - e.g. by making sure the directory mtime changes again after the module is written. For example, create the module under a temporary name, verify that the directory mtime is different than it was before, then keep renaming it to different temporary names until the mtime changes again, then rename it to the final name. (This would be very fast on some platforms, much slower on others, but the OS itself would tell you when it had worked.) A utility routine to "write_module()" or "write_package()" would be easier to find than advice that says to invalidate the cache under thus-and-such conditions, as it would show up in searches for writing modules dynamically or creating modules dynamically, where you could only search for info about the cache if you knew the cache existed. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] wsgi validator with asynchronous handlers/servers
On Sat, Mar 23, 2013 at 3:05 PM, Luca Sbardella wrote: > The pseudocode above does yields bytes before start_response, but they are > not *body* bytes, they are empty bytes so that the asynchronous wsgi server > releases the eventloop and call back at the next eventloop iteration. > > I'm I misinterpreting the pep, or the wsgi validator should be fixed > accordingly? The validator is correct for the spec. You *must* call start_response() before yielding any strings at all. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Safely importing zip files with C extensions
On Wed, Mar 27, 2013 at 5:19 PM, Bradley M. Froehle wrote: > I implemented just such a path hook zipimporter plus the magic required > for C extensions --- as a challenge to myself to learn more about the Python > import mechanisms. > > See https://github.com/bfroehle/pydzipimport. FYI, there appears to be a bug for Windows with packages: you're using '/__init__' in a couple places that should actually be os.sep+'__init__'. This does seem like a good way to address the issue, for those rare situations where this would be a good idea. The zipped .egg approach was originally intended for user-managed plugin directories for certain types of extensible platforms, where "download a file and stick it in the plugins directory" is a low-effort way to install plugins, without having to build a lot of specialized install capability. As Jim has pointed out, though, this doesn't generalize well to a full-blown packaging system. Technically, you can blame Bob Ippolito for this, since he's the one who talked me into using eggs to install Python libraries in general, not just as a plugin packaging mechanism. ;-) That being said, *unpacked* egg, er, wheels, are still a great way to meet all of the "different apps needing different versions" use cases (without needing one venv per app), and nowadays the existence of automated installer tools means that using one to install a plugin for a low-tech plugin system is not a big deal, as long as that tool supports the simple unpacked wheel scenario. So I wholeheartedly support some kind of mount/unmount or "require"-type mechanism for finding plugins. pkg_resources even has an API for handling simple dynamic plugin dependency resolution scenarios: http://peak.telecommunity.com/DevCenter/PkgResources#locating-plugins It'd be a good idea if distlib provides a similar feature, or at least the APIs upon which apps or frameworks can implement such features. (Historical note for those who weren't around back then: easy_install wasn't even an *idea* until well after eggs were created; the original idea was just that people would build plugins and libraries as eggs and manually drop them in directories, where a plugin support library would discover them and add them to sys.path as needed. And Bob and I also considered a sort of "update site" mechanism ala Eclipse, with a library to let apps fetch plugins. But as soon as eggs existed and PyPI allowed uploads, it was kind of an obvious follow-up to make an installation tool as a kind of "technology demonstration" which promptly became a monster. The full story with all its twists and turns can also be found here: http://mail.python.org/pipermail/python-dev/2006-April/064145.html ) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can I introspect/reflect to get arguments exec()?
On Tue, Mar 26, 2013 at 11:00 PM, Rocky Bernstein wrote: > Okay. But is the string is still somewhere in the CPython VM stack? (The > result of LOAD_CONST 4 above). Is there a way to pick it up from there? Maybe using C you could peek into the frame's value stack, but that's not exposed to any Python API I know of. But that still doesn't help you, because the value will be removed from the stack before exec() is actually called, which means if you go looking for it in code called from the exec (e.g. the call event itself), you aren't going to see the data. > At the point that we are stopped the exec action hasn't taken place yet. That doesn't help if you're using line-level tracing events. At the beginning of the line, the data's not on the call stack yet, and by the time you enter the frame of the code being exec'd, it'll be off the stack again. Basically, there is no way to do what you're asking for, short of replacing the built-in exec function with your own version. And it still won't help you with stepping through the source of functions that are *compiled* using an exec() or compile(), or any other way of ending up with dynamically-generated code you want to debug. (Unless you use something like DecoratorTools to generate it, that is -- DecoratorTools has some facilities for caching dynamically-generated code so that it works properly with debuggers. But that has to be done by the code doing the generation, not the debugger. If the code generator uses DecoratorTools' caching support, then any debugger that uses the linecache module will Just Work. It might be nice for the stdlib should have something like this, but you could also potentially fake it by replacing the builtin eval, exec, compile, etc. functions w/versions that cache the source.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can I introspect/reflect to get arguments exec()?
On Thu, Mar 28, 2013 at 6:43 AM, Rocky Bernstein wrote: > Of course the debugger uses sys.settrace too, so the evil-ness of that is > definitely not a concern. But possibly I need to make sure that since the > DecoratorTools and the debugger both hook into trace hooks they play nice > together and fire in the right order. DecoratorTools' trace hooking is unrelated to its linecache functionality. All you need from it is the cache_source() function; you can pretty much ignore everything else for your purposes. You'll just need to give it a phony filename to work with, and the associated string. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] relative import circular problem
On Thu, Apr 4, 2013 at 11:17 AM, Guido van Rossum wrote: > I don't really see what we could change to avoid breaking code in any > particular case Actually, the problem has *nothing* to do with circularity per se; it's that "import a.b" and "from a import b" behave differently in terms of how they obtain the module 'a.b'... And "from a import b" will *always* fail if 'a.b' is part of a cycle with the current module, whereas "import a.b" will *always* succeed. The workaround is to tell people to always use "import a.b" in the case of circular imports; it's practically a FAQ, at least to me. ;-) The problem with "from import" is that it always tries to getattr(a,'b'), even if 'a.b' is in sys.modules. In contrast, a plain import will simply fetch a.b from sys.modules first. In the case of a normal import, this is no problem, because a.b is set to sys.modules['a.b'] at the end of the module loading process. But if the import is circular, then the module is in sys.modules['a.b'], but *not* yet bound to a.b. So the "from import" fails. So, this is actually an implementation quirk that could be fixed in a (somewhat) straightforward manner: by making "from a import b" succeed if 'a.b' is in sys.modules, the same way "import a.b" does. It would require a little bit of discussion to hash out the exact way to do it, but it could be done. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] relative import circular problem
On Thu, Apr 4, 2013 at 4:42 PM, Guido van Rossum wrote: > I do think it would be fine if "from a import b" returned the > attribute 'b' of module 'a' if it exists, and otherwise look for > module 'a.b' in sys.modules. Technically, it already does that -- but inside of __import__, not in the IMPORT_FROM opcode. But then *after* doing that check-and-fallback, __import__ doesn't assign a.b, because it assumes the recursive import it called has already done this... which means that when __import__ returns, the IMPORT_FROM opcode tries and fails to do the getattr. This could be fixed in one of two ways. Either: 1. Change importlib._bootstrap._handle_fromlist() to set a.b if it successfully imports 'a.b' (inside its duplicate handling for what IMPORT_FROM does), or 2. Change the IMPORT_FROM opcode to handle the fallback itself While the latter involves a bit of C coding, it has fewer potential side-effects on the import system as a whole, and simply ensures that if "import" would succeed, then so would "from...import" targeting the same module. (There might be other fixes I haven't thought of, but really, changing IMPORT_FROM to fallback to a sys.modules check is probably by far the least-invasive way to handle it.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] class name spaces inside an outer function
On Sat, Apr 27, 2013 at 2:27 PM, Ethan Furman wrote: > I filed bug http://bugs.python.org/issue17853 last night. > > If somebody could point me in the right direction (mainly which files to > look in), I'd be happy to attempt a patch. Wow. I had no idea Python actually did this (override class-local references with ; I'd have expected your code to work. I was even more surprised to find that the same thing happens all the way back to Python 2.3. Guess I'm not nearly the wizard of abusing scope rules that I thought I was. ;-) About the only workaround I can see is to put "Season = Season" at the top of a class that uses this inside a function definition, or else to define a special name 'enum' instead and hope that nobody ever tries to define an enumeration inside a function with a local variable named 'enum'. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] enum discussion: can someone please summarize open issues?
On Sun, Apr 28, 2013 at 7:37 PM, Steven D'Aprano wrote: > I have also suggested that that the enum package provide a decorator > which can be used to explicitly flag values to *not* be turned into > enum values. See here: > > http://mail.python.org/pipermail/python-dev/2013-April/125641.html In that example, 'food = property(lambda:"skip")' would work in a pinch. (Granted, it wouldn't be a *class* attribute, but you can make a class attribute by assiging it after class creation is completed.) And if you want to make your enum instances callable, ISTM the right (or at least the One Obvious) way to do it is to add a __call__ method to the class. > Even if the Enum class doesn't support this feature, I ask that it be > written in such a way that a subclass could add it (i.e. please expose > a public method for deciding what to exclude). Since you can exclude anything by it having a __get__ method, or include it by making it *not* have a __get__ method, I'm not sure what use case you're actually looking for. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fighting the theoretical randomness of "is" on immutables
On Mon, May 6, 2013 at 4:46 AM, Armin Rigo wrote: > This is clearly a language design issue though. I can't really think > of a use case that would break if we relax the requirement, but I > might be wrong. It seems to me that at most some modules like pickle > which use id()-keyed dictionaries will fail to find some > otherwise-identical objects, but would still work (even if tuples are > "relaxed" in this way, you can't have cycles with only tuples). I don't know if I've precisely understood the change you're proposing, but I do know that in PEAK-Rules I use id() as an approximation for "is" in order to build indexes of various "parameter is some_object" conditions, for various "some_objects" and a given parameter. The rule engine takes id(parameter) at call time and then looks it up to obtain a subset of applicable rules. IIUC, this would require that either "x is y" equates to "id(x)==id(y)", or else that there be some way to determine in advance all the possible id(y)s that are now or would ever be "is x", so they can be placed in the index. Otherwise, any use of an "is" condition would require a linear search of the possibilities, as you could not rule out the possibility that a given x was "is" to *some* y already in the index. Of course, rules using "is" tend to be few and far between, outside of some special cases, and their use with simple integers and strings would be downright silly. And on top of that, I'm not even sure whether the "a <= b" notation you used was meant to signify "a implies b" or "b implies a". ;-) But since you mentioned id()-keyed dictionaries and this is another use of them that I know of, I figured I should at least throw it out there for information's sake, regardless of which side of the issue it lands on. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions
On Thu, May 23, 2013 at 11:11 AM, Paul Moore wrote: > Is the debate between 1 and 2, or 1 and 3? Is it even possible to implement > 3 without having 2 different names for "register"? Yes. You could do it as either: @func.register def doit(foo: int): ... by checking for the first argument to register() being a function, or: @func.register() def doit(foo: int): ... by using a default None first argument. In either case, you would then raise a TypeError if there wasn't an annotation. As to the ability to do multiple types registration, you could support it only in type annotations, e.g.: @func.register def doit(foo: [int, float]): ... without it being confused with being multiple dispatch. One other thing about the register API that's currently unspecified in the PEP: what does it return, exactly? I generally lean towards returning the undecorated function, so that if you say: @func.register def do_int(foo: int): ... You still have the option of calling it explicitly. OTOH, some may prefer to treat it like an overload and call it 'func' every time, in which case register should return the generic function. Some guidance as to what should be the official One Obvious Way would be helpful here. (Personally, I usually name my methods explicitly because in debugging it's a fast clue as to which piece of code I should be looking at.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions
On Thu, May 23, 2013 at 2:59 PM, PJ Eby wrote: > I generally lean towards returning the undecorated function, so that if you > say: > > @func.register > def do_int(foo: int): > ... Oops, forgot to mention: one other advantage to returning the undecorated function is that you can do this: @func.register(int) @func.register(float) def do_num(foo): ... Which neatly solves the multiple registration problem, even without argument annotations. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions
On Thu, May 23, 2013 at 6:58 PM, Ben Hoyt wrote: > It seems no one has provided > decent use-case examples (apart from contrived ones) Um, copy.copy(), pprint.pprint(), a bunch of functions in pkgutil which are actually *based on this implementation already* and have been since Python 2.5... I don't see how any of those are contrived examples. If we'd had this in already, all the registration-based functions for copying, pickling, etc. would likely have been implemented this way, and the motivating example for the PEP is the coming refactoring of pprint.pprint. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions
On Thu, May 23, 2013 at 11:57 PM, Nick Coghlan wrote: > We should be able to use it to help deal with the "every growing > importer API" problem, too. I know that's technically what pkgutil > already uses it for, but elevating this from "pkgutil implementation > detail" to "official stdlib functionality" should make it easier to > document properly :) Oh, that reminds me. pprint() is actually an instance of a general pattern that single dispatch GF's are good for: "visitor pattern" algorithms. There's a pretty good write-up on the general issues with doing visitor pattern stuff in Python, and how single-dispatch GF's can solve that problem, here: http://peak.telecommunity.com/DevCenter/VisitorRevisited The code samples use a somewhat different API from the PEP, but it's pretty close. The main issues solved are eliminating monkeypatching and fixing the inheritance problems that occur when you use 'visit_foo' methods. One of the samples actually comes from the old 'compiler' package in the stdlib... which tells you how long ago I did the write-up. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions (including ABC support)
On Sat, May 25, 2013 at 8:08 AM, Łukasz Langa wrote: > The most important > change in this version is that I introduced ABC support and completed > a reference implementation. Excellent! A couple of thoughts on the implementation... While the dispatch() method allows you to look up what implementation would be *selected* for a target type, it does not let you figure out whether a particular method has been *registered* for a type. That is, if I have a class MyInt that subclasses int, I can't use dispatch() to check whether a MyInt implementation has been registered, because I might get back an implementation registered for int or object. ISTM there should be some way to get at the raw registration info, perhaps by exposing a dictproxy for the registry. Second, it should be possible to memoize dispatch() using a weak key dictionary that is cleared if new ABC implementations have been registered or when a call to register() is made. The way to detect ABC registrations is via the ABCMeta._abc_invalidation_counter attribute: if its value is different than the previous value saved with the cache, the cache must be cleared, and the new value stored. (Unfortunately, this is a private attribute at the moment; it might be a good idea to make it public, however, because it's needed for any sort of type dispatching mechanism, not just this one particular generic function implementation.) Anyway, doing the memoizing in the wrapper function should bring the overall performance very close to a hand-written type dispatch. Code might look something like: # imported inside closure so that functools module # doesn't force import of these other modules: # from weakref import ref, WeakKeyDictionary from abc import ABCMeta cache = WeakKeyDictionary() valid_as_of = ABCMeta._abc_invalidation_counter def wrapper(*args, **kw): nonlocal valid_as_of if valid_as_of != ABCMeta._abc_invalidation_counter: cache.clear() valid_as_of = ABCMeta._abc_invalidation_counter cls = args[0].__class__ try: impl = cache.data[ref(cls)] except KeyError: impl = cache[cls] = dispatch(cls) return impl(*args, **kw) def register(typ, func=None): ... cache.clear() ... This would basically eliminate doing any extra (Python) function calls in the common case, and might actually be faster than my current simplegeneric implementation on PyPI (which doesn't even do ABCs at the moment). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions (including ABC support)
On Sat, May 25, 2013 at 10:59 AM, Nick Coghlan wrote: > Given the global nature of the cache invalidation, it may be better as > a module level abc.get_cache_token() function. Well, since the only reason to ever use it is to improve performance, it'd be better to expose it as an attribute than as a function. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] __subclasses__() return order
On Sat, May 25, 2013 at 9:18 AM, Antoine Pitrou wrote: > In http://bugs.python.org/issue17936, I proposed making tp_subclasses > (the internal container implementing object.__subclasses__) a dict. > This would make the return order of __subclasses__ completely > undefined, while it is right now slightly predictable. I have never seen > __subclasses__ actually used in production code, so I'm wondering > whether someone might be affected by such a change. FWIW, when I've used __subclasses__, I've never depended on it having a stable or predictable order. (I find it somewhat difficult to imagine *why* one would do that, but of course that doesn't mean nobody has done it.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions (including ABC support)
On Sat, May 25, 2013 at 4:16 PM, Łukasz Langa wrote: > So, the latest document is live: > http://www.python.org/dev/peps/pep-0443/ > > The code is here: > http://hg.python.org/features/pep-443/file/tip/Lib/functools.py#l363 > > The documentation here: > http://hg.python.org/features/pep-443/file/tip/Doc/library/functools.rst#l189 Code and tests look great! Nitpick on the docs and PEP, though: generic functions are not composed of functions sharing the same name; it would probably be more correct to say they're composed of functions that perform the same operations on different types. (I think the "names" language might be left over from discussion of *overloaded* functions in PEP 3124 et al; in any case we're actually recommending people *not* use the same names now, so it's confusing.) We should probably also standardize on the term used for the registered functions. The standard terminology is "method", but that would be confusing in Python, where methods usually have a self argument. The PEP uses the term "implementation", and I think that actually makes a lot of sense: a generic function is composed of functions that implement the same operation for different types. So I suggest changing this: """ Transforms a function into a single-dispatch generic function. A **generic function** is composed of multiple functions sharing the same name. Which form should be used during a call is determined by the dispatch algorithm. When the implementation is chosen based on the type of a single argument, this is known as **single dispatch**. Adding an overload to a generic function is achieved by using the :func:`register` attribute of the generic function. The :func:`register` attribute is a decorator, taking a type paramater and decorating a function implementing the overload for that type.""" to: """ Transforms a function into a single-dispatch generic function. A **generic function** is composed of multiple functions implementing the same operation for different types. Which implementation should be used during a call is determined by the dispatch algorithm. When the implementation is chosen based on the type of a single argument, this is known as **single dispatch**. Adding an implementation to a generic function is achieved by using the :func:`register` attribute of the generic function. The :func:`register` attribute is a decorator, taking a type paramater and decorating a function implementing the operation for that type.""" And replacing "overload" with "implementation" in the remainder of the docs and code. Last, but not least, there should be a stacking example somewhere in the doc, as in the PEP, and perhaps the suggestion to name individual implementations differently from each other and the main function -- perhaps as an adjunct to documenting that register() always returns its argument unchanged. (Currently, it doesn't mention what register()'s return value is.) (It may also be useful to note somewhere that, due to caching, changing the base classes of an existing class may not change what implementation is selected the next time the generic function is invoked with an argument of that type or a subclass thereof.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 443 - Single-dispatch generic functions (including ABC support)
On Tue, May 28, 2013 at 3:41 PM, Russell E. Owen wrote: > Is it true that this cannot be used for instance and class methods? It > dispatches based on the first argument, which is "self" for instance > methods, whereas the second argument would almost certainly be the > argument one would want to use for conditional dispatch. You can use a staticmethod and then delegate to it, of course. But it probably wouldn't be too difficult to allow specifying which argument to dispatch on, e.g.: @singledispatch.on('someArg') def my_method(self, someArg, ...): ... The code would look something like this: def singledispatch(func, argPosn=0): ... # existing code here... ... def wrapper(*args, **kw): return dispatch(args[argPosn].__class__)(*args, **kw) # instead of args[0] def _dispatch_on(argname): def decorate(func): argPosn = # code to find argument position of argname for func return dispatch(func, argPosn) return decorate singledispatch.on = _dispatch_on So, it's just a few lines added, but of course additional doc, tests, etc. would have to be added as well. (It also might be a good idea for there to be some error checking in wrapper() to raise an approriate TypeError if len(args)<=arg.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] doctest and pickle
On Fri, Jun 7, 2013 at 1:54 PM, Mark Janssen wrote: > On Fri, Jun 7, 2013 at 10:50 AM, Mark Janssen > wrote: >>> >>> from pickle import dumps, loads >>> >>> Fruit.tomato is loads(dumps(Fruit.tomato)) >>> True >> >> Why are you using is here instead of ==? You're making a circular >> loop using "is" > > I should add that when you're serializing with pickle and then > reloading, the objects should be seen as "essentially equivalent". > This means that they are either byte-by-byte equivalent (not sure > actually if Python actually guarantees this), or every element would > still compare equal and that is what matters. For global objects such as functions and classes -- and singletons such as None, Ellipsis, True, and False -- pickling and unpickling is actually supposed to retain the "is" relationship as well. I don't know if enums *actually* preserve this invariant, but my default expectation of the One Obvious Way would be that enums, being uniquely-named objects that know their name and container, should be considered global objects in the same fashion as classes and functions, *and* that as singletons, they'd also be treated in the same way as None, Ellipsis, etc. That is, there are two independent precedents for objects like that preserving "is" upon pickling and unpickling. (As another precedent, my own SymbolType library (available on PyPI) preserves the "is"-ness of its named symbol objects upon pickling and unpickling.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Add reference implementation for PEP 443
On Fri, Jun 7, 2013 at 10:27 AM, Thomas Wouters wrote: > This isn't a new bug, but it's exposed by always importing weakref and > atexit during interpreter startup. I'm wondering if that's really necessary > :) Importing it during startup isn't necessary per se; imports needed only by the generic function implementation can and should be imported late, rather than at the time functools is imported. However, if pkgutil was/is migrated to using this implementation of generics, then it's likely going to end up imported during startup anyway, because at least the -m startup path involves pkgutil. In short, the overall answer right now is, "maybe", and the answer later is "rather likely". ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Add reference implementation for PEP 443
On Fri, Jun 7, 2013 at 5:16 PM, Łukasz Langa wrote: > On 7 cze 2013, at 22:50, PJ Eby wrote: > >> On Fri, Jun 7, 2013 at 10:27 AM, Thomas Wouters wrote: >>> This isn't a new bug, but it's exposed by always importing weakref and >>> atexit during interpreter startup. I'm wondering if that's really necessary >>> :) >> >> In short, the overall answer right now is, "maybe", and the answer >> later is "rather likely". ;-) > > I would rather say that it's "rather certain". > > functools is necessary for setup.py to work while bootstrapping, whereas > pkgutil is used in runpy.py which is always imported in Modules/main.c. > > So we're left with having to fix atexit to support subinterpreters. I wonder > how difficult that will be. If the problem really has to do with interpreter startup, then there actually is a workaround possible, at the cost of slightly hairier code. If dispatch() looked in the registry *first* and avoided the cache in that case, and lazily created the cache (including the weakref import), then interpreter startup would not trigger an import of weakref in the default case. (Of course, depending on whether site/sitecustomize results in the use of importer subclasses and such, this might not help. It might be necessary to take the even more complex tack of avoiding the use of a cache entirely until an ABC is registered, and walking mro's.) Anyway, it remains to be seen whether these workarounds are easier or more difficult than fixing the atexit problem. ;-) Hmm... actually, there are a couple other ways around this. singledispatch doesn't use finalize(), so it doesn't really need atexit. It doesn't even do much with WeakKeyDictionary, so it could actually just "from _weakref import ref", and inline the relevant operations. Or, WeakKeyDictionary could be pulled out into a separate module, where singledispatch could pull it from without importing finalize. Or, weakref.finalize could be fixed so that the atexit import and register() are deferred until actual use. (Of all of these, that last one actually sounds like the least invasive workaround, with fewest lines of code likely to be changed.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] eval and triple quoted strings
On Fri, Jun 14, 2013 at 2:11 PM, Ron Adam wrote: > > > On 06/14/2013 10:36 AM, Guido van Rossum wrote: >> >> Not a bug. The same is done for file input -- CRLF is changed to LF before >> tokenizing. > > > > Should this be the same? > > > python3 -c 'print(bytes("""\r\n""", "utf8"))' > b'\r\n' > > eval('print(bytes("""\r\n""", "utf8"))') > b'\n' No, but: eval(r'print(bytes("""\r\n""", "utf8"))') should be. (And is.) What I believe you and Walter are missing is that the \r\n in the eval strings are converted early if you don't make the enclosing string raw. So what you're eval-ing is not what you think you are eval-ing, hence the confusion. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Classes with ordered namespaces
On Thu, Jun 27, 2013 at 4:48 AM, Nick Coghlan wrote: > I'd be tempted to kill PEP 422 as not worth the hassle if we did this. I assume you mean the "namespace" keyword part of PEP 422, since PEP 422's class initialization feature is entirely orthogonal to definition order or namespace customization. (Indeed, I cannot recall a single instance of class initialization in my code that actually *cared* about definition order. Certainly I haven't had any situations where a pre-existing definition order would've eliminated the need for a class-level initialization.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Redesigning extension modules
On Fri, Aug 23, 2013 at 4:50 AM, Stefan Behnel wrote: > Reloading and Sub-Interpreters > == > > To "reload" an extension module, the module create function is executed > again and returns a new module type. This type is then instantiated as by > the original module loader and replaces the previous entry in sys.modules. > Once the last references to the previous module and its type are gone, both > will be subject to normal garbage collection. I haven't had a chance to address this on the import-sig discussion yet about ModuleSpec, but I would like to just mention that one property of the existing module system that I'm not sure either this proposal or the ModuleSpec proposal preserves is that it's possible to implement lazy importing of modules using standard reload() semantics. My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module. Because the dummy module doesn't have __file__ or anything else initialized, the import system searches for the module and then loads it, reusing the existing module object, even though it's actually only executing the module code for the first time. That the existing object be reused is important, because once the dummy is in sys.modules, it can also be imported by other modules, so references to it can abound everywhere, and we wish only for it to be loaded lazily, without needing to trace down and replace all instances of it. This also preserves other invariants of the module system. Anyway, the reason I was asking why reloading is being handled as a special case in the ModuleSpec proposal -- and the reason I'm curious about certain provisions of this proposal -- is that making the assumption you can only reload something with the same spec/location/etc. it was originally loaded with, and/or that if you are reloading a module then you previously had a chance to do things to it, doesn't jibe with the way things work currently. That is to say, in the pure PEP 302 world, there is no special status for "reload" that is different from "load" -- the *only* thing that's different is that there is already a module object to use, and there is *no guarantee that it's a module object that was initialized by the loader now being invoked*. AFAICT both this proposal and the ModuleSpec one are making an invalid assumption per PEP 302, and aren't explicitly proposing to change the status quo: they just assume things that aren't actually assured by the prior specs or implementations. So, for example, this extension module proposal needs to cover what happens if an extension module is reloaded and the module object is not of the type or instance it's expecting. Must it do its own checking? Error handling? Will some other portion of the import system be expected to handle it? For that matter, what happens (in either proposal) if you reload() a module which only has a __name__, and no other attributes? I haven't tested with importlib, but with earlier Pythons this results in a standard module search being done by reload(). But the ModuleSpec proposal and this one seem to assume that a reload()-ed module must already be associated with a loader, location, and/or spec. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sys.intern should work on bytes
On Fri, Sep 20, 2013 at 9:54 AM, Jesus Cea wrote: > Why str/bytes doesn't support weakrefs, beside memory use? The typical use case for weakrefs is to break reference cycles, but str and bytes can't *be* part of a reference cycle, so outside of interning-like use cases, there's no need for weakref support there. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Rename contextlib.ignored() to contextlib.ignore().
On Sun, Oct 13, 2013 at 10:05 AM, Antoine Pitrou wrote: > And for the record, it's not *my* objection; several other core > developers have said -1 too: Ezio, Serhiy, Giampaolo, etc. FWIW, I'm -1 also; the thread quickly convinced me that this is a horrible idea, at least with the current name. The feature itself I consider +0, maybe +0.5 if a good but short name can be found. I kind of like "abort_on()" as an accurate description of what it actually does, but it most certainly does not *ignore* exceptions, and it's going to create problems as soon as anybody adds more than one statement to the block, and then reads their code back without *really* thinking about it. Not to mention how it's going to bite people who copy and modify code snippets containing it. On Sun, Oct 13, 2013 at 11:11 AM, Nick Coghlan wrote: > It's just as broken as the try/except equivalent. I consider that a > feature, not a bug. (Note: the following rant is about the *name*, not the context manager itself.) Misleadingness and lack of readability is not a feature, it's a bug. For example, even though I've been coding in Python since 1997, and even participated in the original design process for "with", I *still* misread the "with ignore:" block as ignoring the exceptions until I *really* thought about it. Wait, no, I misread it *entirely*, until somebody *else* pointed it out. ;-) And this is *despite* knowing on a gut level that *actually* ignoring all the errors in a block *isn't possible in Python*. I would not give most people much chance of noticing they made this mistake, and even less chance of finding the problem afterwards. This is like the infamous Stroop test, where you have a word like "brown" only it's printed in blue ink and you have to say "blue" to get the answer right. If you've never taken a Stroop test, by the way, it's *really* hard. It almost literally makes your brain *hurt* to disregard the text and say the ink color instead, because your brain automatically reads the word before you can stop it, so you are straining to stop yourself from saying it so you can then try to *think* what color you're supposed to say, and then your brain reads the word *again*, and... well, it's really quite unpleasant is what it is. Anyway, this feature, with its current name, is just the same: you have to override your instinctive response to understand what it *really* does, in any but the one-liner case. And you have to do it *every time you read it in code*. Only, because it'll mostly be used in the one-line case, you'll get used to it being correct, until one day you make a change without thinking, and create a bug that lies dormant for an extended period. Plus, as soon as people see it being used, they'll think, "oh cool", and use it in their code, not even knowing or thinking that it does something they don't want, because they will never read the docs in the first place. (As Guido says, people learn languages by example.) So call it "catching". Call it "catch_and_exit_on". Even "passing" or "skipping" would be better. And maybe "abort_on" or "abort_without_raising" would be better still, as they describe what will *really* happen. But calling it "ignore" isn't "fits your brain", it's "abuses your brain in a cruelly misleading manner". ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Rename contextlib.ignored() to contextlib.ignore().
On Sun, Oct 13, 2013 at 1:58 PM, Alexander Belopolsky wrote: > People who write code using contextlib > are expected to know People who *read* that code while learning Python cannot be expected to know that it is not really possible to ignore errors in Python. If this feature is used under any name that implies such, it will within a few years become a FAQ and well-known wart, not to mention a meme that "contextlib.ignore() is buggy, it only works if the error is thrown from a single operation performed in C". I say this latter phrasing because now that I've had time to think about it, it is not at all merely a question of whether you wrap a single line or single operation. Quick, is this a safe use, or not: with ignore(OSError): delete_widget_files(spam) It sort of depends on the implementation of delete_widget_files, doesn't it? In contrast: with abort_on(OSError): delete_widget_files(spam) it's immediately clear that the error isn't going to be ignored; the operation will be aborted. Very different concept. > that it is not a good idea to keep resources > multiple unrelated statements within the > with block will raise a mental red flag. How will someone know this when they are reading code they found on the internet? It's one thing to have an operation whose name implies, "you need to do more research to understand this". But it's an entirely different (and dangerous) thing to have an operation whose name implies you already know everything you need to know, no need to think or study further... especially if what you know is actually wrong! > It is also easy for > lint-like tools to warn about abuse of ignore(). Since it's not sufficient to require a single operation, how will a lint-like tool check this? For example: with ignore(AnError, OtherError): ping.pongy(foo, bar.spam(), fizzy()) Is this valid code, or not? If you can't tell, how will a non-human lint tool tell? > Let's not try to improve readability of bad code Actually, I want the good code to be "readable" in the sense of understanding what the operation does, so that people copying it don't end up with a serious misunderstanding of how the context manager actually works. There is no way that naive users aren't going to read it as ignoring errors, and use it with something like: with ignore(OSError): for f in myfiles: os.unlink(f) But this version is obviously *wrong*: with abort_on(OSError): for f in myfiles: os.unlink(f) Upon looking at this code, you will quickly realize that you don't intend to abort the loop, only the unlink, and will therefore rewrite it to put the loop on the outside. So, I am only trying to "improve readability of bad code" in the sense of making it *obvious* that the code is in fact bad. ;-) (To put it another way, "ignore()" improves the readability of bad code in the above example, because it makes the bad code look like it's good.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Rename contextlib.ignored() to contextlib.ignore().
On Tue, Oct 15, 2013 at 11:52 AM, R. David Murray wrote: > I think 'trap' would be much clearer. +1. Short and sweet, and just ambiguous enough that you don't leap to the conclusion that the error is ignored. I agree that "suppress" is basically a synonym for "ignore"; trap at least *implies* some kind of control flow change, which is what's needed to prevent misconceptions. Personally, I would rate "catch" higher than "trap" because it further implies that it is catching a thrown exception, but would compromise to "trap" if that'll end the thread sooner. ;-) > What about making the context > manager provide the trapped exception, in a fashion similar to > what assertRaises does? Sadly, that won't work, since context managers provide a value *before* the block is executed, not after. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Rename contextlib.ignored() to contextlib.ignore().
On Tue, Oct 15, 2013 at 8:57 AM, Nick Coghlan wrote: > So, having been convinced that "ignore" was the wrong choice of name, > reviewing the docs made it clear to me what the name *should* be. >From the point of view of code *outside* a block, the error is indeed suppressed. But, as one of those examples actually points out, what's happening from the POV *inside* the block is that the exception is "trapped". So using "suppress" creates an ambiguity: are we suppressing these errors *inside* the block, or *outside* the block? The way it actually works is errors are suppressed from the code *surrounding* the block, but the word can equally be interpreted as suppressing errors *inside* the block, in exactly the same way that "ignore" can be misread. So, if we're going with words that have precedent in the doc, the term "trap", as used here: > "If an exception is trapped merely in order to log it or to perform > some action (rather than to suppress it entirely), the generator must > reraise that exception." is the only one used to describe the POV from inside the block, where the error is... well, being trapped. ;-) It is a more apt description of what actually happens, even if it's only usable for the specific use case where an exception is trapped in order to suppress it. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the built-in virtualenv functionality in 3.3
On Thu, Oct 6, 2011 at 12:02 PM, Barry Warsaw wrote: > Well, I have to be honest, I've *always* thought "nest" would be a good > choice > for a feature like this, but years ago (IIRC) PJE wanted to reserve that > term > for something else, which I'm not sure ever happened. > Actually, it was pretty much for this exact purpose -- i.e. it was the idea of a virtual environment. Ian just implemented it first, with some different ideas about configuration and activation. Since this is basically the replacement for that, I don't have any objection to using the term here. (In my vision, "nest" was also the name of a package management tool for creating such nests and manipulating their contents, though.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Packaging and binary distributions for Python 3.3
On Sun, Oct 9, 2011 at 3:15 AM, Éric Araujo wrote: > After all, if setuptools and then pkg_resources were turned > down for inclusion in Python 2.5, it’s not now that we have packaging that we’ll change our mind and just bless eggs. Actually, that's not what happened. I withdrew the approved-by-Guido, announced-at-PyCon, and already-in-progress implementation, both because of the lack of package management features, and because of support concerns raised by Fredrik Lundh. (At that time, the EggFormats doc didn't exist, and there were not as many people familiar with the design or code as there are now.) For the full statement, see: http://mail.python.org/pipermail/python-dev/2006-April/064145.html (The withdrawal is after a lot of background on the history of setuptools and what it was designed for.) In any case, it definitely wasn't the case that eggs or setuptools were rejected for 2.5; they were withdrawn for reasons that didn't have anything to do with the format itself. (And, ironically enough, AFAIK the new packaging module uses code that's actually based on the bits of setuptools Fredrik was worried about supporting... but at least there now are more people providing that support.) What we can do however > is to see what bdist_egg does and define a new bdist command inspired by > it, but without zipping, pkg_resource calls, etc. > Why? If you just want a dumb bdist format, there's already bdist_dumb. Conversely, if you want a smarter format, why reinvent wheels? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Packaging and binary distributions for Python 3.3
On Sun, Oct 9, 2011 at 4:14 PM, Paul Moore wrote: > As regards the format, bdist_dumb is about the right level - but > having just checked it has some problems (which if I recall, have been > known for some time, and are why bdist_dumb doesn't get used). > Specifically, bdist_dumb puts the location of site-packages ON THE > BUILD SYSTEM into the archive, making it useless for direct unzipping > on a target system which has Python installed somewhere else. > I don't know about the case for packaging/distutils2, but I know that in original distutils, you can work around this by making bdist_dumb call the install commands with different arguments. That is, it's a relatively shallow flaw in bdist_dumb. bdist_wininst, for example, is basically a zipped bdist_dumb with altered install arguments and an .exe header tacked on the front. (Along with a little extra data crammed in between the two.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bring new features to older python versions
On Tue, Oct 11, 2011 at 12:14 PM, Toshio Kuratomi wrote: > This may not be the preferred manner to write decorators but it's fairly > straightforward and easy to remember compared to, say, porting away from > the > with statement. > You can emulate 'with' using decorators, actually, if you don't mind a nested function. Some code from my Contextual library (minus the tests): *def* *call_with*(ctxmgr): *"""Emulate the PEP 343 "with" statement for Python versions <2.5 The following examples do the same thing at runtime:: Python 2.5+ Python 2.4 - with x as y: @call_with(x) print y def do_it(y): print y ``call_with(foo)`` returns a decorator that immediately invokes the function it decorates, passing in the same value that would be bound by the ``as`` clause of the ``with`` statement. Thus, by decorating a nested function, you can get most of the benefits of "with", at a cost of being slightly slower and perhaps a bit more obscure than the 2.5 syntax. Note: because of the way decorators work, the return value (if any) of the ``do_it()`` function above will be bound to the name ``do_it``. So, this example prints "42":: @call_with(x) def do_it(y): return 42 print do_it This is rather ugly, so you may prefer to do it this way instead, which more explicitly calls the function and gets back a value:: def do_it(y): return 42 print with_(x, do_it) """* *return* with_.__get__(ctxmgr, type(ctxmgr)) *def* *with_*(ctx, func): *"""Perform PEP 343 "with" logic for Python versions <2.5 The following examples do the same thing at runtime:: Python 2.5+ Python 2.3/2.4 -- with x as y: z = with_(x,f) z = f(y) This function is used to implement the ``call_with()`` decorator, but can also be used directly. It's faster and more compact in the case where the function ``f`` already exists. """* inp = ctx.__enter__() *try*: retval = func(inp) *except*: *if* *not* ctx.__exit__(*sys.exc_info()): *raise* *else*: ctx.__exit__(None, None, None) *return* retval This version doesn't handle the multi-context syntax of newer pythons, but could probably be extended readily enough. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP397 no command line options to python?
On Mon, Oct 17, 2011 at 8:55 AM, Sam Partington wrote: > Yes it is a bit annoying to have to treat those specially, but other > than -c/-m it does not need to understand pythons args, just check > that the arg is not an explicit version specifier. -q/-Q etc have no > impact on how to treat the file. > > In fact there's no real need to treat -c differently as it's extremely > unlikely that there is a file that might match. But for -m you can > come up with a situation where if you it gets it wrong. e.g. 'module' > and 'module.py' in the cwd. > > I would suggest that it is also unlikely that there will be any future > options would need any special consideration. > What about -S (no site.py) and -E (no environment)? These are needed for secure setuid scripts on *nix; I don't know how often they'd be used in practice on Windows. (Basically, they let you isolate a script's effective sys.path; there may be some use case overlap with virtual envs. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP397 no command line options to python?
On Mon, Oct 17, 2011 at 8:00 PM, Mark Hammond wrote: > On 18/10/2011 3:24 AM, PJ Eby wrote: > >> What about -S (no site.py) and -E (no environment)? These are needed >> for secure setuid scripts on *nix; I don't know how often they'd be used >> in practice on Windows. (Basically, they let you isolate a script's >> effective sys.path; there may be some use case overlap with virtual envs. >> > > It is worth pointing out that options can be specified directly in the > shebang line - eg, a line like "#! /usr/bin/python -S" in a foo.py works as > expected. Ah, ok. Never mind, then. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Packaging and binary distributions
On Sun, Oct 30, 2011 at 6:52 PM, Paul Moore wrote: > On 30 October 2011 18:04, Ned Deily wrote: > > Has anyone analyzed the current packages on PyPI to see how many provide > > binary distributions and in what format? > > A very quick and dirty check: > > dmg: 5 > rpm: 12 > msi: 23 > dumb: 132 > wininst: 364 > egg: 2570 > > That's number of packages with binary distributions in that format. > It's hard to be sure about egg distributions, as many of these could > be pure-python (there's no way I know, from the PyPI metadata, to > check this). > FYI, the egg filename will contain a distutils platform identifier (e.g. 'win32', 'macosx', 'linux', etc.) after the 'py2.x' tag if the egg is platform-specific. Otherwise, it's pure Python. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Packaging and binary distributions
Urgh. I guess that was already answered. Guess this'll teach me not to reply to a thread before waiting for ALL the messages to download over a low-bandwidth connection... (am on the road at the moment and catching up on stuff in spare cycles - sorry for the noise) On Fri, Nov 4, 2011 at 10:24 PM, PJ Eby wrote: > On Sun, Oct 30, 2011 at 6:52 PM, Paul Moore wrote: > >> On 30 October 2011 18:04, Ned Deily wrote: >> > Has anyone analyzed the current packages on PyPI to see how many provide >> > binary distributions and in what format? >> >> A very quick and dirty check: >> >> dmg: 5 >> rpm: 12 >> msi: 23 >> dumb: 132 >> wininst: 364 >> egg: 2570 >> >> That's number of packages with binary distributions in that format. >> It's hard to be sure about egg distributions, as many of these could >> be pure-python (there's no way I know, from the PyPI metadata, to >> check this). >> > > FYI, the egg filename will contain a distutils platform identifier (e.g. > 'win32', 'macosx', 'linux', etc.) after the 'py2.x' tag if the egg is > platform-specific. Otherwise, it's pure Python. > > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 382 specification and implementation complete
On Sun, Nov 6, 2011 at 7:29 AM, Nick Coghlan wrote: > I think this was based on the assumption that *existing* namespace > package approaches would break under the new scheme. Since that is not > the case, I suspect those previous objections were overstated (and all > packaging related code manages to cope well enough with modules where > the file name doesn't match the package name) > I was actually referring to all the code that does things like split package names on '.' and then use os.path.join, or that makes assumptions which are the moral equivalent of that. PEP 402's version of namespace packages should break less of that sort of code than adding a directory name extension. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Sat, Nov 26, 2011 at 11:53 AM, Éric Araujo wrote: > > Le 11/08/2011 20:30, P.J. Eby a écrit : > >> At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote: > >>> I’ll just regret that it's not possible to provide a module docstring > >>> to inform that this is a namespace package used for X and Y. > >> It *is* possible - you'd just have to put it in a "zc.py" file. IOW, > >> this PEP still allows "namespace-defining packages" to exist, as was > >> requested by early commenters on PEP 382. It just doesn't *require* > >> them to exist in order for the namespace contents to be importable. > > That’s quite cool. I guess such a namespace-defining module (zc.py > here) would be importable, right? Yes. > Also, would it cause worse > performance for other zc.* packages than if there were no zc.py? > No. The first import of a subpackage sets up the __path__, and all subsequent imports use it. > >>> A pure virtual package having no source file, I think it should have no >>> __file__ at all. > > Antoine and someone else thought likewise (I can find the link if you > want); do you consider it consensus enough to update the PEP? > Sure. At this point, though, before doing any more work on the PEP I'd like to have some idea of whether there's any chance of it being accepted. At this point, there seems to be a lot of passive, "Usenet nod syndrome" type support for it, but little active support. It doesn't help at all that I'm not really in a position to provide an implementation, and the persons most likely to implement have been leaning somewhat towards 382, or wanting to modify 402 such that it uses .pyp directory extensions so that PEP 395 can be supported... And while 402 is an extension of an idea that Guido proposed a few years ago, he hasn't weighed in lately on whether he still likes that idea, let alone whether he likes where I've taken it. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
On Fri, Dec 9, 2011 at 10:11 AM, Barry Warsaw wrote: > As Chris points out, this seems to be a use case tied to WSGI and PEP > . I > guess it's an unfortunate choice for so recent a PEP, but maybe there was > no > way to do better. For the record, "native strings" are defined the way they are because of IronPython and Jython, which had unicode strings long before CPython. At the time WSGI was developed, the approach for Python 3 (then called "3000") was expected to be similar, and the new I/O system was not (AFAIR) designed yet. All that changed in PEP was introducing *byte* strings (to accommodate the I/O changes), not native strings. In fact, I'm not sure why people are bringing it into this discussion at all: PEP was designed to work well with 2to3, which does the right thing for WSGI code: it converts 2.x "str" to 3.x "str", as it should. If you're writing 2.x WSGI code with 'u' literals, *your code is broken*. WSGI doesn't need 'u' literals and never has. It *does* need b'' literals for stuff that refers to request and response bodies, but everything else should be plain old string literals for the appropriate Python version. It can certainly be useful in many contexts outside of WSGI. > And *only* there, pretty much. ;-) PEP was designed to work with the official upgrade path (2to3), which is why it has a concept of native strings. Thing is, if you mark them with a 'u', you're writing incorrect code for 2.x. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?)
On Fri, Dec 9, 2011 at 11:11 PM, Terry Reedy wrote: > This just gave me the idea of tagging tracebacks with the Python version > number. Something like > > Traceback (Py3.2.2, most recent call last): > > and perhaps with the platform also > > Traceback (most recent call last) [Py3.2.2 on win23]: > > Since computation has stopped, the few extra milliseconds is trivial. This > would certainly help on Python list and the tracker when people do post the > traceback (which they do not always) without version and system (which they > often do not, especially on Python list). It might suggest to people that > this is important info to include. I wonder if this would also help with > tracebacks sent to library/app developers. > Yes, but doctest will need to take this into account, both for its native traceback matcher, and for traceback matches using ellipses. Otherwise you introduce more Python version hell for doctest users. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?)
On Sat, Dec 10, 2011 at 5:30 PM, Terry Reedy wrote: > Is doctest really insisting that the whole line > Traceback (most recent call last): > exactly match, with nothing added? It really should not, as that is not > part of the language spec. This seems like the tail wagging the dog. > It's a regular expression match, actually. The standard matcher ignores everything between the Traceback line (matched by a regex) and the first unindented line that follows in the doctest. However, if you explicitly try to match a traceback with the ellipsis matcher, intending to observe whether certain specific lines are printed, then you wouldn't be using doctest's built-in matcher, and that was the case I was concerned about. However, as it turns out, I was confused about when this latter case occurs: in order to do it, you have to actually intentionally print a traceback (e.g. via traceback.format_exception() and friends), rather than allowing the exception to propagate normally. This doesn't happen nearly as often in my doctests as I thought it did, but if format_exception() changes it'll still affect some people. The other piece I was pointing out was that if you change the message without changing the doctest regex, then pasting an interpreter transcript into a doctest will no longer work, because doctest will think it's trying to match non-error output. So that has to be changed when the exception format changes. So, no actual objection here; just saying that if you don't change that regex, people who create *new* doctests with tracebacks won't be able to get them to work without deleting the version info from their copy-pasted tracebacks. I was also concerned about a situation that, while it exists, does not occur anywhere near as frequently as I thought it would in my own tests, even for things that seriously abuse Python internals and likely can't be ported to Python 3 anyway. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
On Mon, Dec 12, 2011 at 3:40 AM, Chris McDonough wrote: > Truth be told, in the vast majority of WSGI apps only high-level WSGI > libraries (like WebOb and Werkzeug) and standalone middleware really > needs to work with native strings. And the middleware really should be > using the high-level libraries to parse WSGI anyway. So there are a > finite number of places where it's actually a real issue. > And those only if they're using "six" or a similar joint-codebase strategy, *and* using unicode_literals in a 2.x module that also does WSGI. If they're using 2to3 and stick with explicit u'', they'll be fine. Unfortunately, AFAIR, nobody in the PEP discussions brought up either the unicode_literals import OR the strategy of using a common codebase, so 2to3 on plain code and writing new Python3 code were the only porting scenarios discussed. (Not that I'm sure it would've made a difference, as I'm not sure what we could have done differently that would still support simple Python3 code and easy 2to3 porting.) As someone who ported WebOb and other stuff built on top of it to Python > 3 without using "from __future__ import unicode_literals", I'm kinda sad > that to be using best practice I'll have to go back and flip the > polarity on everything. Eh? If you don't need unicode_literals, what's the problem? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
On Tue, Dec 13, 2011 at 11:24 AM, Antoine Pitrou wrote: > On Tue, 13 Dec 2011 15:28:31 +0100 > "Laurence Rowe" wrote: > > > > The approach that most people seem to have settled on for porting > > libraries to Python 3 is to make a single codebase that is compatible > with > > both Python 2 and Python 3, perhaps making use of the six library. > > Do you have evidence that "most" people have settled on that approach? > (besides the couple of library writers who have commented on this > thread) > I've seen more projects doing it that way than maintaining dual code bases. In retrospect, it seems way more attractive than having to run a converter all the time, especially if I could run a "2to6" tool *once* and then simply write new code using six-isms Among other things, it means that: * There's only one codebase * If the conversion isn't perfect, you only have to fix it once * Line numbers are the same * There's no conversion step slowing down development So, I expect that if the approach is at all viable, it'll quickly become the One Obvious Way to do it. In effect, 2to3 is a "purity" solution, but six is more like a "practicality" solution. And if there's official support for it, so much the better. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
On Tue, Dec 13, 2011 at 7:30 PM, Antoine Pitrou wrote: > On Tue, 13 Dec 2011 14:02:45 -0500 > PJ Eby wrote: > > > > Among other things, it means that: > > > > * There's only one codebase > > * If the conversion isn't perfect, you only have to fix it once > > * Line numbers are the same > > * There's no conversion step slowing down development > > > > So, I expect that if the approach is at all viable, it'll quickly become > > the One Obvious Way to do it. > > Well, with all due respect, this is hand-waving. Sure, if it's > viable, then fine. The question is if it's "viable", precisely. That > depends on which project we're talking about. > What I'm saying is that it has many characteristics that are desirable for people who need to support Python 2 and 3 - which is likely the most common use case for library developers. > In effect, 2to3 is a "purity" solution, but > > six is more like a "practicality" solution. > > This sounds like your personal interpretation. I see nothing "pure" in > 2to3. > It's "pure" in being optimized for a world where you just stop using Python 2 one day, and start using 3 the next, without any crossover support. As someone else pointed out, this is a more common case for application developers than for library developers. However, until the libraries are ported, it's harder for the app developers to port their apps. Anyway, if you're supporting both 2 and 3, a common code base offers many attractions, so if it can be done, it will. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Thu, Dec 29, 2011 at 8:32 AM, Christian Heimes wrote: > IMHO we don't have to alter the outcome of hash("some string"), hash(1) > and all other related types. We just need to reduce the change the an > attacker can produce collisions in the dict (and set?) code that looks > up the slot (PyDictEntry). How about adding the random value in > Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / > lookdict_unicode() (Python 3.x)? With this approach the hash of all our > objects stay the same and just the dict code needs to be altered. I don't understand how that helps a collision attack. If you can still generate two strings with the same (pre-randomized) hash, what difference does it make that the dict adds a random number? The post-randomized number will still be the same, no? Or does this attack just rely on the hash *remainders* being the same? If so, I can see how hashing the hash would help. But since the attacker doesn't know the modulus, and it can change as the dictionary grows, I would expect the attack to require matching hashes, not just matching hash remainders... unless I'm just completely off base here. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull wrote: > While the dictionary probe has to start with a hash for backward > compatibility reasons, is there a reason the overflow strategy for > insertion has to be buckets containing lists? How about > double-hashing, etc? > This won't help, because the keys still have the same hash value. ANYTHING you do to them after they're generated will result in them still colliding. The *only* thing that works is to change the hash function in such a way that the strings end up with different hashes in the first place. Otherwise, you'll still end up with (deliberate) collisions. (Well, technically, you could use trees or some other O log n data structure as a fallback once you have too many collisions, for some value of "too many". Seems a bit wasteful for the purpose, though.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin wrote: > Hash functions are already unstable across Python versions. Making > them unstable across interpreter processes (multiprocessing doesn't > share dicts, right?) doesn't sound like a big additional problem. > Users who want a distributed hash table will need to pull their own > hash function out of hashlib or re-implement a non-cryptographic hash > instead of using the built-in one, but they probably need to do that > already to allow themselves to upgrade Python. > Here's an idea. Suppose we add a sys.hash_seed or some such, that's settable to an int, and defaults to whatever we're using now. Then programs that want a fix can just set it to a random number, and on Python versions that support it, it takes effect. Everywhere else it's a silent no-op. Downside: sys has to have slots for this to work; does sys actually have slots? My memory's hazy on that. I guess actually it'd have to be sys.set_hash_seed(). But same basic idea. Anyway, this would make fixing the problem *possible*, while still pushing off the hard decisions to the app/framework developers. ;-) Downside: every hash operation includes one extra memory access, but strings only compute their hash once anyway.) Given that changing dict won't help, and changing the default hash is a non-starter, an option to set the seed is probably the way to go. (Maybe with an environment variable and/or command line option so users can work around old code.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html
On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett wrote: > Well, there is nothing wrong with switching to a different hash function > after N > collisions, rather than "in the first place". The perturbation > effectively does by > shoving the high-order bits through the part of the hash that survives the > mask. > Since these are true hash collisions, they will all have the same high order bits. So, the usefulness of the perturbation is limited mainly to the common case where true collisions are rare. > (Well, technically, you could use trees or some other O log n data > > structure as a fallback once you have too many collisions, for some value > > of "too many". Seems a bit wasteful for the purpose, though.) > > Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > > requires > using a real dictionary for compatibility; storing some of the values > outside the > values array would violate that. When I said "use some other data structure", I was referring to the internal implementation of the dict type, not to user code. The only user-visible difference (even at C API level) would be the order of keys() et al. (In any case, I still assume this is too costly an implementation change compared to changing the hash function or seeding it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html)
On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett wrote: > Given the wording requiring a real dictionary, I would have assumed > that it was OK (if perhaps not sensible) to do pointer arithmetic and > access the keys/values/hashes directly. (Though if the breakage was > between python versions, I would feel guilty about griping too > loudly.) > If you're going to be a language lawyer about it, I would simply point out that all the spec requires is that "type(env) is dict" -- it says nothing about how Python defines "type" or "is" or "dict". So, you're on your own with that one. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html)
On Mon, Jan 2, 2012 at 4:07 PM, Jim Jewett wrote: > On Mon, Jan 2, 2012 at 1:16 AM, PJ Eby wrote: > > On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett > wrote: > >> > >> Given the wording requiring a real dictionary, I would have assumed > >> that it was OK (if perhaps not sensible) to do pointer arithmetic and > >> access the keys/values/hashes directly. (Though if the breakage was > >> between python versions, I would feel guilty about griping too > >> loudly.) > > > If you're going to be a language lawyer about it, I would simply point > out > > that all the spec requires is that "type(env) is dict" -- it says nothing > > about how Python defines "type" or "is" or "dict". So, you're on your > own > > with that one. ;-) > > But the public header file < > http://hg.python.org/cpython/file/3ed5a6030c9b/Include/dictobject.h > > defines the typedef structs for PyDictEntry and _dictobject. > > What is the purpose of the requiring a "real dict" without also > promising what the header file promises? > > Er, just because it's in the .h doesn't mean it's in the public API. But in any event, if you're actually serious about this, I'd just point out that: 1. The struct layout doesn't guarantee anything about insertion or lookup algorithms, 2. If the data structure were changed, the header file would obviously change as well, and 3. ISTM that Python does not even promise inter-version ABI compatibility for internals like the dict object layout. Are you seriously writing code that relies on the C structure layout of dicts? Because really, that was SO not the point of the dict type requirement. It was so that you could use Python's low-level *API* calls, not muck about with the data structure directly. I'm occasionally considered notorious for abusing Python internals, but even I have to draw the line somewhere. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed PEP on concurrent programming support
On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer wrote: > STM is a relatively new technology being experimented with in newer > languages, and in a number of 3rd party libraries (both Peak [#Peak]_ > and Kamaelia [#Kamaelia]_ provide STM facilities). I don't know about Kamaelia, but PEAK's STM (part of the Trellis event-driven library) is *not* an inter-thread concurrency solution: it's actually used to sort out the order of events in a co-operative multitasking scenario. So, it should not be considered evidence for the practicality of doing inter-thread co-ordination that way in pure Python. A suite is marked > as a `transaction`, and then when an unlocked object is modified, > instead of indicating an error, a locked copy of it is created to be > used through the rest of the transaction. If any of the originals are > modified during the execution of the suite, the suite is rerun from > the beginning. If it completes, the locked copies are copied back to > the originals in an atomic manner. > I'm not sure if "locked" is really the right word here. A private copy isn't "locked" because it's not shared. The disadvantage is that any code in a transaction must be safe to run > multiple times. This forbids any kind of I/O. > More precisely, code in a transaction must be *reversible*, so it doesn't forbid any I/O that can be undone. If you can seek backward in an input file, for example, or delete queued output data, then it can still be done. Even I/O like re-drawing a screen can be made STM safe by making the redraw occur after a transaction that reads and empties a buffer written by other transactions. For > instance, combining STM with explicit locking would allow explicit > locking when IO was required, I don't think this idea makes any sense, since STM's don't really "lock", and to control I/O in an STM system you just STM-ize the queues. (Generally speaking.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed PEP on concurrent programming support
On Wed, Jan 11, 2012 at 7:01 PM, Mike Meyer wrote: > On Wed, 4 Jan 2012 00:07:27 -0500 > PJ Eby wrote: > > On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer wrote: > > > For > > > instance, combining STM with explicit locking would allow explicit > > > locking when IO was required, > > I don't think this idea makes any sense, since STM's don't really > > "lock", and to control I/O in an STM system you just STM-ize the > > queues. (Generally speaking.) > > I thought about that. I couldn't convince myself that STM by itself > sufficient. If you need to make irreversible changes to the state of > an object, you can't use STM, so what do you use? Can every such > situation be handled by creating "safe" values then using an STM to > update them? > If you need to do something irreversible, you just need to use an STM-controlled queue, with something that reads from it to do the irreversible things. The catch is that your queue design has to support guaranteed-successful item removal, since if the dequeue transaction fails, it's too late. Alternately, the queue reader can commit removal first, then perform the irreversible operation... but leave open a short window for failure. It depends on the precise semantics you're looking for. In either case, though, the STM is pretty much sufficient, given a good enough queue data structure. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashing proposal: change only string-only dicts
On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. Löwis" wrote: > Am 17.01.2012 22:26, schrieb Antoine Pitrou: > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits > > could cache a "hash perturbation" computed from the string and the > > random bits: > > > > - hash() would use ob_shash > > - dict_lookup() would use ((ob_shash * 103) ^ (ob_sstate & ~3)) > > > > This way, you cache almost all computations, adding only a computation > > and a couple logical ops when looking up a string in a dict. > > That's a good idea. For Unicode, it might be best to add another slot > into the object, even though this increases the object size. > Wouldn't that break the ABI in 2.x? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hashing proposal: change only string-only dicts
On Jan 18, 2012 12:55 PM, Martin v. Löwis wrote: > > Am 18.01.2012 17:01, schrieb PJ Eby: > > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. Löwis" > <mailto:mar...@v.loewis.de>> wrote: > > > > Am 17.01.2012 22:26, schrieb Antoine Pitrou: > > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits > > > could cache a "hash perturbation" computed from the string and the > > > random bits: > > > > > > - hash() would use ob_shash > > > - dict_lookup() would use ((ob_shash * 103) ^ (ob_sstate & ~3)) > > > > > > This way, you cache almost all computations, adding only a computation > > > and a couple logical ops when looking up a string in a dict. > > > > That's a good idea. For Unicode, it might be best to add another slot > > into the object, even though this increases the object size. > > > > > > Wouldn't that break the ABI in 2.x? > > I was thinking about adding the field at the end, so I thought it > shouldn't. However, if somebody inherits from PyUnicodeObject, it still > might - so my new proposal is to add the extra hash into the str block, > either at str[-1], or after the terminating 0. This would cause an > average increase of four bytes of the storage (0 bytes in 50% of the > cases, 8 bytes because of padding in the other 50%). > > What do you think? So far it sounds like the very best solution of all, as far as backward compatibility is concerned. If the extra bits are only used when two strings have a matching hash value, the only doctests that could be affected are ones testing for this issue. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Packaging and setuptools compatibility
2012/1/24 Alexis Métaireau > Entrypoints basically are a plugin system. They are storing information in > the metadata and then retrieving them when needing them. The problem with > this, as everything when trying to get information from metadata is that we > need to parse all the metadata for all the installed distributions. (say > O(N)). > Note that this is why setuptools doesn't put entry points into PKG-INFO, but instead uses separate metadata files. Thus there is a lower "N" as well as smaller files to parse. ;-) Entrypoints are also only one type of extension metadata supported by setuptools; there is for example the EggTranslations system built on setuptools metadata system: it allows plugins to provide translations and localized resources for applications, and for other plugins in the same application. And it does this by using a different metadata file, again stored in the installed project's metadata. Since the new packaging metadata format is still a directory (replacing setuptools' EGG-INFO or .egg-info directories), it seems a reasonable migration path to simply install entry_points.txt and other metadata extensions to that same directory, and provide API to iterate over all the packages that offer a particular metadata file name. Entry points work this way now in setuptools, i.e. they iterate over all eggs containing entry_points metadata, then parse and cache the contents. An API for doing the same sort of thing here seems appropriate. This is still "meta" as Glyph suggests, and allows both setuptools-style entry point plugins, EggTranslations-style plugins, and whatever other sorts of plugin systems people would like. (I believe some other systems exist with this sort of metadata scheme; ISTM that Paster has a metadata format, but I don't know if it's exposed in egg-info metadata like this currently.) Anyway, if you offer an API for finding packages by metadata file (or even just a per-installed-package object API to query the existence of a metadata file), and for process-level caching of extended metadata for installed packages, that is sufficient for the above systems to work, without needing to bless any particular plugin API per se. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Store timestamps as decimal.Decimal objects
On Tue, Jan 31, 2012 at 7:35 PM, Nick Coghlan wrote: > Such a protocol can easily be extended to any other type - the time > module could provide conversion functions for integers and float > objects (meaning results may have lower precision than the underlying > system calls), while the existing "fromtimestamp" APIs in datetime can > be updated to accept the new optional arguments (and perhaps an > appropriate class method added to timedelta, too). A class method > could also be added to the decimal module to construct instances from > integer components (as shown above), since that method of construction > isn't actually specific to timestamps. > Why not just make it something like __fromfixed__() and make it a standard protocol, implemented on floats, ints, decimals, etc. Then the API is just "time.time(type)", where type is any object providing a __fromfixed__ method. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Store timestamps as decimal.Decimal objects
On Jan 31, 2012 11:08 PM, "Nick Coghlan" wrote: > PJE is quite right that using a new named protocol rather than a > callback with a particular signature could also work, but I don't see > a lot of advantages in doing so. The advantage is that it fits your brain better. That is, you don't have to remember another symbol besides the type you wanted. (There's probably fewer keystrokes involved, too.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon wrote: > So, if there is going to be some baseline performance target I need to hit > to make people happy I would prefer to know what that (real-world) > benchmark is and what the performance target is going to be on a non-debug > build. And if people are not worried about the performance then I'm happy > with that as well. =) > One thing I'm a bit worried about is repeated imports, especially ones that are inside frequently-called functions. In today's versions of Python, this is a performance win for "command-line tool platform" systems like Mercurial and PEAK, where you want to delay importing as long as possible, in case the code that needs the import is never called at all... but, if it *is* used, you may still need to use it a lot of times. When writing that kind of code, I usually just unconditionally import inside the function, because the C code check for an already-imported module is faster than the Python "if" statement I'd have to clutter up my otherwise-clean function with. So, in addition to the things other people have mentioned as performance targets, I'd like to keep the slowdown factor low for this type of scenario as well. Specifically, the slowdown shouldn't be so much as to motivate lazy importers like Mercurial and PEAK to need to rewrite in-function imports to do the already-imported check ourselves. ;-) (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import code, so I can't say for 100% sure if they'd be affected the same way.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 5:24 PM, Brett Cannon wrote: > > On Tue, Feb 7, 2012 at 16:51, PJ Eby wrote: > >> On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon wrote: >> >>> So, if there is going to be some baseline performance target I need to >>> hit to make people happy I would prefer to know what that (real-world) >>> benchmark is and what the performance target is going to be on a non-debug >>> build. And if people are not worried about the performance then I'm happy >>> with that as well. =) >>> >> >> One thing I'm a bit worried about is repeated imports, especially ones >> that are inside frequently-called functions. In today's versions of >> Python, this is a performance win for "command-line tool platform" systems >> like Mercurial and PEAK, where you want to delay importing as long as >> possible, in case the code that needs the import is never called at all... >> but, if it *is* used, you may still need to use it a lot of times. >> >> When writing that kind of code, I usually just unconditionally import >> inside the function, because the C code check for an already-imported >> module is faster than the Python "if" statement I'd have to clutter up my >> otherwise-clean function with. >> >> So, in addition to the things other people have mentioned as performance >> targets, I'd like to keep the slowdown factor low for this type of scenario >> as well. Specifically, the slowdown shouldn't be so much as to motivate >> lazy importers like Mercurial and PEAK to need to rewrite in-function >> imports to do the already-imported check ourselves. ;-) >> >> (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import >> code, so I can't say for 100% sure if they'd be affected the same way.) >> > > IOW you want the sys.modules case fast, which I will never be able to > match compared to C code since that is pure execution with no I/O. > Couldn't you just prefix the __import__ function with something like this: ... try: module = sys.modules[name] except KeyError: # slow code path (Admittedly, the import lock is still a problem; initially I thought you could just skip it for this case, but the problem is that another thread could be in the middle of executing the module.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 6:40 PM, Terry Reedy wrote: > importlib could provide a parameterized decorator for functions that are > the only consumers of an import. It could operate much like this: > > def imps(mod): >def makewrap(f): >def wrapped(*args, **kwds): >print('first/only call to wrapper') >g = globals() >g[mod] = __import__(mod) >g[f.__name__] = f >f(*args, **kwds) >wrapped.__name__ = f.__name__ >return wrapped >return makewrap > > @imps('itertools') > def ic(): >print(itertools.count) > > ic() > ic() > # > first/only call to wrapper > > > If I were going to rewrite code, I'd just use lazy imports (see http://pypi.python.org/pypi/Importing ). They're even faster than this approach (or using plain import statements), as they have zero per-call function call overhead. It's just that not everything I write can depend on Importing. Throw an equivalent into the stdlib, though, and I guess I wouldn't have to worry about dependencies... (To be clearer; I'm talking about the http://peak.telecommunity.com/DevCenter/Importing#lazy-imports feature, which sticks a dummy module subclass instance into sys.modules, whose __gettattribute__ does a reload() of the module, forcing the normal import process to run, after first changing the dummy object's type to something that doesn't have the __getattribute__ any more. This ensures that all accesses after the first one are at normal module attribute access speed. That, and the "whenImported" decorator from Importing would probably be of general stdlib usefulness too.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 4:08 PM, Brett Cannon wrote: > > On Wed, Feb 8, 2012 at 15:31, Terry Reedy wrote: > >> For top-level imports, unless *all* are made lazy, then there *must* be >> some indication in the code of whether to make it lazy or not. >> > > Not true; importlib would make it dead-simple to whitelist what modules to > make lazy (e.g. your app code lazy but all stdlib stuff not, etc.). > There's actually only a few things stopping all imports from being lazy. "from x import y" immediately de-lazies them, after all. ;-) The main two reasons you wouldn't want imports to *always* be lazy are: 1. Changing sys.path or other parameters between the import statement and the actual import 2. ImportErrors are likewise deferred until point-of-use, so conditional importing with try/except would break. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Feb 9, 2012 9:58 AM, "Brett Cannon" wrote: > This actually depends on the type of ImportError. My current solution actually would trigger an ImportError at the import statement if no finder could locate the module. But if some ImportError was raised because of some other issue during load then that would come up at first use. That's not really a lazy import then, or at least not as lazy as what Mercurial or PEAK use for general lazy importing. If you have a lot of them, that module-finding time really adds up. Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer wrote: > For those of you not watching -ideas, or ignoring the "Python TIOBE > -3%" discussion, this would seem to be relevant to any discussion of > reworking the import mechanism: > > http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html > > Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. That is, an importer created for a particular directory could, upon first use, cache a frozenset(listdir()), and the stat().st_mtime of the directory. All the filename checks could then be performed against the frozenset, and the st_mtime of the directory only checked once per import, to verify whether the frozenset() needed refreshing. Since a failed module lookup takes at least 5 stat checks (pyc, pyo, py, directory, and compiled extension (pyd/so)), this cuts it down to only 1, at the price of a listdir(). The big question is how long does a listdir() take, compared to a stat() or failed open()? That would tell us whether the tradeoff is worth making. I did some crude timeit tests on frozenset(listdir()) and trapping failed stat calls. It looks like, for a Windows directory the size of the 2.7 stdlib, you need about four *failed* import attempts to overcome the initial caching cost, or about 8 successful bytecode imports. (For Linux, you might need to double these numbers; my tests showed a different ratio there, perhaps due to the Linux stdib I tested having nearly twice as many directory entries as the directory I tested on Windows!) However, the numbers are much better for application directories than for the stdlib, since they are located earlier on sys.path. Every successful stdlib import in an application is equal to one failed import attempt for every preceding directory on sys.path, so as long as the average directory on sys.path isn't vastly larger than the stdlib, and the average application imports at least four modules from the stdlib (on Windows, or 8 on Linux), there would be a net performance gain for the application as a whole. (That is, there'd be an improved per-sys.path entry import time for stdlib modules, even if not for any application modules.) For smaller directories, the tradeoff actually gets better. A directory one seventh the size of the 2.7 Windows stdlib has a listdir() that's proportionately faster, but failed stats() in that directory are *not* proportionately faster; they're only somewhat faster. This means that it takes fewer failed module lookups to make caching a win - about 2 in this case, vs. 4 for the stdlib. Now, these numbers are with actual disk or network access abstracted away, because the data's in the operating system cache when I run the tests. It's possible that this strategy could backfire if you used, say, an NFS directory with ten thousand files in it as your first sys.path entry. Without knowing the timings for listdir/stat/failed stat in that setup, it's hard to say how many stdlib imports you need before you come out ahead. When I tried a directory about 7 times larger than the stdlib, creating the frozenset took 10 times as long, but the cost of a failed stat didn't go up by very much. This suggests that there's probably an optimal directory size cutoff for this trick; if only there were some way to check the size of a directory without reading it, we could turn off the caching for oversize directories, and get a major speed boost for everything else. On most platforms, the stat().st_size of the directory itself will give you some idea, but on Windows that's always zero. On Windows, we could work around that by using a lower-level API than listdir() and simply stop reading the directory if we hit the maximum number of entries we're willing to build a cache for, and then call it off. (Another possibility would be to explicitly enable caching by putting a flag file in the directory, or perhaps by putting a special prefix on the sys.path entry, setting the cutoff in an environment variable, etc.) In any case, this seems really worth a closer look: in non-pathological cases, it could make directory-based importing as fast as zip imports are. I'd be especially interested in knowing how the listdir/stat/failed stat ratios work on NFS - ISTM that they might be even *more* conducive to this approach, if setup latency dominates the cost of individual system calls. If this works out, it'd be a good example of why importlib is a good idea; i.e., allowing us to play with ideas like this. Brett, wouldn't you love to be able to say importlib is *faster* than the old C-based importing? ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 5:34 PM, Robert Kern wrote: > On 2/9/12 10:15 PM, Antoine Pitrou wrote: > >> On Thu, 9 Feb 2012 17:00:04 -0500 >> PJ Eby wrote: >> >>> On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer wrote: >>> >>> For those of you not watching -ideas, or ignoring the "Python TIOBE >>>> -3%" discussion, this would seem to be relevant to any discussion of >>>> reworking the import mechanism: >>>> >>>> http://mail.scipy.org/**pipermail/numpy-discussion/** >>>> 2012-January/059801.html<http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html> >>>> >>>> Interesting. This gives me an idea for a way to cut stat calls per >>>> >>> sys.path entry per import by roughly 4x, at the cost of a one-time >>> directory read per sys.path entry. >>> >> >> Why do you even think this is a problem with "stat calls"? >> > > All he said is that reading about that problem and its solution gave him > an idea about dealing with stat call overhead. The cost of stat calls has > demonstrated itself to be a significant problem in other, more typical > contexts. Right. It was the part of the post that mentioned that all they sped up was knowing which directory the files were in, not the actual loading of bytecode. The thought then occurred to me that this could perhaps be applied to normal importing, as a zipimport-style speedup. (The zipimport module caches each zipfile directory it finds on sys.path, so failed import lookups are extremely fast.) It occurs to me, too, that applying the caching trick to *only* the stdlib directories would still be a win as soon as you have between four and eight site-packages (or user specific site-packages) imports in an application, so it might be worth applying unconditionally to system-defined stdlib (non-site) directories. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon wrote: > > > On Thu, Feb 9, 2012 at 17:00, PJ Eby wrote: > >> I did some crude timeit tests on frozenset(listdir()) and trapping failed >> stat calls. It looks like, for a Windows directory the size of the 2.7 >> stdlib, you need about four *failed* import attempts to overcome the >> initial caching cost, or about 8 successful bytecode imports. (For Linux, >> you might need to double these numbers; my tests showed a different ratio >> there, perhaps due to the Linux stdib I tested having nearly twice as many >> directory entries as the directory I tested on Windows!) >> > >> However, the numbers are much better for application directories than for >> the stdlib, since they are located earlier on sys.path. Every successful >> stdlib import in an application is equal to one failed import attempt for >> every preceding directory on sys.path, so as long as the average directory >> on sys.path isn't vastly larger than the stdlib, and the average >> application imports at least four modules from the stdlib (on Windows, or 8 >> on Linux), there would be a net performance gain for the application as a >> whole. (That is, there'd be an improved per-sys.path entry import time for >> stdlib modules, even if not for any application modules.) >> > > Does this comment take into account the number of modules required to load > the interpreter to begin with? That's already like 48 modules loaded by > Python 3.2 as it is. > I didn't count those, no. So, if they're loaded from disk *after* importlib is initialized, then they should pay off the cost of caching even fairly large directories that appear earlier on sys.path than the stdlib. We still need to know about NFS and other ratios, though... I still worry that people with more extreme directory sizes or slow-access situations will run into even worse trouble than they have now. > First is that if this were used on Windows or OS X (i.e. the OSs we > support that typically have case-insensitive filesystems), then this > approach would be a massive gain as we already call os.listdir() when > PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat > calls and add in 5 listdir() calls and that's what you get on Windows and > OS X right now. Linux doesn't have this check so you would still be > potentially paying a penalty there. > Wow. That means it'd always be a win for pre-stdlib sys.path entries, because any successful stdlib import equals a failed pre-stdlib lookup. (Of course, that's just saving some of the overhead that's been *added* by importlib, not a new gain, but still...) Second is variance in filesystems. Are we guaranteed that the stat of a > directory is updated before a file change is made? > Not quite sure what you mean here. The directory stat is used to ensure that new files haven't been added, old ones removed, or existing ones renamed. Changes to the files themselves shouldn't factor in, should they? > Else there is a small race condition there which would suck. We also have > the issue of granularity; Antoine has already had to add the source file > size to .pyc files in Python 3.3 to combat crappy mtime granularity when > generating bytecode. If we get file mod -> import -> file mod -> import, > are we guaranteed that the second import will know there was a modification > if the first three steps occur fast enough to fit within the granularity of > an mtime value? > Again, I'm not sure how this relates. Automatic code reloaders monitor individual files that have been previously imported, so the directory timestamps aren't relevant. Of course, I could be confused here. Are you saying that if somebody makes a new .py file and saves it, that it'll be possible to import it before it's finished being written? If so, that could happen already, and again caching the directory doesn't make any difference. Alternately, you could have a situation where the file is deleted after we load the listdir(), but in that case the open will fail and we can fall back... heck, we can even force resetting the cache in that event. I was going to say something about __pycache__, but it actually doesn't > affect this. Since you would have to stat the directory anyway, you might > as well just stat directory for the file you want to keep it simple. Only > if you consider __pycache__ to be immutable except for what the interpreter > puts in that directory during execution could you optimize that step (in > which case you can stat the directory once and never care again as the set > would be just updated by import whenever a new .pyc file was written). > > Having said all
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Feb 10, 2012 3:38 PM, "Brett Cannon" wrote: > On Fri, Feb 10, 2012 at 15:07, PJ Eby wrote: >> On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon wrote: >>> First is that if this were used on Windows or OS X (i.e. the OSs we support that typically have case-insensitive filesystems), then this approach would be a massive gain as we already call os.listdir() when PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat calls and add in 5 listdir() calls and that's what you get on Windows and OS X right now. Linux doesn't have this check so you would still be potentially paying a penalty there. >> >> >> Wow. That means it'd always be a win for pre-stdlib sys.path entries, because any successful stdlib import equals a failed pre-stdlib lookup. (Of course, that's just saving some of the overhead that's been *added* by importlib, not a new gain, but still...) > > > How so? import.c does a listdir() as well (this is not special to importlib). IIRC, it does a FindFirstFile on Windows, which is not the same thing. That's one system call into a preallocated buffer, not a series of system calls and creation of Python string objects. > Don't care about automatic reloaders. I'm just asking about the case where the mtime granularity is coarse enough to allow for a directory change, an import to execute, and then another directory change to occur all within a single mtime increment. That would lead to the set cache to be out of date. Ah. Good point. Well, if there's any way to know what the mtime granularity is, we can avoid the race condition by never performing the listdir when the current clock time is too close to the stat(). In effect, we can bypass the optimization if the directory was just modified. Something like: mtime = stat(dir).st_mtime if abs(time.time()-mtime)>unsafe_window: old_mtime, files = cache.get(dir, (-1, ())) if mtime!=old_mtime: files = frozenset(listdir(dir)) cache[dir] = mtime, files # code to check for possibility of importing # and shortcut if found, or # exit with failure if no matching files # fallthrough to direct filesystem checking The "unsafe window" is presumably filesystem and platform dependent, but ISTR that even FAT filesystems have 2-second accuracy. The other catch is the relationship between st_mtime and time.time(); I assume they'd be the same in any sane system, but what if you're working across a network and there's clock skew? Ugh. Worst case example would be say, accessing a FAT device that's been shared over a Windows network from a machine whose clock is several hours off. So it always looks safe to read, even if it's just been changed. What's the downside in that case? You're trying to import something that just changed in the last fraction of a second... why? I mean, sure, the directory listing will be wrong, no question. But it only matters that it was wrong if you added, removed, or renamed importable files. Why are you trying to import one of them? Ah, here's a use case: you're starting up IDLE, and while it's loading, you save some .py files you plan to import later. Your editor saves them all at once, but IDLE does the listdir() midway through. You then do an import from the IDLE prompt, and it fails because the listdir() didn't catch everything. Okay, now I know how to fix this. The problem isn't that there's a race condition per se, the problem is that the race results in a broken cache later. After all, it could just as easily have been the case that the import failed due to timing. The problem is that all *future* imports would fail in this circumstance. So the fix is a time-to-live recheck: if TTL seconds have passed since the last use of the cached frozenset, reload it, and reset the TTL to infinity. In other words: mtime = stat(dir).st_mtime now - time.time() if abs(now-mtime)>unsafe_window: old_mtime, then, files = cache.get(dir, (-1, now, ())) if mtime!=old_mtime or then is not None and now-then>TTL: files = frozenset(listdir(dir)) cache[dir] = mtime, now if mtime!=old_mtime else None, files # code to check for possibility of importing # and shortcut if found, or # exit with failure if no matching files # fallthrough to direct filesystem checking What this does (or should do) is handle clock-skew race condition stale caches by reloading the listdir even if mtime hasn't changed, as soon as TTL seconds have passed since the last snapshot was taken. However, if the mtime stays the same, no subsequent listdirs will occur. As long as the TTL is set high enough that a full startup of Python can occur, but low enough that it resets by the time a hum
Re: [Python-Dev] [Python-checkins] cpython: Issue #14043: Speed up importlib's _FileFinder by at least 8x, and add a new
On Mon, Feb 20, 2012 at 1:20 PM, Brett Cannon wrote: > On Sun, Feb 19, 2012 at 22:15, Nick Coghlan wrote: > >> However, "very cool" on adding the caching in the default importers :) > > > Thanks to PJE for bringing the idea up again and Antoine discovering the > approach *independently* from PJE and myself and actually writing the code. > Where is the code, btw? (I looked at your sandbox and didn't see it.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414
On Sat, Mar 3, 2012 at 5:02 AM, Lennart Regebro wrote: > I'm not sure that's true at all. In most cases where you support both > Python 2 and Python 3, most strings will be "native", ie, without > prefix in either Python 2 or Python 3. The native case is the most > common case. > Exactly. The reason "native strings" even exist as a concept in WSGI was to make it so that the idiomatic manipulation of header data in both Python 2 and 3 would use plain old string constants with no special wrappers or markings. What's thrown the monkey wrench in here for the WSGI case is the use of unicode_literals. If you simply skip using unicode_literals for WSGI code, you should be fine with a single 2/3 codebase. But then you need some way to mark some things as unicode... which is how we end up back at this PEP. I suppose WSGI could have gone the route of using byte strings for headers instead, but I'm not sure it would have helped. The design goals for PEP were to sanely support both 2to3 and 2+3 single codebases, and WSGI does actually do that... for the code that's actually doing WSGI stuff. Ironically enough, the effect of the WSGI API is that it's all the *non* WSGI-specific code in the same module that ends up needing to mark its strings as unicode... or else it has to use unicode_literals and mark all the WSGI code with str(). There's really no good way to deal with a *mixed* WSGI/non-WSGI module, except to use explicit markers on one side or the other. Perhaps the simplest solution of all might be to just isolate direct WSGI code in modules that don't import unicode_literals. Web frameworks usually hide WSGI stuff away from the user anyway, and many are already natively unicode in their app-facing APIs. So, if a framework or library encapsulates WSGI in a str-safe/unicode-friendly API, this really shouldn't be an issue for the library's users. But I suppose somebody's got to port the libraries first. ;-) If anyone's updating porting strategy stuff, a mention of this in the tips regarding unicode_literals would be a good idea. i.e., something like: "If you have 2.x modules which work with WSGI and also contain explicit u'' strings, you should not use unicode_literals unless you are willing to explicitly mark all WSGI environment and header strings as native strings using 'str()'. This is necessary because WSGI headers and environment keys/values are defined as byte strings in Python 2.x, and unicode strings in 3.x. Alternatively, you may continue to use u'' strings if you are targeting Python 3.3+ only, or can use the import or install hooks provided for Python 3.2, or if you are using 2to3... but in this case you should not use unicode_literals." That could probably be written a lot more clearly. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Non-string keys in type dict
On Wed, Mar 7, 2012 at 8:39 PM, Victor Stinner wrote: > So my question is: what is the use case of such dict? Well, I use them for this: http://pypi.python.org/pypi/AddOns (And I have various other libraries that depend on that library.) Short version: AddOns are things you can use to dynamically extend instances -- a bit like the "decorator" in "decorator pattern" (not to be confused with Python decorators). Rather than synthesize a unique string as a dictionary key, I just used the AddOn classes themselves as keys. This works fine for object instances, but gets hairy once classes come into play. ( http://pypi.python.org/pypi/AddOns#class-add-ons - an orthogonal alternative to writing hairy metaclasses with registries for special methods, persisted attributes, and all other sorts of things one would ordinarily use metaclasses for.) In principle, I could refactor AddOns to use synthetic (i.e. made-up) strings as keys, but it honestly seemed unpythonic to me to make up a key when the One Obvious key to use is the AddOn type itself. (Or in some cases, a tuple comprised of an AddOn type plus additional values - which would mean string manipulation for every access.) Another possible solution would be to not store addons directly in a class' dictionary, but instead throw in an __addons__ key with a subdictionary; again this seemed like pointless indirection, wasted memory and access time when there's already a perfectly good dictionary lying about. IOW, it's one of those places where Python's simple orthogonality seems like a feature rather than a bug that needs fixing. I mean, next thing you know, people will be saying that *instance* dictionaries need to have only string keys or something. ;-) Of course, if my library has to change to be able to work on 3.3, then I guess it'll have to change. IIRC, this is *probably* the only place I'm using non-string keys in type or instance dictionaries, so in the big scheme of porting costs, it's not that much. But, since you asked, that's the main use case I know of for non-string keys in type dictionaries, and I wouldn't be terribly surprised if I'm the only person with public code that does this. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Non-string keys in type dict
On Thu, Mar 8, 2012 at 2:43 AM, Ethan Furman wrote: > > PJ Eby wrote: >> >> Short version: AddOns are things you can use to dynamically extend instances -- a bit like the "decorator" in "decorator pattern" (not to be confused with Python decorators). Rather than synthesize a unique string as a dictionary key, I just used the AddOn classes themselves as keys. This works fine for object instances, but gets hairy once classes come into play. > > > Are you able to modify classes after class creation in Python 3? Without using a metaclass? For ClassAddOns, it really doesn't matter; you can't remove them from the class they attach to. Addons created after the class is finalized use a weakref dictionary to attach to their classes. Now that I've gone back and looked at the code, the only reason that ClassAddOns even use the class __dict__ in the first place is because it's a convenient place to put them while the class is being built. With only slightly hairier code, I could use an __addons__ dict in the class namespace while it's being built, but there'll then be a performance hit at look up time to do cls.__dict__['__addons__'][key] instead of cls.__dict__[key]. Actually, now that I'm thinking about it, the non-modifiability of class dictionaries is actually a feature for this use case: if I make an __addons__ dict, that dict is mutable. That means I'll have to move to string keys or have some sort of immutable dict type available... ;-) (Either that, or do some other, more complex refactoring.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Zipping the standard library.
On Sat, Mar 10, 2012 at 5:49 PM, Thomas Wouters wrote: > (And, yes, I'm zipping up the stdlib for Python 2.7 at Google, to reduce > the impact on the aforementioned million of machines :) > You might want to consider instead backporting the importlib caching facility, since it provides some of the zipimport benefits for plain old, non-zipped modules. Actually, a caching-only import hook that operated that way wouldn't even need the whole of importlib, just a wrapper over the standard C import that skips the unnecessary filesystem accesses. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Fwd: [Import-SIG] Where to discuss PEP 382 vs. PEP 402 (namespace packages)?
Ugh; this was supposed to be sent to the list, not just Guido. (I wish Gmail defaulted to reply-all in the edit box.) -- Forwarded message -- From: PJ Eby Date: Mon, Mar 12, 2012 at 12:16 AM Subject: Re: [Import-SIG] Where to discuss PEP 382 vs. PEP 402 (namespace packages)? To: Guido van Rossum On Sun, Mar 11, 2012 at 10:39 PM, Guido van Rossum wrote: > I'm leaning towards PEP 402 or some variant. Let's have a pow-wow at > the sprint tomorrow (I'll arrive in Santa Clara between 10 and 10:30). > I do want to understand Nick's argument better; I haven't studied PEP > 395 yet. > Note that PEP 395 can stay compatible with PEP 402 by a fairly straightforward change: instead of implicitly and automagically guessing the needed sys.path[0] change, it could be made explicit by adding something like this to the top of script/modules that are inside a package: import pkgutil pkgutil.script_module(__name__, 'mypackage.thismodule') Assuming __name__=='__main__', the API would set __main__.__qualname__, set sys.modules[qualname] = __main__, and fix up sys.path[0] if and only if it still is the parent directory of __main__.__file__. (If __name__!=='__main__' and it's not equal to the second argument either, it'd be an error.) Then, in the event of broken relative imports or module aliasing, the error message can suggest adding a script_module() declaration to explicitly make the file a "dual citizen" -- i.e., script/module. (It's already possible for PEP 395 to be confused by stray __init__.py files or __path__ manipulation; using error messages and explicit declaration instead of guessing seems like a better route for 395 to take.) Of course, it's also possible to fix the 395/402 incompatibility by reintroducing some sort of marker, such as .pyp directory extensions or by including *.pyp marker files within package directories. The problem is that these markers work against the intuitive nature of PEP 402 if they are required, and they do not help 395 if nobody uses them due to their optionality. ;-) (Last, but not least, the compromise approach: allow explicit script/module declaration as a workaround for virtual packages, AND support automagic __qualname__ recognition for self-contained packages... but still give error messages for broken relative imports and aliasing that suggest the explicit declaration.) Anyway, the other open issues for 402 are: * Dealing with updates to sys.path * Iterating available virtual packages There was a Python-Dev discussion about the first, in which I realized that sys.path updates can actually be handled transparently by making virtual __path__ objects be special iterables rather than lists; but the PEP hasn't been updated to reflect that. (I was actually waiting for some sign of BDFL interest before adding a potential complication like that to the PEP.) The relevant proposal was: > This seems to lean in favor of making a simple reiterable wrapper > type for the __path__, that only allows you to take the length and > iterate over it. With an appropriate design, it could actually > update itself automatically, given a subname and a parent > __path__/sys.path. That is, it could keep a tuple copy of the > last-seen parent path, and before iteration, compare > tuple(self.parent_path) to self.last_seen_path. If they're > different, it rebuilds the value to be iterated over. > Voila: transparent updating of all virtual __path__ values from > sys.path changes (or modifications to self-contained __path__ > parents, btw), and trying to change it (or read an item from it > positionally) will not create any silent failures. > Alright... *if* we support automatic updates to virtual __paths__, > this is probably how we should do it. (It will require, though, that > imp.find_module be changed to use a different iteration method than > PyList_GetItem, as it's quite possible a virtual __path__ will get > passed into it.) I actually drafted an implementation of this to work with importlib, so it seems pretty feasible to support automatically-updated virtual paths that change on the next import attempt if sys.path (or any parent __path__) has changed since the last time. Iterating virtual packages is a somewhat harder problem, since it's not really practical to do an unbounded subdirectory search for importable files. Probably, the pkgutil module-walking APIs just need to grow some extra flags for virtual package searching, with some reasonable defaults. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SocketServer issues
On Wed, Mar 14, 2012 at 5:02 AM, Antoine Pitrou wrote: > On Wed, 14 Mar 2012 04:26:16 + > Kristján Valur Jónsson wrote: > > Hi there. > > I want to mention some issues I've had with the socketserver module, and > discuss if there's a way to make it nicer. > > So, for a long time we were able to create magic stackless mixin classes > for > > it, like ThreadingMixIn, and assuming we had the appropriate socket > > replacement library, be able to use it nicely using tasklets. > > I don't really think the ability to "create magic stackless mixin > classes" should be a driving principle for the stdlib. > But not needlessly duplicating functionality already elsewhere in the stdlib probably ought to be. ;-) > So, my first question is: Why not simply rely on the already built-in > timeout > > support in the socket module? > > In case you didn't notice, the built-in timeout support *also* uses > select(). > That's not really the point; the frameworks that implement nonblocking I/O by replacing the socket module (and Stackless is only one of many) won't be using that code. If SocketServer uses only the socket module's API, then those frameworks will be told about the timeout via the socket API, and can then implement it their own way. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SocketServer issues
On Wed, Mar 14, 2012 at 12:29 PM, Antoine Pitrou wrote: > On Wed, 14 Mar 2012 12:17:06 -0400 > PJ Eby wrote: > > That's not really the point; the frameworks that implement nonblocking > I/O > > by replacing the socket module (and Stackless is only one of many) won't > be > > using that code. > > Then they should also replace the select module. > That actually sounds like a good point. ;-) I'm not the maintainer of any of those frameworks, but IIRC some of them *do* replace it. Perhaps this would solve Stackless's problem here too? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] svn.python.org and buildbots down
On Mar 19, 2012 1:20 PM, "Ned Deily" wrote: > > In article <20120319142539.7e83c...@pitrou.net>, > Antoine Pitrou wrote: > > [...] As for svn.python.org, is anyone > > using it? > > The repo for the website (www.python.org) is maintained there. It's also still setuptools' official home, though I've been doing some work recently on migrating it to hg. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com