Re: [Python-Dev] The fate of 3.0.*
>So what are the expected efforts for 3.1? >- io-in-C >- import-in-Python >- ... anything else? A fixed "email" module. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pyc files, constant folding and borderline portability issues
On 07/04/2009, at 7:27 AM, Guido van Rossum wrote: On Mon, Apr 6, 2009 at 7:28 AM, Cesare Di Mauro wrote: The Language Reference says nothing about the effects of code optimizations. I think it's a very good thing, because we can do some work here with constant folding. Unfortunately the language reference is not the only thing we have to worry about. Unlike languages like C++, where compiler writers have the moral right to modify the compiler as long as they stay within the weasel-words of the standard, in Python, users' expectations carry value. Since the language is inherently not that fast, users are not all that focused on performance (if they were, they wouldn't be using Python). Unsurprising behavior OTOH is valued tremendously. Rather than trying to get the optimizer to guess, why not have a "const" keyword and make it explicit? The result would be a symbol that essentially only exists at compile time - references to the symbol would be replaced by the computed value while compiling. Okay, maybe that would suck a bit (no symbolic debug output). Yeah, I know... take it to python-wild-and-ill-considered-id...@python.org . ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issues with Py3.1's new ipaddr
On 03/06/2009, at 3:56 AM, Jean-Paul Calderone wrote: On Tue, 02 Jun 2009 19:34:11 +0200, "\"Martin v. Löwis\"" > wrote: [snip] You seem comfortable with these quirks, but then you're not planning to write software with this library. Developers who do intend to write meaningful network applications do seem concerned, yet we're ignored. I don't hear a public outcry - only a single complainer. Clay repeatedly pointed out that other people have objected to ipaddr and been ignored. It's really, really disappointing to see you continue to ignore not only them, but the repeated attempts Clay has made to point them out. I don't have time to argue this issue, but I agree with essentially everything Clay has said in this thread, and I commented about these problems on the ticket months ago, before ipaddr was added. Indeed... "Me too" - I've been quietly concerned with these issues, but have have not said anything as Clay's postings pretty much cover it (and swine flu response is trumping all my other priorities). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issues with Py3.1's new ipaddr
On 03/06/2009, at 12:39 PM, Guido van Rossum wrote: I'm disappointed in the process -- it's as if nobody really reviewed the API until it was released with rc1, and this despite there being a significant discussion about its inclusion and alternatives months ago. (Don't look at me -- I wouldn't recognize a netmask if it bit me in the behind, and I can honestly say that I don't know whether /8 means to look only at the first 8 bits or whether it means to mask off the last 8 bits.) I hope we can learn from this. When including third-party modules into the standard library, we've generally only included them after they have broad acceptance in the community. In this case, however, it seems that while the ipaddr module had acceptance within Google, it had not had much exposure to the broader python community. I think if anyone other than Guido had proposed adding the module to the standard library, we would not have even considered it until it had spent some time standing on it's own two feet. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>I believe PEP 3144 is ready for your review. When you get a chance, >can you take a look/make a pronouncement? In my experience it is common to leave out the masked octets when referring to an IPv4 network (the octets are assumed to be zero), so I don't agree with this behaviour from the reference implementation: >>> ipaddr.IPv4Network('10/8') IPv4Network('0.0.0.10/8') >>> ipaddr.IPv4Network('192.168/16') Traceback (most recent call last): File "", line 1, in File "/usr/src/py/ipaddr/ipaddr.py", line 1246, in __init__ raise IPv4IpValidationError(addr[0]) ipaddr.IPv4IpValidationError: '192.168' is not a valid IPv4 address I also couldn't see an easy way to get from a network address to the containing network. For example: >>> ipaddr.IPv4Network('192.168.1.1/16') IPv4Network('192.168.1.1/16') This is close: >>> ipaddr.IPv4Network('192.168.1.1/16').network IPv4Address('192.168.0.0') What I want is a method that returns: IPv4Network('192.168.0.0/16') I appreciate these requests are somewhat contradictory (one calls for masked octets to be insignificant, the other calls for them to be significant), but they are both valid use cases in my experience. Apologies if these have already been covered in prior discussion - I've tried to keep up, but I haven't been able to give it the attention it deserves. I also note that many methods in the reference implementation are not discussed in the PEP. While I don't consider this a problem for the PEP, anyone reviewing the module for inclusion in the standard lib needs to consider them. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>>> I don't see any valid reason for entering a network as "192.168.1.1/24" >>> rather than the canonical "192.168.1.0/24". The former might indicate a >>> typing error or a mental slip, so let's be helpful and signal it to the >>> user. >> >> Or perhaps there can be an optional "strict=True" (or "strict=False") >> argument to the constructor / parsing function. > >I can live w/ a default of strict=False. there are plenty of cases >where it's not an error and easy enough ways to check, if the >developer is concerned, with or without an option. eg if addr.ip != >addr.network: I agree - there are definitely times when it is not an error, but I don't like the idea of a "strict" flag. I've done a bit of everything - router configs with a national ISP, scripts to manage host configuration, user interfaces, you name it. The way I see it, we need: * Two address classes that describe a single IP end-point - "Address" with no mask and "AddressWithMask" (the later being the current Network class, minus the container-like behaviour). * A "Network" container-like class. Same as the current Network class, but addresses with masked bits would be considered an error. This is along the lines that RDM was suggesting, except that we remove the container behaviour from AddressWithMask. Additionally: * The .network attribute on an AddressWithMask would return a Network instance. * An Address class would not have a .network attribute * Network.__contains__() would accept Network, Address and AddressWithMask. Only Network implements __contains__ - an AddressWithMask can't contain another address, although it's .network can. * Maybe an Address should compare equal with an AddressWithMask if the address is identical and the mask is equivalent to /32? Personally, I don't see a strong use-case for the list-like indexing and iteration behaviour - I think it's enough to implement some basic container behaviour, but I won't object to the iterator and indexing, provided they don't distort the rest of the design (which I fear they are doing now). Iterating or indexing a network should return Address or AddressWithMask instances - if the later, the mask should match the parent network's mask. I'm not particularly wedded to the name "AddressWithMask" - maybe it could be NetworkAddress or MaskedAddress or ? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>R. David Murray wrote: > >> A network is conventionally represented by an IP address in which the >> bits corresponding to the one bits in the netmask are set to zero, plus >> the netmask. > >Okay, that's clarified things for me, thanks. Put another way, an "Address" describes a single end-point and a "Network" describes a set of (contiguous) Addresses. Where things have become confused is that, for practical reasons, it is convenient to have a representation for an Address and it's containing Network (the later can be derived from the Address and a mask). We tried to make the current Network entity do double-duty, but it is just leading to confusion. This is why I proprose there be three entities: * an Address entity (same as the current one) * a Network entity (like now, but requires masked bits to be zero) * an AddressWithMask entity (existing Network, but no container behaviour) There is a school of thought that says we only need a single class that behaves like the current Network entity - end-points are simply represented by an all-ones mask. This is, I think, where we started. But this scheme was rejected. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>> Some people have claimed that the gateway address of a >> network isn't necessarily the zero address in that network. It almost never is - conventions vary, but it is often the network address plus one, or the broadcast address minus one. >I'll go further: I don't think it's even legal for the gateway address to be >the zero address of the network (and I used to program the embedded software >in routers for a living :) ). I don't think the RFCs forbid the zero address being used, and "enlightened" network stacks allow it (typically routers) to achieve better utilisation of the limited IPv4 address space (for a /24 or larger, wasting one address out of 255 isn't too bad, but it is now typical to use much smaller nets - right down to /30). >> If that's true, then you *can't* calculate the network >> address from a host address and a netmask -- there isn't >> enough information. You can always calculate the network address from the IP address plus mask - the network address is simply the bits that are not masked. In the olden days, the mask was spelled out in octets (eg 255.255.255.0). But we've moved to a more compact and logical notation where the number of leading significant bits is specified (eg /24). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>This proposal actually leads to 6 entities (3 for IPv4 and 3 for IPv6). Yes, I know - I was just trying to keep to the point. >It's still unclear to me what is gained by pulling AddressWithMask >functionality out of the current network classes. It's easy enough for >the concerned developer who to check if the entered network address >does actually have all of its host bits set to zero. It is not my >experience that this behavior is desired so often that having the >network classes behave as they do now leads to a great deal of >confusion. I think we're in a painful middle ground now - we should either go back to the idea of a single class (per protocol), or make the distinctions clear (networks are containers and addresses are singletons). Personally, I think I would be happy with a single class (but I suspect that's just my laziness speaking). However, I think the structure and discipline of three classes (per protocol) may actually make the concepts easier to understand for non-experts. A particular case in point - if you want to represent a single IP address with netmask (say an interface), you use a Network class, not an Address class. And the .network attribute returns a Address class! The reason I suggest having the Network class assert that masked bits be zero is two-fold: * it ensures the correct class is being used for the job * it ensures application-user errors are detected as early as possible I also suggest the AddressWithMask classes not have any network/container behaviours for a similar reason. If the developer needs these, the .network attribute is only a lookup away. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>> I think we're in a painful middle ground now - we should either go back >> to the idea of a single class (per protocol), or make the distinctions >> clear (networks are containers and addresses are singletons). >> >> Personally, I think I would be happy with a single class (but I suspect >> that's just my laziness speaking). However, I think the structure and >> discipline of three classes (per protocol) may actually make the concepts >> easier to understand for non-experts. > >I think this is where we disagree. I don't think the added complexity >does make it any easier to understand. I argue that we're not actually adding any complexity: yes, we add a class (per protocol), but we then merely relocate functionality to clarify the intended use of the classes. >> A particular case in point - if you want to represent a single IP address >> with netmask (say an interface), you use a Network class, not an Address >> class. And the .network attribute returns a Address class! > >Right, and I don't see where the confusion lies. I suggest you are too close to the implementation to be surprised by it. 8-) >You have an address + netmask. ergo, you have a Network object. In a common use case, however, this instance will not represent a network at all, but an address. It will have container-like behaviour, but it should not (this is a property of networks, not addresses). So the instance will be misnamed and have behaviours that are, at best, misleading. >The single address that defines the base address (most commonly referred >to as the network address) is an Address object. there is no netmask >associated with that single address, ergo, it's an Address object. I would argue that a Network never has a single address - by definition, it has two or more nodes. A .network attribute should return a Network instance. If you want the base address, this probably should be called .base_address or just .address (to parallel the .netmask attribute). >> The reason I suggest having the Network class assert that masked bits be >> zero is two-fold: >> >> * it ensures the correct class is being used for the job >> * it ensures application-user errors are detected as early as possible >> >> I also suggest the AddressWithMask classes not have any network/container >> behaviours for a similar reason. If the developer needs these, the >> .network attribute is only a lookup away. > >the problem I have with this approach is that it seems like a long way >to go for a shortcut (of checking if addr.ip != addr.network: raise >Error). This isn't about shortcuts, but about correctness... having the Network object represent a network, and having Address objects represent end-points, and having errors discovered as early as possible. What I'm arguing here is that singletons should not simultaneously be containers - it's not pythonic, and it leads to ambiguity. The underlying IP concepts don't require it either: an IP address is a singleton, a network is a container, and there is no overlap. Yes, an address may be a member of a network, and having a reference to that network on the address object is valuable, but the address should not behave like a network. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>Another way to approach this would be for the Address object to >potentially have a 'network' attribute referencing a Network object. Yes - that's reasonable. >Then there are only two classes, but three use cases are covered: > >1) a Network > >2) a plain, network-agnostic Address with network == None > >3) an Address with an attached Network > >An Address could be constructed in three ways: > > Address(ip_number) > > Address(ip_number, network = ) > > Address(ip_number, mask = ) > # constructs and attaches a suitably-masked Network instance I think you still need to support the common notations: Address('10.0.0.1') # .network == None Address('10.0.0.1/255.255.255.0') Address('10.0.0.1/24') >We could also have some_network[n] return an Address >referring back to the network object it was obtained >from. Yes. (Of course, we're simplifying - there would really be classes for each protocol). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>> I argue that we're not actually adding any complexity: yes, we add >> a class (per protocol), but we then merely relocate functionality to >> clarify the intended use of the classes. > >And I argue the moving this functionality to new classes (and adding >new restrictions to existing classes) doesn't buy anything in the way >of overall functionality of the module or a developer's ability to >comprehend intended uses. It's mostly just minor refactoring and renaming, which I think makes things clearer, although I agree this is merely an opinion. I would be interest to hear what others think. To summarise: * an address is a singleton (a network endpoint), with no container behaviour. It may optionally reference it's network (via the .network attribute), .address returns mask-less address. * a network is a container-like object. For consistency, .network should return self and raise an exception if the mask conflicts with the address, .address returns the base address, .mask returns an address object. >> I would argue that a Network never has a single address - by definition, >> it has two or more nodes. A .network attribute should return a Network >> instance. If you want the base address, this probably should be called >> .base_address or just .address (to parallel the .netmask attribute). > >.network is shorthand for network address. are .network_address and >.broadcast_address less confusing? I have to say, though, >.network/.broadcast are fairly common (IPy uses .net, netaddr and ipv4 >use, IIRC .network...) Yes, I understand your motivation, but I still think it's going to be more confusing the way you have it. >> This isn't about shortcuts, but about correctness... having the Network >> object represent a network, and having Address objects represent >> end-points, and having errors discovered as early as possible. > >Then what I don't see is the purpose of your >network-only-network-object. essentially identical functionality can >be obtained with the module as is w/o the added complexity of new >classes. Certainly, I'm not talking about adding functionality. What I am suggesting is that if we wish to have a distinction between networks and addresses, then that distinction should be clear and strong, such that the choice of which to use is obvious, and if the wrong one is used, the error is discovered as early as possible. As the module stands, we have a pair of address-without-mask classes called *Address, and a pair of address-with-mask classes called *Network. So, sometimes when you want to record an *address* you use a class called Network, and that class comes with a behaviours that make no sense in the context of a singleton network end-point (it can't "contain" other addresses, although it's .network can). Sorry if I sound like a cracked record - these are subtle concepts, and my ability to explain what I mean is less than is needed, but we'll get there in the end. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
> > As the module stands, we have a pair of address-without-mask classes > > called *Address, and a pair of address-with-mask classes called > > *Network. So, sometimes when you want to record an *address* you use > > a class called Network, and that class comes with a behaviours that > > make no sense in the context of a singleton network end-point (it can't > > "contain" other addresses, although it's .network can). > >I'm going to consistently use "address" to mean a singleton and >"network" to mean a container in the following. Ta. I think it's useful to have a common terminology. >I still don't see why an address-with-mask is useful, except that the >network is deducible as {'network': address & mask, 'mask': mask}. Is >there *any* other way you would *ever* use that? > >It seems to me that for some purposes (implementing dig(1), for >example), an IPv4Address can contain only the address (ie, a 32-bit >integer) as a data attribute, and (with methods for using that >attribute) that is the minimal implementation of IPv4Address. > >However, there are other cases (eg, routing) where it's useful to >associate an address with its network, and I don't see much harm in >doing so by adding a 'network' attribute to the base class >IPv4Address, since addresses are hardly useful except in the context >of networks. Of course that attribute is often going to be None (eg, >in implementing dig(1) the remote nameserver is unlikely to tell you >the netmask). However, when iterating over an IPv4Network, the >iterator can automatically fill in the 'network' attribute, and that's >fairly cheap. Conceptually, you sometimes need a bare address, and other times, you need an address with an associated network (host interface configs, router configs, etc). By AddressWithMask, I really mean AddressWithEnoughInformationToDeriveNetworkWhenNeeded. Conveniently, IPv4 and IPv6 addressing allows us to derive the network from the host address combined with the netmask - in other words, we don't have to attach a real Network object to Address objects until the user tries to access it, and then we derive it from the address and mask. >While to me neither the 'network' attribute nor the iterator behavior >just described seems amazing useful in the base classes, it seems to >me that precisely those behaviors will be reinvented over and over >again for derived classes. Furthermore they are natural enough that >they won't bother people who don't need them. (That's despite at >least one person (IIRC it was Antoine) firmly saying "an IPv4Address >should contain exactly one 32-bit int, no more, no less", so I could >be wrong.) If you have a .network attribute on an address object, checking if an address is in the same network as another address becomes: addr_a in addr_b.network As the module stands, you write that as: addr_a in addr_b I don't think the intent is as clear with the later. >It seems to me that the only good reason for not having a >'network' attribute that contains an IPv4Network instance or None is >efficiency: the space for the attribute and the overhead of filling it >in the iterator. I personally can't think of an application that >would care (from what I hear, Cisco has no interest in writing its >routers' IP stacks in Python, amazingly enough), but in theory ... The implementation already lazily creates most things like this. >Finally, I agree that using IPv4Network as address-with-mask is a >confusing, undiscoverable abuse. In particular, I think that every >time I went a week without using that idiom, I'd get nervous when I >saw it again: "Are you *sure* that won't raise an error or silently >get the lower bits masked off?! If not now, in the next version?" Yes. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>On Thu, 17 Sep 2009 10:41:37 am Andrew McNamara wrote: >> In the olden days, the mask was spelled out in octets (eg >> 255.255.255.0). But we've moved to a more compact and logical >> notation where the number of leading significant bits is specified >> (eg /24). > >I hope you're not suggesting the older notation be unsupported? I would >expect to be able to use a mask like 255.255.255.192 without having to >count bits myself. No, of course not - I was just explaining the relationship between the two notations for people who may not have been aware. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>> Conceptually, you sometimes need a bare address, and other times, >> you need an address with an associated network (host interface >> configs, router configs, etc). By AddressWithMask, I really mean >> AddressWithEnoughInformationToDeriveNetworkWhenNeeded. Conveniently, >> IPv4 and IPv6 addressing allows us to derive the network from the >> host address combined with the netmask - in other words, we don't >> have to attach a real Network object to Address objects until the >> user tries to access it, and then we derive it from the address and >> mask. > >To clarify: when you say "derive the network" are you talking about the >network (which is a container) or the network address = host_address & >netmask (which is not a container)? I think you're referring to the >later. I mean a Network object which is a container (which, by definition, has a network address + mask). >If there's need for address+netmask, does it need to be a separate >class? Perhaps Address objects could simply have a netmask property, >defaulting to None. If you need an "address with mask" object, you >create an Address and set the mask: > >addr = Address(...) >addr.netmask = "255.255.255.0" Greg Ewing suggested this yesterday - I'm neutral on whether it's done this way or as a separate class. The implementation may be somewhat cleaner if it's a separate class, however. >> If you have a .network attribute on an address object, checking if an >> address is in the same network as another address becomes: >> >> addr_a in addr_b.network >> >> As the module stands, you write that as: >> >> addr_a in addr_b >> >> I don't think the intent is as clear with the later. > >I would find the later completely unclear and disturbing -- how can one >address contain another address? Yes - that's how it works now, and I can only see it resulting in confusion and bugs for no advantage. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>To a non-specialist, "the network address" is ambiguous. There are many >addresses in a network, and none of them are the entire network. It's >like saying, given a list [2, 4, 8, 12], what's "the list item"? A "network address" is an IP address and mask, but I understand your confusion - we're mixing terminology from disperate domains. In my postings, I have tried to refer to Network (a containter) and Address (an item). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] conceptual clarity
>off to patch the pep and implement some of the non controversial changes. It might be a good idea to add some use-cases to the PEP. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] conceptual clarity
>Again, the same error-catching functionality can be obtained through >an option to the constructor. network and broadcast attributes can be >renamed to .\1_address to alleviate confusion as well. > >I mentioned before that IPy's insistence on receiving masked out >networks was one of the main reasons I wrote ipaddr to begin with. >Having ipaddr mimic this behavior would make it significantly less >useful. Removing functionality in the name of avoiding confusion >doesn't make sense when the same confusion can be alleviated w/o the >loss. The issue is bigger than error checking - I'm maintaining that a distinction between an Address (singleton, item) and a Network (Container) is useful and should be embraced. The current implementation has already partially gone this route, but hasn't completed the transition, and this does not give the structure to users that it could - there's an obligation on modules in the standard library to provide leadership and clarity without being dictatorial. They are essentially silent mentors. So, while I am not suggesting we build a bondage and discipline machine, I am suggesting that partitioning the functionality differently will result in a better outcome all round. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] conceptual clarity
>> It might be a good idea to add some use-cases to the PEP. > >There are several use-cases in the PEP already. Maybe the use-cases deserve their own section in the PEP, or better yet, be pulled up into the Motivation section. >The problem is, for every use-case where one can show that the >existing implementation is confusing, I can come up with a use-case >showing where the existing implementation makes more sense than >anything proposed. Uh, I don't think that is the intention of use-cases - they're there to inform the design, rather than to show how a specific implementation can be used. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review.
>On Fri, 18 Sep 2009 11:04:46 am Andrew McNamara wrote: >> >To a non-specialist, "the network address" is ambiguous. There are >> > many addresses in a network, and none of them are the entire >> > network. It's like saying, given a list [2, 4, 8, 12], what's "the >> > list item"? >> >> A "network address" is an IP address and mask, but I understand your >> confusion - we're mixing terminology from disperate domains. In my >> postings, I have tried to refer to Network (a containter) and Address >> (an item). > >So to clarify, how many different things which need to be handled are >there? > >Items: >1 IP address -- a 32 bit (IPv4) or 128 bit (IPv6) number Yes. >2 Netmask -- a bit mask of the form 111..100..0 I don't think there's much to be gained by exposing a Netmask object, although other objects might have a .netmask property returning an IPAddress instance. Where we expose a netmask, it should be as an Address instance (or maybe a subclass with additional restrictions). >3 Network address -- the lowest address in a network, and equal > to (defined by?) the bitwise-AND of any address in the network > with the network's netmask This idea of a "network address" being simply an IP address is in error - a network address was always an address and a mask, however in the days prior to CIDR, the mask was implicitly specified by the class of the network. >4 Host address -- the part of the IP address that is not masked > by the netmask Well, yes, but I don't think we need an entity representing that. >5 Broadcast address -- the highest address in a IPv4 network Yes, but again, we don't need an entity - as with the netmask, when exposed, it should just be an Address instance (or subclass thereof). >Containers: >6 Network -- a range of IP address Yes, although not an arbitrary or discontinuous range of address. Really, I think we just need two entities (per protocol): Address (& maybe AddressWithMask) * If no mask is specified, this is just the IP address. * If a mask is specified, then it gains a .network property returning a Network instance. It probably should also have a .netmask property containing an Address instance. Network * Has an IP address with netmask * for consistency's sake, masked address bits are not allowed * behaves like a read-only container wrt Addresses So, you want to represent an interface on your host: >>> if_addr = IPv4Address('10.0.0.1/24') from this, you could get: >>> if_addr.address IPv4Address('10.0.0.1') >>> if_addr.netmask IPv4Address('255.255.255.0') >>> if_addr.broadcast IPv4Address('10.0.0.255') >>> if_addr.network IPV4Network('10.0.0.0/24') you might also have an address for the default gateway: >>> router_addr = IPv4Address('10.0.0.254/24') >>> router_addr in if_addr.network True or: >>> router_addr = IPv4Address('10.0.0.254') >>> router_addr in if_addr.network True Or maybe you've subneted your LAN: >>> IPV4Network('10.0.0.0/24') in IPv4Network('10.0.0.0/8') True >>> IPV4Network('10.0.1.0/24') in IPv4Network('10.0.0.0/8') True but: >>> IPV4Network('10.0.0.0/8') in IPv4Network('10.0.0.0/24') False This suggests the natural behaviour if the Address mask doesn't fit in the network: >>> IPv4Address('10.0.0.254/8') in IPv4Network('10.0.0.0/24') False -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3144 review, and the inclusion process
>I've never said otherwise. In fact, from an email last night, "If what >the community requires is the library you've described, then ipaddr is >not that library." The changes *you* require make ipaddr significantly >less useful to me. I'm not prepared to make those changes in an >attempt seek acceptance to the stdlib, especially if the stdlib is in >such flux that I'll get to do this again in 18 months. The point is that, having brought it to us, we all now have an interest in the outcome. Whatever goes into the standard library is going to be something that we have to live with for a long time, and now is our best chance to shape the result. I understand your concern over introducing more classes, however I still feel my suggested functional decomposition is worth that cost because I consider the behaviour of my suggested classes to be more intuitive. I should mention that I am not a computer scientist, and none of this is motivated by a desire for theoretical purity - just practical experience. One of my concerns now is that if a code block receives an IPv4Network instance, it does not know whether this represents a host address with mask, or a network address. In some contexts, this distinction is critical, and confounding them can result in delayed error reporting or erroneous behaviour. Your addition of a strict flag does not completely address this concern as it assumes the instantiation and use occur in proximity, which is often not the case in large projects. I suspect you are also mistaken in thinking my proposed changes make the module less useful for you - maybe you can describe the problem as you see it? As a reminder to people who have come late to this thread, I proposed three classes per protocol: IPv?Address A single address IPv?AddressWithMask A single address with implied IPv?Network IPv?Network A container-like network address (with strict mask parsing) Further: * Comparisons between classes should be disallowed. * The IPv?AddressWithMask class would have a .address and .mask attributes containing IPv?Addresses, and a .network attribute for the containing network (as specified by the mask, and lazily constructed). * The IPv?Network class would have similar .address and .mask attributes. In cases where you want to allow lax specification of network addresses, this would be spelt: IPv?AddressWithMask(some_address).network At first glance, this seems somewhat round-about, however it makes explicit the potential loss of bits. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
>> On the other hand, it is dangerous to provide a polymorphic API which >> does that more extensive parsing, because a less than paranoid >> programmer will have very likely allowed the parsed components to >> escape from the context where their encodings can be reliably >> determined. =A0Remember, *it is unlikely that they will ever be punished >> for their own lack of caution.* =A0The person who is doomed is somebody >> who tries to take that code and reuse it in a different context. > >Yeah, that's the original reasoning that had me leaning towards the >parallel API approach. If I seem to be changing my mind a lot in this >thread it's because I'm genuinely torn between the desire to make it >easier to port existing 2.x code to 3.x by making the current API >polymorphic and the fear that doing so will reintroduce some of the >exact same bytes/text confusion that the bytes/str split is trying to >get rid of. I don't think polymorphic API's do anyone any favours in the long run. My experience of the Py2 email API was that it would give the developer false comfort, only to blow up when the app was in the hands of users, and it didn't seem to matter how careful I was. Py3 has gone the pure/strict route in the core, and I think libs should be consistent with that choice. Developers will have work a little harder, but there will be less surprises. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3000 upgrade path
>I wrote two versions of the dict views refactoring. One that turns >d.keys() into list(d.keys()) and d.iterkeys() into iter(d.keys()). >This one is pretty robust except if you have classes that emulate >2.x-style dicts. But it generates ugly code. So I have a >"light-weight" version that leaves d.keys() alone, while turning >d.iterkeys() into d.keys(). This generates prettier code but more >buggy. I probably should have used the heavy-duty one instead. The ugliness is a virtue in this case as it stands out enough to motivate developers to review each case. The pretty/efficient version is tantamount to guessing, and effectively discards information in the transformation ("here be dragons"). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] These csv test cases seem incorrect to me...
>I decided it would be worthwhile to have a csv module written in Python (no >C underpinnings) for a number of reasons: Several other people have already done this. I will forward you their e-mail address in a separate private e-mail. >I'm far from having anything which will pass the current test suite, but in >diagnosing some of my current failures I noticed a couple test cases which >seem wrong. In the TestDialectExcel class I see these two questionable >tests: > >def test_quotes_and_more(self): >self.readerAssertEqual('"a"b', [['ab']]) > >def test_quote_and_quote(self): >self.readerAssertEqual('"a" "b"', [['a "b"']]) [...] >Any ideas about why these test cases are in there? I can't imagine Excel >generating either one. The point was to produce the same results as Excel. Sure, Excel probably doesn't generate crap like this itself, but 3rd parties do, and people complain if we don't parse it just like Excel (sigh). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Csv] These csv test cases seem incorrect to me...
>IMHO these test cases are *WRONG* and it's a worry that they "work" with >the current csv module :-( Those tests are not "wrong" - they verify that we produce the same result as Excel when presented with those inputs, which was one of the design goals of the module (and is an important consideration for many of it's users). While you might find the Excel team's choices bizare, they are stable, and in the absence of a formal specification for "CSV", Excel's behaviour is what most users want and expect. If you feel like extending the parser to optionally accept some other format, I have no problem. If you want to make this format the default, make sure you stick around to answer all the angry e-mail from users. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Fwd: PEP 0305 (small problem with the CSV reader)]
>First of all, let me say thank you for the CSV module. Thanks. >I've been using it and today is the first time I hit a minor bump in the road. >What happened is I opened this file with genome annotations with a >long field and the error "field larger than field limit" showed up. >From what I can see it is in the "static int parse_add_char(ReaderObj >*self, char c)" function. >This function uses the static long field_limit = 128 * 1024; /* max >parsed field size */ >I'm not sure if this is supposed to be recomputed or if there is >something I need to do to change it, but for right now it just says my >row is bigger than 131,072 and stops. >I don't think Python 2.5 has any such string length limitations and >this shouldn't be. This limit was added back in January 2005 to provide some protection against the situation where the parser is returning fields directly from a file, and the file contains a mismatched quote character: this would otherwise result in the entire file being unexpectedly read into memory. You can change the limit with the csv.field_size_limit() method. As you note, it defaults to 128K, but you can set it to anything up to (2**31)-1 or 2147483647 (about 2 billion). BTW, I've taken the liberty of CC'ing this to the python-dev list, so the motivation for this feature is recorded - it caused me some head scratching, and I added it. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of Tracker Issues
>> I think a single-click button "Spammer" >> should allow committers to lock an account and hide all messages >> and files that he sent, but that still requires somebody to implement >> it. > >I'd expect that to be pretty effective -- like graffiti artists, >spammers want their work to be seen, and a site that quickly removes >them will not be worth the effort for them. Unfortunately, the spammers are using automated tools to locate, register on and post to victim sites. The tools are distributed (running on compromised PCs) and massively parallel, so they really don't care that some of their posts are never seen. I'm reluctant to mention the name of one particular tool I'm aware of, but as well as the above, it also has OCR to defeat CAPTCHA, and automatically creates throw-away e-mail accounts with a range of free web-mail providers for registration purposes. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of Tracker Issues
>Typically spammers don't go through the effort to do a custom login >script for each different site. Instead, they do a custom login script >for each of the various software applications that support end-user >comments. So for example, there's a script for WordPress, and one for >PHPNuke, and so on. In my experience, what you say is true - the bulk of the spam comes via generic spamming software that has been hard-coded to work with a finite number of applications. However - once you knock these out, there is still a steady stream of what are clearly human generated spams. The mind boggles at the economics or desperation that make this worthwhile. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of Tracker Issues
>> However - once you knock these out, there is still a steady stream of >> what are clearly human generated spams. The mind boggles at the economics >> or desperation that make this worthwhile. > >Actually, it doesn't cost that much, because typically the spammer can >trick other humans into doing their work for them. > >Here's a simple method: Put up a free porn site, with a front page that >says "you must be 18 or older to enter". The page also has a captcha to >verify that you are a real person. But here's the trick: The captcha is >actually a proxy to some other site that the spammer is trying to get >access to. When the human enters in the correct word, the spammer's >server sends that word to the target site, which result in a successful >login/registration. Now that the spammer is in, they can post comments >or whatever they need to do. Yep - I was aware of this trick, but the ones I'm talking about have also got through filling out questionnaires, and whatnot. Certainly the same technique could be used, but my suspicion is that real people are being paid a pittance to sit in front of a PC and spam anything that moves. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Calling back into python from C
I realise I'm going to get slapped for asking a userish question here - sorry in advance. I'm looking for an explanation for why things are the way they are, the doco and py source aren't providing the missing info, and it looks like I'm bumping into an old Python bug (fixed in r38830 by mwh on 2005-04-18). I'm working on an C extension that needs to call back into python. Generally the GIL has been released when I need to do the callback, but I can't be sure. So I need to save the GIL state, get the lock, then restore it at the end. As far as I can tell from the doco, the recommended way to do this is to use PyGILState_Ensure() and PyGILState_Release(), but prior to r38830, PyGILState_Release incorrectly used PyEval_ReleaseThread when it should have been using PyEval_SaveThread() (I think), and the result is SEGV. This poses a problem, as I need to support Python versions back to 2.3. Am I correct in using PyGILState_Ensure() and PyGILState_Release()? If so, how do I support back to Py 2.3? Copy the current fixed PyGILState_Release() into my code (ick)? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python developers are in demand
>I wonder if we should start maintaining a list of Python developers >for hire somewhere on python.org, beyond the existing Jobs page. Is >anyone interested in organizing this? What about something a little less formal - a mailing list such as python-jobs? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
>On Nov 12, 2007, at 8:16 AM, M.-A. Lemburg wrote: >> We have a -1 from Martin and a +1 from Walter, Guido and myself. >> Pretty clear vote if you ask me. I'd say we end the discussion here >> and move on. > >If we're counting, you've got a -1 on the codec from me as well. >Martin's right: there's no value to embedding the logic of auto- >detection into the codec. A function somewhere in the xml package is >all that's warranted. I agree with Fred here - it should be a function in the xml package, not a codec. -1 -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pkgutil, pkg_resource and Python 3.0 name space packages
>The best existing indicator we have is the organization of the docs for >the standard library. I, for one, have a hell of a difficult time finding >modules via the "organized" table of contents in the Library Reference. >Instead, I always go the the Global Module Index where the somewhat flat >namespace makes it easy to go directly to the module of interest. I'm >curious whether the other developers have had the same experience -- if >so, then it is a bad omen for over-organizing the standard library. I nearly always use my browser's search function to find the module of interest, so yes, I'm effectively using a flat namespace. >Another indicator of what lies ahead is the current organization of os vs >os.path. While that split-out was well done and necessary, I routinely >have difficulty remembering which one contains a function of interest. I mostly remember, but there are some notable exceptions: exists (posix system call, expect to find it in os), walk (which is the old deprecated one? have to check doc). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Monkeypatching idioms -- elegant or ugly?
>I think that despite the objection that monkeypatching shoudn't be >made too easy, it's worth at looking into a unification of the API, >features, and implementation. I agree. The other virtue of having it in the standard library is that it's immediately recognisable for what it is. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-3000] Removing bsddb module from py3k (was Re: No beta2 tonight)
>But sqlite is transactional, can offer cursors, getrange, etc., etc. > >I'm still curious as to what deep features people are using in bsddb. It's not using "deep features", unless you define their on-disk layout as deep, but it does get used for things such as interactions with other systems - for example, using it to maintain Radius user databases for a (proprietary/commercial) Radius auth daemon. But dropping it from the core won't stop this. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Csv] skipfinalspace
>>>I downloaded the 2.6 source tar ball, but is it too late for new >>>features to get into versions <3? >> >> Yep. Sigh - I should slow down and actually read the e-mail I'm replying to. It is not too late to get features into versions <3. It is, however, too late to get features into 2.6, which was not what you asked, but what I was answering "Yep" to. >>>How would you feel about adding the following tests to >>>Lib/test/test_csv.py and getting them to pass? I have no real objection to someone adding a skipfinalspace parameter and associated tests, although I have no time to do it myself at the moment. >> >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says >> >"*skipinitialspace *When True, whitespace immediately following the >> >delimiter is ignored." >> >but my tests show whitespace at the start of any field is ignored, >> >including the first field. >> >> I suspect (but I haven't checked) that it means "after the delimiter and >> before any quoted field (or some variation on that). > >I agree that whitespace after the delimiter and before any quoted field is >skipped. Also whitespace after the start of the line and before any quoted >field is skipped. I'm not sure if we're talking about the same thing - it seems to work as I expect it to work: >>> list(csv.reader([' foo, bar'])) [[' foo', ' bar']] >>> list(csv.reader([' foo, bar'], skipinitialspace=1)) [['foo', 'bar']] BTW, I think the reason "skipinitialspace" exists at all is to support this: >>> list(csv.reader([' foo, " bar"'])) [[' foo', ' " bar"']] >>> list(csv.reader([' foo, " bar"'], skipinitialspace=1)) [['foo', ' bar']] The quoting is only valid if the quote is the first character encountered in the field (this is how Excel works). However, some other CSV generators insert a space after the comma, and expect the parser to still treat it as a quoted field - so skipinitialspace eats the space leading up the quote, but does not eat any space after the quote (hence the "initial" in the name). For symmetry, a "skipfinalspace" option should do the same - only eat space after the quote (if quotes are used) - however this will be rather hard to implement as the parser state has already rolled on, and you no longer know that whether the field was quoted. Eating spaces that appeared within the quotes is the wrong thing to do. >skipinitialspace defaults to false and by the same logic skipfinalspace >should default to false to preserve compatibility with the csv module in >2.6. On the other hand, the switch to version 3 is as good a time as any to >break backwards compatibility to adopt something that works better for new >users. No, by default it needs to work like Excel, because this is the defacto standard. >Based on my experience parsing several hundred csv generated by many >different people I think it would be nice to at least have a dialect that is >excel + skipinitialspace=True + skipfinalspace=True. Once the "skipfinalspace" parameter is implemented, there is nothing stopping you creating such a dialect in your code, but I don't support adding it to the standard library - the dialects in the std lib should be well defined (in some way). BTW, it's not necessary to create dialect objects: as I've done above, users can pass keyword parameters to the parser if it's more convenient. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python2.5 _sre deepcopy regression?
In version of Python prior to 2.5, it would appear that deepcopying compiled regular expressions worked by accident: 2.4: >>> copy.deepcopy(re.compile('')) <_sre.SRE_Pattern object at 0xb7d53ef0> 2.5: >>> copy.deepcopy(re.compile('')) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.5/copy.py", line 173, in deepcopy y = copier(memo) TypeError: cannot deepcopy this pattern object I say "by accident", since the SRE_Pattern object in 2.4 has a __deepcopy__ method which raises the "cannot deepcopy this pattern object" TypeError, however this method isn't found by copy.deepcopy() in 2.4, and copy.deepcopy() falls back to using the pickle logic. The _sre source has #ifdef-out support for __deepcopy__, issue 416670 has the gory details: http://bugs.python.org/issue416670 Changeset 38430 on the release24-maint branch introduced the changes that stopped SRE_Pattern.__deepcopy__ being found. r38430 was a patch forward ported from 2.3, but never ported to the trunk (probably a good thing, too). Thoughts? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python2.5 _sre deepcopy regression?
I posted this week ago, but haven't seen any comments. Issue 416670 is probably the most relevent ticket. The buggy changeset I mention, 38430 on the release24-maint branch is one that had been forward and back-ported for a while. I haven't found the motivation for that change, but it hasn't been applied to any version of Python later than 2.4. >In version of Python prior to 2.5, it would appear that deepcopying >compiled regular expressions worked by accident: > >2.4: > >>>> copy.deepcopy(re.compile('')) ><_sre.SRE_Pattern object at 0xb7d53ef0> > >2.5: > >>>> copy.deepcopy(re.compile('')) >Traceback (most recent call last): > File "", line 1, in > File "/usr/lib/python2.5/copy.py", line 173, in deepcopy >y = copier(memo) >TypeError: cannot deepcopy this pattern object > >I say "by accident", since the SRE_Pattern object in 2.4 has >a __deepcopy__ method which raises the "cannot deepcopy this pattern >object" TypeError, however this method isn't found by copy.deepcopy() >in 2.4, and copy.deepcopy() falls back to using the pickle logic. > >The _sre source has #ifdef-out support for __deepcopy__, issue 416670 >has the gory details: > >http://bugs.python.org/issue416670 > >Changeset 38430 on the release24-maint branch introduced the changes >that stopped SRE_Pattern.__deepcopy__ being found. r38430 was a patch >forward ported from 2.3, but never ported to the trunk (probably a good >thing, too). > >Thoughts? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Py2.4 _sre uses uninitialised memory (Bug 1088891)
_sre.c, data_stack_grow() in Py2.4 uses realloc()'ed memory without initialising the newly allocated memory. For complex regexps that require additional sre stack space, this ultimately results in a core dump or corrupted heap. Filling the newly allocated memory with 0x55 makes the problem more obvious (dies on a reference to 0x5558) for me. See bug ID 1088891: http://sourceforge.net/tracker/index.php?func=detail&aid=1088891&group_id=5470&atid=105470 Can I be the only person who crafts diabolical regexps? Here, have a lend of my brown paper bag... -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] csv module TODO list
There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order): * unicode support (this will probably uglify the code considerably). * 8 bit transparency (specifically, allow \0 characters in source string and as delimiters, etc). * Reader and universal newlines don't interact well, reader doesn't honour Dialect's lineterminator setting. All outstanding bug id's (789519, 944890, 967934 and 1072404) are related to this - it's a difficult problem and further discussion is needed. * compare PEP-305 and library reference manual to the module as implemented and either document the differences or correct them. * Address or document Francis Avila's issues as mentioned in this posting: http://www.google.com.au/groups?selm=vsb89q1d3n5qb1%40corp.supernews.com * Several blogs complain that the CSV module is no good for parsing strings. Suggest making it clearer in the documentation that the reader accepts an iterable, rather than a file, and document why an iterable (as opposed to a string) is necessary (multi-line records with embedded newlines). We could also provide an interface that parses a single string (or the old Object Craft interface) for those that really feel the need. See: http://radio.weblogs.com/0124960/2003/09/12.html http://zephyrfalcon.org/weblog/arch_d7_2003_09_06.html#e335 * Compatability API for old Object Craft CSV module? http://mechanicalcat.net/cgi-bin/log/2003/08/18 For example: "from csv.legacy import reader" or something. * Pure python implementation? * Some CSV-like formats consider a quoted field a string, and an unquoted field a number - consider supporting this in the Reader and Writer. See: http://radio.weblogs.com/0124960/2004/04/23.html * Add line number and record number counters to reader object? * it's possible to get the csv parser to suck the whole source file into memory with an unmatched quote character. Need to limit size of internal buffer. Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should already have been addressed): * remove TODO comment at top of file--it's empty * is CSV going to be maintained outside the python tree? If not, remove the 2.2 compatibility macros for: PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc. * inline the following functions since they are used only in one place get_string, set_string, get_nullchar_as_None, set_nullchar_as_None, join_reset (maybe) * rather than use PyErr_BadArgument, should you use assert? (first example, Dialect_set_quoting, line 218) * is it necessary to have Dialect_methods, can you use 0 for tp_methods? * remove commented out code (PyMem_DEL) on line 261 Have you used valgrind on the test to find memory overwrites/leaks? * PyString_AsString()[0] on line 331 could return NULL in which case you are dereferencing a NULL pointer * note sure why there are casts on 0 pointers lines 383-393, 733-743, 1144-1154, 1164-1165 * Reader_getiter() can be removed and use PyObject_SelfIter() * I think you need PyErr_NoMemory() before returning on line 768, 1178 * is PyString_AsString(self->dialect->lineterminator) on line 994 guaranteed not to return NULL? If not, it could crash by passing to memmove. * PyString_AsString() can return NULL on line 1048 and 1063, the result is passed to join_append() * iteratable should be iterable? (line 1088) * why doesn't csv_writerows() have a docstring? csv_writerow does * any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE * csv_unregister_dialect, csv_get_dialect could use METH_O so you don't need to use PyArg_ParseTuple * in init_csv, recommend using PyModule_AddIntConstant and PyModule_AddStringConstant where appropriate Also, review comments from Jeremy Hylton, 10 Apr 2003: I've been reviewing extension modules looking for C types that should participate in garbage collection. I think the csv ReaderObj and WriterObj should participate. The ReaderObj it contains a reference to input_iter that could be an arbitrary Python object. The iterator object could well participate in a cycle that refers to the ReaderObj. The WriterObj has a reference to a writeline callable, which could well be a method of an object that also points to the WriterObj. The Dialect object appears to be safe, because the only PyObject * it refers should be a string. Safe until someone creates an insane string subclass <0.4 wink>. Also, an unrelated comment about the code, the lineterminator of the Dialect is managed by a collection of little helper functions like get_string, set_string, etc. This code appears to be excessively general; since they're called only once, it seems clearer to inline the log
Re: [Python-Dev] Re: [Csv] csv module TODO list
>Andrew> There's a bunch of jobs we (CSV module maintainers) have been >Andrew> putting off - attached is a list (in no particular order): >... > >In addition, it occurred to me this evening that there's functionality in >the csv module I don't think anybody uses. It's very difficult to say for sure that nobody is using it once it's released to the world. >For example, you can register CSV dialects by name, then pass in the >string name instead of the dialect class. I'd be in favor of scrapping >list_dialects, register_dialect and unregister_dialect altogether. While >they are probably trivial little functions I don't think they add much if >anything to the implementation and just complicate the _csv extension >module slightly. Yes, in hindsight, they're not really necessary, although I'm sure we had some motivation for them initially. That said, they're there now, and they shouldn't require much maintenance. >I'm also not aware that anyone really uses the Sniffer class, though it >does provide some useful functionality should you need to analyze random >CSV files. The comment I get repeatedly is that they don't use it because it's "too magic/scary". That's as it should be. But if it didn't exist, then someone would be requesting we add it... 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
>> Andrew McNamara wrote: >>> There's a bunch of jobs we (CSV module maintainers) have been putting >>> off - attached is a list (in no particular order): >>> * unicode support (this will probably uglify the code considerably). >> >Martin v. Löwis wrote: >> Can you please elaborate on that? What needs to be done, and how is >> that going to be done? It might be possible to avoid considerable >> uglification. I'm not altogether sure there. The parsing state machine is all written in C, and deals with signed chars - I expect we'll need two versions of that (or one version that's compiled twice using pre-processor macros). Quite a large job. Suggestions gratefully received. M.-A. Lemburg wrote: >Indeed. The trick is to convert to Unicode early and to use Unicode >literals instead of string literals in the code. Yes, although it would be nice to also retain the 8-bit versions as well. >Note that the only real-life Unicode format in use is UTF-16 >(with BOM mark) written by Excel. Note that there's no standard >for specifying the encoding in CSV files, so this is also the only >feasable format. Yes - that's part of the problem I hadn't really thought about yet - the csv module currently interacts directly with files as iterators, but it's clear that we'll need to decode as we go. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
>> Yes, although it would be nice to also retain the 8-bit versions as well. > >You can do so by using latin-1 as default encoding. Works great ! Yep, although that means we wear the cost of decoding and encoding for all 8 bit input. What does the _sre.c code do? >Depends on your needs: CSV files tend to be small enough >to do the decoding in one call in memory. We are routinely dealing with multi-gigabyte csv files - which is why the original 2001 vintage csv module was written as a C state machine. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
>> Yep, although that means we wear the cost of decoding and encoding for >> all 8 bit input. > >Right, but it makes the code very clean and straight forward. I agree it makes for a very clean solution, and 99% of the time I'd chose that option. >Again, it depends on what you need. If performance is critical >then you probably need a C version written using the same trick >as _sre.c... > >> What does the _sre.c code do? > >It comes in two versions: one for 8-bit the other for Unicode. That's what I thought. I think the motivations here are similar to those that drove the _sre developers. >> We are routinely dealing with multi-gigabyte csv files - which is why the >> original 2001 vintage csv module was written as a C state machine. > >I see, but are you sure that the typical Python user will have >the same requirements to make it worth the effort (and >complexity) ? This is open source, so I scratch my own itch (and that of my employers) - we need fast csv parsing more than we need unicode... 8-) Okay, assuming we go the "produce two versions via evil macro tricks" path, it's still not quite the same situation as _sre.c, which only has to deal with the internal unicode representation. One way to approach this would be to add an "encoding" keyword argument to the readers and writers. If given, the parser would decode the input stream to the internal representation before passing it through the unicode state machine, which would yield tuples of unicode objects. That leaves us with a bit of a problem where the source is already unicode (eg, a list of unicode strings)... hmm. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] csv module TODO list
>Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should >already have been addressed): I should apologise to Neal here for not replying to him at the time. Okay, going though the issues Neal raised... >* remove TODO comment at top of file--it's empty Was fixed. >* is CSV going to be maintained outside the python tree? > If not, remove the 2.2 compatibility macros for: > PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc. Does anyone thing we should continue to maintain this 2.2 compatibility? >* inline the following functions since they are used only in one place >get_string, set_string, get_nullchar_as_None, set_nullchar_as_None, >join_reset (maybe) It was done that way as I felt we would be adding more getters and setters to the dialect object in future. >* rather than use PyErr_BadArgument, should you use assert? >(first example, Dialect_set_quoting, line 218) You mean C assert()? I don't think I'm really following you here - where would the type of the object be checked in a way the user could recover from? >* is it necessary to have Dialect_methods, can you use 0 for tp_methods? I was assuming I would need to add methods at some point (in fact, I did have methods, but removed them). >* remove commented out code (PyMem_DEL) on line 261 >Have you used valgrind on the test to find memory overwrites/leaks? No, valgrind wasn't used. >* PyString_AsString()[0] on line 331 could return NULL in which case >you are dereferencing a NULL pointer Was fixed. >* note sure why there are casts on 0 pointers >lines 383-393, 733-743, 1144-1154, 1164-1165 To make it easier when the time comes to add one of those members. >* Reader_getiter() can be removed and use PyObject_SelfIter() Okay, wasn't aware of PyObject_SelfIter - will fix. >* I think you need PyErr_NoMemory() before returning on line 768, 1178 The examples I looked at in the Python core didn't do this - are you sure? (now lines 832 and 1280). >* is PyString_AsString(self->dialect->lineterminator) on line 994 >guaranteed not to return NULL? If not, it could crash by >passing to memmove. >* PyString_AsString() can return NULL on line 1048 and 1063, >the result is passed to join_append() Looking at the PyString_AsString implementation, it looks safe (we ensure it's really a string elsewhere)? >* iteratable should be iterable? (line 1088) Sorry, I don't know what you're getting at here? (now line 1162). >* why doesn't csv_writerows() have a docstring? csv_writerow does Was fixed. >* any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE Was fixed. >* csv_unregister_dialect, csv_get_dialect could use METH_O >so you don't need to use PyArg_ParseTuple Was fixed. >* in init_csv, recommend using >PyModule_AddIntConstant and PyModule_AddStringConstant >where appropriate Was fixed. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: csv module TODO list
>Quite a while ago I posted some material to the csv-list about >problems using the csv module on Unix-style colon-separated files -- >it just doesn't deal properly with backslash escaping and is quite >useless for this kind of file. I seem to recall the general view was >that it wasn't intended for this kind of thing -- only the sort of csv >that Microsoft Excel outputs/inputs, but if I am mistaken about this, >perhaps fixing this issue might be put on the TODO-list? I'll be happy >to re-send or summarize the relevant emails, if needed. I think a related issue was included in my TODO list: >* Address or document Francis Avila's issues as mentioned in this posting: > >http://www.google.com.au/groups?selm=vsb89q1d3n5qb1%40corp.supernews.com -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
>>>>Can you please elaborate on that? What needs to be done, and how is >>>>that going to be done? It might be possible to avoid considerable >>>>uglification. >> >> I'm not altogether sure there. The parsing state machine is all written in >> C, and deals with signed chars - I expect we'll need two versions of that >> (or one version that's compiled twice using pre-processor macros). Quite >> a large job. Suggestions gratefully received. > >I'm still trying to understand what *needs* to be done - I would move to >how this is done only later. What APIs should be extended/changed, and >in what way? That's certainly the first step, and I have to admit that I don't have a clear idea at this time - the unicode issue has been in the "too hard" basket since we started. Marc-Andre Lemburg mentioned that he has encountered UTF-16 encoded csv files, so a reasonable starting point would be the ability to read and parse, as well as the ability to generate, one of these. The reader interface currently returns a row at a time, consuming as many lines from the supplied iterable (with the most common iterable being a file). This suggests to me that we will need an optional "encoding" argument to the reader constructor, and that the reader will need to decode the source lines. That said, I'm hardly a unicode expert, so I may be overlooking something (could a utf-16 encoded character span a line break, for example). The writer interface probably should have similar facilities. However - a number of people have complained about the "iterator" interface, wanting to supply strings (the iterable is necessary because a CSV row can span multiple lines). It's also conceiveable that the source lines could already be unicode objects. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Csv] Re: [Python-Dev] csv module TODO list
>>I'm still trying to understand what *needs* to be done - I would move to >>how this is done only later. What APIs should be extended/changed, and >>in what way? [...] >The reader interface currently returns a row at a time, consuming as many >lines from the supplied iterable (with the most common iterable being >a file). This suggests to me that we will need an optional "encoding" >argument to the reader constructor, and that the reader will need to >decode the source lines. That said, I'm hardly a unicode expert, so I >may be overlooking something (could a utf-16 encoded character span a >line break, for example). The writer interface probably should have >similar facilities. Ah - I see that the codecs module provides an EncodedFile class - better to use this than add encoding/decoding cruft to the csv module. So, do we duplicate the current reader and writer as UnicodeReader and UnicodeWriter (how else do we know to use the unicode parser)? What about the "dialects"? I guess if a dialect uses no unicode strings, it can be applied to the current parser, but if it does include unicode strings, then the parser would need to raise an exception. The DictReader and DictWriter classes will probably need matching UnicodeDictReader/UnicodeDictWriter versions (use common base class, just specify alternate parser). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] csv module TODO list
>There's a bunch of jobs we (CSV module maintainers) have been putting >off - attached is a list (in no particular order): [...] >Also, review comments from Jeremy Hylton, 10 Apr 2003: > >I've been reviewing extension modules looking for C types that should >participate in garbage collection. I think the csv ReaderObj and >WriterObj should participate. The ReaderObj it contains a reference to >input_iter that could be an arbitrary Python object. The iterator >object could well participate in a cycle that refers to the ReaderObj. >The WriterObj has a reference to a writeline callable, which could well >be a method of an object that also points to the WriterObj. I finally got around to looking at this, only to realise Jeremy did the work back in Apr 2003 (thanks). One question, however - the GC doco in the Python/C API seems to suggest to me that PyObject_GC_Track should be called on the newly minted object prior to returning from the initialiser (and correspondingly PyObject_GC_UnTrack should be called prior to dismantling). This isn't being done in the module as it stands. Is the module wrong, or is my understanding of the reference manual incorrect? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Minor change to behaviour of csv module
I'm considering a change to the csv module that could potentially break some obscure uses of the module (but CSV files usually quote, rather than escape, so the most common uses aren't effected). Currently, with a non-default escapechar='\\', input like: field one,field \ two,field three Returns: ["field one", "field \\\ntwo", "field three"] In the 2.5 series, I propose changing this to return: ["field one", "field \ntwo", "field three"] Is this reasonable? Is the old behaviour desirable in any way (we could add a switch to enable to new behaviour, but I feel that would only allow the confusion to continue)? BTW, some of my other changes have changed the exceptions raised when bad arguments were passed to the reader and writer factory functions - previously, the exceptions were semi-random, including TypeError, AttributeError and csv.Error - they should now almost always be TypeError (like most other argument passing errors). I can't see this being a problem, but I'm prepared to listen to arguments. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Minor change to behaviour of csv module
>I'm considering a change to the csv module that could potentially break >some obscure uses of the module (but CSV files usually quote, rather >than escape, so the most common uses aren't effected). > >Currently, with a non-default escapechar='\\', input like: > >field one,field \ >two,field three > >Returns: > >["field one", "field \\\ntwo", "field three"] > >In the 2.5 series, I propose changing this to return: > >["field one", "field \ntwo", "field three"] > >Is this reasonable? Is the old behaviour desirable in any way (we could >add a switch to enable to new behaviour, but I feel that would only >allow the confusion to continue)? Thinking about this further, I suspect we have to retain the current behaviour, as broken as it is, as the default: it's conceivable that someone somewhere is post-processing the result to remove the backslashes, and if we fix the csv module, we'll break their code. Note that PEP-305 had nothing to say about escaping, nor does the module reference manual. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: csv module TODO list
>I'd love to see a 'split' and a 'join' function in the csv module to >just convert between string and list without having to bother about >files. > >Something like > >csv.split(aStr [, dialect='excel'[, fmtparam]]) -> list object > >and > >csv.join(aList, e[, dialect='excel'[, fmtparam]]) -> str object > >Feasible? Yes, it's feasible, although newlines can be embedded in within fields of a CSV record, hence the use of the iterator, rather than working with strings. In your example above, if the parser gets to the end of the string and finds it's still within a field, I'd propose just raising an exception. No promises, however - I only have a finite ammount of time to work on this at the moment. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Csv] Minor change to behaviour of csv module
>> Andrew explains that in the CSV module, escape characters are not >> properly removed. >> >> Magnus writes: >>> IMO this is the *only* reasonable behaviour. I don't understand why >>> the escape character should be left in; this is one of the reason why >>> UNIX-style colon-separated values don't work with the current module. >> >> Andrew writes back later: >>> Thinking about this further, I suspect we have to retain the current >>> behaviour, as broken as it is, as the default: it's conceivable that >>> someone somewhere is post-processing the result to remove the >>> backslashes, >>> and if we fix the csv module, we'll break their code. >> >> I'm with Magnus on this. No one has 4 year old code using the CSV >> module. >> The existing behavior is just simply WRONG. Sure, of course we should >> try to maintain backward compatibility, but surely SOME cases don't >> require it, right? Can't we treat this misbehavior as an outright bug? > >+1 -- the nonremoval of escape characters smells like a bug to me, too. Okay, I'm glad the community agrees (less work, less crustification). For what it's worth, it wasn't a bug so much as a misfeature. I was explicitly adding the escape character back in. The intention was to make the feature more forgiving on users who accidently set the escape character - in other words, only special (quoting, escaping, field delimiter) characters received special treatment. With the benefit of hindsight, that was an inadequately considered choice. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] csv module and universal newlines
This item, from the TODO list, has been bugging me for a while: >* Reader and universal newlines don't interact well, reader doesn't > honour Dialect's lineterminator setting. All outstanding bug id's > (789519, 944890, 967934 and 1072404) are related to this - it's > a difficult problem and further discussion is needed. The csv parser consumes lines from an iterator, but it also has it's own idea of end-of-line conventions, which are currently only used by the writer, not the reader, which is a source of much confusion. The writer, by default, also attempts to emit a \r\n sequence, which results in more confusion unless the file is opened in binary mode. I'm looking for suggestions for how we can mitigate these problems (without breaking things for existing users). The standard file iterator includes the end-of-line characters in the returned string. One potentional solution is, then, to ignore the line chunking done by the file iterator, and logically concatenate the source lines until the csv parser's idea of lineterminator is seen - but this defeats negates the benefits of using an iterator. Another option might be to provide a new interface that relies on a file-like object being supplied. The lineterminator character would only be used with this interface, with the current interface falling back to using only \n. Rather a drastic solution. Any other ideas? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Csv] csv module TODO list
>Would the csv module be a good place to add a DBF reader and writer? I would have thought it would make sense as it's own module (in the same way that we have separate modules that present a common interface for the different databases), or am I missing something? I'd certainly like to see a DBF parser in python - reading and writing odd file formats is bread-and-butter for us contractors... 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Csv] csv module and universal newlines
>You can argue that reading csv data from/writing csv data to a file on >Windows if the file isn't opened in binary mode is an error. Perhaps we >should enforce that in situations where it matters. Would this be a start? > >terminators = {"darwin": "\r", > "win32": "\r\n"} > >if (dialect.lineterminator != terminators.get(sys.platform, "\n") and > "b" not in getattr(f, "mode", "b")): > raise IOError, ("%s not opened in binary mode" % > getattr(f, "name", "???")) > >The elements of the postulated terminators dictionary may already exist >somewhere within the sys or os modules (if not, perhaps they should be >added). The idea of the check is to enforce binary mode on those objects >that support a mode if the desired line terminator doesn't match the >platform's line terminator. Where that falls down, I think, is where you want to read an alien file - in fact, under unix, most of the CSV files I read use \r\n for end-of-line. Also, I *really* don't like the idea of looking for a mode attribute on the supplied iterator - it feels like a layering violation. We've advertised the fact that it's an iterator, so we shouldn't be using anything but the iterator protocol. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Csv] csv module and universal newlines
>Isn't universal newlines only used for reading? That right. And the CSV reader has it's own version of univeral newlines anyway (from the py1.5 days). >I have had no problems using the csv module for reading files with >universal newlines by opening the file myself or providing an iterator. Neither have I, funnily enough. >Unicode, on the other hand, I have had problems with. Ah, so somebody does want it then? Good to hear. Hard to get motivated to make radical changes without feedback. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] UserString
>> if e.errno <> errno.EEXIST: >> raise > >You have a lot more faith in the errno module than I do. Are you sure >the same error codes work on all platforms where Python works? It's >also not exactly readable (except for old Unix hacks). On the other hand, LBYL in this context can result in race conditions and security vulnerabilities. "os.makedirs" is already a composite of many system calls, so all bets are off anyway, but for simpler operations that result in an atomic system call, this is important. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] UserString
>> You have a lot more faith in the errno module than I do. Are you sure >> the same error codes work on all platforms where Python works? > >No, but I'm pretty confident the symbolic names for the errors are >consistent for any platform I've cared about . > >> It's also not exactly readable (except for old Unix hacks). > >Guilty as charged. ;) The consistency of the semantics of core system calls is sort of trademark of unix. Any system that claims to be Unix, but plays loose and fast with semantics soon gets a very poor reputation (xenix, cough). All well-coded unix apps are dependent on system calls returning consistent errno's. Which is one thing that makes life so difficult for "posix" environments layered on other operating systems. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Faster Set.discard() method?
To avoid the exception in the discard method, it could be implemented as: def discard(self, element): """Remove an element from a set if it is a member. If the element is not a member, do nothing. """ try: self._data.pop(element, None) except TypeError: transform = getattr(element, "__as_temporarily_immutable__", None) if transform is None: raise # re-raise the TypeError exception we caught del self._data[transform()] Currently, it's implemented as the much clearer: try: self.remove(element) except KeyError: pass But the dict.pop method is about 12 times faster. Is this worth doing? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Faster Set.discard() method?
>> But the dict.pop method is about 12 times faster. Is this worth doing? > >The 2.4 builtin set's discard function looks like it does roughly the same >as the 2.3 sets.Set. Have you tried comparing a C version of your version >with the 2.4 set to see if there are speedups there, too? Ah. I had forgotten it was builtin - I'd found the python implementation and concluded the C implementation didn't make it into 2.4 for some reason... 8-) Yes, the builtin set.discard() method is already faster than dict.pop(). >IMO keeping the sets.Set version as clean and readable as possible is nice, >since the reason this exists is for other implementations (Jython, PyPy, >...) and documentation, right? OTOH, speeding up the CPython implementation >is nice and it's read by many fewer people. No, you're right - making sets.Set less readable than it already is would be a step backwards. On the other hand, Jython and PyPy are already in trouble - the builtin set() is not entirely compatible with sets.Set. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Faster Set.discard() method?
>The C implementation has this code: > >""" > if (PyDict_DelItem(so->data, item) == -1) { > if (!PyErr_ExceptionMatches(PyExc_KeyError)) > return NULL; > PyErr_Clear(); > } >""" > >Which is more-or-less the same as the sets.Set version, right? What I was >wondering was whether changing that C to a C version of your dict.pop() >version would also result in speedups. Are Exceptions really that slow, >even at the C level? No, exceptions are fast at the C level - all they do is set a flag. The expense of exceptions is saving a restoring python frames, I think, which doesn't happen in this case. So the current implementation is ideal for C code - clear and fast. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] Example workaround classes for using Unicode with csv module...
>I added UnicodeReader and UnicodeWriter example classes to the csv module >docs just now. They mention problems with ASCII NUL characters (which I >vaguely remember - NUL-terminated strings are used internally, right?). Do >NULs still present a problem? I saw nothing in the log messages that >mentioned "ascii" or "nul" so I presume it is. That's right - it still uses null terminated strings internally, and the various special characters (quotechar, escapechar, etc) use null to mean "not specified". Fixing this would cause much upheaval. >Here's what I added. Let me know if you think it needs any corrections, >especially if there's a better way to word "as long as you avoid encodings >like utf-16 that use NULs". Can that just be "as long as you avoid >multi-byte encodings other than utf-8"? I think only utf-8 provides the guarantees needed for this to work - specifically, multi-byte characters need to have the high bit set (otherwise a delimiter or other special character appearing within a multi-byte character would upset the parsing), while at the same time having single byte characters for the characters with special meaning to the parser: note also that none of the special characters (quotechar, delimiter, escapechar, etc) can be a multi-byte sequence. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket module corner cases
>Without further ado, the questions: > > * getfqdn(): The module docs specify that if no FQDN can be found, >socket.getfqdn() should return the hostname as returned by >gethostname(). However, CPython seems to return the passed-in hostname >rather than the local machine's hostname (as would be expected from >gethostname()). What's the correct behavior? >>>> s.getfqdn(' asdlfk asdfsadf ') >'asdlfk asdfsadf' ># expected 'mybox.mydomain.com' I would suggest the documentation is wrong and the CPython code is right in this case: if you supply the optional /name/ argument, then you don't want it returning your own name (but returning gethostname() is desirable if no /name/ is supplied). > * getfqdn(): The function seems to not always return the FQDN. For >example, if I run the following code from 'mybox.mydomain.com', I get >strange output. Does getfqdn() remove the common domain between my >hostname and the one that I'm looking up? >>>> socket.getfqdn('otherbox') >'OTHERBOX' ># expected 'otherbox.mydomain.com' getfqdn() calls the system library gethostbyaddr(), and searches the result for the first name that contains '.' or if no name contains dot, it returns the canonical name (as defined by gethostbyaddr() and the system host name resolver libraries (hosts file, DNS, NMB)). > * getprotobyname(): Only a few protocols seem to be supported. Why? >>>> for p in [a[8:] for a in dir(socket) if a.startswith('IPPROTO_')]: >... try: >... print p, >... print socket.getprotobyname(p) >... except socket.error: >... print "(not handled)" >... getprotobyname() looks up the /etc/protocols file (on a unix system - I don't know about windows), whereas the socket.IPPROTO_* constants are populated from the #defines in netinet/in.h at compile time. Personally, I think /etc/protocols and the associated library functions are a historical mistake (getprotobynumber() is marginally useful - but python doesn't expose it!). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket module corner cases
>After a little more investigation here, it appears that getfqdn() returns >the name unchanged (or perhaps run through the system's resolver libs) if >there's no reverse DNS PTR entry. In the case given above, >otherbox.mydomain.com didn't have a reverse DNS entry, so getfqdn() >returned 'OTHERBOX'. However, when getfqdn() is called with a name whose >IP *does* have a PTR record, it returns the correct FQDN. That sounds entirely plausible. Many of these name resolver functions pre-date DNS, and show their /etc/hosts heritage somewhat (gethostbyaddr returning ip, names and aliases in one hit is a classic example - this isn't easy with DNS). >Thanks for the help. Now, couple more questions: > >getnameinfo() accepts the NI_NAMEREQD flag. It appears, though that a name >lookup (and associated error if the lookup fails) occurs independent of >whether the flag is specified. Does it actually do anything? > >Does getnameinfo() support IPv6? It appears to fail (with a socket.error >that says "sockaddr resolved to multiple addresses") if both IPv4 and IPv6 >are enabled. Someone more knowledgeable will have to answer these. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "and" and "or" operators in Py3.0
>While I don't disagree with some of your main points, I do think that >your proposal would eliminate a natural and easy to understand use of >the current behavior of "or" that I tend to use quite a bit. Your >proposal would break a lot of code, and I can't think of a better >"conditional operator" than the one thats already there. > >I often find myself using 'or' to conditionally select a meaningful >value in the absence of a real value: I agree. I find I often have an object with an optional friendly name (label) and a manditory system name. So this sort of thing becomes common: '%s blah blah' % (foo.label or foo.name) The if-else-expression alternative works, but isn't quite as readable: '%s blah blah' % (foo.label ? foo.label : foo.name) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com