from:"Andrew McNamara"

Re: [Python-Dev] The fate of 3.0.*

2009-02-13 Thread Andrew McNamara

>So what are the expected efforts for 3.1?
>- io-in-C 
>- import-in-Python
>- ... anything else?

A fixed "email" module.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-07 Thread Andrew McNamara


On 07/04/2009, at 7:27 AM, Guido van Rossum wrote:

On Mon, Apr 6, 2009 at 7:28 AM, Cesare Di Mauro
 wrote:
The Language Reference says nothing about the effects of code  
optimizations.
I think it's a very good thing, because we can do some work here  
with constant

folding.


Unfortunately the language reference is not the only thing we have to
worry about. Unlike languages like C++, where compiler writers have
the moral right to modify the compiler as long as they stay within the
weasel-words of the standard, in Python, users' expectations carry
value. Since the language is inherently not that fast, users are not
all that focused on performance (if they were, they wouldn't be using
Python). Unsurprising behavior OTOH is valued tremendously.


Rather than trying to get the optimizer to guess, why not have a  
"const" keyword and make it explicit? The result would be a symbol  
that essentially only exists at compile time - references to the  
symbol would be replaced by the computed value while compiling. Okay,  
maybe that would suck a bit (no symbolic debug output).


Yeah, I know... take it to python-wild-and-ill-considered-id...@python.org 
.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issues with Py3.1's new ipaddr

2009-06-02 Thread Andrew McNamara



On 03/06/2009, at 3:56 AM, Jean-Paul Calderone wrote:

On Tue, 02 Jun 2009 19:34:11 +0200, "\"Martin v. Löwis\"" > wrote:

[snip]


You seem comfortable with these quirks, but then you're not planning
to write software with this library. Developers who do intend to  
write
meaningful network applications do seem concerned, yet we're  
ignored.


I don't hear a public outcry - only a single complainer.


Clay repeatedly pointed out that other people have objected to  
ipaddr and
been ignored.  It's really, really disappointing to see you continue  
to

ignore not only them, but the repeated attempts Clay has made to point
them out.

I don't have time to argue this issue, but I agree with essentially
everything Clay has said in this thread, and I commented about these
problems on the ticket months ago, before ipaddr was added.


Indeed... "Me too" - I've been quietly concerned with these issues,  
but have have not said anything as Clay's postings pretty much cover  
it (and swine flu response is trumping all my other priorities).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issues with Py3.1's new ipaddr

2009-06-02 Thread Andrew McNamara



On 03/06/2009, at 12:39 PM, Guido van Rossum wrote:


I'm disappointed in the process -- it's as if nobody really reviewed
the API until it was released with rc1, and this despite there being a
significant discussion about its inclusion and alternatives months
ago. (Don't look at me -- I wouldn't recognize a netmask if it bit me
in the behind, and I can honestly say that I don't know whether /8
means to look only at the first 8 bits or whether it means to mask off
the last 8 bits.)

I hope we can learn from this.


When including third-party modules into the standard library, we've  
generally only included them after they have broad acceptance in the  
community. In this case, however, it seems that while the ipaddr  
module had acceptance within Google, it had not had much exposure to  
the broader python community. I think if anyone other than Guido had  
proposed adding the module to the standard library, we would not have  
even considered it until it had spent some time standing on it's own  
two feet.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-14 Thread Andrew McNamara

>I believe PEP 3144 is ready for your review.  When you get a chance,
>can you take a look/make a pronouncement?

In my experience it is common to leave out the masked octets when
referring to an IPv4 network (the octets are assumed to be zero), so I
don't agree with this behaviour from the reference implementation:

>>> ipaddr.IPv4Network('10/8')
IPv4Network('0.0.0.10/8')
>>> ipaddr.IPv4Network('192.168/16')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/src/py/ipaddr/ipaddr.py", line 1246, in __init__
raise IPv4IpValidationError(addr[0])
ipaddr.IPv4IpValidationError: '192.168' is not a valid IPv4 address

I also couldn't see an easy way to get from a network address to the
containing network. For example:

>>> ipaddr.IPv4Network('192.168.1.1/16')
IPv4Network('192.168.1.1/16')

This is close:

>>> ipaddr.IPv4Network('192.168.1.1/16').network
IPv4Address('192.168.0.0')

What I want is a method that returns:

IPv4Network('192.168.0.0/16')

I appreciate these requests are somewhat contradictory (one calls
for masked octets to be insignificant, the other calls for them to be
significant), but they are both valid use cases in my experience.

Apologies if these have already been covered in prior discussion -
I've tried to keep up, but I haven't been able to give it the attention
it deserves.

I also note that many methods in the reference implementation are not
discussed in the PEP. While I don't consider this a problem for the PEP,
anyone reviewing the module for inclusion in the standard lib needs to 
consider them.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-15 Thread Andrew McNamara

>>> I don't see any valid reason for entering a network as "192.168.1.1/24"
>>> rather than the canonical "192.168.1.0/24". The former might indicate a
>>> typing error or a mental slip, so let's be helpful and signal it to the
>>> user.
>>
>> Or perhaps there can be an optional "strict=True" (or "strict=False")
>> argument to the constructor / parsing function.
>
>I can live w/ a default of strict=False.  there are plenty of cases
>where it's not an error and easy enough ways to check, if the
>developer is concerned, with or without an option.  eg if addr.ip !=
>addr.network:

I agree - there are definitely times when it is not an error, but I don't
like the idea of a "strict" flag.

I've done a bit of everything - router configs with a national ISP,
scripts to manage host configuration, user interfaces, you name it.
The way I see it, we need:

 * Two address classes that describe a single IP end-point - "Address" with
   no mask and "AddressWithMask" (the later being the current Network
   class, minus the container-like behaviour).

 * A "Network" container-like class. Same as the current Network class,
   but addresses with masked bits would be considered an error.

This is along the lines that RDM was suggesting, except that we remove the
container behaviour from AddressWithMask.

Additionally:

 * The .network attribute on an AddressWithMask would return a Network
   instance.

 * An Address class would not have a .network attribute

 * Network.__contains__() would accept Network, Address and
   AddressWithMask. Only Network implements __contains__ -
   an AddressWithMask can't contain another address, although it's
   .network can.

 * Maybe an Address should compare equal with an AddressWithMask if
   the address is identical and the mask is equivalent to /32?

Personally, I don't see a strong use-case for the list-like indexing and
iteration behaviour - I think it's enough to implement some basic
container behaviour, but I won't object to the iterator and indexing,
provided they don't distort the rest of the design (which I fear they
are doing now). Iterating or indexing a network should return Address
or AddressWithMask instances - if the later, the mask should match the
parent network's mask.

I'm not particularly wedded to the name "AddressWithMask" - maybe it
could be NetworkAddress or MaskedAddress or ?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-16 Thread Andrew McNamara

>R. David Murray wrote:
>
>> A network is conventionally represented by an IP address in which the
>> bits corresponding to the one bits in the netmask are set to zero, plus
>> the netmask.
>
>Okay, that's clarified things for me, thanks.

Put another way, an "Address" describes a single end-point and a "Network"
describes a set of (contiguous) Addresses.

Where things have become confused is that, for practical reasons, it is
convenient to have a representation for an Address and it's containing
Network (the later can be derived from the Address and a mask). We tried
to make the current Network entity do double-duty, but it is just leading
to confusion.

This is why I proprose there be three entities:

 * an Address entity (same as the current one)
 * a Network entity (like now, but requires masked bits to be zero)
 * an AddressWithMask entity (existing Network, but no container behaviour)

There is a school of thought that says we only need a single class
that behaves like the current Network entity - end-points are simply
represented by an all-ones mask. This is, I think, where we started. But
this scheme was rejected.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-16 Thread Andrew McNamara

>> Some people have claimed that the gateway address of a
>> network isn't necessarily the zero address in that network.

It almost never is - conventions vary, but it is often the network address
plus one, or the broadcast address minus one.

>I'll go further: I don't think it's even legal for the gateway address to be
>the zero address of the network (and I used to program the embedded software
>in routers for a living :) ).

I don't think the RFCs forbid the zero address being used, and
"enlightened" network stacks allow it (typically routers) to achieve
better utilisation of the limited IPv4 address space (for a /24 or larger,
wasting one address out of 255 isn't too bad, but it is now typical to
use much smaller nets - right down to /30).

>> If that's true, then you *can't* calculate the network
>> address from a host address and a netmask -- there isn't
>> enough information.

You can always calculate the network address from the IP address plus
mask - the network address is simply the bits that are not masked. 

In the olden days, the mask was spelled out in octets (eg
255.255.255.0). But we've moved to a more compact and logical notation
where the number of leading significant bits is specified (eg /24).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-16 Thread Andrew McNamara

>This proposal actually leads to 6 entities (3 for IPv4 and 3 for IPv6).

Yes, I know - I was just trying to keep to the point.

>It's still unclear to me what is gained by pulling AddressWithMask
>functionality out of the current network classes. It's easy enough for
>the concerned developer who to check if the entered network address
>does actually have all of its host bits set to zero. It is not my
>experience that this behavior is desired so often that having the
>network classes behave as they do now leads to a great deal of
>confusion.

I think we're in a painful middle ground now - we should either go back
to the idea of a single class (per protocol), or make the distinctions
clear (networks are containers and addresses are singletons).

Personally, I think I would be happy with a single class (but I suspect
that's just my laziness speaking). However, I think the structure and
discipline of three classes (per protocol) may actually make the concepts
easier to understand for non-experts.

A particular case in point - if you want to represent a single IP address
with netmask (say an interface), you use a Network class, not an Address
class. And the .network attribute returns a Address class!

The reason I suggest having the Network class assert that masked bits be
zero is two-fold:

 * it ensures the correct class is being used for the job
 * it ensures application-user errors are detected as early as possible

I also suggest the AddressWithMask classes not have any network/container
behaviours for a similar reason. If the developer needs these, the
.network attribute is only a lookup away.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-16 Thread Andrew McNamara

>> I think we're in a painful middle ground now - we should either go back
>> to the idea of a single class (per protocol), or make the distinctions
>> clear (networks are containers and addresses are singletons).
>>
>> Personally, I think I would be happy with a single class (but I suspect
>> that's just my laziness speaking). However, I think the structure and
>> discipline of three classes (per protocol) may actually make the concepts
>> easier to understand for non-experts.
>
>I think this is where we disagree. I don't think the added complexity
>does make it any easier to understand.

I argue that we're not actually adding any complexity: yes, we add
a class (per protocol), but we then merely relocate functionality to
clarify the intended use of the classes.

>> A particular case in point - if you want to represent a single IP address
>> with netmask (say an interface), you use a Network class, not an Address
>> class. And the .network attribute returns a Address class!
>
>Right, and I don't see where the confusion lies.  

I suggest you are too close to the implementation to be surprised by it. 8-)

>You have an address + netmask. ergo, you have a Network object.  

In a common use case, however, this instance will not represent a
network at all, but an address. It will have container-like behaviour,
but it should not (this is a property of networks, not addresses). So
the instance will be misnamed and have behaviours that are, at best,
misleading.

>The single address that defines the base address (most commonly referred
>to as the network address) is an Address object. there is no netmask
>associated with that single address, ergo, it's an Address object.

I would argue that a Network never has a single address - by definition,
it has two or more nodes. A .network attribute should return a Network
instance. If you want the base address, this probably should be called
.base_address or just .address (to parallel the .netmask attribute).

>> The reason I suggest having the Network class assert that masked bits be
>> zero is two-fold:
>>
>> * it ensures the correct class is being used for the job
>> * it ensures application-user errors are detected as early as possible
>>
>> I also suggest the AddressWithMask classes not have any network/container
>> behaviours for a similar reason. If the developer needs these, the
>> .network attribute is only a lookup away.
>
>the problem I have with this approach is that it seems like a long way
>to go for a shortcut (of checking if addr.ip != addr.network: raise
>Error).

This isn't about shortcuts, but about correctness... having the Network
object represent a network, and having Address objects represent
end-points, and having errors discovered as early as possible.

What I'm arguing here is that singletons should not simultaneously be
containers - it's not pythonic, and it leads to ambiguity. The underlying
IP concepts don't require it either: an IP address is a singleton, a
network is a container, and there is no overlap. Yes, an address may be a
member of a network, and having a reference to that network on the address
object is valuable, but the address should not behave like a network.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-16 Thread Andrew McNamara

>Another way to approach this would be for the Address object to
>potentially have a 'network' attribute referencing a Network object.

Yes - that's reasonable.

>Then there are only two classes, but three use cases are covered:
>
>1) a Network
>
>2) a plain, network-agnostic Address with network == None
>
>3) an Address with an attached Network
>
>An Address could be constructed in three ways:
>
>   Address(ip_number)
>
>   Address(ip_number, network = )
>
>   Address(ip_number, mask = )
> # constructs and attaches a suitably-masked Network instance

I think you still need to support the common notations:

Address('10.0.0.1') # .network == None

Address('10.0.0.1/255.255.255.0')
Address('10.0.0.1/24')

>We could also have some_network[n] return an Address
>referring back to the network object it was obtained
>from.

Yes.

(Of course, we're simplifying - there would really be classes for each
protocol).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-16 Thread Andrew McNamara

>> I argue that we're not actually adding any complexity: yes, we add
>> a class (per protocol), but we then merely relocate functionality to
>> clarify the intended use of the classes.
>
>And I argue the moving this functionality to new classes (and adding
>new restrictions to existing classes) doesn't buy anything in the way
>of overall functionality of the module or a developer's ability to
>comprehend intended uses.

It's mostly just minor refactoring and renaming, which I think makes
things clearer, although I agree this is merely an opinion. I would be
interest to hear what others think. To summarise:

 * an address is a singleton (a network endpoint), with no container
   behaviour. It may optionally reference it's network (via the .network
   attribute), .address returns mask-less address. 

 * a network is a container-like object. For consistency, .network should
   return self and raise an exception if the mask conflicts with the
   address, .address returns the base address, .mask returns an address
   object.

>> I would argue that a Network never has a single address - by definition,
>> it has two or more nodes. A .network attribute should return a Network
>> instance. If you want the base address, this probably should be called
>> .base_address or just .address (to parallel the .netmask attribute).
>
>.network is shorthand for network address. are .network_address and
>.broadcast_address less confusing?  I have to say, though,
>.network/.broadcast are fairly common (IPy uses .net, netaddr and ipv4
>use, IIRC .network...)

Yes, I understand your motivation, but I still think it's going to be more
confusing the way you have it.

>> This isn't about shortcuts, but about correctness... having the Network
>> object represent a network, and having Address objects represent
>> end-points, and having errors discovered as early as possible.
>
>Then what I don't see is the purpose of your
>network-only-network-object. essentially identical functionality can
>be obtained with the module as is w/o the added complexity of new
>classes.

Certainly, I'm not talking about adding functionality. What I am
suggesting is that if we wish to have a distinction between networks and
addresses, then that distinction should be clear and strong, such that
the choice of which to use is obvious, and if the wrong one is used,
the error is discovered as early as possible.

As the module stands, we have a pair of address-without-mask classes
called *Address, and a pair of address-with-mask classes called
*Network. So, sometimes when you want to record an *address* you use
a class called Network, and that class comes with a behaviours that
make no sense in the context of a singleton network end-point (it can't
"contain" other addresses, although it's .network can).

Sorry if I sound like a cracked record - these are subtle concepts,
and my ability to explain what I mean is less than is needed, but we'll
get there in the end.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-17 Thread Andrew McNamara

> > As the module stands, we have a pair of address-without-mask classes
> > called *Address, and a pair of address-with-mask classes called
> > *Network. So, sometimes when you want to record an *address* you use
> > a class called Network, and that class comes with a behaviours that
> > make no sense in the context of a singleton network end-point (it can't
> > "contain" other addresses, although it's .network can).
>
>I'm going to consistently use "address" to mean a singleton and
>"network" to mean a container in the following.

Ta. I think it's useful to have a common terminology.

>I still don't see why an address-with-mask is useful, except that the
>network is deducible as {'network': address & mask, 'mask': mask}.  Is
>there *any* other way you would *ever* use that?
>
>It seems to me that for some purposes (implementing dig(1), for
>example), an IPv4Address can contain only the address (ie, a 32-bit
>integer) as a data attribute, and (with methods for using that
>attribute) that is the minimal implementation of IPv4Address.
>
>However, there are other cases (eg, routing) where it's useful to
>associate an address with its network, and I don't see much harm in
>doing so by adding a 'network' attribute to the base class
>IPv4Address, since addresses are hardly useful except in the context
>of networks.  Of course that attribute is often going to be None (eg,
>in implementing dig(1) the remote nameserver is unlikely to tell you
>the netmask).  However, when iterating over an IPv4Network, the
>iterator can automatically fill in the 'network' attribute, and that's
>fairly cheap.

Conceptually, you sometimes need a bare address, and other times,
you need an address with an associated network (host interface
configs, router configs, etc). By AddressWithMask, I really mean
AddressWithEnoughInformationToDeriveNetworkWhenNeeded. Conveniently,
IPv4 and IPv6 addressing allows us to derive the network from the host
address combined with the netmask - in other words, we don't have to attach
a real Network object to Address objects until the user tries to access
it, and then we derive it from the address and mask.

>While to me neither the 'network' attribute nor the iterator behavior
>just described seems amazing useful in the base classes, it seems to
>me that precisely those behaviors will be reinvented over and over
>again for derived classes.  Furthermore they are natural enough that
>they won't bother people who don't need them.  (That's despite at
>least one person (IIRC it was Antoine) firmly saying "an IPv4Address
>should contain exactly one 32-bit int, no more, no less", so I could
>be wrong.)  

If you have a .network attribute on an address object, checking if an
address is in the same network as another address becomes:

addr_a in addr_b.network

As the module stands, you write that as:

addr_a in addr_b

I don't think the intent is as clear with the later.

>It seems to me that the only good reason for not having a
>'network' attribute that contains an IPv4Network instance or None is
>efficiency: the space for the attribute and the overhead of filling it
>in the iterator.  I personally can't think of an application that
>would care (from what I hear, Cisco has no interest in writing its
>routers' IP stacks in Python, amazingly enough), but in theory ...

The implementation already lazily creates most things like this.

>Finally, I agree that using IPv4Network as address-with-mask is a
>confusing, undiscoverable abuse.  In particular, I think that every
>time I went a week without using that idiom, I'd get nervous when I
>saw it again: "Are you *sure* that won't raise an error or silently
>get the lower bits masked off?!  If not now, in the next version?"

Yes.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-17 Thread Andrew McNamara

>On Thu, 17 Sep 2009 10:41:37 am Andrew McNamara wrote:
>> In the olden days, the mask was spelled out in octets (eg
>> 255.255.255.0). But we've moved to a more compact and logical
>> notation where the number of leading significant bits is specified
>> (eg /24).
>
>I hope you're not suggesting the older notation be unsupported? I would 
>expect to be able to use a mask like 255.255.255.192 without having to 
>count bits myself.

No, of course not - I was just explaining the relationship between the
two notations for people who may not have been aware.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-17 Thread Andrew McNamara

>> Conceptually, you sometimes need a bare address, and other times,
>> you need an address with an associated network (host interface
>> configs, router configs, etc). By AddressWithMask, I really mean 
>> AddressWithEnoughInformationToDeriveNetworkWhenNeeded. Conveniently,
>> IPv4 and IPv6 addressing allows us to derive the network from the
>> host address combined with the netmask - in other words, we don't
>> have to attach a real Network object to Address objects until the
>> user tries to access it, and then we derive it from the address and
>> mask.
>
>To clarify: when you say "derive the network" are you talking about the 
>network (which is a container) or the network address = host_address & 
>netmask (which is not a container)? I think you're referring to the 
>later.

I mean a Network object which is a container (which, by definition,
has a network address + mask).

>If there's need for address+netmask, does it need to be a separate 
>class? Perhaps Address objects could simply have a netmask property, 
>defaulting to None. If you need an "address with mask" object, you 
>create an Address and set the mask:
>
>addr = Address(...)
>addr.netmask = "255.255.255.0"

Greg Ewing suggested this yesterday - I'm neutral on whether it's done this
way or as a separate class. The implementation may be somewhat cleaner if
it's a separate class, however.

>> If you have a .network attribute on an address object, checking if an
>> address is in the same network as another address becomes:
>>
>> addr_a in addr_b.network
>>
>> As the module stands, you write that as:
>>
>> addr_a in addr_b
>>
>> I don't think the intent is as clear with the later.
>
>I would find the later completely unclear and disturbing -- how can one 
>address contain another address?

Yes - that's how it works now, and I can only see it resulting in
confusion and bugs for no advantage.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-17 Thread Andrew McNamara

>To a non-specialist, "the network address" is ambiguous. There are many 
>addresses in a network, and none of them are the entire network. It's 
>like saying, given a list [2, 4, 8, 12], what's "the list item"?

A "network address" is an IP address and mask, but I understand your
confusion - we're mixing terminology from disperate domains. In my
postings, I have tried to refer to Network (a containter) and Address
(an item).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] conceptual clarity

2009-09-17 Thread Andrew McNamara

>off to patch the pep and implement some of the non controversial changes.

It might be a good idea to add some use-cases to the PEP.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] conceptual clarity

2009-09-17 Thread Andrew McNamara

>Again, the same error-catching functionality can be obtained through
>an option to the constructor. network and broadcast attributes can be
>renamed to .\1_address to alleviate confusion as well.
>
>I mentioned before that IPy's insistence on receiving masked out
>networks was one of the main reasons I wrote ipaddr to begin with.
>Having ipaddr mimic this behavior would make it significantly less
>useful. Removing functionality in the name of avoiding confusion
>doesn't make sense when the same confusion can be alleviated w/o the
>loss.

The issue is bigger than error checking - I'm maintaining that a
distinction between an Address (singleton, item) and a Network (Container)
is useful and should be embraced. The current implementation has already
partially gone this route, but hasn't completed the transition, and
this does not give the structure to users that it could - there's an
obligation on modules in the standard library to provide leadership and
clarity without being dictatorial. They are essentially silent mentors.

So, while I am not suggesting we build a bondage and discipline machine,
I am suggesting that partitioning the functionality differently will
result in a better outcome all round.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] conceptual clarity

2009-09-17 Thread Andrew McNamara

>> It might be a good idea to add some use-cases to the PEP.
>
>There are several use-cases in the PEP already.

Maybe the use-cases deserve their own section in the PEP, or better yet,
be pulled up into the Motivation section.

>The problem is, for every use-case where one can show that the
>existing implementation is confusing, I can come up with a use-case
>showing where the existing implementation makes more sense than
>anything proposed.

Uh, I don't think that is the intention of use-cases - they're there
to inform the design, rather than to show how a specific implementation
can be used.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-17 Thread Andrew McNamara

>On Fri, 18 Sep 2009 11:04:46 am Andrew McNamara wrote:
>> >To a non-specialist, "the network address" is ambiguous. There are
>> > many addresses in a network, and none of them are the entire
>> > network. It's like saying, given a list [2, 4, 8, 12], what's "the
>> > list item"?
>>
>> A "network address" is an IP address and mask, but I understand your
>> confusion - we're mixing terminology from disperate domains. In my
>> postings, I have tried to refer to Network (a containter) and Address
>> (an item).
>
>So to clarify, how many different things which need to be handled are 
>there?
>
>Items:
>1 IP address  -- a 32 bit (IPv4) or 128 bit (IPv6) number

Yes.

>2 Netmask -- a bit mask of the form 111..100..0

I don't think there's much to be gained by exposing a Netmask object,
although other objects might have a .netmask property returning an
IPAddress instance. Where we expose a netmask, it should be as an
Address instance (or maybe a subclass with additional restrictions).

>3 Network address -- the lowest address in a network, and equal 
>  to (defined by?) the bitwise-AND of any address in the network 
>  with the network's netmask

This idea of a "network address" being simply an IP address is in error - 
a network address was always an address and a mask, however in the
days prior to CIDR, the mask was implicitly specified by the class of
the network.

>4 Host address -- the part of the IP address that is not masked 
>  by the netmask

Well, yes, but I don't think we need an entity representing that.

>5 Broadcast address -- the highest address in a IPv4 network

Yes, but again, we don't need an entity - as with the netmask, when
exposed, it should just be an Address instance (or subclass thereof).

>Containers:
>6 Network -- a range of IP address

Yes, although not an arbitrary or discontinuous range of address.

Really, I think we just need two entities (per protocol):

 Address (& maybe AddressWithMask)

   * If no mask is specified, this is just the IP address.
   * If a mask is specified, then it gains a .network property returning a
 Network instance. It probably should also have a .netmask property
 containing an Address instance.

 Network

   * Has an IP address with netmask 
   * for consistency's sake, masked address bits are not allowed
   * behaves like a read-only container wrt Addresses

So, you want to represent an interface on your host:

  >>> if_addr = IPv4Address('10.0.0.1/24')

from this, you could get:

  >>> if_addr.address
  IPv4Address('10.0.0.1')
  >>> if_addr.netmask
  IPv4Address('255.255.255.0')
  >>> if_addr.broadcast
  IPv4Address('10.0.0.255')
  >>> if_addr.network
  IPV4Network('10.0.0.0/24')

you might also have an address for the default gateway:

  >>> router_addr = IPv4Address('10.0.0.254/24')
  >>> router_addr in if_addr.network
  True

or:

  >>> router_addr = IPv4Address('10.0.0.254')
  >>> router_addr in if_addr.network
  True

Or maybe you've subneted your LAN:

  >>> IPV4Network('10.0.0.0/24') in IPv4Network('10.0.0.0/8')
  True
  >>> IPV4Network('10.0.1.0/24') in IPv4Network('10.0.0.0/8')
  True

but:

  >>> IPV4Network('10.0.0.0/8') in IPv4Network('10.0.0.0/24')
  False

This suggests the natural behaviour if the Address mask doesn't fit in the
network:

  >>> IPv4Address('10.0.0.254/8') in IPv4Network('10.0.0.0/24')
  False

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review, and the inclusion process

2009-09-28 Thread Andrew McNamara

>I've never said otherwise. In fact, from an email last night, "If what
>the community requires is the library you've described, then ipaddr is
>not that library." The changes *you* require make ipaddr significantly
>less useful to me. I'm not prepared to make those changes in an
>attempt seek acceptance to the stdlib, especially if the stdlib is in
>such flux that I'll get to do this again in 18 months.

The point is that, having brought it to us, we all now have an interest
in the outcome. Whatever goes into the standard library is going to be
something that we have to live with for a long time, and now is our best
chance to shape the result.

I understand your concern over introducing more classes, however I still
feel my suggested functional decomposition is worth that cost because
I consider the behaviour of my suggested classes to be more intuitive.
I should mention that I am not a computer scientist, and none of this is
motivated by a desire for theoretical purity - just practical experience.

One of my concerns now is that if a code block receives an IPv4Network
instance, it does not know whether this represents a host address
with mask, or a network address. In some contexts, this distinction is
critical, and confounding them can result in delayed error reporting or
erroneous behaviour. Your addition of a strict flag does not completely
address this concern as it assumes the instantiation and use occur in
proximity, which is often not the case in large projects.

I suspect you are also mistaken in thinking my proposed changes make
the module less useful for you - maybe you can describe the problem as
you see it?


As a reminder to people who have come late to this thread, I proposed three
classes per protocol:

IPv?Address
A single address

IPv?AddressWithMask
A single address with implied IPv?Network

IPv?Network
A container-like network address (with strict mask parsing)

Further:

 * Comparisons between classes should be disallowed. 

 * The IPv?AddressWithMask class would have a .address and .mask attributes
   containing IPv?Addresses, and a .network attribute for the containing
   network (as specified by the mask, and lazily constructed).

 * The IPv?Network class would have similar .address and .mask attributes.

In cases where you want to allow lax specification of network addresses,
this would be spelt:

IPv?AddressWithMask(some_address).network

At first glance, this seems somewhat round-about, however it makes explicit
the potential loss of bits.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Andrew McNamara

>> On the other hand, it is dangerous to provide a polymorphic API which
>> does that more extensive parsing, because a less than paranoid
>> programmer will have very likely allowed the parsed components to
>> escape from the context where their encodings can be reliably
>> determined. =A0Remember, *it is unlikely that they will ever be punished
>> for their own lack of caution.* =A0The person who is doomed is somebody
>> who tries to take that code and reuse it in a different context.
>
>Yeah, that's the original reasoning that had me leaning towards the
>parallel API approach. If I seem to be changing my mind a lot in this
>thread it's because I'm genuinely torn between the desire to make it
>easier to port existing 2.x code to 3.x by making the current API
>polymorphic and the fear that doing so will reintroduce some of the
>exact same bytes/text confusion that the bytes/str split is trying to
>get rid of.

I don't think polymorphic API's do anyone any favours in the long
run. My experience of the Py2 email API was that it would give the
developer false comfort, only to blow up when the app was in the hands
of users, and it didn't seem to matter how careful I was. Py3 has gone
the pure/strict route in the core, and I think libs should be consistent
with that choice.  Developers will have work a little harder, but there
will be less surprises.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3000 upgrade path

2007-03-11 Thread Andrew McNamara

>I wrote two versions of the dict views refactoring. One that turns
>d.keys() into list(d.keys()) and d.iterkeys() into iter(d.keys()).
>This one is pretty robust except if you have classes that emulate
>2.x-style dicts. But it generates ugly code. So I have a
>"light-weight" version that leaves d.keys() alone, while turning
>d.iterkeys() into d.keys(). This generates prettier code but more
>buggy. I probably should have used the heavy-duty one instead.

The ugliness is a virtue in this case as it stands out enough to motivate
developers to review each case. The pretty/efficient version is tantamount
to guessing, and effectively discards information in the transformation
("here be dragons").

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] These csv test cases seem incorrect to me...

2007-03-11 Thread Andrew McNamara

>I decided it would be worthwhile to have a csv module written in Python (no
>C underpinnings) for a number of reasons:

Several other people have already done this. I will forward you their
e-mail address in a separate private e-mail.

>I'm far from having anything which will pass the current test suite, but in
>diagnosing some of my current failures I noticed a couple test cases which
>seem wrong.  In the TestDialectExcel class I see these two questionable
>tests:
>
>def test_quotes_and_more(self):
>self.readerAssertEqual('"a"b', [['ab']])
>
>def test_quote_and_quote(self):
>self.readerAssertEqual('"a" "b"', [['a "b"']])
[...]
>Any ideas about why these test cases are in there?  I can't imagine Excel
>generating either one.

The point was to produce the same results as Excel. Sure, Excel probably
doesn't generate crap like this itself, but 3rd parties do, and people
complain if we don't parse it just like Excel (sigh).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Csv] These csv test cases seem incorrect to me...

2007-03-11 Thread Andrew McNamara

>IMHO these test cases are *WRONG* and it's a worry that they "work" with 
>the current csv module :-(

Those tests are not "wrong" - they verify that we produce the same result
as Excel when presented with those inputs, which was one of the design
goals of the module (and is an important consideration for many of it's
users).

While you might find the Excel team's choices bizare, they are stable,
and in the absence of a formal specification for "CSV", Excel's behaviour
is what most users want and expect.

If you feel like extending the parser to optionally accept some other
format, I have no problem. If you want to make this format the default,
make sure you stick around to answer all the angry e-mail from users.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Fwd: PEP 0305 (small problem with the CSV reader)]

2007-03-27 Thread Andrew McNamara

>First of all, let me say thank you for the CSV module.

Thanks.

>I've been using it and today is the first time I hit a minor bump in the road.
>What happened is I opened this file with genome annotations with a
>long field and the error "field larger than field limit" showed up.
>From what I can see it is in the "static int parse_add_char(ReaderObj
>*self, char c)" function.
>This function uses the static long field_limit = 128 * 1024;   /* max
>parsed field size */
>I'm not sure if this is supposed to be recomputed or if there is
>something I need to do to change it, but for right now it just says my
>row is bigger than 131,072 and stops.
>I don't think Python 2.5 has any such string length limitations and
>this shouldn't be.

This limit was added back in January 2005 to provide some protection
against the situation where the parser is returning fields directly from
a file, and the file contains a mismatched quote character: this would
otherwise result in the entire file being unexpectedly read into memory.

You can change the limit with the csv.field_size_limit() method. As
you note, it defaults to 128K, but you can set it to anything up to
(2**31)-1 or 2147483647 (about 2 billion).

BTW, I've taken the liberty of CC'ing this to the python-dev list, so
the motivation for this feature is recorded - it caused me some head
scratching, and I added it.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Summary of Tracker Issues

2007-05-14 Thread Andrew McNamara

>> I think a single-click button "Spammer"
>> should allow committers to lock an account and hide all messages
>> and files that he sent, but that still requires somebody to implement
>> it.
>
>I'd expect that to be pretty effective -- like graffiti artists,
>spammers want their work to be seen, and a site that quickly removes
>them will not be worth the effort for them.

Unfortunately, the spammers are using automated tools to locate,
register on and post to victim sites. The tools are distributed (running
on compromised PCs) and massively parallel, so they really don't care
that some of their posts are never seen.

I'm reluctant to mention the name of one particular tool I'm aware
of, but as well as the above, it also has OCR to defeat CAPTCHA, and
automatically creates throw-away e-mail accounts with a range of free
web-mail providers for registration purposes.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Summary of Tracker Issues

2007-05-16 Thread Andrew McNamara

>Typically spammers don't go through the effort to do a custom login 
>script for each different site. Instead, they do a custom login script 
>for each of the various software applications that support end-user 
>comments. So for example, there's a script for WordPress, and one for 
>PHPNuke, and so on.

In my experience, what you say is true - the bulk of the spam comes via
generic spamming software that has been hard-coded to work with a finite
number of applications. 

However - once you knock these out, there is still a steady stream of
what are clearly human generated spams. The mind boggles at the economics
or desperation that make this worthwhile.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Summary of Tracker Issues

2007-05-16 Thread Andrew McNamara

>> However - once you knock these out, there is still a steady stream of
>> what are clearly human generated spams. The mind boggles at the economics
>> or desperation that make this worthwhile.
>
>Actually, it doesn't cost that much, because typically the spammer can 
>trick other humans into doing their work for them.
>
>Here's a simple method: Put up a free porn site, with a front page that 
>says "you must be 18 or older to enter". The page also has a captcha to 
>verify that you are a real person. But here's the trick: The captcha is 
>actually a proxy to some other site that the spammer is trying to get 
>access to. When the human enters in the correct word, the spammer's 
>server sends that word to the target site, which result in a successful 
>login/registration. Now that the spammer is in, they can post comments 
>or whatever they need to do.

Yep - I was aware of this trick, but the ones I'm talking about have also
got through filling out questionnaires, and whatnot. Certainly the same
technique could be used, but my suspicion is that real people are being
paid a pittance to sit in front of a PC and spam anything that moves.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Calling back into python from C

2007-07-24 Thread Andrew McNamara

I realise I'm going to get slapped for asking a userish question here -
sorry in advance.  I'm looking for an explanation for why things are the
way they are, the doco and py source aren't providing the missing info,
and it looks like I'm bumping into an old Python bug (fixed in r38830
by mwh on 2005-04-18).

I'm working on an C extension that needs to call back into python.
Generally the GIL has been released when I need to do the callback,
but I can't be sure. So I need to save the GIL state, get the lock,
then restore it at the end.

As far as I can tell from the doco, the recommended way to do this is to
use PyGILState_Ensure() and PyGILState_Release(), but prior to r38830,
PyGILState_Release incorrectly used PyEval_ReleaseThread when it
should have been using PyEval_SaveThread() (I think), and the result is
SEGV. This poses a problem, as I need to support Python versions back
to 2.3.

Am I correct in using PyGILState_Ensure() and PyGILState_Release()? If
so, how do I support back to Py 2.3? Copy the current fixed
PyGILState_Release() into my code (ick)?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python developers are in demand

2007-10-15 Thread Andrew McNamara

>I wonder if we should start maintaining a list of Python developers
>for hire somewhere on python.org, beyond the existing Jobs page. Is
>anyone interested in organizing this?

What about something a little less formal - a mailing list such as
python-jobs?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] XML codec?

2007-11-12 Thread Andrew McNamara

>On Nov 12, 2007, at 8:16 AM, M.-A. Lemburg wrote:
>> We have a -1 from Martin and a +1 from Walter, Guido and myself.
>> Pretty clear vote if you ask me. I'd say we end the discussion here
>> and move on.
>
>If we're counting, you've got a -1 on the codec from me as well.   
>Martin's right: there's no value to embedding the logic of auto- 
>detection into the codec.  A function somewhere in the xml package is  
>all that's warranted.

I agree with Fred here - it should be a function in the xml package,
not a codec. -1

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pkgutil, pkg_resource and Python 3.0 name space packages

2008-01-07 Thread Andrew McNamara

>The best existing indicator we have is the organization of the docs for
>the standard library. I, for one, have a hell of a difficult time finding
>modules via the "organized" table of contents in the Library Reference.
>Instead, I always go the the Global Module Index where the somewhat flat
>namespace makes it easy to go directly to the module of interest. I'm
>curious whether the other developers have had the same experience -- if
>so, then it is a bad omen for over-organizing the standard library.

I nearly always use my browser's search function to find the module of
interest, so yes, I'm effectively using a flat namespace.

>Another indicator of what lies ahead is the current organization of os vs
>os.path.  While that split-out was well done and necessary, I routinely
>have difficulty remembering which one contains a function of interest.  

I mostly remember, but there are some notable exceptions: exists (posix
system call, expect to find it in os), walk (which is the old deprecated
one? have to check doc).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Monkeypatching idioms -- elegant or ugly?

2008-01-20 Thread Andrew McNamara

>I think that despite the objection that monkeypatching shoudn't be
>made too easy, it's worth at looking into a unification of the API,
>features, and implementation.

I agree. The other virtue of having it in the standard library is that
it's immediately recognisable for what it is.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-3000] Removing bsddb module from py3k (was Re: No beta2 tonight)

2008-07-20 Thread Andrew McNamara

>But sqlite is transactional, can offer cursors, getrange, etc., etc.
>
>I'm still curious as to what deep features people are using in bsddb.

It's not using "deep features", unless you define their on-disk layout
as deep, but it does get used for things such as interactions with other
systems - for example, using it to maintain Radius user databases for a
(proprietary/commercial) Radius auth daemon. But dropping it from the
core won't stop this.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Csv] skipfinalspace

2008-10-19 Thread Andrew McNamara

>>>I downloaded the 2.6 source tar ball, but is it too late for new
>>>features to get into versions <3?
>>
>> Yep.

Sigh - I should slow down and actually read the e-mail I'm replying
to. It is not too late to get features into versions <3. It is, however,
too late to get features into 2.6, which was not what you asked, but
what I was answering "Yep" to.

>>>How would you feel about adding the following tests to
>>>Lib/test/test_csv.py and getting them to pass?

I have no real objection to someone adding a skipfinalspace parameter and
associated tests, although I have no time to do it myself at the moment.

>> >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says
>> >"*skipinitialspace *When True, whitespace immediately following the
>> >delimiter is ignored."
>> >but my tests show whitespace at the start of any field is ignored,
>> >including the first field.
>>
>> I suspect (but I haven't checked) that it means "after the delimiter and
>> before any quoted field (or some variation on that).
>
>I agree that whitespace after the delimiter and before any quoted field is
>skipped. Also whitespace after the start of the line and before any quoted
>field is skipped.

I'm not sure if we're talking about the same thing - it seems to work as I
expect it to work:

>>> list(csv.reader([' foo, bar']))
[[' foo', ' bar']]
>>> list(csv.reader([' foo, bar'], skipinitialspace=1))
[['foo', 'bar']]

BTW, I think the reason "skipinitialspace" exists at all is to support
this:

>>> list(csv.reader([' foo, " bar"']))
[[' foo', ' " bar"']]
>>> list(csv.reader([' foo, " bar"'], skipinitialspace=1))
[['foo', ' bar']]

The quoting is only valid if the quote is the first character encountered
in the field (this is how Excel works). However, some other CSV generators
insert a space after the comma, and expect the parser to still treat it
as a quoted field - so skipinitialspace eats the space leading up the
quote, but does not eat any space after the quote (hence the "initial"
in the name).

For symmetry, a "skipfinalspace" option should do the same - only eat
space after the quote (if quotes are used) - however this will be rather
hard to implement as the parser state has already rolled on, and you
no longer know that whether the field was quoted. Eating spaces that
appeared within the quotes is the wrong thing to do.

>skipinitialspace defaults to false and by the same logic skipfinalspace
>should default to false to preserve compatibility with the csv module in
>2.6. On the other hand, the switch to version 3 is as good a time as any to
>break backwards compatibility to adopt something that works better for new
>users.

No, by default it needs to work like Excel, because this is the defacto
standard.

>Based on my experience parsing several hundred csv generated by many
>different people I think it would be nice to at least have a dialect that is
>excel + skipinitialspace=True + skipfinalspace=True.

Once the "skipfinalspace" parameter is implemented, there is nothing
stopping you creating such a dialect in your code, but I don't support
adding it to the standard library - the dialects in the std lib should
be well defined (in some way).

BTW, it's not necessary to create dialect objects: as I've done above,
users can pass keyword parameters to the parser if it's more convenient.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python2.5 _sre deepcopy regression?

2008-10-22 Thread Andrew McNamara

In version of Python prior to 2.5, it would appear that deepcopying
compiled regular expressions worked by accident:

2.4:

>>> copy.deepcopy(re.compile(''))
<_sre.SRE_Pattern object at 0xb7d53ef0>

2.5:

>>> copy.deepcopy(re.compile(''))
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.5/copy.py", line 173, in deepcopy
y = copier(memo)
TypeError: cannot deepcopy this pattern object

I say "by accident", since the SRE_Pattern object in 2.4 has
a __deepcopy__ method which raises the "cannot deepcopy this pattern
object" TypeError, however this method isn't found by copy.deepcopy()
in 2.4, and copy.deepcopy() falls back to using the pickle logic.

The _sre source has #ifdef-out support for __deepcopy__, issue 416670
has the gory details:

http://bugs.python.org/issue416670

Changeset 38430 on the release24-maint branch introduced the changes
that stopped SRE_Pattern.__deepcopy__ being found. r38430 was a patch
forward ported from 2.3, but never ported to the trunk (probably a good
thing, too).

Thoughts?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python2.5 _sre deepcopy regression?

2008-11-03 Thread Andrew McNamara

 I posted this week ago, but haven't seen any comments. Issue
416670 is probably the most relevent ticket.

The buggy changeset I mention, 38430 on the release24-maint branch is
one that had been forward and back-ported for a while. I haven't found
the motivation for that change, but it hasn't been applied to any version
of Python later than 2.4.

>In version of Python prior to 2.5, it would appear that deepcopying
>compiled regular expressions worked by accident:
>
>2.4:
>
>>>> copy.deepcopy(re.compile(''))
><_sre.SRE_Pattern object at 0xb7d53ef0>
>
>2.5:
>
>>>> copy.deepcopy(re.compile(''))
>Traceback (most recent call last):
>  File "", line 1, in 
>  File "/usr/lib/python2.5/copy.py", line 173, in deepcopy
>y = copier(memo)
>TypeError: cannot deepcopy this pattern object
>
>I say "by accident", since the SRE_Pattern object in 2.4 has
>a __deepcopy__ method which raises the "cannot deepcopy this pattern
>object" TypeError, however this method isn't found by copy.deepcopy()
>in 2.4, and copy.deepcopy() falls back to using the pickle logic.
>
>The _sre source has #ifdef-out support for __deepcopy__, issue 416670
>has the gory details:
>
>http://bugs.python.org/issue416670
>
>Changeset 38430 on the release24-maint branch introduced the changes
>that stopped SRE_Pattern.__deepcopy__ being found. r38430 was a patch
>forward ported from 2.3, but never ported to the trunk (probably a good
>thing, too).
>
>Thoughts?
-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Py2.4 _sre uses uninitialised memory (Bug 1088891)

2004-12-21 Thread Andrew McNamara

_sre.c, data_stack_grow() in Py2.4 uses realloc()'ed memory without
initialising the newly allocated memory. For complex regexps that require
additional sre stack space, this ultimately results in a core dump or
corrupted heap. Filling the newly allocated memory with 0x55 makes the
problem more obvious (dies on a reference to 0x5558) for me.

See bug ID 1088891:


http://sourceforge.net/tracker/index.php?func=detail&aid=1088891&group_id=5470&atid=105470

Can I be the only person who crafts diabolical regexps? Here, have a
lend of my brown paper bag...

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] csv module TODO list

2005-01-04 Thread Andrew McNamara

There's a bunch of jobs we (CSV module maintainers) have been putting
off - attached is a list (in no particular order): 

* unicode support (this will probably uglify the code considerably).

* 8 bit transparency (specifically, allow \0 characters in source string
  and as delimiters, etc).

* Reader and universal newlines don't interact well, reader doesn't
  honour Dialect's lineterminator setting. All outstanding bug id's
  (789519, 944890, 967934 and 1072404) are related to this - it's 
  a difficult problem and further discussion is needed.

* compare PEP-305 and library reference manual to the module as implemented
  and either document the differences or correct them.

* Address or document Francis Avila's issues as mentioned in this posting:

http://www.google.com.au/groups?selm=vsb89q1d3n5qb1%40corp.supernews.com

* Several blogs complain that the CSV module is no good for parsing
  strings. Suggest making it clearer in the documentation that the reader
  accepts an iterable, rather than a file, and document why an iterable
  (as opposed to a string) is necessary (multi-line records with embedded
  newlines). We could also provide an interface that parses a single
  string (or the old Object Craft interface) for those that really feel
  the need. See:

http://radio.weblogs.com/0124960/2003/09/12.html
http://zephyrfalcon.org/weblog/arch_d7_2003_09_06.html#e335

* Compatability API for old Object Craft CSV module?

http://mechanicalcat.net/cgi-bin/log/2003/08/18

  For example: "from csv.legacy import reader" or something.

* Pure python implementation? 

* Some CSV-like formats consider a quoted field a string, and an unquoted
  field a number - consider supporting this in the Reader and Writer. See:

http://radio.weblogs.com/0124960/2004/04/23.html

* Add line number and record number counters to reader object?

* it's possible to get the csv parser to suck the whole source file
  into memory with an unmatched quote character. Need to limit size of
  internal buffer.

Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should
already have been addressed):

* remove TODO comment at top of file--it's empty
* is CSV going to be maintained outside the python tree?
  If not, remove the 2.2 compatibility macros for:
 PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc.
* inline the following functions since they are used only in one place
get_string, set_string, get_nullchar_as_None, set_nullchar_as_None,
join_reset (maybe)
* rather than use PyErr_BadArgument, should you use assert?
(first example, Dialect_set_quoting, line 218)
* is it necessary to have Dialect_methods, can you use 0 for tp_methods?
* remove commented out code (PyMem_DEL) on line 261
Have you used valgrind on the test to find memory overwrites/leaks?
* PyString_AsString()[0] on line 331 could return NULL in which case
you are dereferencing a NULL pointer
* note sure why there are casts on 0 pointers
lines 383-393, 733-743, 1144-1154, 1164-1165
* Reader_getiter() can be removed and use PyObject_SelfIter()
* I think you need PyErr_NoMemory() before returning on line 768, 1178
* is PyString_AsString(self->dialect->lineterminator) on line 994
guaranteed not to return NULL?  If not, it could crash by
passing to memmove.
* PyString_AsString() can return NULL on line 1048 and 1063, 
the result is passed to join_append()
* iteratable should be iterable?  (line 1088)
* why doesn't csv_writerows() have a docstring?  csv_writerow does
* any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE
* csv_unregister_dialect, csv_get_dialect could use METH_O 
so you don't need to use PyArg_ParseTuple
* in init_csv, recommend using 
PyModule_AddIntConstant and PyModule_AddStringConstant
where appropriate

Also, review comments from Jeremy Hylton, 10 Apr 2003:

I've been reviewing extension modules looking for C types that should
participate in garbage collection.  I think the csv ReaderObj and
WriterObj should participate.  The ReaderObj it contains a reference to
input_iter that could be an arbitrary Python object.  The iterator
object could well participate in a cycle that refers to the ReaderObj.
The WriterObj has a reference to a writeline callable, which could well
be a method of an object that also points to the WriterObj.

The Dialect object appears to be safe, because the only PyObject * it
refers should be a string.  Safe until someone creates an insane string
subclass <0.4 wink>.

Also, an unrelated comment about the code, the lineterminator of the
Dialect is managed by a collection of little helper functions like
get_string, set_string, etc.  This code appears to be excessively
general; since they're called only once, it seems clearer to inline the
log

Re: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-04 Thread Andrew McNamara

>Andrew> There's a bunch of jobs we (CSV module maintainers) have been
>Andrew> putting off - attached is a list (in no particular order):
>...
>
>In addition, it occurred to me this evening that there's functionality in
>the csv module I don't think anybody uses.  

It's very difficult to say for sure that nobody is using it once it's
released to the world.

>For example, you can register CSV dialects by name, then pass in the
>string name instead of the dialect class.  I'd be in favor of scrapping
>list_dialects, register_dialect and unregister_dialect altogether.  While
>they are probably trivial little functions I don't think they add much if
>anything to the implementation and just complicate the _csv extension
>module slightly.  

Yes, in hindsight, they're not really necessary, although I'm sure we
had some motivation for them initially. That said, they're there now,
and they shouldn't require much maintenance.

>I'm also not aware that anyone really uses the Sniffer class, though it
>does provide some useful functionality should you need to analyze random
>CSV files.

The comment I get repeatedly is that they don't use it because it's
"too magic/scary". That's as it should be. But if it didn't exist,
then someone would be requesting we add it... 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] csv module TODO list

2005-01-05 Thread Andrew McNamara

>> Andrew McNamara wrote:
>>> There's a bunch of jobs we (CSV module maintainers) have been putting
>>> off - attached is a list (in no particular order):
>>> * unicode support (this will probably uglify the code considerably).
>> 
>Martin v. Löwis wrote:
>> Can you please elaborate on that? What needs to be done, and how is
>> that going to be done? It might be possible to avoid considerable
>> uglification.

I'm not altogether sure there. The parsing state machine is all written in
C, and deals with signed chars - I expect we'll need two versions of that
(or one version that's compiled twice using pre-processor macros). Quite
a large job. Suggestions gratefully received.

M.-A. Lemburg wrote:
>Indeed. The trick is to convert to Unicode early and to use Unicode
>literals instead of string literals in the code.

Yes, although it would be nice to also retain the 8-bit versions as well.

>Note that the only real-life Unicode format in use is UTF-16
>(with BOM mark) written by Excel. Note that there's no standard
>for specifying the encoding in CSV files, so this is also the only
>feasable format.

Yes - that's part of the problem I hadn't really thought about yet - the
csv module currently interacts directly with files as iterators, but it's 
clear that we'll need to decode as we go.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] csv module TODO list

2005-01-05 Thread Andrew McNamara

>> Yes, although it would be nice to also retain the 8-bit versions as well.
>
>You can do so by using latin-1 as default encoding. Works great !

Yep, although that means we wear the cost of decoding and encoding for
all 8 bit input.

What does the _sre.c code do?

>Depends on your needs: CSV files tend to be small enough
>to do the decoding in one call in memory.

We are routinely dealing with multi-gigabyte csv files - which is why the
original 2001 vintage csv module was written as a C state machine. 

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] csv module TODO list

2005-01-05 Thread Andrew McNamara

>> Yep, although that means we wear the cost of decoding and encoding for
>> all 8 bit input.
>
>Right, but it makes the code very clean and straight forward.

I agree it makes for a very clean solution, and 99% of the time I'd
chose that option.

>Again, it depends on what you need. If performance is critical
>then you probably need a C version written using the same trick
>as _sre.c...
>
>> What does the _sre.c code do?
>
>It comes in two versions: one for 8-bit the other for Unicode.

That's what I thought. I think the motivations here are similar to those
that drove the _sre developers.

>> We are routinely dealing with multi-gigabyte csv files - which is why the
>> original 2001 vintage csv module was written as a C state machine. 
>
>I see, but are you sure that the typical Python user will have
>the same requirements to make it worth the effort (and
>complexity) ?

This is open source, so I scratch my own itch (and that of my employers) - 
we need fast csv parsing more than we need unicode... 8-)

Okay, assuming we go the "produce two versions via evil macro tricks"
path, it's still not quite the same situation as _sre.c, which only has
to deal with the internal unicode representation.

One way to approach this would be to add an "encoding" keyword argument
to the readers and writers. If given, the parser would decode the input
stream to the internal representation before passing it through the
unicode state machine, which would yield tuples of unicode objects.

That leaves us with a bit of a problem where the source is already unicode
(eg, a list of unicode strings)... hmm.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Re: [Csv] csv module TODO list

2005-01-05 Thread Andrew McNamara

>Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should
>already have been addressed):

I should apologise to Neal here for not replying to him at the time.

Okay, going though the issues Neal raised...

>* remove TODO comment at top of file--it's empty

Was fixed.

>* is CSV going to be maintained outside the python tree?
>  If not, remove the 2.2 compatibility macros for:
> PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc.

Does anyone thing we should continue to maintain this 2.2 compatibility?

>* inline the following functions since they are used only in one place
>get_string, set_string, get_nullchar_as_None, set_nullchar_as_None,
>join_reset (maybe)

It was done that way as I felt we would be adding more getters and
setters to the dialect object in future.

>* rather than use PyErr_BadArgument, should you use assert?
>(first example, Dialect_set_quoting, line 218)

You mean C assert()? I don't think I'm really following you here -
where would the type of the object be checked in a way the user could
recover from?

>* is it necessary to have Dialect_methods, can you use 0 for tp_methods?

I was assuming I would need to add methods at some point (in fact, I did
have methods, but removed them).

>* remove commented out code (PyMem_DEL) on line 261
>Have you used valgrind on the test to find memory overwrites/leaks?

No, valgrind wasn't used.

>* PyString_AsString()[0] on line 331 could return NULL in which case
>you are dereferencing a NULL pointer

Was fixed.

>* note sure why there are casts on 0 pointers
>lines 383-393, 733-743, 1144-1154, 1164-1165

To make it easier when the time comes to add one of those members.

>* Reader_getiter() can be removed and use PyObject_SelfIter()

Okay, wasn't aware of PyObject_SelfIter - will fix.

>* I think you need PyErr_NoMemory() before returning on line 768, 1178

The examples I looked at in the Python core didn't do this - are you sure?
(now lines 832 and 1280). 

>* is PyString_AsString(self->dialect->lineterminator) on line 994
>guaranteed not to return NULL?  If not, it could crash by
>passing to memmove.
>* PyString_AsString() can return NULL on line 1048 and 1063, 
>the result is passed to join_append()

Looking at the PyString_AsString implementation, it looks safe (we ensure
it's really a string elsewhere)?

>* iteratable should be iterable?  (line 1088)

Sorry, I don't know what you're getting at here? (now line 1162).

>* why doesn't csv_writerows() have a docstring?  csv_writerow does

Was fixed.

>* any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE

Was fixed.

>* csv_unregister_dialect, csv_get_dialect could use METH_O 
>so you don't need to use PyArg_ParseTuple

Was fixed.

>* in init_csv, recommend using 
>PyModule_AddIntConstant and PyModule_AddStringConstant
>where appropriate

Was fixed.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Re: csv module TODO list

2005-01-05 Thread Andrew McNamara

>Quite a while ago I posted some material to the csv-list about
>problems using the csv module on Unix-style colon-separated files --
>it just doesn't deal properly with backslash escaping and is quite
>useless for this kind of file. I seem to recall the general view was
>that it wasn't intended for this kind of thing -- only the sort of csv
>that Microsoft Excel outputs/inputs, but if I am mistaken about this,
>perhaps fixing this issue might be put on the TODO-list? I'll be happy
>to re-send or summarize the relevant emails, if needed.

I think a related issue was included in my TODO list:

>* Address or document Francis Avila's issues as mentioned in this posting:
>
>http://www.google.com.au/groups?selm=vsb89q1d3n5qb1%40corp.supernews.com

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] csv module TODO list

2005-01-05 Thread Andrew McNamara

>>>>Can you please elaborate on that? What needs to be done, and how is
>>>>that going to be done? It might be possible to avoid considerable
>>>>uglification.
>> 
>> I'm not altogether sure there. The parsing state machine is all written in
>> C, and deals with signed chars - I expect we'll need two versions of that
>> (or one version that's compiled twice using pre-processor macros). Quite
>> a large job. Suggestions gratefully received.
>
>I'm still trying to understand what *needs* to be done - I would move to
>how this is done only later. What APIs should be extended/changed, and
>in what way?

That's certainly the first step, and I have to admit that I don't have
a clear idea at this time - the unicode issue has been in the "too hard"
basket since we started.

Marc-Andre Lemburg mentioned that he has encountered UTF-16 encoded csv
files, so a reasonable starting point would be the ability to read and
parse, as well as the ability to generate, one of these.

The reader interface currently returns a row at a time, consuming as many
lines from the supplied iterable (with the most common iterable being
a file). This suggests to me that we will need an optional "encoding"
argument to the reader constructor, and that the reader will need to
decode the source lines. That said, I'm hardly a unicode expert, so I
may be overlooking something (could a utf-16 encoded character span a
line break, for example).  The writer interface probably should have
similar facilities.

However - a number of people have complained about the "iterator"
interface, wanting to supply strings (the iterable is necessary because a
CSV row can span multiple lines). It's also conceiveable that the source
lines could already be unicode objects.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Csv] Re: [Python-Dev] csv module TODO list

2005-01-05 Thread Andrew McNamara

>>I'm still trying to understand what *needs* to be done - I would move to
>>how this is done only later. What APIs should be extended/changed, and
>>in what way?
[...]
>The reader interface currently returns a row at a time, consuming as many
>lines from the supplied iterable (with the most common iterable being
>a file). This suggests to me that we will need an optional "encoding"
>argument to the reader constructor, and that the reader will need to
>decode the source lines. That said, I'm hardly a unicode expert, so I
>may be overlooking something (could a utf-16 encoded character span a
>line break, for example).  The writer interface probably should have
>similar facilities.

Ah - I see that the codecs module provides an EncodedFile class - better
to use this than add encoding/decoding cruft to the csv module.

So, do we duplicate the current reader and writer as UnicodeReader and
UnicodeWriter (how else do we know to use the unicode parser)? What about
the "dialects"? I guess if a dialect uses no unicode strings, it can be
applied to the current parser, but if it does include unicode strings,
then the parser would need to raise an exception.

The DictReader and DictWriter classes will probably need matching
UnicodeDictReader/UnicodeDictWriter versions (use common base class,
just specify alternate parser).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Re: [Csv] csv module TODO list

2005-01-06 Thread Andrew McNamara

>There's a bunch of jobs we (CSV module maintainers) have been putting
>off - attached is a list (in no particular order): 
[...]
>Also, review comments from Jeremy Hylton, 10 Apr 2003:
>
>I've been reviewing extension modules looking for C types that should
>participate in garbage collection.  I think the csv ReaderObj and
>WriterObj should participate.  The ReaderObj it contains a reference to
>input_iter that could be an arbitrary Python object.  The iterator
>object could well participate in a cycle that refers to the ReaderObj.
>The WriterObj has a reference to a writeline callable, which could well
>be a method of an object that also points to the WriterObj.

I finally got around to looking at this, only to realise Jeremy did the
work back in Apr 2003 (thanks). One question, however - the GC doco in
the Python/C API seems to suggest to me that PyObject_GC_Track should be
called on the newly minted object prior to returning from the initialiser
(and correspondingly PyObject_GC_UnTrack should be called prior to
dismantling). This isn't being done in the module as it stands. Is the
module wrong, or is my understanding of the reference manual incorrect?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Minor change to behaviour of csv module

2005-01-06 Thread Andrew McNamara

I'm considering a change to the csv module that could potentially break
some obscure uses of the module (but CSV files usually quote, rather
than escape, so the most common uses aren't effected).

Currently, with a non-default escapechar='\\', input like:

field one,field \
two,field three

Returns:

["field one", "field \\\ntwo", "field three"]

In the 2.5 series, I propose changing this to return:

["field one", "field \ntwo", "field three"]

Is this reasonable? Is the old behaviour desirable in any way (we could
add a switch to enable to new behaviour, but I feel that would only
allow the confusion to continue)?

BTW, some of my other changes have changed the exceptions raised when
bad arguments were passed to the reader and writer factory functions - 
previously, the exceptions were semi-random, including TypeError,
AttributeError and csv.Error - they should now almost always be TypeError
(like most other argument passing errors). I can't see this being a
problem, but I'm prepared to listen to arguments.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Minor change to behaviour of csv module

2005-01-07 Thread Andrew McNamara

>I'm considering a change to the csv module that could potentially break
>some obscure uses of the module (but CSV files usually quote, rather
>than escape, so the most common uses aren't effected).
>
>Currently, with a non-default escapechar='\\', input like:
>
>field one,field \
>two,field three
>
>Returns:
>
>["field one", "field \\\ntwo", "field three"]
>
>In the 2.5 series, I propose changing this to return:
>
>["field one", "field \ntwo", "field three"]
>
>Is this reasonable? Is the old behaviour desirable in any way (we could
>add a switch to enable to new behaviour, but I feel that would only
>allow the confusion to continue)?

Thinking about this further, I suspect we have to retain the current
behaviour, as broken as it is, as the default: it's conceivable that
someone somewhere is post-processing the result to remove the backslashes,
and if we fix the csv module, we'll break their code.

Note that PEP-305 had nothing to say about escaping, nor does the module
reference manual.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Re: csv module TODO list

2005-01-09 Thread Andrew McNamara

>I'd love to see a 'split' and a 'join' function in the csv module to
>just convert between string and list without having to bother about
>files. 
>
>Something like
>
>csv.split(aStr [, dialect='excel'[, fmtparam]])  -> list object
>
>and
>
>csv.join(aList, e[, dialect='excel'[, fmtparam]]) -> str object
>
>Feasible?

Yes, it's feasible, although newlines can be embedded in within fields
of a CSV record, hence the use of the iterator, rather than working with
strings. In your example above, if the parser gets to the end of the
string and finds it's still within a field, I'd propose just raising
an exception.

No promises, however - I only have a finite ammount of time to work on
this at the moment.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Re: [Csv] Minor change to behaviour of csv module

2005-01-09 Thread Andrew McNamara

>> Andrew explains that in the CSV module, escape characters are not
>> properly removed.
>>
>> Magnus writes:
>>> IMO this is the *only* reasonable behaviour. I don't understand why
>>> the escape character should be left in; this is one of the reason why
>>> UNIX-style colon-separated values don't work with the current module.
>>
>> Andrew writes back later:
>>> Thinking about this further, I suspect we have to retain the current
>>> behaviour, as broken as it is, as the default: it's conceivable that
>>> someone somewhere is post-processing the result to remove the 
>>> backslashes,
>>> and if we fix the csv module, we'll break their code.
>>
>> I'm with Magnus on this. No one has 4 year old code using the CSV 
>> module.
>> The existing behavior is just simply WRONG. Sure, of course we should
>> try to maintain backward compatibility, but surely SOME cases don't
>> require it, right? Can't we treat this misbehavior as an outright bug?
>
>+1 -- the nonremoval of escape characters smells like a bug to me, too.

Okay, I'm glad the community agrees (less work, less crustification).

For what it's worth, it wasn't a bug so much as a misfeature. I was
explicitly adding the escape character back in. The intention was to
make the feature more forgiving on users who accidently set the escape
character - in other words, only special (quoting, escaping, field
delimiter) characters received special treatment. With the benefit of
hindsight, that was an inadequately considered choice.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] csv module and universal newlines

2005-01-09 Thread Andrew McNamara

This item, from the TODO list, has been bugging me for a while:

>* Reader and universal newlines don't interact well, reader doesn't
>  honour Dialect's lineterminator setting. All outstanding bug id's
>  (789519, 944890, 967934 and 1072404) are related to this - it's 
>  a difficult problem and further discussion is needed.

The csv parser consumes lines from an iterator, but it also has it's own
idea of end-of-line conventions, which are currently only used by the
writer, not the reader, which is a source of much confusion. The writer,
by default, also attempts to emit a \r\n sequence, which results in more
confusion unless the file is opened in binary mode.

I'm looking for suggestions for how we can mitigate these problems
(without breaking things for existing users).

The standard file iterator includes the end-of-line characters in the
returned string. One potentional solution is, then, to ignore the line
chunking done by the file iterator, and logically concatenate the source
lines until the csv parser's idea of lineterminator is seen - but this
defeats negates the benefits of using an iterator.

Another option might be to provide a new interface that relies on a
file-like object being supplied. The lineterminator character would only
be used with this interface, with the current interface falling back to
using only \n. Rather a drastic solution.

Any other ideas?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-11 Thread Andrew McNamara

>Would the csv module be a good place to add a DBF reader and writer?  

I would have thought it would make sense as it's own module (in the same
way that we have separate modules that present a common interface for
the different databases), or am I missing something?

I'd certainly like to see a DBF parser in python - reading and writing odd
file formats is bread-and-butter for us contractors... 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Re: [Csv] csv module and universal newlines

2005-01-12 Thread Andrew McNamara

>You can argue that reading csv data from/writing csv data to a file on
>Windows if the file isn't opened in binary mode is an error.  Perhaps we
>should enforce that in situations where it matters.  Would this be a start?
>
>terminators = {"darwin": "\r",
>   "win32": "\r\n"}
>
>if (dialect.lineterminator != terminators.get(sys.platform, "\n") and
>   "b" not in getattr(f, "mode", "b")):
>   raise IOError, ("%s not opened in binary mode" %
>   getattr(f, "name", "???"))
>
>The elements of the postulated terminators dictionary may already exist
>somewhere within the sys or os modules (if not, perhaps they should be
>added).  The idea of the check is to enforce binary mode on those objects
>that support a mode if the desired line terminator doesn't match the
>platform's line terminator.

Where that falls down, I think, is where you want to read an alien
file - in fact, under unix, most of the CSV files I read use \r\n for
end-of-line.

Also, I *really* don't like the idea of looking for a mode attribute
on the supplied iterator - it feels like a layering violation. We've
advertised the fact that it's an iterator, so we shouldn't be using
anything but the iterator protocol.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Re: [Csv] csv module and universal newlines

2005-01-12 Thread Andrew McNamara

>Isn't universal newlines only used for reading?

That right. And the CSV reader has it's own version of univeral newlines
anyway (from the py1.5 days).

>I have had no problems using the csv module for reading files with 
>universal newlines by opening the file myself or providing an iterator.

Neither have I, funnily enough.

>Unicode, on the other hand, I have had problems with.

Ah, so somebody does want it then? Good to hear. Hard to get motivated
to make radical changes without feedback.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] UserString

2005-02-22 Thread Andrew McNamara

>> if e.errno <> errno.EEXIST:
>> raise
>
>You have a lot more faith in the errno module than I do. Are you sure
>the same error codes work on all platforms where Python works? It's
>also not exactly readable (except for old Unix hacks).

On the other hand, LBYL in this context can result in race conditions
and security vulnerabilities. "os.makedirs" is already a composite of
many system calls, so all bets are off anyway, but for simpler operations
that result in an atomic system call, this is important.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] UserString

2005-02-24 Thread Andrew McNamara

>> You have a lot more faith in the errno module than I do. Are you sure
>> the same error codes work on all platforms where Python works? 
>
>No, but I'm pretty confident the symbolic names for the errors are
>consistent for any platform I've cared about .
>
>> It's also not exactly readable (except for old Unix hacks).
>
>Guilty as charged. ;)

The consistency of the semantics of core system calls is sort of trademark
of unix. Any system that claims to be Unix, but plays loose and fast
with semantics soon gets a very poor reputation (xenix, cough).

All well-coded unix apps are dependent on system calls returning
consistent errno's. Which is one thing that makes life so difficult for
"posix" environments layered on other operating systems.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Faster Set.discard() method?

2005-03-17 Thread Andrew McNamara

To avoid the exception in the discard method, it could be implemented as:

def discard(self, element):
"""Remove an element from a set if it is a member.

If the element is not a member, do nothing.
"""
try:
self._data.pop(element, None)
except TypeError:
transform = getattr(element, "__as_temporarily_immutable__", None)
if transform is None:
raise # re-raise the TypeError exception we caught
del self._data[transform()]

Currently, it's implemented as the much clearer:

try:
self.remove(element)
except KeyError:
pass

But the dict.pop method is about 12 times faster. Is this worth doing?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Faster Set.discard() method?

2005-03-17 Thread Andrew McNamara

>> But the dict.pop method is about 12 times faster. Is this worth doing?
>
>The 2.4 builtin set's discard function looks like it does roughly the same
>as the 2.3 sets.Set.  Have you tried comparing a C version of your version
>with the 2.4 set to see if there are speedups there, too?

Ah. I had forgotten it was builtin - I'd found the python implementation
and concluded the C implementation didn't make it into 2.4 for some
reason... 8-)

Yes, the builtin set.discard() method is already faster than dict.pop().

>IMO keeping the sets.Set version as clean and readable as possible is nice,
>since the reason this exists is for other implementations (Jython, PyPy,
>...) and documentation, right?  OTOH, speeding up the CPython implementation
>is nice and it's read by many fewer people.

No, you're right - making sets.Set less readable than it already is would
be a step backwards. On the other hand, Jython and PyPy are already in
trouble - the builtin set() is not entirely compatible with sets.Set.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Faster Set.discard() method?

2005-03-17 Thread Andrew McNamara

>The C implementation has this code:
>
>"""
>   if (PyDict_DelItem(so->data, item) == -1) {
>   if (!PyErr_ExceptionMatches(PyExc_KeyError))
>   return NULL;
>   PyErr_Clear();
>   }
>"""
>
>Which is more-or-less the same as the sets.Set version, right?  What I was
>wondering was whether changing that C to a C version of your dict.pop()
>version would also result in speedups.  Are Exceptions really that slow,
>even at the C level?

No, exceptions are fast at the C level - all they do is set a flag. The
expense of exceptions is saving a restoring python frames, I think,
which doesn't happen in this case. So the current implementation is
ideal for C code - clear and fast.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Re: [Csv] Example workaround classes for using Unicode with csv module...

2005-03-20 Thread Andrew McNamara

>I added UnicodeReader and UnicodeWriter example classes to the csv module
>docs just now.  They mention problems with ASCII NUL characters (which I
>vaguely remember - NUL-terminated strings are used internally, right?).  Do
>NULs still present a problem?  I saw nothing in the log messages that
>mentioned "ascii" or "nul" so I presume it is.

That's right - it still uses null terminated strings internally, and the
various special characters (quotechar, escapechar, etc) use null to mean
"not specified". Fixing this would cause much upheaval.

>Here's what I added.  Let me know if you think it needs any corrections,
>especially if there's a better way to word "as long as you avoid encodings
>like utf-16 that use NULs".  Can that just be "as long as you avoid
>multi-byte encodings other than utf-8"?  

I think only utf-8 provides the guarantees needed for this to work -
specifically, multi-byte characters need to have the high bit set
(otherwise a delimiter or other special character appearing within a
multi-byte character would upset the parsing), while at the same time
having single byte characters for the characters with special meaning
to the parser: note also that none of the special characters (quotechar,
delimiter, escapechar, etc) can be a multi-byte sequence.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Socket module corner cases

2006-05-30 Thread Andrew McNamara

>Without further ado, the questions:
>
> * getfqdn(): The module docs specify that if no FQDN can be found,
>socket.getfqdn() should return the hostname as returned by
>gethostname(). However, CPython seems to return the passed-in hostname
>rather than the local machine's hostname (as would be expected from
>gethostname()). What's the correct behavior?
>>>> s.getfqdn(' asdlfk asdfsadf ')
>'asdlfk asdfsadf'
># expected 'mybox.mydomain.com'

I would suggest the documentation is wrong and the CPython code is right
in this case: if you supply the optional /name/ argument, then you don't
want it returning your own name (but returning gethostname() is desirable
if no /name/ is supplied).

> * getfqdn(): The function seems to not always return the FQDN. For
>example, if I run the following code from 'mybox.mydomain.com', I get
>strange output. Does getfqdn() remove the common domain between my
>hostname and the one that I'm looking up?
>>>> socket.getfqdn('otherbox')
>'OTHERBOX'
># expected 'otherbox.mydomain.com'

getfqdn() calls the system library gethostbyaddr(), and searches the
result for the first name that contains '.' or if no name contains dot,
it returns the canonical name (as defined by gethostbyaddr() and the
system host name resolver libraries (hosts file, DNS, NMB)).

> * getprotobyname(): Only a few protocols seem to be supported. Why?
>>>> for p in [a[8:] for a in dir(socket) if a.startswith('IPPROTO_')]:
>... try:
>... print p,
>... print socket.getprotobyname(p)
>... except socket.error:
>... print "(not handled)"
>...

getprotobyname() looks up the /etc/protocols file (on a unix system -
I don't know about windows), whereas the socket.IPPROTO_* constants are
populated from the #defines in netinet/in.h at compile time. 

Personally, I think /etc/protocols and the associated library functions
are a historical mistake (getprotobynumber() is marginally useful -
but python doesn't expose it!).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Socket module corner cases

2006-05-30 Thread Andrew McNamara

>After a little more investigation here, it appears that getfqdn() returns
>the name unchanged (or perhaps run through the system's resolver libs) if
>there's no reverse DNS PTR entry. In the case given above,
>otherbox.mydomain.com didn't have a reverse DNS entry, so getfqdn()
>returned 'OTHERBOX'. However, when getfqdn() is called with a name whose
>IP *does* have a PTR record, it returns the correct FQDN.

That sounds entirely plausible.

Many of these name resolver functions pre-date DNS, and show their
/etc/hosts heritage somewhat (gethostbyaddr returning ip, names and
aliases in one hit is a classic example - this isn't easy with DNS).

>Thanks for the help. Now, couple more questions:
>
>getnameinfo() accepts the NI_NAMEREQD flag. It appears, though that a name
>lookup (and associated error if the lookup fails) occurs independent of
>whether the flag is specified. Does it actually do anything?
>
>Does getnameinfo() support IPv6? It appears to fail (with a socket.error
>that says "sockaddr resolved to multiple addresses") if both IPv4 and IPv6
>are enabled.

Someone more knowledgeable will have to answer these.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] "and" and "or" operators in Py3.0

2005-09-19 Thread Andrew McNamara

>While I don't disagree with some of your main points, I do think that  
>your proposal would eliminate a natural and easy to understand use of  
>the current behavior of "or" that I tend to use quite a bit.  Your  
>proposal would break a lot of code, and I can't think of a better  
>"conditional operator" than the one thats already there.
>
>I often find myself using 'or' to conditionally select a meaningful  
>value in the absence of a real value:

I agree. I find I often have an object with an optional friendly name
(label) and a manditory system name. So this sort of thing becomes common:

'%s blah blah' % (foo.label or foo.name)

The if-else-expression alternative works, but isn't quite as readable:

'%s blah blah' % (foo.label ? foo.label : foo.name)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

66 matches

Mail list logo