date:20051024

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson

Martin v. Löwis:

> That's very tricky. If you have multiple implementations, you make
> usage at the C API difficult. If you make it either UTF-8 or UTF-32,
> you make PythonWin difficult. If you make it UTF-16, you make indexing
> difficult.

   For Windows, the code will get a little uglier, needing to perform
an allocation/encoding and deallocation more often then at present but
I don't think there will be a speed degradation as Windows is
currently performing a conversion from 8 bit to UTF-16 inside many
system calls. To minimize the cost of allocation, Python could copy
Windows in keeping a small number of commonly sized preallocated
buffers handy.

   For indexing UTF-16, a flag could be set to show if the string is
all in the base plane and if not, an index could be constructed when
and if needed. It'd be good to get some feel for what proportion of
string operations performed require indexing. Many, such as
startswith, split, and concatenation don't require indexing. The
proportion of operations that use indexing to scan strings would also
be interesting as adding a (currentIndex, currentOffset) cursor to
string objects would be another approach.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michele Simionato

On 10/23/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Very nice indeed. I'd be more supportive if it was defined as a new statement
> such as "create" with the syntax:
>
>create TYPE NAME(ARGS):
>  BLOCK

I like it, but it would require a new keyword. Alternatively, one
could abuse 'def':

def  TYPE NAME(ARGS):
  BLOCK

but then people would likely be confused as Skip was, earlier in this thread,
so I guess 'def' is a not an option.

IMHO a new keyword could be justified for such a powerful feature,
but only Guido's opinion counts on this matters ;)

Anyway I expected people to criticize the proposal as too powerful and
dangerously close to Lisp macros.

 Michele Simionato
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] int(string)

2005-10-24 Thread Fredrik Lundh

Alan McIntyre wrote:

> When running "make test" I get some errors in test_array and
> test_compile that did not occur in the build from CVS.  Given the inputs
> to long() have '.' characters in them, I assume that these tests really
> should be failing as implemented, but I haven't dug into them to see
> what's going on:
>
> ==
> ERROR: test_repr (__main__.FloatTest)
> --
> Traceback (most recent call last):
>   File "Lib/test/test_array.py", line 187, in test_repr
> self.assertEqual(a, eval(repr(a), {"array": array.array}))
> ValueError: invalid literal for long(): 100.0
>
> ==
> ERROR: test_repr (__main__.DoubleTest)
> --
> Traceback (most recent call last):
>   File "Lib/test/test_array.py", line 187, in test_repr
> self.assertEqual(a, eval(repr(a), {"array": array.array}))
> ValueError: invalid literal for long(): 100.0

I don't have the latest cvs, but in my copy of test_array, the input to those
two eval calls are

 array('f', [-42.0, 0.0, 42.0, 10.0, -100.0, -42.0, 0.0, 42.0,
10.0, -100.0])

and

 array('d', [-42.0, 0.0, 42.0, 10.0, -100.0, -42.0, 0.0, 42.0,
10.0, -100.0])

respectively.  if either of those gives "invalid literal for long", something's
seriously broken.

does a plain

a = -100.0

still work on your machine?





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Josiah Carlson


Michele Simionato <[EMAIL PROTECTED]> wrote:
> 
> On 10/23/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> > Very nice indeed. I'd be more supportive if it was defined as a new 
> > statement
> > such as "create" with the syntax:
> >
> >create TYPE NAME(ARGS):
> >  BLOCK
> 
> I like it, but it would require a new keyword. Alternatively, one
> could abuse 'def':
> 
> def  TYPE NAME(ARGS):
>   BLOCK
> 
> but then people would likely be confused as Skip was, earlier in this thread,
> so I guess 'def' is a not an option.
> 
> IMHO a new keyword could be justified for such a powerful feature,
> but only Guido's opinion counts on this matters ;)
> 
> Anyway I expected people to criticize the proposal as too powerful and
> dangerously close to Lisp macros.

I would criticise it for being dangerously close to worthless.  With the
minor support code that I (and others) have offered, no new syntax is
necessary.

You can get the same semantics with...

class NAME(_(TYPE), ARGS):
BLOCK

And a suitably defined _.  Remember, not every X line function should be
made a builtin or syntax.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michele Simionato

On 10/24/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> I would criticise it for being dangerously close to worthless.  With the
> minor support code that I (and others) have offered, no new syntax is
> necessary.
>
> You can get the same semantics with...
>
> class NAME(_(TYPE), ARGS):
> BLOCK
>
> And a suitably defined _.  Remember, not every X line function should be
> made a builtin or syntax.
>
>  - Josiah

Could you re-read my original message, please? Sugar is *everything*
in this case. If the functionality is to be implemented via a __metaclass__
hook, then it should be considered a hack that nobody in his right mind
should use. OTOH, if there is a specific syntax for it, then it means
this the usage
has the benediction of the BDFL. This would be a HUGE change.
For instance, I would never abuse metaclasses for that, whereas I
would freely use a 'create' statement.

   Michele Simionato
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg

Neil Hodgson wrote:
> Guido van Rossum:
> 
> 
>>Folks, please focus on what Python 3000 should do.
>>
>>I'm thinking about making all character strings Unicode (possibly with
>>different internal representations a la NSString in Apple's Objective
>>C) and introduce a separate mutable bytes array data type. But I could
>>use some validation or feedback on this idea from actual
>>practitioners.
> 
> 
>I'd like to more tightly define Unicode strings for Python 3000.
> Currently, Unicode strings may be implemented with either 2 byte
> (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
> contain any Unicode character and should be indexable yielding
> characters rather than half characters. Therefore Python strings
> should appear to be UTF-32. There could still be multiple
> implementations (using UTF-16 or UTF-8) to preserve space but all
> implementations should appear to be the same apart from speed and
> memory use.

There seems to be a general misunderstanding here: even if you
have UCS4 storage, it is still possible to slice a Unicode
string in a way which makes rendering it correctly.

Unicode has the concept of combining code points, e.g. you can
store an "é" (e with a accent) as "e" + "'". Now if you slice
off the accent, you'll break the character that you encoded
using combining code points.

Note that combining code points are rather common in encodings
of Asian scripts, so this is not an artificial example.

Some time ago I proposed a new module called unicodeindex
to help with indexing. It would solve most of the indexing
issues you run into when dealing with Unicode. I've attached
it to this email for reference.

More on the used terms:

http://www.egenix.com/files/python/EuroPython2002-Python-and-Unicode.pdf
http://www.egenix.com/files/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
PEP: 0XXX
Title: Unicode Indexing Helper Module
Version: $Revision: 1.0 $
Author: [EMAIL PROTECTED] (Marc-Andr Lemburg)
Status: Draft
Type: Standards Track
Python-Version: 2.3
Created: 06-Jun-2001
Post-History: 

Abstract

This PEP proposes a new module "unicodeindex" which provides 
means to index Unicode objects in various higher level abstractions
of "characters".

Problem and Terminology

Unicode objects can be indexed just like string object using what
in Unicode terms is called a code unit as index basis.  

Code units are the storage entities used by the Unicode
implementation to store a single Unicode information unit and do
not necessarily map 1-1 to code points which are the smallest
entities encoded by the Unicode standard. Python exposes code
units to the programmer via the Unicode object indexing and slicing
API, e.g. u[10] or u[12:15] refer to the code units at index 10
and indices 12 to 14.

These code points can sometimes be composed to form graphemes
which are then displayed by the Unicode output device as one
character. A word is then a sequence of characters separated by
space characters or punctuation, a line is a sequence of code
points separated by line breaking code point sequences.

For addressing Unicode, there are basically five different methods
by which you can reference the data:

1. per code unit(codeunit)
2. per code point   (codepoint)
3. per grapheme (grapheme)
4. per word (word)
5. per line (line)

The indexing type name is given in parenthesis and used in the
module interface.

Proposed Solution

I propose to add a new module to the standard Python library which
provides interfaces implementing the above indexing methods.

Module Interface

The module should provide the following interfaces for all four
indexing styles:

next_(u, index) -> integer

Returns the Unicode object index for the start of the next
 found after u[index] or -1 in case no next element
of this type exists.

prev_(u, index) -> integer

Returns the Unicode object index for the start of the previous
 found before u[index] or -1 in case no previous
element of this type exists.

_index(u, n) -> integer

Returns the Unicode object index for the start of the n-th
 element in u. Raises an IndexError in case no n-th
element can be found.

_count(u, index) -> integer

Counts the number of complete  elements found in
u[:index] and returns the count as integer.

_start

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread Walter Dörwald

Martin v. Löwis wrote:

> M.-A. Lemburg wrote:
> 
>>I've checked in a whole bunch of newly generated codecs
>>which now make use of the faster charmap decoding variant added
>>by Walter a short while ago.
>>
>>Please let me know if you find any problems.
> 
> I think we should work on eliminating the decoding_map variables.
> There are some codecs which rely on them being present in other codecs
> (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
> to use, say
> 
> decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
>  0x00a4: 0x0454, #   CYRILLIC SMALL LETTER UKRAINIAN IE
>  0x00a6: 0x0456, #   CYRILLIC SMALL LETTER 
> BYELORUSSIAN-UKRAINIAN I
>  0x00a7: 0x0457, #   CYRILLIC SMALL LETTER YI (UKRAINIAN)
>  0x00ad: 0x0491, #   CYRILLIC SMALL LETTER UKRAINIAN GHE 
> WITH UPTURN
>  0x00b4: 0x0404, #   CYRILLIC CAPITAL LETTER UKRAINIAN IE
>  0x00b6: 0x0406, #   CYRILLIC CAPITAL LETTER 
> BYELORUSSIAN-UKRAINIAN I
>  0x00b7: 0x0407, #   CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
>  0x00bd: 0x0490, #   CYRILLIC CAPITAL LETTER UKRAINIAN GHE 
> WITH UPTURN
> })
> 
> With all these cross-references gone, the decoding_maps could also go.

Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put 
a complete decoding_table into koi8_u.py?

I'd like to suggest a small cosmetic change: gencodec.py should output 
byte values with two hexdigits instead of four. This makes it easier to 
see what is a byte values and what is a codepoint. And it would make 
grepping for stuff simpler.

I.e. change:

decoding_map.update({
 0x0080: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

to

decoding_map.update({
 0x80: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

and

decoding_table = (
 u'\x00' #  0x -> NULL

to

decoding_table = (
 u'\x00' # 0x00 -> U+ NULL

and

encoding_map = {
 0x: 0x, #  NULL

to

encoding_map = {
 0x: 0x00, #  NULL
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread M.-A. Lemburg

Walter Dörwald wrote:
> Martin v. Löwis wrote:
> 
>> M.-A. Lemburg wrote:
>>
>>> I've checked in a whole bunch of newly generated codecs
>>> which now make use of the faster charmap decoding variant added
>>> by Walter a short while ago.
>>>
>>> Please let me know if you find any problems.
>>
>>
>> I think we should work on eliminating the decoding_map variables.
>> There are some codecs which rely on them being present in other codecs
>> (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
>> to use, say
>>
>> decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
>>  0x00a4: 0x0454, #   CYRILLIC SMALL LETTER UKRAINIAN IE
>>  0x00a6: 0x0456, #   CYRILLIC SMALL LETTER
>> BYELORUSSIAN-UKRAINIAN I
>>  0x00a7: 0x0457, #   CYRILLIC SMALL LETTER YI (UKRAINIAN)
>>  0x00ad: 0x0491, #   CYRILLIC SMALL LETTER UKRAINIAN GHE
>> WITH UPTURN
>>  0x00b4: 0x0404, #   CYRILLIC CAPITAL LETTER UKRAINIAN IE
>>  0x00b6: 0x0406, #   CYRILLIC CAPITAL LETTER
>> BYELORUSSIAN-UKRAINIAN I
>>  0x00b7: 0x0407, #   CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
>>  0x00bd: 0x0490, #   CYRILLIC CAPITAL LETTER UKRAINIAN GHE
>> WITH UPTURN
>> })
>>
>> With all these cross-references gone, the decoding_maps could also go.

I just left them in because I thought they wouldn't do any harm
and might be useful in some applications.

Removing them where not directly needed by the codec would not
be a problem.

> Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
> a complete decoding_table into koi8_u.py?

KOI8-U is not available as mapping on ftp.unicode.org and
I only recreated codecs from the mapping files available
there.

> I'd like to suggest a small cosmetic change: gencodec.py should output
> byte values with two hexdigits instead of four. This makes it easier to
> see what is a byte values and what is a codepoint. And it would make
> grepping for stuff simpler.

True.

I'll rerun the creation with the above changes sometime this
week.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Nick Coghlan

Josiah Carlson wrote:
> You can get the same semantics with...
> 
> class NAME(_(TYPE), ARGS):
> BLOCK
> 
> And a suitably defined _.  Remember, not every X line function should be
> made a builtin or syntax.

And this would be an extremely fragile hack that is entirely dependent on the 
murky rules regarding how Python chooses the metaclass for the newly created 
class. Ensuring that the metaclass of the class returned by "_" was always the 
one chosen would be tricky at best and impossible at worst.

Even if it *could* be done, I'd never want to see a hack like that in 
production code I had anything to do with.

And while writing it with "__metaclass__" has precisely the correct semantics, 
that simply isn't as readable as a new block statement would be, nor is it as 
readable as the current major alternatives (e.g., defining and invoking a 
factory function).

An alternative to a completely new function would be to simply allow the 
metaclass to be defined up front, rather than inside the body of the class 
statement:

   class @TYPE NAME(ARGS):
   BLOCK

For example:

   class @Property x():
   def get(self):
   return self._x
   def set(self, value):
   self._x = value
   def delete(self, value):
   del self._x

(I put the metaclass after the keyword, because, unlike a function decorator, 
the metaclass is invoked *before* the class is created, and because you're 
only allowed one explicit metaclass)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg

Bengt Richter wrote:
> Please bear with me for a few paragraphs ;-)

Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert string literals (currently on the Unicode ones)
into constant Unicode objects within the program text.
It's also a nice way to let other people know what kind of
encoding you used to write your comments ;-)

Nothing more.

Once a module is compiled, there's no distinction between
a module using the latin-1 source code encoding or one using
the utf-8 encoding.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Nick Coghlan

Barry Warsaw wrote:
> I've had this PEP laying around for quite a few months.  It was inspired
> by some code we'd written which wanted to be able to get immutable
> versions of arbitrary objects.  I've finally finished the PEP, uploaded
> a sample patch (albeit a bit incomplete), and I'm posting it here to see
> if there is any interest.
> 
> http://www.python.org/peps/pep-0351.html

I think it's definitely worth considering. It may also reduce the need for "x" 
and "frozenx" builtin pairs. We already have "set" and "frozenset", and the 
various "bytes" ideas that have been kicked around have generally considered 
the need for a "frozenbytes" as well.

If freeze was available, then "freeze(x(*args))" might server as a replacement 
for any builtin "frozen" variants.

I think having dicts and sets automatically invoke freeze would be a mistake, 
because at least one of the following two cases would behave unexpectedly:

   d = {}
   l = []
   d[l] = "Oops!"
   d[l] # Raises KeyError if freeze() isn't also invoked in __getitem__

   d = {}
   l = []
   d[l] = "Oops!"
   l.append(1)
   d[l] # Raises KeyError regardless

Oh, and the PEP's xdict example is even more broken than the PEP implies, 
because two imdicts which compare equal (same contents) may not hash equal 
(different id's).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Christopher Armstrong

On 10/24/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> "Should dicts and sets automatically freeze their mutable keys?"
>
> Dictionaries don't have mutable keys,

Since when?

class Foo:
def __init__(self):
self.x = 1

f = Foo()
d = {f: 1}
f.x = 2

Maybe you meant something else? I can't think of any way in which
"dictionaries don't have mutable keys" is true. The only rule about
dictionary keys that I know of is that they need to be hashable and
need to be comparable with the equality operator.

--
  Twisted   |  Christopher Armstrong: International Man of Twistery
   Radix|-- http://radix.twistedmatrix.com
|  Release Manager, Twisted Project
  \\\V///   |-- http://twistedmatrix.com
   |o O||
wvw-+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Josiah Carlson

Nick Coghlan <[EMAIL PROTECTED]> wrote:
> 
> Josiah Carlson wrote:
> > You can get the same semantics with...
> > 
> > class NAME(_(TYPE), ARGS):
> > BLOCK
> > 
> > And a suitably defined _.  Remember, not every X line function should be
> > made a builtin or syntax.
> 
> And this would be an extremely fragile hack that is entirely dependent on the 
> murky rules regarding how Python chooses the metaclass for the newly created 
> class. Ensuring that the metaclass of the class returned by "_" was always 
> the 
> one chosen would be tricky at best and impossible at worst.

The rules for which metaclass is used is listed in the metaclass
documentation.  I personally never claimed it was perfect, and neither
is this one...

class NAME(_(TYPE, ARGS)):
BLOCK

But it does solve the problem without needing syntax (and fixes any
possible metaclass order choices).

> Even if it *could* be done, I'd never want to see a hack like that in 
> production code I had anything to do with.

That's perfectly reasonable.

> (I put the metaclass after the keyword, because, unlike a function decorator, 
> the metaclass is invoked *before* the class is created, and because you're 
> only allowed one explicit metaclass)

Perhaps, but because the metaclass can return anything (in this case, it
returns a property), being able to modify the object that is created may
be desireable...at which point, we may as well get class decorators for
the built-in chaining.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Josiah Carlson

Christopher Armstrong <[EMAIL PROTECTED]> wrote:
> 
> On 10/24/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > "Should dicts and sets automatically freeze their mutable keys?"
> >
> > Dictionaries don't have mutable keys,
> 
> Since when?
> 
> Maybe you meant something else? I can't think of any way in which
> "dictionaries don't have mutable keys" is true. The only rule about
> dictionary keys that I know of is that they need to be hashable and
> need to be comparable with the equality operator.

Good point, I forgot about user-defined classes (I rarely use them as
keys myself, it's all too easy to make a mutable whose hash is dependant
on mutable contents, as having an object which you can only find if you
have the exact object is not quite as useful I generally need).  I will,
however, stand by, "a container which is frozen should have its contents
frozen as well."

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread Walter Dörwald

M.-A. Lemburg wrote:

> Walter Dörwald wrote:
> 
>>Martin v. Löwis wrote:
>>
>>>M.-A. Lemburg wrote:
>>>
I've checked in a whole bunch of newly generated codecs
which now make use of the faster charmap decoding variant added
by Walter a short while ago.

Please let me know if you find any problems.
>>>
>>>I think we should work on eliminating the decoding_map variables.
>>>There are some codecs which rely on them being present in other codecs
>>>(e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
>>>to use, say
>>>
>>>decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
>>> 0x00a4: 0x0454, #   CYRILLIC SMALL LETTER UKRAINIAN IE
>>> 0x00a6: 0x0456, #   CYRILLIC SMALL LETTER
>>>BYELORUSSIAN-UKRAINIAN I
>>> 0x00a7: 0x0457, #   CYRILLIC SMALL LETTER YI (UKRAINIAN)
>>> 0x00ad: 0x0491, #   CYRILLIC SMALL LETTER UKRAINIAN GHE
>>>WITH UPTURN
>>> 0x00b4: 0x0404, #   CYRILLIC CAPITAL LETTER UKRAINIAN IE
>>> 0x00b6: 0x0406, #   CYRILLIC CAPITAL LETTER
>>>BYELORUSSIAN-UKRAINIAN I
>>> 0x00b7: 0x0407, #   CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
>>> 0x00bd: 0x0490, #   CYRILLIC CAPITAL LETTER UKRAINIAN GHE
>>>WITH UPTURN
>>>})
>>>
>>>With all these cross-references gone, the decoding_maps could also go.
> 
> I just left them in because I thought they wouldn't do any harm
> and might be useful in some applications.
 >
> Removing them where not directly needed by the codec would not
> be a problem.

Recreating them is quite simple via dict(enumerate(decoding_table)) so I 
think we should remove them.

>>Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
>>a complete decoding_table into koi8_u.py?
> 
> KOI8-U is not available as mapping on ftp.unicode.org and
> I only recreated codecs from the mapping files available
> there.

OK, so we'd need something that creates a new decoding table from an old 
one + changes, i.e. something like:

def update_decoding_table(table, new):
table = list[table]
for (key, value) in new.iteritems():
   table[key] = unichr(value)
return u"".join(table)

>>I'd like to suggest a small cosmetic change: gencodec.py should output
>>byte values with two hexdigits instead of four. This makes it easier to
>>see what is a byte values and what is a codepoint. And it would make
>>grepping for stuff simpler.
> 
> True.
> 
> I'll rerun the creation with the above changes sometime this
> week.

Great, thanks!

Bye,
Walter Dörwald
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michael Hudson

Nick Coghlan <[EMAIL PROTECTED]> writes:

> Josiah Carlson wrote:
>> You can get the same semantics with...
>> 
>> class NAME(_(TYPE), ARGS):
>> BLOCK
>> 
>> And a suitably defined _.  Remember, not every X line function should be
>> made a builtin or syntax.
>
> And this would be an extremely fragile hack that is entirely
> dependent on the murky rules regarding how Python chooses the
> metaclass for the newly created class.

Uh, not really.  In the presence of base classes it's always "the type
of the first base".  The reason it might not seem this simple is that
most metaclasses end up calling type.__new__ at some point and this
function does more complicated things (such as checking for metaclass
conflict and deferring to the most specific metaclass).  

Not sure what the context is here, but I have to butt in when I see
people complicating things which aren't actually that complicated...

Cheers,
mwh

-- 
  There's an aura of unholy black magic about CLISP.  It works, but
  I have no idea how it does it.  I suspect there's a goat involved
  somewhere. -- Johann Hibschman, comp.lang.scheme
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Josiah Carlson


Nick Coghlan <[EMAIL PROTECTED]> wrote:
> I think having dicts and sets automatically invoke freeze would be a mistake, 
> because at least one of the following two cases would behave unexpectedly:

I'm pretty sure that the PEP was only aslomg if one would freeze the
contents of dicts IF the dict was being frozen.

That is, which of the following should be the case:
freeze({1:[2,3,4]}) -> {1:[2,3,4]}
freeze({1:[2,3,4]}) -> xdict(1=(2,3,4))

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Josiah Carlson

Michele Simionato <[EMAIL PROTECTED]> wrote:
> 
> On 10/24/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > I would criticise it for being dangerously close to worthless.  With the
> > minor support code that I (and others) have offered, no new syntax is
> > necessary.
> >
> > You can get the same semantics with...
> >
> > class NAME(_(TYPE), ARGS):
> > BLOCK
> >
> > And a suitably defined _.  Remember, not every X line function should be
> > made a builtin or syntax.
> >
> >  - Josiah
> 
> Could you re-read my original message, please? Sugar is *everything*
> in this case. If the functionality is to be implemented via a __metaclass__
> hook, then it should be considered a hack that nobody in his right mind
> should use. OTOH, if there is a specific syntax for it, then it means
> this the usage
> has the benediction of the BDFL. This would be a HUGE change.
> For instance, I would never abuse metaclasses for that, whereas I
> would freely use a 'create' statement.

Metaclass abuse?  Oh, I'm sorry, I thought that the point of metaclasses
were to offer a way to make "magic" happen in a somewhat pragmatic
manner, you know, through metaprogramming.  I would call this particular
use a practical application of standard Python semantics.

Pardon me while I attempt to re-parse your above statement...
"If there is a specific syntax for [passing a temporary namespace to a
callable, created by some sort of block mechanism], then [using it for
property creation] has the benediction of the BDFL".

What I'm trying to say is that it already has a no-syntax syntax.  It
uses the "magic" of metaclasses, but one can make that "magic" as
explicit as necessary.

class NAME(PassNamespaceFromClassBlock(fcn=TYPE, args=ARGS)):
BLOCK

Personally, I've not seen the desire to pass temporary namespaces to
functions until recently, so whether or not people will use it for
property creation, or any other way that people would find interesting
and/or useful, is at least a bit of prediction.  Maybe people will
prefer to use property('get_foo', 'set_foo', 'del_foo'), who knows?  But
you know what?  Regardless of what people want, they can use metaclasses
right now to create properties, where they would have to wait until
Python 2.5 comes out before they could use this proposed 'create'
statement.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Ronald Oussoren

On 24-okt-2005, at 12:54, Josiah Carlson wrote:

>>
>
> Metaclass abuse?  Oh, I'm sorry, I thought that the point of  
> metaclasses
> were to offer a way to make "magic" happen in a somewhat pragmatic
> manner, you know, through metaprogramming.  I would call this  
> particular
> use a practical application of standard Python semantics.

I'd say using a class statement to define a property is metaclass  
abuse, as would
anything that wouldn't define something class-like. The same is true  
for other
constructs, using an decorator to define something that is not a  
callable would IMHO
also be abuse.

That said, I really have an opinion on the 'create' statement  
proposal yet. It
does seem to have a very limited field of use. I'm quite happy with  
using property
as it is, property('get_foo', 'set_foo') would take away most if not  
all of
the remaining problems.

Ronald

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] KOI8_U (New codecs checked in)

2005-10-24 Thread M.-A. Lemburg

Walter Dörwald wrote:
>>> Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
>>> a complete decoding_table into koi8_u.py?
>>
>>
>> KOI8-U is not available as mapping on ftp.unicode.org and
>> I only recreated codecs from the mapping files available
>> there.
> 
> 
> OK, so we'd need something that creates a new decoding table from an old
> one + changes, i.e. something like:
> 
> def update_decoding_table(table, new):
>table = list[table]
>for (key, value) in new.iteritems():
>   table[key] = unichr(value)
>return u"".join(table)

Actually, I'd rather have some official mapping files
for these.

Perhaps we could get someone to upload a mapping file
for KOI8_U to the Unicode site ?!

The mapping is defined in RFC2319:

http://www.faqs.org/rfcs/rfc2319.html

I've put Alexander Yeremenko, the coordinator of
the KOI8-U group on CC.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] KOI8_U (New codecs checked in)

2005-10-24 Thread M.-A. Lemburg



M.-A. Lemburg wrote:
> Walter Dörwald wrote:
> 
Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
a complete decoding_table into koi8_u.py?
>>>
>>>
>>>KOI8-U is not available as mapping on ftp.unicode.org and
>>>I only recreated codecs from the mapping files available
>>>there.
>>
>>
>>OK, so we'd need something that creates a new decoding table from an old
>>one + changes, i.e. something like:
>>
>>def update_decoding_table(table, new):
>>   table = list[table]
>>   for (key, value) in new.iteritems():
>>  table[key] = unichr(value)
>>   return u"".join(table)
> 
> 
> Actually, I'd rather have some official mapping files
> for these.
> 
> Perhaps we could get someone to upload a mapping file
> for KOI8_U to the Unicode site ?!
> 
> The mapping is defined in RFC2319:
> 
> http://www.faqs.org/rfcs/rfc2319.html
> 
> I've put Alexander Yeremenko, the coordinator of
> the KOI8-U group on CC.

Hmm, that email address bounces. I've now put Maxim
on CC: Maxim Dzumanenko <[EMAIL PROTECTED]>

Here's a mapping file for KOI9-U - please check whether
it's correct.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
#
#   Name: KOI8-U (RFC2319) to Unicode
#
#   See RFC2319 for details. This encoding is a modified KOI8-R
#   encoding.
#
0x000x  #   NULL
0x010x0001  #   START OF HEADING
0x020x0002  #   START OF TEXT
0x030x0003  #   END OF TEXT
0x040x0004  #   END OF TRANSMISSION
0x050x0005  #   ENQUIRY
0x060x0006  #   ACKNOWLEDGE
0x070x0007  #   BELL
0x080x0008  #   BACKSPACE
0x090x0009  #   HORIZONTAL TABULATION
0x0A0x000A  #   LINE FEED
0x0B0x000B  #   VERTICAL TABULATION
0x0C0x000C  #   FORM FEED
0x0D0x000D  #   CARRIAGE RETURN
0x0E0x000E  #   SHIFT OUT
0x0F0x000F  #   SHIFT IN
0x100x0010  #   DATA LINK ESCAPE
0x110x0011  #   DEVICE CONTROL ONE
0x120x0012  #   DEVICE CONTROL TWO
0x130x0013  #   DEVICE CONTROL THREE
0x140x0014  #   DEVICE CONTROL FOUR
0x150x0015  #   NEGATIVE ACKNOWLEDGE
0x160x0016  #   SYNCHRONOUS IDLE
0x170x0017  #   END OF TRANSMISSION BLOCK
0x180x0018  #   CANCEL
0x190x0019  #   END OF MEDIUM
0x1A0x001A  #   SUBSTITUTE
0x1B0x001B  #   ESCAPE
0x1C0x001C  #   FILE SEPARATOR
0x1D0x001D  #   GROUP SEPARATOR
0x1E0x001E  #   RECORD SEPARATOR
0x1F0x001F  #   UNIT SEPARATOR
0x200x0020  #   SPACE
0x210x0021  #   EXCLAMATION MARK
0x220x0022  #   QUOTATION MARK
0x230x0023  #   NUMBER SIGN
0x240x0024  #   DOLLAR SIGN
0x250x0025  #   PERCENT SIGN
0x260x0026  #   AMPERSAND
0x270x0027  #   APOSTROPHE
0x280x0028  #   LEFT PARENTHESIS
0x290x0029  #   RIGHT PARENTHESIS
0x2A0x002A  #   ASTERISK
0x2B0x002B  #   PLUS SIGN
0x2C0x002C  #   COMMA
0x2D0x002D  #   HYPHEN-MINUS
0x2E0x002E  #   FULL STOP
0x2F0x002F  #   SOLIDUS
0x300x0030  #   DIGIT ZERO
0x310x0031  #   DIGIT ONE
0x320x0032  #   DIGIT TWO
0x330x0033  #   DIGIT THREE
0x340x0034  #   DIGIT FOUR
0x350x0035  #   DIGIT FIVE
0x360x0036  #   DIGIT SIX
0x370x0037  #   DIGIT SEVEN
0x380x0038  #   DIGIT EIGHT
0x390x0039  #   DIGIT NINE
0x3A0x003A  #   COLON
0x3B0x003B  #   SEMICOLON
0x3C0x003C  #   LESS-THAN SIGN
0x3D0x003D  #   EQUALS SIGN
0x3E0x003E  #   GREATER-THAN SIGN
0x3F0x003F  #   QUESTION MARK
0x400x0040  #   COMMERCIAL AT
0x410x0041  #   LATIN CAPITAL LETTER A
0x420x0042  #   LATIN CAPITAL LETTER B
0x430x0043  #   LATIN CAPITAL LETTER C
0x440x0044  #   LATIN CAPITAL LETTER D
0x450x0045  #   LATIN CAPITAL LETTER E
0x460x0046  #   LATIN CAPITAL LETTER F
0x470x0047  #   LATIN CAPITAL LETTER G
0x480x0048  #   LATIN CAPITAL LETTER H
0x490x0049  #   LATIN CAPITAL LETTER I
0x4A0x004A  #   LATIN CAPITAL LETTER J
0x4B0x004B  #   LATIN CAPITAL LETTER K
0x4C0x004C  #   LATIN CAPITAL LETTER L
0x4D0x004D  #   LATIN CAPITAL LETTER M
0x4E0x004E  #   LATIN CAPITAL LETTER N
0x4F0x004F  #   LATIN CAPITAL LETTER O
0x500x0050  #   LATIN CAPITAL LETTER P
0x510x0051  #   LATIN CAPITAL LETTER Q
0x520x0052  #   LATIN CAPITAL LETTER R
0x53

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michele Simionato

On 10/24/05, Ronald Oussoren <[EMAIL PROTECTED]> wrote:
> I'd say using a class statement to define a property is metaclass
> abuse, as would
> anything that wouldn't define something class-like. The same is true
> for other
> constructs, using an decorator to define something that is not a
> callable would IMHO
> also be abuse.

+1

> That said, I really have an opinion on the 'create' statement
> proposal yet. It
> does seem to have a very limited field of use.

This is definitely non-true. The 'create' statement would have lots of
applications. On top of my mind I can think of 'create' applied to:

- bunches;
- modules;
- interfaces;
- properties;
- usage in framewors, for instance providing sugar for
Object-Relational mappers,
  for making templates (i.e. a create HTMLPage);
- building custom minilanguages;
- ...

This is way I see a 'create' statement is frightening powerful addition to the
language.

 Michele Simionato
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] KOI8_U (New codecs checked in)

2005-10-24 Thread M.-A. Lemburg

M.-A. Lemburg wrote:
> Here's a mapping file for KOI9-U - please check whether
> it's correct.

I missed one codec point change: 0xB4.

Here's the updated version which matches the codec we currently
have in Python 2.3 and 2.4.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
#
#   Name: KOI8-U (RFC2319) to Unicode
#
#   See RFC2319 for details. This encoding is a modified KOI8-R
#   encoding.
#
0x000x  #   NULL
0x010x0001  #   START OF HEADING
0x020x0002  #   START OF TEXT
0x030x0003  #   END OF TEXT
0x040x0004  #   END OF TRANSMISSION
0x050x0005  #   ENQUIRY
0x060x0006  #   ACKNOWLEDGE
0x070x0007  #   BELL
0x080x0008  #   BACKSPACE
0x090x0009  #   HORIZONTAL TABULATION
0x0A0x000A  #   LINE FEED
0x0B0x000B  #   VERTICAL TABULATION
0x0C0x000C  #   FORM FEED
0x0D0x000D  #   CARRIAGE RETURN
0x0E0x000E  #   SHIFT OUT
0x0F0x000F  #   SHIFT IN
0x100x0010  #   DATA LINK ESCAPE
0x110x0011  #   DEVICE CONTROL ONE
0x120x0012  #   DEVICE CONTROL TWO
0x130x0013  #   DEVICE CONTROL THREE
0x140x0014  #   DEVICE CONTROL FOUR
0x150x0015  #   NEGATIVE ACKNOWLEDGE
0x160x0016  #   SYNCHRONOUS IDLE
0x170x0017  #   END OF TRANSMISSION BLOCK
0x180x0018  #   CANCEL
0x190x0019  #   END OF MEDIUM
0x1A0x001A  #   SUBSTITUTE
0x1B0x001B  #   ESCAPE
0x1C0x001C  #   FILE SEPARATOR
0x1D0x001D  #   GROUP SEPARATOR
0x1E0x001E  #   RECORD SEPARATOR
0x1F0x001F  #   UNIT SEPARATOR
0x200x0020  #   SPACE
0x210x0021  #   EXCLAMATION MARK
0x220x0022  #   QUOTATION MARK
0x230x0023  #   NUMBER SIGN
0x240x0024  #   DOLLAR SIGN
0x250x0025  #   PERCENT SIGN
0x260x0026  #   AMPERSAND
0x270x0027  #   APOSTROPHE
0x280x0028  #   LEFT PARENTHESIS
0x290x0029  #   RIGHT PARENTHESIS
0x2A0x002A  #   ASTERISK
0x2B0x002B  #   PLUS SIGN
0x2C0x002C  #   COMMA
0x2D0x002D  #   HYPHEN-MINUS
0x2E0x002E  #   FULL STOP
0x2F0x002F  #   SOLIDUS
0x300x0030  #   DIGIT ZERO
0x310x0031  #   DIGIT ONE
0x320x0032  #   DIGIT TWO
0x330x0033  #   DIGIT THREE
0x340x0034  #   DIGIT FOUR
0x350x0035  #   DIGIT FIVE
0x360x0036  #   DIGIT SIX
0x370x0037  #   DIGIT SEVEN
0x380x0038  #   DIGIT EIGHT
0x390x0039  #   DIGIT NINE
0x3A0x003A  #   COLON
0x3B0x003B  #   SEMICOLON
0x3C0x003C  #   LESS-THAN SIGN
0x3D0x003D  #   EQUALS SIGN
0x3E0x003E  #   GREATER-THAN SIGN
0x3F0x003F  #   QUESTION MARK
0x400x0040  #   COMMERCIAL AT
0x410x0041  #   LATIN CAPITAL LETTER A
0x420x0042  #   LATIN CAPITAL LETTER B
0x430x0043  #   LATIN CAPITAL LETTER C
0x440x0044  #   LATIN CAPITAL LETTER D
0x450x0045  #   LATIN CAPITAL LETTER E
0x460x0046  #   LATIN CAPITAL LETTER F
0x470x0047  #   LATIN CAPITAL LETTER G
0x480x0048  #   LATIN CAPITAL LETTER H
0x490x0049  #   LATIN CAPITAL LETTER I
0x4A0x004A  #   LATIN CAPITAL LETTER J
0x4B0x004B  #   LATIN CAPITAL LETTER K
0x4C0x004C  #   LATIN CAPITAL LETTER L
0x4D0x004D  #   LATIN CAPITAL LETTER M
0x4E0x004E  #   LATIN CAPITAL LETTER N
0x4F0x004F  #   LATIN CAPITAL LETTER O
0x500x0050  #   LATIN CAPITAL LETTER P
0x510x0051  #   LATIN CAPITAL LETTER Q
0x520x0052  #   LATIN CAPITAL LETTER R
0x530x0053  #   LATIN CAPITAL LETTER S
0x540x0054  #   LATIN CAPITAL LETTER T
0x550x0055  #   LATIN CAPITAL LETTER U
0x560x0056  #   LATIN CAPITAL LETTER V
0x570x0057  #   LATIN CAPITAL LETTER W
0x580x0058  #   LATIN CAPITAL LETTER X
0x590x0059  #   LATIN CAPITAL LETTER Y
0x5A0x005A  #   LATIN CAPITAL LETTER Z
0x5B0x005B  #   LEFT SQUARE BRACKET
0x5C0x005C  #   REVERSE SOLIDUS
0x5D0x005D  #   RIGHT SQUARE BRACKET
0x5E0x005E  #   CIRCUMFLEX ACCENT
0x5F0x005F  #   LOW LINE
0x600x0060  #   GRAVE ACCENT
0x610x0061  #   LATIN SMALL LETTER A
0x620x0062  #   LATIN SMALL LETTER B
0x630x0063  #   LATIN SMALL LETTER C
0x640x0064  #   LATIN SMALL LETTER D
0x650x0065  #   LATIN SMALL LETTER E
0x660x0066  #   LATIN SMALL LETTER F
0x670

Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-24 Thread Nick Coghlan

Guido van Rossum wrote:
> Right. That was my point. Nick's worried about undecorated __context__
> because he wants to endow generators with a different default
> __context__. I say no to both proposals and the worries cancel each
> other out. EIBTI.

Works for me.

That makes the resolutions for the posted issues:

1. The slot name "__context__" will be used instead of "__with__"
2. The builtin name "context" is currently offlimits due to its ambiguity
3a. generator-iterators do NOT have a native context
3b. Use "contextmanager" as a builtin decorator to get generator-contexts
4. The __context__ slot will NOT be special cased

I'll add those into the PEP and reference this thread after Martin is done 
with the SVN migration.

However, those resolutions bring up the following issues:

   5 a. What exception is raised when EXPR does not have a __context__ method?
 b.  What about when the returned object is missing __enter__ or __exit__?
I suggest raising TypeError in both cases, for symmetry with for loops.
The slot check is made in C code, so I don't see any difficulty in raising
TypeError instead of AttributeError if the relevant slots aren't filled.

   6 a. Should a generic "closing" context manager be provided?
 b. If yes, should it be a builtin or in a "contexttools" module?
I'm not too worried about this one for the moment, and it could easily be
left out of the PEP itself. Of the sample managers, it seems the most
universally useful, though.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread M.-A. Lemburg

Walter Dörwald wrote:
>>>I'd like to suggest a small cosmetic change: gencodec.py should output
>>>byte values with two hexdigits instead of four. This makes it easier to
>>>see what is a byte values and what is a codepoint. And it would make
>>>grepping for stuff simpler.
>>
>>True.
>>
>>I'll rerun the creation with the above changes sometime this
>>week.
> 
> 
> Great, thanks!

Done.

I had to create three custom mapping files for cp1140, koi8-u
and tis-620.

If you want more non-standard charmap codecs converted, please
send me the mapping files in the Unicode standard format for
these files.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Paolino

I'm not sure I understood completely the idea but deriving freeze 
function from hash gives hash a wider importance.
Is __hash__=id inside a class enough to use a set (sets.Set before 2.5) 
derived class instance as a key to a mapping?
Sure I missed the point.


Regards Paolino

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-24 Thread Guido van Rossum

On 10/24/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> That makes the resolutions for the posted issues:
>
> 1. The slot name "__context__" will be used instead of "__with__"
> 2. The builtin name "context" is currently offlimits due to its ambiguity
> 3a. generator-iterators do NOT have a native context
> 3b. Use "contextmanager" as a builtin decorator to get generator-contexts
> 4. The __context__ slot will NOT be special cased

+1

> I'll add those into the PEP and reference this thread after Martin is done
> with the SVN migration.
>
> However, those resolutions bring up the following issues:
>
>5 a. What exception is raised when EXPR does not have a __context__ method?
>  b.  What about when the returned object is missing __enter__ or __exit__?
> I suggest raising TypeError in both cases, for symmetry with for loops.
> The slot check is made in C code, so I don't see any difficulty in raising
> TypeError instead of AttributeError if the relevant slots aren't filled.

Why are you so keen on TypeError? I find AttributeError totally
appropriate. I don't see symmetry with for-loops as a valuable
property here. AttributeError and TypeError are often interchangeable
anyway.

>6 a. Should a generic "closing" context manager be provided?

No. Let's provide the minimal mechanisms FIRST.

>  b. If yes, should it be a builtin or in a "contexttools" module?
> I'm not too worried about this one for the moment, and it could easily be
> left out of the PEP itself. Of the sample managers, it seems the most
> universally useful, though.

Let's leave some examples just be examples.

I think I'm leaning towards adding __context__ to locks (all types
defined in tread or threading, including condition variables), files,
and decimal.Context, and leave it at that.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Gary Poster

On Oct 23, 2005, at 6:43 PM, Barry Warsaw wrote:

> I've had this PEP laying around for quite a few months.  It was  
> inspired
> by some code we'd written which wanted to be able to get immutable
> versions of arbitrary objects.  I've finally finished the PEP,  
> uploaded
> a sample patch (albeit a bit incomplete), and I'm posting it here  
> to see
> if there is any interest.
>
> http://www.python.org/peps/pep-0351.html

I like this.  I'd like it better if it integrated with the adapter  
PEP, so that the freezing mechanism for a given type could be  
pluggable, and could be provided even if the original object did not  
contemplate it.  I don't know where the adapter PEP stands: skimming  
through the (most recent?) thread in January didn't give me a clear  
idea.

As another poster mentioned, in-place freezing is also of interest to  
me (and why I read the PEP Initially), but as also as mentioned  
that's probably unrelated to your PEP.

Gary
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] int(string)

2005-10-24 Thread Alan McIntyre

Fredrik Lundh wrote:

>does a plain
>
>a = -100.0
>
>still work on your machine?
>
D'oh - I seriously broke something, then, because it didn't. 
funny_falcon commented on the patch in SF and suggested a change that
took care of that.  I've uploaded the corrected version of the patch,
which now passes all the tests.

Alan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Raymond Hettinger

[Barry Warsaw]
> I've had this PEP laying around for quite a few months.  It was
inspired
> by some code we'd written which wanted to be able to get immutable
> versions of arbitrary objects.


* FWIW, the _as_immutable() protocol was dropped from sets.py for a
reason.  User reports indicated that it was never helpful in practice.
It added complexity and confusion without producing offsetting benefits.

* AFAICT, there are no use cases for freezing arbitrary objects when the
object types are restricted to just lists and sets but not dicts,
arrays, or other containers.  Even if the range of supported types were
expanded, what applications could use this?  Most apps cannot support
generic substitution of lists and sets -- they have too few methods in
common -- they are almost never interchangeable.

* I'm concerned that generic freezing leads to poor design and
hard-to-find bugs.  One class of bugs results from conflating ordered
and unordered collections as lookup keys.  It is difficult to assess
program correctness when the ordered/unordered distinction has been
abstracted away.  A second class of errors can arise when the original
object mutates and gets out-of-sync with its frozen counterpart.

* For a rare app needing mutable lookup keys, a simple recipe would
suffice:

freeze_pairs = [(list, tuple), (set, frozenset)]

def freeze(obj):
try:
hash(obj)
except TypeError:
for sourcetype, desttype in freeze_pairs:
if isinstance(obj, sourcetype):
return desttype(obj)
raise
else:
return obj

Unlike the PEP, the recipe works with older pythons and is trivially
easy to extend to include other containers.

* The name "freeze" is problematic because it suggests an in-place
change.  Instead, the proposed mechanism creates a new object.  In
contrast, explicit conversions like tuple(l) or frozenset(s) are obvious
about their running time, space consumed, and new object identity.  

Overall, I'm -1 on the PEP.  Like a bad C macro, the proposed
abstraction hides too much.  We lose critical distinctions of ordered vs
unordered, mutable vs immutable, new objects vs in-place change, etc.
Without compelling use cases, the mechanism smells like a
hyper-generalization.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen

> >I'm thinking about making all character strings Unicode (possibly with
> >different internal representations a la NSString in Apple's Objective
> >C) and introduce a separate mutable bytes array data type. But I could
> >use some validation or feedback on this idea from actual
> >practitioners.

+1 from me, too.

> I'm tempted to say it would be even better if there was a command line 
> option that could be used to force all binary opens to result in bytes, and 
> require all text opens to specify an encoding.

I like this idea, too.  Presumably plain "open(FILENAME, MODE)" would
then result in a binary open (no encoding specified), which I've
wanted for a long time (and which makes sense).  But it is a change.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen

> Python should allow strings to
> contain any Unicode character and should be indexable yielding
> characters rather than half characters. Therefore Python strings
> should appear to be UTF-32.

+1.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread Phil Thompson

I'm implementing a string-like object in an extension module and trying to 
make it as interoperable with the standard string object as possible. To do 
this I'm implementing the relevant slots and the buffer interface. For most 
things this is fine, but there are a small number of methods in 
stringobject.c that don't use the buffer interface - and I don't understand 
why.

Specifically...

string_contains() doesn't which means that...

MyString("foo") in "foobar"

...doesn't work.

s.join(sequence) only allows sequence to contain string or unicode objects.

s.strip([chars]) only allows chars to be a string or unicode object. Same for 
lstrip() and rstrip().

s.ljust(width[, fillchar]) only allows fillchar to be a string object (not 
even a unicode object). Same for rjust() and center().

Other methods happily allow types that support the buffer interface as well as 
string and unicode objects.

I'm happy to submit a patch - I just wanted to make sure that this behaviour 
wasn't intentional for some reason.

Thanks,
Phil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread Guido van Rossum

On 10/24/05, Phil Thompson <[EMAIL PROTECTED]> wrote:
> I'm implementing a string-like object in an extension module and trying to
> make it as interoperable with the standard string object as possible. To do
> this I'm implementing the relevant slots and the buffer interface. For most
> things this is fine, but there are a small number of methods in
> stringobject.c that don't use the buffer interface - and I don't understand
> why.
>
> Specifically...
>
> string_contains() doesn't which means that...
>
> MyString("foo") in "foobar"
>
> ...doesn't work.
>
> s.join(sequence) only allows sequence to contain string or unicode objects.
>
> s.strip([chars]) only allows chars to be a string or unicode object. Same for
> lstrip() and rstrip().
>
> s.ljust(width[, fillchar]) only allows fillchar to be a string object (not
> even a unicode object). Same for rjust() and center().
>
> Other methods happily allow types that support the buffer interface as well as
> string and unicode objects.
>
> I'm happy to submit a patch - I just wanted to make sure that this behaviour
> wasn't intentional for some reason.

A concern I'd have with fixing this is that Unicode objects also
support the buffer API. In any situation where either str or unicode
is accepted I'd be reluctant to guess whether a buffer object was
meant to be str-like or Unicode-like. I think this covers all the
cases you mention here.

We need to support this better in Python 3000; but I'm not sure you
can do much better in Python 2.x; subclassing from str is unlikely to
work for you because then too many places are going to assume the
internal representation is also the same as for str.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface instringobject.c

2005-10-24 Thread Fredrik Lundh

Guido van Rossum wrote:

> A concern I'd have with fixing this is that Unicode objects also
> support the buffer API. In any situation where either str or unicode
> is accepted I'd be reluctant to guess whether a buffer object was
> meant to be str-like or Unicode-like. I think this covers all the
> cases you mention here.

iirc, SRE solves that by comparing the length of the sequence with the
number of bytes in the buffer.  if length == bytes, it's an 8-bit string; if
length*sizeof(Py_Unicode) == bytes, it's a Unicode string.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread M.-A. Lemburg

Guido van Rossum wrote:
> On 10/24/05, Phil Thompson <[EMAIL PROTECTED]> wrote:
> 
>>I'm implementing a string-like object in an extension module and trying to
>>make it as interoperable with the standard string object as possible. To do
>>this I'm implementing the relevant slots and the buffer interface. For most
>>things this is fine, but there are a small number of methods in
>>stringobject.c that don't use the buffer interface - and I don't understand
>>why.
>>
>>Specifically...
>>
>>string_contains() doesn't which means that...
>>
>>MyString("foo") in "foobar"
>>
>>...doesn't work.
>>
>>s.join(sequence) only allows sequence to contain string or unicode objects.
>>
>>s.strip([chars]) only allows chars to be a string or unicode object. Same for
>>lstrip() and rstrip().
>>
>>s.ljust(width[, fillchar]) only allows fillchar to be a string object (not
>>even a unicode object). Same for rjust() and center().
>>
>>Other methods happily allow types that support the buffer interface as well as
>>string and unicode objects.
>>
>>I'm happy to submit a patch - I just wanted to make sure that this behaviour
>>wasn't intentional for some reason.
> 
> 
> A concern I'd have with fixing this is that Unicode objects also
> support the buffer API. In any situation where either str or unicode
> is accepted I'd be reluctant to guess whether a buffer object was
> meant to be str-like or Unicode-like. I think this covers all the
> cases you mention here.

This situation is a little better than that: the buffer
interface has a slot called getcharbuffer which is what
the string methods use in case they find that a string
argument is not of type str or unicode.

A few don't, but I guess we could fix this.

str.split(), .[lr]strip() all support the getcharbuffer
interface. str.join() currently doesn't. The Unicode object also
leaves out a few cases, among those the ones you mentioned.
If it's better for inter-op, I guess we should make an effort
and let all of them support the getcharbuffer interface.

> We need to support this better in Python 3000; but I'm not sure you
> can do much better in Python 2.x; subclassing from str is unlikely to
> work for you because then too many places are going to assume the
> internal representation is also the same as for str.

As first step, I'd suggest to implement the gatcharbuffer
slot. That will already go a long way.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread Guido van Rossum

On 10/24/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > A concern I'd have with fixing this is that Unicode objects also
> > support the buffer API. In any situation where either str or unicode
> > is accepted I'd be reluctant to guess whether a buffer object was
> > meant to be str-like or Unicode-like. I think this covers all the
> > cases you mention here.
>
> This situation is a little better than that: the buffer
> interface has a slot called getcharbuffer which is what
> the string methods use in case they find that a string
> argument is not of type str or unicode.

I stand corrected!

> As first step, I'd suggest to implement the gatcharbuffer
> slot. That will already go a long way.

Phil, if anything still doesn't work after doing what Marc-Andre says,
those would be good candidates for fixes!

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread Phil Thompson

On Monday 24 October 2005 7:39 pm, Guido van Rossum wrote:
> On 10/24/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> > Guido van Rossum wrote:
> > > A concern I'd have with fixing this is that Unicode objects also
> > > support the buffer API. In any situation where either str or unicode
> > > is accepted I'd be reluctant to guess whether a buffer object was
> > > meant to be str-like or Unicode-like. I think this covers all the
> > > cases you mention here.
> >
> > This situation is a little better than that: the buffer
> > interface has a slot called getcharbuffer which is what
> > the string methods use in case they find that a string
> > argument is not of type str or unicode.
>
> I stand corrected!
>
> > As first step, I'd suggest to implement the gatcharbuffer
> > slot. That will already go a long way.
>
> Phil, if anything still doesn't work after doing what Marc-Andre says,
> those would be good candidates for fixes!

I have implemented getcharbuffer - I was highlighting those methods where the 
getcharbuffer implementation was ignored.

I'll put a patch together.

Phil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis

Neil Hodgson wrote:
>For Windows, the code will get a little uglier, needing to perform
> an allocation/encoding and deallocation more often then at present but
> I don't think there will be a speed degradation as Windows is
> currently performing a conversion from 8 bit to UTF-16 inside many
> system calls.
[...]
> 
>For indexing UTF-16, a flag could be set to show if the string is
> all in the base plane and if not, an index could be constructed when
> and if needed.

There are many design alternatives: one option would be to support
*three* internal representations in a single type, generating the
others from the one operation existing as needed. The default, initial
representation might be UTF-8, with UCS-4 only being generated when
indexing occurs, and UCS-2 only being generated when the API requires
it. On concatenation, always concatenate just one represenation: either
one that is already present in both operands, else UTF-8.

 > It'd be good to get some feel for what proportion of
> string operations performed require indexing. Many, such as
> startswith, split, and concatenation don't require indexing. The
> proportion of operations that use indexing to scan strings would also
> be interesting as adding a (currentIndex, currentOffset) cursor to
> string objects would be another approach.

Indeed. My guess is that indexing is more common than you think,
especially when iterating over the string. Of course, iteration
could also operate on UTF-8, if you introduced string iterator
objects.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread Martin v. Löwis

Walter Dörwald wrote:
> Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put 
> a complete decoding_table into koi8_u.py?

Not sure. Unfortunately, the tables being used as source are not part of
the Python source, so nobody except MAL can faithfully regenerate them.
If they were part of the Python source, explicitly adding one for
KOI8-U would certainly be feasible.

> I.e. change:
> 
> decoding_map.update({
> 0x0080: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

Hmm. I was suggesting to remove decoding_map completely, in which
case neither the current form nor your suggested cosmetic change
would survive.

> to
> 
> decoding_table = (
> u'\x00' # 0x00 -> U+ NULL

Using U+ in comments to denote the codepoints is a good idea,
anyway.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread Martin v. Löwis

M.-A. Lemburg wrote:
> I just left them in because I thought they wouldn't do any harm
> and might be useful in some applications.
> 
> Removing them where not directly needed by the codec would not
> be a problem.

I think memory usage caused is measurable (I estimated 4KiB per
dictionary). More importantly, people apparently currently change
the dictionaries we provide and expect the codecs to automatically
pick up the modified mappings. It would be better if the breakage
is explicit (i.e. they get an AttributeError on the variable) instead
of implicit (their changes to the mapping simply have no effect
anymore).

> KOI8-U is not available as mapping on ftp.unicode.org and
> I only recreated codecs from the mapping files available
> there.

I think we should come up with mapping tables for the additional
codecs as well, and maintain them in the CVS. This also applies
to things like rot13.

> I'll rerun the creation with the above changes sometime this
> week.

I hope I can finish my encoding routine shortly, which again
results in changes to the codecs (replacing the encoding dictionaries
with other lookup tables).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread Martin v. Löwis

M.-A. Lemburg wrote:

> I had to create three custom mapping files for cp1140, koi8-u
> and tis-620.

Can you please publish the files you have used somewhere? They
best go into the Python CVS.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis

M.-A. Lemburg wrote:
> There seems to be a general misunderstanding here: even if you
> have UCS4 storage, it is still possible to slice a Unicode
> string in a way which makes rendering it correctly.
   [impossible?]

> Unicode has the concept of combining code points, e.g. you can
> store an "é" (e with a accent) as "e" + "'". Now if you slice
> off the accent, you'll break the character that you encoded
> using combining code points.

While this is all true, I agree with Neil that it should do
whatever it does consistently across implementations, i.e.
len("\U0001") should always give the same result, and
I think this result should always be 1.

How to best implement this efficiently is an entirely different
question, as is the question whether you can render
arbitrary substrings in a meaningful way.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Antoine Pitrou


> There are many design alternatives: one option would be to support
> *three* internal representations in a single type, generating the
> others from the one operation existing as needed. The default, initial
> representation might be UTF-8, with UCS-4 only being generated when
> indexing occurs, and UCS-2 only being generated when the API requires
> it. On concatenation, always concatenate just one represenation: either
> one that is already present in both operands, else UTF-8.

Wouldn't it be simpler to use:
- one-byte representation if every character <= 0xFF
- two-byte representation if every character <= 0x
- four-byte representation otherwise

Then combining several strings means using the larger representation as
a result (*). In practice, most use cases will not involve the four-byte
representation.

(*) a heuristic can be invented so that, when producing a smaller string
(by stripping/slicing/etc.), it will "sometimes" check whether a
narrower representation is possible.
For example : store the length of the string when the last check
occurred, and do a new check when the length falls below the half that
value.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum

On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Indeed. My guess is that indexing is more common than you think,
> especially when iterating over the string. Of course, iteration
> could also operate on UTF-8, if you introduced string iterator
> objects.

Python's slice-and-dice model pretty much ensures that indexing is
common. Almost everything is ultimately represented as indices: regex
search results have the index in the API, find()/index() return
indices, many operations take a start and/or end index. As long as
that's the case, indexing better be fast.

Changing the APIs would be much work, although perhaps not impossible
of Python 3000. For example, Raymond Hettinger's partition() API
doesn't refer to indices at all, and can replace many uses of find()
or index().

Still, the mere existence of __getitem__ and __getslice__ on strings
makes it necessary to implement them efficiently. How realistic would
it be to drop them? What should replace them? Some kind of abstract
pointers-into-strings perhaps, but that seems much more complex.

The trick seems to be to support both simple programs manipulating
short strings (where indexing is probably the easiest API to
understand, and the additional copying is unlikely to cause
performance problems) , as well as  programs manipulating very large
buffers containing text and doing sophisticated string processing on
them. Perhaps we could provide a different kind of API to support the
latter, perhaps based on a mutable character buffer data type without
direct indexing?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis

Guido van Rossum wrote:
> Changing the APIs would be much work, although perhaps not impossible
> of Python 3000. For example, Raymond Hettinger's partition() API
> doesn't refer to indices at all, and can replace many uses of find()
> or index().

I think Neil's proposal is not to make them go away, but to implement
them less efficiently. For example, if the internal representation
is UTF-8, indexing requires linear time, as opposed to constant time.
If the internal representation is UTF-16, and you have a flag to
indicate whether there are any surrogates on the string, indexing
is constant if the flag is false, else linear.

> Perhaps we could provide a different kind of API to support the
> latter, perhaps based on a mutable character buffer data type without
> direct indexing?

There are different design goals conflicting here:
- some think: "all my data is ASCII, so I want to only use one
   byte per character".
- others think: "all my data goes to the Windows API, so I want
   to use 2 byte per character".
- yet others think: "I want all of Unicode, with proper, efficient
   indexing, so I want four bytes per char".

It's not so much a matter of API as a matter of internal
representation. The API doesn't have to change (except for the
very low-level C API that directly exposes Py_UNICODE*, perhaps).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum

On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > Changing the APIs would be much work, although perhaps not impossible
> > of Python 3000. For example, Raymond Hettinger's partition() API
> > doesn't refer to indices at all, and can replace many uses of find()
> > or index().
>
> I think Neil's proposal is not to make them go away, but to implement
> them less efficiently. For example, if the internal representation
> is UTF-8, indexing requires linear time, as opposed to constant time.
> If the internal representation is UTF-16, and you have a flag to
> indicate whether there are any surrogates on the string, indexing
> is constant if the flag is false, else linear.

I understand all that. My point is that it's a bad idea to offer an
indexing operation that isn't O(1).

> > Perhaps we could provide a different kind of API to support the
> > latter, perhaps based on a mutable character buffer data type without
> > direct indexing?
>
> There are different design goals conflicting here:
> - some think: "all my data is ASCII, so I want to only use one
>byte per character".
> - others think: "all my data goes to the Windows API, so I want
>to use 2 byte per character".
> - yet others think: "I want all of Unicode, with proper, efficient
>indexing, so I want four bytes per char".

I doubt the last one though. Probably they really don't want efficient
indexing, they want to perform higher-level operations that currently
are only possible using efficient indexing or slicing. With the right
API. perhaps they could work just as efficiently with an internal
representation of UTF-8.

> It's not so much a matter of API as a matter of internal
> representation. The API doesn't have to change (except for the
> very low-level C API that directly exposes Py_UNICODE*, perhaps).

I think the API should reflect the representation *to some extend*,
namely it shouldn't claim to have operations that are typically
thought of as O(1) that can only be implemented as O(n). An internal
representation of UTF-8 might make everyone happy except heavy Windows
users; but it requires changes to the API so people won't be writing
Python 2.x-style string slinging code.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis

Antoine Pitrou wrote:
>>There are many design alternatives:
> 
> Wouldn't it be simpler to use:
> - one-byte representation if every character <= 0xFF
> - two-byte representation if every character <= 0x
> - four-byte representation otherwise

As I said: there are many alternatives. This one has the
disadvantage of requiring a copy every time you pass the string
to a Win32 function (which expects UTF-16).

Whether or not this is a significant disadvantage, I don't know.

In any case, a multi-representations implementation has the
disadvantage of making the C API more difficult to use, in
particular for writing codecs. On encoding, it is difficult
to fetch the individual characters which you need for the
lookup table; on decoding, it is difficult to know in advance
what representation to use (unless you know there is an upper
bound on the decoded character ordinals).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson

M.-A. Lemburg:

> Unicode has the concept of combining code points, e.g. you can
> store an "é" (e with a accent) as "e" + "'". Now if you slice
> off the accent, you'll break the character that you encoded
> using combining code points.
> ...
> next_(u, index) -> integer
>
> Returns the Unicode object index for the start of the next
>  found after u[index] or -1 in case no next element
> of this type exists.

   Should entity breakage be further discouraged by returning a slice
here rather than an object index?

   Something like:

i = first_grapheme(u)
x = 0
while x < width and u[i] != "\n":
   x, _ = draw(u[i], (x, y))
   i = next_grapheme(u, i)

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen

> > - yet others think: "I want all of Unicode, with proper, efficient
> >indexing, so I want four bytes per char".
> 
> I doubt the last one though. Probably they really don't want efficient
> indexing, they want to perform higher-level operations that currently
> are only possible using efficient indexing or slicing. With the right
> API. perhaps they could work just as efficiently with an internal
> representation of UTF-8.

I just got mail this morning from a researcher who wants exactly what
Martin described, and wondered why the default MacPython 2.4.2 didn't
provide it by default. :-)

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum

On 10/24/05, Bill Janssen <[EMAIL PROTECTED]> wrote:
> > > - yet others think: "I want all of Unicode, with proper, efficient
> > >indexing, so I want four bytes per char".
> >
> > I doubt the last one though. Probably they really don't want efficient
> > indexing, they want to perform higher-level operations that currently
> > are only possible using efficient indexing or slicing. With the right
> > API. perhaps they could work just as efficiently with an internal
> > representation of UTF-8.
>
> I just got mail this morning from a researcher who wants exactly what
> Martin described, and wondered why the default MacPython 2.4.2 didn't
> provide it by default. :-)

Oh, I don't doubt that they want it. But often they don't *need* it,
and the higher-level goal they are trying to accomplish can be dealt
with better in a different way. (Sort of my response to people asking
for static typing in Python as well. :-)

Did they tell you what they were trying to do that MacPython 2.4.2
wouldn't let them, beyond "represent a large Unicode string as an
array of 4-byte integers"?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing

Guido van Rossum wrote:

> I think the API should reflect the representation *to some extend*,
> namely it shouldn't claim to have operations that are typically
> thought of as O(1) that can only be implemented as O(n).

Maybe a compromise could be reached by using a
btree of chunks or something, so indexing is
O(log n). Not as good as O(1) but a lot better
than O(n).

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing

Guido van Rossum wrote:

> Python's slice-and-dice model pretty much ensures that indexing is
> common. Almost everything is ultimately represented as indices: regex
> search results have the index in the API, find()/index() return
> indices, many operations take a start and/or end index.

Maybe the idea of string views should be reconsidered in
light of this. It's been criticised on the grounds that
its use could keep large strings alive longer than needed,
but if operations that currently return indices instead
returned string views, this wouldn't be any more of a
concern than it is now, especially if there is an easy
way to explicitly materialise the view as an independent
string when wanted.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen

Guido writes:
> Oh, I don't doubt that they want it. But often they don't *need* it,
> and the higher-level goal they are trying to accomplish can be dealt
> with better in a different way. (Sort of my response to people asking
> for static typing in Python as well. :-)

I suppose that's true.  But what if they're not smart enough to figure
out that better, different, way?  I doubt you intend Python to be sort
of the Rubik's cube of programming...

And no, he didn't say why he wanted the ability to "represent a
Unicode string as an array of 4-byte integers".  Though I know he's
doing something with the Deseret Alphabet, translating some early work
on American Indian culture that was transcribed in that character set.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] AST branch is in?

2005-10-24 Thread Simon Burton

On Fri, 21 Oct 2005 18:32:22 + (UTC)
nas at arctrix.com (Neil Schemenauer) wrote:

> > Does it just allow us to do new and interesting manipulations of
> > the code during compilation?
> 
> Well, that's a pretty big deal, IMHO. For example, adding
> pychecker-like functionality should be straight forward now. I also
> hope some of the namespace optimizations get explored (e.g. PEP
> 267).

Is there a python interface ?

Simon.



-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

55 matches

Mail list logo