Re: [Python-Dev] Unicode byte order mark decoding

2005-04-06 Thread Walter Dörwald
Martin v. Löwis sagte:
> Walter Dörwald wrote:
>> There are situations where the byte stream might be temporarily
>> exhausted, e.g. an XML parser that tries to support the
>> IncrementalParser interface, or when you want to decode
>> encoded data piecewise, because you want to give a progress
>> report.
>
> Yes, but these are not file-like objects.

True, on the outside there are no file-like objects. But the
IncrementalParser gets passed the XML bytes in chunks,
so it has to use a stateful decoder for decoding. Unfortunately
this means that is has to use a stream API. (See
http://www.python.org/sf/1101097 for a patch that somewhat
fixes that.)

(Another option would be to completely ignore the stateful API
and handcraft stateful decoding (or only support stateless
decoding), like most XML parsers for Python do now.)

> In the IncrementalParser,
> it is *not* the case that a read operation returns an empty
> string. Instead, the application repeatedly feeds data explicitly.

That's true, but the parser has to wrap this data into an object
that can be passed to the StreamReader constructor. (See the
Queue class in Lib/test/test_codecs.py for an example.)

> For a file-like object, returning "" indicates EOF.

Not neccassarily. In the example above the IncrementalParser
gets fed a chunk of data, it stuffs this data into the Queue,
so that the StreamReader can decode it. Once the data
from the Queue is exhausted, there won't any further
data until the user calls feed() on the IncrementalParser again.

Bye,
   Walter Dörwald



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-06 Thread Stephen J. Turnbull
> "Martin" == Martin v Löwis <[EMAIL PROTECTED]> writes:

Martin> I can't put these two paragraphs together. If you think
Martin> that explicit is better than implicit, why do you not want
Martin> to make different calls for the first chunk of a stream,
Martin> and the subsequent chunks?

Because the signature/BOM is not a chunk, it's a header.  Handling the
signature/BOM is part of stream initialization, not translation, to my
mind.

The point is that explicitly using a stream shows that initialization
(and finalization) matter.  The default can be BOM or not, as a
pragmatic matter.  But then the stream data itself can be treated
homogeneously, as implied by the notion of stream.

I think it probably also would solve Walter's conundrum about
buffering the signature/BOM if responsibility for that were moved out
of the codecs and into the objects where signatures make sense.

I don't know whether that's really feasible in the short run---I
suspect there may be a lot of stream-like modules that would need to
be updated---but it would be a saner in the long run.

>> Yes!  Exactly (except in reverse, we want to _read_ from the
>> slurped stream-as-string, not write to one)!  ... and there's
>> no need for a utf-8-sig codec for strings, since you can
>> support the usage in exactly this way.

Martin> However, if there is an utf-8-sig codec for streams, there
Martin> is currently no way of *preventing* this codec to also be
Martin> available for strings. The very same code is used for
Martin> streams and for strings, and automatically so.

And of course it should be.  But if it's not possible to move the -sig
facility out of the codecs into the streams, that would be a shame.  I
think we should encourage people to use streams where initialization or
finalization semantics are non-trivial, as they are with signatures.

But as long as both utf-8-we-dont-need-no-steenkin-sigs-in-strings and
utf-8-sig are available, I can program as I want to (and refer those
whose strings get cratered by stray BOMs to you).

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-06 Thread Walter Dörwald
Stephen J. Turnbull wrote:
"Martin" == Martin v Löwis <[EMAIL PROTECTED]> writes:
Martin> I can't put these two paragraphs together. If you think
Martin> that explicit is better than implicit, why do you not want
Martin> to make different calls for the first chunk of a stream,
Martin> and the subsequent chunks?
Because the signature/BOM is not a chunk, it's a header.  Handling the
signature/BOM is part of stream initialization, not translation, to my
mind.
The point is that explicitly using a stream shows that initialization
(and finalization) matter.  The default can be BOM or not, as a
pragmatic matter.  But then the stream data itself can be treated
homogeneously, as implied by the notion of stream.
I think it probably also would solve Walter's conundrum about
buffering the signature/BOM if responsibility for that were moved out
of the codecs and into the objects where signatures make sense.
Not really. In every encoding where a sequence of more than one byte 
maps to one Unicode character, you will always need some kind of 
buffering. If we remove the handling of initial BOMs from the codecs 
(except for UTF-16 where it is required), this wouldn't change any 
buffering requirements.

I don't know whether that's really feasible in the short run---I
suspect there may be a lot of stream-like modules that would need to
be updated---but it would be a saner in the long run.
I'm not exactly sure, what you're proposing here. That all codecs (even 
UTF-16) pass the BOM through and some other infrastructure is 
responsible for dropping it?

[...]
Bye,
   Walter Dörwald
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] longobject.c & ob_size

2005-04-06 Thread Michael Hudson
Tim Peters <[EMAIL PROTECTED]> writes:

> [Michael Hudson]
>> Asking mostly for curiousity, how hard would it be to have longs store
>> their sign bit somewhere less aggravating?
>
> Depends on where that is.
>
>> It seems to me that the top bit of ob_digit[0] is always 0, for example,
>
> Yes, the top bit of ob_digit[i], for all relevant i, is 0 on all
> platforms now.
>
>> and I'm sure this would result no less convolution in longobject.c it'd be
>> considerably more localized convolution.
>
> I'd much rather give struct _longobject a distinct sign member (say, 0
> == zero, -1 = non-zero negative, 1 == non-zero positive). 

Well, that would indeed be simpler.

> That would simplify code.  It would cost no extra bytes for some
> longs, and 8 extra bytes for others (since obmalloc rounds up to a
> multiple of 8); I don't care about that (e.g., I never use millions
> of longs simultaneously, but often use a few dozen very big longs
> simultaneously; the memory difference is in the noise then).
>
> Note that longintrepr.h isn't included by Python.h.  Only longobject.h
> is, and longobject.h doesn't reveal the internal structure of longs. 
> IOW, changing the internal layout of longs shouldn't even hurt binary
> compatibility.

Bonus.

> The ob_size member of PyObject_VAR_HEAD would also be redeclared as
> size_t in an ideal world.

As nature intended.

I might do a patch, at some point...

Cheers,
mwh

-- 
  Indeed, when I design my killer language, the identifiers "foo" and
  "bar" will be reserved words, never used, and not even mentioned in
  the reference manual. Any program using one will simply dump core
  without comment. Multitudes will rejoice. -- Tim Peters, 29 Apr 1998
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Python-checkins] python/dist/src/Modules mathmodule.c, 2.74, 2.75

2005-04-06 Thread Tim Peters
[EMAIL PROTECTED]
> Modified Files:
>mathmodule.c
> Log Message:
> Add a comment explaining the import of longintrepr.h.
> 
> Index: mathmodule.c
...
> #include "Python.h"
> -#include "longintrepr.h"
> +#include "longintrepr.h" // just for SHIFT

The intent is fine, but please use a standard C (not C++) comment. 
That is, /*...*/, not //.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] longobject.c & ob_size

2005-04-06 Thread Michael Hudson
Michael Hudson <[EMAIL PROTECTED]> writes:

> Tim Peters <[EMAIL PROTECTED]> writes:
>
>> [Michael Hudson]
>>> Asking mostly for curiousity, how hard would it be to have longs store
>>> their sign bit somewhere less aggravating?
>>
>> Depends on where that is.

[...]

>> I'd much rather give struct _longobject a distinct sign member (say, 0
>> == zero, -1 = non-zero negative, 1 == non-zero positive). 

I ended up doing -1 non-zero negative, 1 zero and positive, but I
don't know if this is really clearer than what you suggest overall.  I
suspect it's a wash.

[...]

> I might do a patch, at some point...

http://python.org/sf/119

Assigned to you, but unassign if you don't have time (testing the
patch is probably more worthwhile than reading it!).

Cheers,
mwh

-- 
  Linux: Horse. Like a wild horse, fun to ride. Also prone to
  throwing you and stamping you into the ground because it doesn't
  like your socks. -- Jim's pedigree of operating systems, asr
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inconsistency when swapping obj.__dict__ with a dict-like object...

2005-04-06 Thread Steven Bethard
On Apr 5, 2005 8:46 PM, Brett C. <[EMAIL PROTECTED]> wrote:
> Alex A. Naanou wrote:
> > Here there are two problems, the first is minor, and it is that
> > anything assigned to the __dict__ attribute is checked to be a
> > descendant of the dict class (mixing this in does not seem to work)...
> > and the second problem is a real annoyance, it is that the mapping
> > protocol supported by the Dict object in the example above is not used
> > by the attribute access mechanics (the same thing that once happened
> > in exec)...
>
> Actually, overriding __getattribute__() does work; __getattr__() and
> __getitem__() doesn't.  This was brought up last month at some point without
> any resolve (I think Steve Bethard pointed it out).

Yeah, here's the link:

http://mail.python.org/pipermail/python-dev/2005-March/051837.html

I've pointed out three possible "solutions" there, but they all have
some significant drawbacks.  I took the complete silence on the topic
as an indication that none of the options were acceptable.

STeVe
--
You can wordify anything if you just verb it.
   --- Bucky Katt, Get Fuzzy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inconsistency when swapping obj.__dict__ with a dict-like object...

2005-04-06 Thread Nick Coghlan
P.S. (IMHO) the type check here is not that necessary (at least in its
current state), as what we need to assert is not the relation to the
dict class but the support of the mapping protocol
The type-check is basically correct - as you have discovered, type & object use 
the PyDict_* API internally (for speed reasons, as I understand it), so 
supporting the mapping API is not really sufficient for something assigned to 
__dict__. Changing this for exec is one thing, as speed of access to the locals 
dict isn't likely to have a major impact on the overall performance of such 
code, but I would expect changing class dictionary access code in a similar way 
would have a major (detrimental) performance impact.

Depending on the use case, it is possible to work around the problem by defining 
__dict__, __getattribute__, __setattr__ and __delattr__ in the class. defining 
__dict__ sidesteps the type error, defining the other three methods then let's 
you get around the fact that the standard C-level dict pointer is no longer 
being updated, as well as making sure the general mapping API is used, rather 
than the concrete PyDict_* API. This is kinda ugly, but it works as long as any 
C code using the class __dict__ goes via the attribute access machinery and 
doesn't try to get the dictionary automatically supplied by Python by digging 
directly into the type structure.

=
from UserDict import DictMixin
class Dict(DictMixin):
def __init__(self, dct=None):
if dct is None:
dct = {}
self._dict = dct
def __getitem__(self, name):
return self._dict[name]
def __setitem__(self, name, value):
self._dict[name] = value
def __delitem__(self, name):
del self._dict[name]
def keys(self):
return self._dict.keys()
class A(object):
def __new__(cls, *p, **n):
o = object.__new__(cls)
super(A, o).__setattr__('__dict__', Dict())
return o
__dict__ = None
def __getattr__(self, attr):
try:
return self.__dict__[attr]
except KeyError:
raise AttributeError("%s" % attr)
def __setattr__(self, attr, value):
if attr in self.__dict__ or not hasattr(self, attr):
self.__dict__[attr] = value
else:
super(A, self).__setattr__(attr, value)
def __delattr__(self, attr):
if attr in self.__dict__:
del self.__dict__[attr]
else:
super(A, self).__delattr__(attr)
Py> a = A()
Py> a.__dict__._dict
{}
Py> a.xxx = 123
Py> a.__dict__._dict
{'xxx': 123}
Py> a.__dict__._dict['yyy'] = 321
Py> a.yyy
321
Py> a.__dict__._dict
{'xxx': 123, 'yyy': 321}
Py> del a.xxx
Py> a.__dict__._dict
{'yyy': 321}
Py> del a.xxx
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 21, in __delattr__
AttributeError: xxx
Py> a.__dict__ = {}
Py> a.yyy
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 11, in __getattr__
AttributeError: yyy
Cheers,
Nick.
--
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
http://boredomandlaziness.skystorm.net
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-06 Thread Martin v. Löwis
Stephen J. Turnbull wrote:
Because the signature/BOM is not a chunk, it's a header.  Handling the
signature/BOM is part of stream initialization, not translation, to my
mind.
I'm sorry, but I'm losing track as to what precisely you are trying to
say. You seem to be using a mental model that is entirely different
from mine.
The point is that explicitly using a stream shows that initialization
(and finalization) matter.  The default can be BOM or not, as a
pragmatic matter.  But then the stream data itself can be treated
homogeneously, as implied by the notion of stream.
But what follows from that point? So it shows some kind of matter...
what does that mean for actual changes to Python API?
I think it probably also would solve Walter's conundrum about
buffering the signature/BOM if responsibility for that were moved out
of the codecs and into the objects where signatures make sense.
I don't know whether that's really feasible in the short run---I
suspect there may be a lot of stream-like modules that would need to
be updated---but it would be a saner in the long run.
What is "that" which might be really feasible? To "solve Walter's
conundrum"? That "signatures make sense"?
So I can't really respond to your message in a meaningful way;
I just let it rest...
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Weekly Python Patch/Bug Summary

2005-04-06 Thread Kurt B. Kaiser
Patch / Bug Summary
___

Patches :  308 open (+11) /  2819 closed ( +7) /  3127 total (+18)
Bugs:  882 open (+11) /  4913 closed (+13) /  5795 total (+24)
RFE :  176 open ( +1) /   151 closed ( +1) /   327 total ( +2)

New / Reopened Patches
__

improvement of the script adaptation for the win32 platform  (2005-03-30)
   http://python.org/sf/1173134  opened by  Vivian De Smedt

unicodedata docstrings  (2005-03-30)
CLOSED http://python.org/sf/1173245  opened by  Jeremy Yallop

__slots__ for subclasses of variable length types  (2005-03-30)
   http://python.org/sf/1173475  opened by  Michael Hudson

Python crashes in pyexpat.c if malformed XML is parsed  (2005-03-31)
   http://python.org/sf/1173998  opened by  pdecat

hierarchical regular expression  (2005-04-01)
CLOSED http://python.org/sf/1174589  opened by  Chris Ottrey

site enhancements  (2005-04-01)
   http://python.org/sf/1174614  opened by  Bob Ippolito

Export more libreadline API functions  (2005-04-01)
   http://python.org/sf/1175004  opened by  Bruce Edge

Export more libreadline API functions  (2005-04-01)
CLOSED http://python.org/sf/1175048  opened by  Bruce Edge

Patch for whitespace enforcement  (2005-04-01)
CLOSED http://python.org/sf/1175070  opened by  Guido van Rossum

Allow weak referencing of classic classes  (2005-04-03)
   http://python.org/sf/1175850  opened by  Greg Chapman

threading.Condition.wait() return value indicates timeout  (2005-04-03)
   http://python.org/sf/1175933  opened by  Martin Blais

Make subprocess.Popen support file-like objects (win)  (2005-04-03)
   http://python.org/sf/1175984  opened by  Nicolas Fleury

Implemented new 'class foo():pass' syntax  (2005-04-03)
   http://python.org/sf/1176019  opened by  logistix

locale._build_localename treatment for utf8  (2005-04-05)
   http://python.org/sf/1176504  opened by  Hye-Shik Chang

Clarify unicode.(en|de)code.() docstrings  (2005-04-04)
CLOSED http://python.org/sf/1176578  opened by  Brett Cannon

UTF-8-Sig codec  (2005-04-05)
   http://python.org/sf/1177307  opened by  Walter Dörwald

Complex commented  (2005-04-06)
   http://python.org/sf/1177597  opened by  engelbert gruber

explicit sign variable for longs  (2005-04-06)
   http://python.org/sf/119  opened by  Michael Hudson

Patches Closed
__

unicodedata docstrings  (2005-03-30)
   http://python.org/sf/1173245  closed by  perky

hierarchical regular expression  (2005-04-01)
   http://python.org/sf/1174589  closed by  loewis

Export more libreadline API functions  (2005-04-01)
   http://python.org/sf/1175048  closed by  loewis

Patch for whitespace enforcement  (2005-04-01)
   http://python.org/sf/1175070  closed by  gvanrossum

ast for decorators  (2005-03-21)
   http://python.org/sf/1167709  closed by  nascheme

[ast branch] unicode literal fixes  (2005-03-25)
   http://python.org/sf/1170272  closed by  nascheme

Clarify unicode.(en|de)code.() docstrings  (2005-04-04)
   http://python.org/sf/1176578  closed by  bcannon

New / Reopened Bugs
___

very minor doc bug in 'listsort.txt'  (2005-03-30)
CLOSED http://python.org/sf/1173407  opened by  gyrof

quit should quit  (2005-03-30)
CLOSED http://python.org/sf/1173637  opened by  Matt Chaput

multiple broken links in profiler docs  (2005-03-30)
   http://python.org/sf/1173773  opened by  Ilya Sandler

Reading /dev/zero causes SystemError  (2005-04-01)
   http://python.org/sf/1174606  opened by  Adam Olsen

subclassing ModuleType and another built-in type  (2005-04-01)
   http://python.org/sf/1174712  opened by  Armin Rigo

PYTHONPATH is not working  (2005-04-01)
CLOSED http://python.org/sf/1174795  opened by  Alexander Belchenko

property example code error  (2005-04-01)
   http://python.org/sf/1175022  opened by  John Ridley

import statement likely to crash if module launches threads  (2005-04-01)
   http://python.org/sf/1175194  opened by  Jeff Stearns

python hangs if import statement launches threads  (2005-04-01)
CLOSED http://python.org/sf/1175202  opened by  Jeff Stearns

codecs.readline sometimes removes newline chars  (2005-04-02)
CLOSED http://python.org/sf/1175396  opened by  Irmen de Jong

poorly named variable in urllib2.py  (2005-04-03)
   http://python.org/sf/1175848  opened by  Roy Smith

StringIO and cStringIO don't provide 'name' attribute  (2005-04-03)
   http://python.org/sf/1175967  opened by  logistix

compiler module didn't get updated for "class foo():pass"  (2005-04-03)
   http://python.org/sf/1176012  opened by  logistix

Python garbage collector isn't detecting deadlocks  (2005-04-04)
CLOSED http://python.org/sf/1176467  opened by  Nathan Marushak

Readline segfault  (2005-04-05)
   http://python.org/sf/1176893  opened by  Walter Dörwald

[PyPI] Password reset problem.  (2005-04-05)
CLOSED http://python.org/sf/1177077  opened by  Darek Suchojad

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-06 Thread Nicholas Bastin
On Apr 5, 2005, at 6:19 AM, M.-A. Lemburg wrote:
Note that the UTF-16 codec is strict w/r to the presence
of the BOM mark: you get a UnicodeError if a stream does
not start with a BOM mark. For the UTF-8-SIG codec, this
should probably be relaxed to not require the BOM.
I've actually been confused about this point for quite some time now, 
but never had a chance to bring it up.  I do not understand why 
UnicodeError should be raised if there is no BOM.  I know that PEP-100 
says:

'utf-16': 16-bit variable length encoding (little/big 
endian)

and:
Note: 'utf-16' should be implemented by using and requiring byte order 
marks (BOM) for file input/output.

But this appears to be in error, at least in the current unicode 
standard.  'utf-16', as defined by the unicode standard, is big-endian 
in the absence of a BOM:

---
3.10.D42:  UTF-16 encoding scheme:
...
* The UTF-16 encoding scheme may or may not begin with a BOM.  However, 
when there is no BOM, and in the absence of a higher-level protocol, 
the byte order of the UTF-16 encoding scheme is big-endian.
---

The current implementation of the utf-16 codecs makes for some 
irritating gymnastics to write the BOM into the file before reading it 
if it contains no BOM, which seems quite like a bug in the codec.  I 
allow for the possibility that this was ambiguous in the standard when 
the PEP was written, but it is certainly not ambiguous now.

--
Nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-06 Thread Stephen J. Turnbull
> "Walter" == Walter Dörwald <[EMAIL PROTECTED]> writes:

Walter> Not really. In every encoding where a sequence of more
Walter> than one byte maps to one Unicode character, you will
Walter> always need some kind of buffering. If we remove the
Walter> handling of initial BOMs from the codecs (except for
Walter> UTF-16 where it is required), this wouldn't change any
Walter> buffering requirements.

Sure.  My point is that codecs should be stateful only to the extent
needed to assemble semantically meaningful units (ie, multioctet coded
characters).  In particular, they should not need to know about
location at the beginning, middle, or end of some stream---because in
the context of operating on a string they _can't_.

>> I don't know whether that's really feasible in the short
>> run---I suspect there may be a lot of stream-like modules that
>> would need to be updated---but it would be a saner in the long
>> run.

Walter> I'm not exactly sure, what you're proposing here. That all
Walter> codecs (even UTF-16) pass the BOM through and some other
Walter> infrastructure is responsible for dropping it?

Not exactly.  I think that at the lowest level codecs should not
implement complex mode-switching internally, but rather explicitly
abdicate responsibility to a more appropriate codec.

For example, autodetecting UTF-16 on input would be implemented by a
Python program that does something like

data = stream.read()
for detector in [ "utf-16-signature", "utf-16-statistical" ]:
# for the UTF-16 detectors, OUT will always be u"" or None
out, data, codec = data.decode(detector)
if codec: break
while codec:
more_out, data, codec = data.decode(codec)
out = out + more_out
if data:
# a real program would complain about it
pass
process(out)

where decode("utf-16-signature") would be implemented

def utf-16-signature-internal (data):
if data[0:2] == "\xfe\xff":
return (u"", data[2:], "utf-16-be")
else if data[0:2] == "\xff\xfe":
return (u"", data[2:], "utf-16-le")
else
# note: data is undisturbed if the detector fails
return (None, data, None)

The main point is that the detector is just a codec that stops when it
figures out what the next codec should be, touches only data that
would be incorrect to pass to the next codec, and leaves the data
alone if detection fails.  utf-16-signature only handles the BOM (if
present), and does not handle arbitrary "chunks" of data.  Instead, it
passes on the rest of the data (including the first chunk) to be
handled by the appropriate utf-16-?e codec.

I think that the temptation to encapsulate this logic in a utf-16
codec that "simplifies" things by calling the appropriate utf-16-?e
codec itself should be deprecated, but YMMV.  What I would really like
is for the above style to be easier to achieve than it currently is.

BTW, I appreciate your patience in exploring this; after Martin's
remark about different mental models I have to suspect this approach
is just somehow un-Pythonic, but fleshing it out this way I can see
how it will be useful in the context of a different project.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com