Martin v. Löwis wrote:
>> ci = codecs.lookup("xml-auto-detect")
>> p = expat.ParserCreate()
>> e = "utf-32"
>> s = (u"" % e).encode(e)
>> s = ci.encode(ci.decode(s)[0], encoding="utf-8")[0]
>> p.Parse(s, True)
>
> So how come the document being parsed is recognized as UTF-8?
Because you can forc
Adam Olsen wrote:
> On 11/8/07, Walter Dörwald <[EMAIL PROTECTED]> wrote:
>> [...]
Furthermore encoding-detection might be part of the responsibility of
the XML parser, but this decoding phase is totally distinct from the
parsing phase, so why not put the decoding into a common libr
>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
>> codecs to do the encoding. There's no need to create a magical
>> mystery codec to pick out which though.
>
> So the code is good, if it is inside an XML parser, and it's bad if it
> is inside a codec?
Exactly so. This fun
> Because you can force the encoder to use a specified encoding. If you do
> this and the unicode string starts with an XML declaration
So what if the unicode string doesn't start with an XML declaration?
Will it add one? If so, what version number will it use?
>>> OK, so should I put the C code
Martin v. Löwis wrote:
>> Because you can force the encoder to use a specified encoding. If you do
>> this and the unicode string starts with an XML declaration
>
> So what if the unicode string doesn't start with an XML declaration?
> Will it add one?
No.
> If so, what version number will it u
Martin v. Löwis wrote:
>>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
>>> codecs to do the encoding. There's no need to create a magical
>>> mystery codec to pick out which though.
>> So the code is good, if it is inside an XML parser, and it's bad if it
>> is inside a cod
On 2007-11-09 14:10, Walter Dörwald wrote:
> Martin v. Löwis wrote:
Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
codecs to do the encoding. There's no need to create a magical
mystery codec to pick out which though.
>>> So the code is good, if it is inside an
Walter Dörwald wrote:
> Martin v. Löwis wrote:
Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
codecs to do the encoding. There's no need to create a magical
mystery codec to pick out which though.
>>> So the code is good, if it is inside an XML parser, and it's
M.-A. Lemburg wrote:
> On 2007-11-09 14:10, Walter Dörwald wrote:
>> Martin v. Löwis wrote:
> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
> codecs to do the encoding. There's no need to create a magical
> mystery codec to pick out which though.
So the code
On Nov 9, 2007, at 8:22 AM, M.-A. Lemburg wrote:
> FWIW: I'm +1 on adding such a codec.
I'm undecided, and really don't feel strongly either way.
> It makes working with XML data a lot easier: you simply don't have to
> bother with the encoding of the XML data anymore and can just let the
> codec
Hello!
Guido has granted me committer privileges to svn.python.org and
bugs.python.org about a week ago. So I'm new and new people tend to make
mistakes until they've learned the specific rules of a project.
Today I've learned that the resolution keyword "accepted" doesn't mean
the bug report is
Christian Heimes wrote:
> (*) It's missing from the list of resolutions but I like to have it
> added. http://psf.upfronthosting.co.za/roundup/meta/issue167
Update:
Georg Brandl pointed out that it makes more sense to add confirmed to
status.
Christian
ACTIVITY SUMMARY (11/02/07 - 11/09/07)
Tracker at http://bugs.python.org/
To view or respond to any of the issues listed below, click on the issue
number. Do NOT respond to this message.
1322 open (+23) / 11575 closed (+18) / 12897 total (+41)
Open issues with patches: 418
Average durati
>> So what if the unicode string doesn't start with an XML declaration?
>> Will it add one?
>
> No.
Ok. So the XML document would be ill-formed then unless the encoding is
UTF-8, right?
> The point of this code is not just to return whether the string starts
> with " * The string does start wi
> And what do you do once you've detected the encoding? You decode the
> input, so why not combine both into an XML decoder?
Because it is the XML parser that does the decoding, not the
application. Also, it is better to provide functionality in
a modular manner (i.e. encoding detection separately
On Nov 9, 2007 6:10 AM, Walter Dörwald <[EMAIL PROTECTED]> wrote:
>
> Martin v. Löwis wrote:
> >>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
> >>> codecs to do the encoding. There's no need to create a magical
> >>> mystery codec to pick out which though.
> >> So the code
> It makes working with XML data a lot easier: you simply don't have to
> bother with the encoding of the XML data anymore and can just let the
> codec figure out the details. The XML parser can then work directly
> on the Unicode data.
Having the functionality indeed makes things easier. However,
> In fact, we already have such a codec. The utf-16 decoder looks at the
> first two bytes and then decides to forward the rest to either a
> utf-16-be or a utf-16-le decoder.
That's different. UTF-16 is a proper encoding that is just specified
to use the BOM. "xml-auto-detection" is not an encodi
> It's clear to me that detecting an encoding is actually the simplest
> part of all this (so long as there's an API to do it!) Putting it
> inside a codec seems like the wrong subdivision of responsibility.
In case it isn't clear - this is exactly my view also.
Regards,
Martin
_
Martin v. Löwis wrote:
>> It makes working with XML data a lot easier: you simply don't have to
>> bother with the encoding of the XML data anymore and can just let the
>> codec figure out the details. The XML parser can then work directly
>> on the Unicode data.
>
> Having the functionality indee
On Nov 9, 2007 3:59 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
> >> It makes working with XML data a lot easier: you simply don't have to
> >> bother with the encoding of the XML data anymore and can just let the
> >> codec figure out the details. The XML parser can then
> Not really, but the codec has more control over what happens to
> the stream, ie. it's easier to implement look-ahead in the codec
> than to do the detection and then try to push the bytes back onto
> the stream (which may or may not be possible depending on the
> nature of the stream).
YAGNI.
"Martin v. Löwis" writes:
> > It's clear to me that detecting an encoding is actually the simplest
> > part of all this (so long as there's an API to do it!) Putting it
> > inside a codec seems like the wrong subdivision of responsibility.
>
> In case it isn't clear - this is exactly my vie
Martin v. Löwis wrote:
>> Not really, but the codec has more control over what happens to
>> the stream, ie. it's easier to implement look-ahead in the codec
>> than to do the detection and then try to push the bytes back onto
>> the stream (which may or may not be possible depending on the
>> natu
To follow up, I now have a patch. It's pretty straightforward.
This implements the kind of syntax that I believe won over most folks
in the end:
@property
def foo(self): ...
@foo.setter
def foo(self, value=None): ...
There are also .getter and .deleter descriptors. This includes the ha
D'oh. I forgot to point to the patch. It's here:
http://bugs.python.org/issue1416
On Nov 9, 2007 10:00 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> To follow up, I now have a patch. It's pretty straightforward.
>
> This implements the kind of syntax that I believe won over most folks
> in the
There is only one week left for PyCon tutorial & scheduled talk proposals. If
you've been thinking about making a proposal, now's the time!
Tutorial details and instructions here:
http://us.pycon.org/2008/tutorials/proposals/
Scheduled talk details and instructions here:
http://us.pycon.org/2008
27 matches
Mail list logo