Re: [Python-Dev] XML codec?

2007-11-11 Thread M.-A. Lemburg
On 2007-11-10 09:54, Martin v. Löwis wrote:
>> A non-seekable stream is not all that uncommon in network processing.
> 
> Right. But what is the relationship to XML encoding autodetection?

It pops up whenever you need to detect the encoding of the
incoming XML data on the network connection, e.g. in XML RPC
or data upload mechanisms.

Even though XML data mostly uses UTF-8 in real life applications,
a standards compliant XML interface must also support other
possible encodings.

It is also not always feasible to load all data into memory, so
some form of buffering must be used.

Since incremental codecs already implement buffering, it's only
natural to let them take care of the auto detection.

This approach is also needed if you want to stack stream codecs
(not sure whether this is still possible in Py3, but that's how
I designed them for Py2).

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 11 2007)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-11 Thread Martin v. Löwis
> I don't know. Is an XML document ill-formed if it doesn't contain an
> XML declaration, is not in UTF-8 or UTF-8, but there's external
> encoding info?

If there is external encoding info, matching the actual encoding,
it would be well-formed. Of course, preserving that information would
be up to the application.

> This looks good. Now we would have to extent the code to detect and
> replace the encoding in the XML declaration too.

I'm still opposed to making this a codec. Right - for a pure Python
solution, the processing of the XML declaration would still need to
be implemented.

>> I think there could be a much simpler routine to have the same 
>> effect. - if it's less than 4 bytes, answer "need more data".
> 
> Can there be an XML document that is less then 4 bytes? I guess not.

No, the smallest document has exactly 4 characters (e.g. "").
However, external entities may be smaller, such as "x".

> But anyway: would a Python implementation of these two functions
> (detect_encoding()/fix_encoding()) be accepted?

I could agree to a Python implementation of this algorithm as long
as it's not packaged as a codec.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-11 Thread Martin v. Löwis
>>> A non-seekable stream is not all that uncommon in network processing.
>> Right. But what is the relationship to XML encoding autodetection?
> 
> It pops up whenever you need to detect the encoding of the
> incoming XML data on the network connection, e.g. in XML RPC
> or data upload mechanisms.

No, it doesn't. For XML-RPC, you pass the XML payload of the
HTTP request to the XML parser, and it deals with the encoding.

> It is also not always feasible to load all data into memory, so
> some form of buffering must be used.

Again, I don't see the use case. For XML-RPC, it's very feasible
and standard procedure to have the entire document in memory
(in a processed form).

> This approach is also needed if you want to stack stream codecs
> (not sure whether this is still possible in Py3, but that's how
> I designed them for Py2).

The design of the Py2 codecs is fairly flawed, unfortunately.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-11 Thread M.-A. Lemburg
On 2007-11-11 14:51, Martin v. Löwis wrote:
 A non-seekable stream is not all that uncommon in network processing.
>>> Right. But what is the relationship to XML encoding autodetection?
>> It pops up whenever you need to detect the encoding of the
>> incoming XML data on the network connection, e.g. in XML RPC
>> or data upload mechanisms.
> 
> No, it doesn't. For XML-RPC, you pass the XML payload of the
> HTTP request to the XML parser, and it deals with the encoding.

First, XML-RPC is not the only mechanism using XML over a network
connection. Second, you don't want to do this if you're dealing
with several 100 MB of data just because you want to figure
out the encoding.

>> It is also not always feasible to load all data into memory, so
>> some form of buffering must be used.
> 
> Again, I don't see the use case. For XML-RPC, it's very feasible
> and standard procedure to have the entire document in memory
> (in a processed form).

You may not see the use case, but that doesn't really mean
anything if the use cases exist in real life applications,
right ?!

>> This approach is also needed if you want to stack stream codecs
>> (not sure whether this is still possible in Py3, but that's how
>> I designed them for Py2).
> 
> The design of the Py2 codecs is fairly flawed, unfortunately.

Fortunately, this sounds like a fairly flawed argument to me ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 11 2007)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-11 Thread Martin v. Löwis
> First, XML-RPC is not the only mechanism using XML over a network
> connection. Second, you don't want to do this if you're dealing
> with several 100 MB of data just because you want to figure
> out the encoding.

That's my original claim/question: what SPECIFIC application do
you have in mind that transfers XML over a network and where you
would want to have such a stream codec?

If I have 100MB of XML in a file, using the detection API, I do

  f = open(filename)
  s = f.read(100)
  while True:
coding = xml.utils.detect_encoding(s)
if coding is not undetermined:
   break
s += f.read(100)
  f.close()

Having the loop here is paranoia: in my application, I might be
able to know that 100 bytes are sufficient to determine the encoding
always.

>> Again, I don't see the use case. For XML-RPC, it's very feasible
>> and standard procedure to have the entire document in memory
>> (in a processed form).
> 
> You may not see the use case, but that doesn't really mean
> anything if the use cases exist in real life applications,
> right ?!

Right. However, I' will remain opposed to adding this to the
standard library until I see why one would absolutely need to
have that. Not every piece of code that is useful in some
application should be added to the standard library.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Tracker Issues

2007-11-11 Thread Tracker

ACTIVITY SUMMARY (11/04/07 - 11/11/07)
Tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 1323 open (+21) / 11582 closed (+19) / 12905 total (+40)

Open issues with patches:   419

Average duration of open issues: 687 days.
Median duration of open issues: 791 days.

Open Issues Breakdown
   open  1318 (+21)
pending 5 ( +0)

Issues Created Or Reopened (41)
___

test_import breaks on Linux  11/09/07
   http://bugs.python.org/issue1377reopened gvanrossum   
   py3k

Backport abcoll to 2.6   11/04/07
   http://bugs.python.org/issue1383created  baranguren   
   patch   

Windows fix for inspect tests11/04/07
CLOSED http://bugs.python.org/issue1384created  tiran
   py3k, patch 

hmac module violates RFC for some hash functions, e.g. sha51211/04/07
CLOSED http://bugs.python.org/issue1385created  jowagner 
   py3k

py3k-pep3137: patch to ensure that all codecs return bytes   11/04/07
CLOSED http://bugs.python.org/issue1386created  amaury.forgeotdarc   
   py3k, patch 

py3k-pep3137: patch for hashlib on Windows   11/04/07
CLOSED http://bugs.python.org/issue1387created  amaury.forgeotdarc   
   py3k, patch 

py3k-pep3137: possible ref leak in ctypes11/05/07
CLOSED http://bugs.python.org/issue1388created  tiran
   py3k

py3k-pep3137: struct module is leaking references11/05/07
CLOSED http://bugs.python.org/issue1389created  tiran
   py3k

toxml generates output that is not well formed   11/05/07
   http://bugs.python.org/issue1390created  drtomc   
   

Adds the .compact() method to bsddb db.DB objects11/05/07
   http://bugs.python.org/issue1391created  gregory.p.smith  
   patch, rfe  

py3k-pep3137: issue warnings / errors on str(bytes()) and simila 11/05/07
CLOSED http://bugs.python.org/issue1392created  tiran
   py3k, patch 

function comparing lacks NotImplemented error11/05/07
   http://bugs.python.org/issue1393created  _doublep 
   

simple patch, improving unreachable bytecode removing11/05/07
   http://bugs.python.org/issue1394created  _doublep 
   patch   

py3k: duplicated line endings when using read(1) 11/06/07
   http://bugs.python.org/issue1395created  amaury.forgeotdarc   
   py3k

py3k-pep3137: patch for mailbox  11/06/07
CLOSED http://bugs.python.org/issue1396created  tiran
   py3k, patch 

py3k-pep3137: failing unit test test_bsddb   11/06/07
   http://bugs.python.org/issue1397created  tiran
   py3k

Can't pickle partial functions   11/07/07
CLOSED http://bugs.python.org/issue1398created  danhs
   

XML codec11/07/07
   http://bugs.python.org/issue1399created  doerwalter   
   patch   

Py3k's print() flushing problem  11/07/07
   http://bugs.python.org/issue1400created  wojtekwalczak
   py3k

urllib2 302 POST 11/07/07

Re: [Python-Dev] XML codec?

2007-11-11 Thread M.-A. Lemburg
On 2007-11-11 18:56, Martin v. Löwis wrote:
>> First, XML-RPC is not the only mechanism using XML over a network
>> connection. Second, you don't want to do this if you're dealing
>> with several 100 MB of data just because you want to figure
>> out the encoding.
> 
> That's my original claim/question: what SPECIFIC application do
> you have in mind that transfers XML over a network and where you
> would want to have such a stream codec?

XML-based web services used for business integration, e.g. based
on ebXML.

A common use case from our everyday consulting business is e.g.
passing market and trading data to portfolio pricing web services.

> If I have 100MB of XML in a file, using the detection API, I do
> 
>   f = open(filename)
>   s = f.read(100)
>   while True:
> coding = xml.utils.detect_encoding(s)
> if coding is not undetermined:
>break
> s += f.read(100)
>   f.close()
> 
> Having the loop here is paranoia: in my application, I might be
> able to know that 100 bytes are sufficient to determine the encoding
> always.

Doing the detection with files is easy, but that was never
questioned.

>>> Again, I don't see the use case. For XML-RPC, it's very feasible
>>> and standard procedure to have the entire document in memory
>>> (in a processed form).
>> You may not see the use case, but that doesn't really mean
>> anything if the use cases exist in real life applications,
>> right ?!
> 
> Right. However, I' will remain opposed to adding this to the
> standard library until I see why one would absolutely need to
> have that. Not every piece of code that is useful in some
> application should be added to the standard library.

Agreed, but the application space of web services is large
enough to warrant this.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 11 2007)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] XML codec?

2007-11-11 Thread Martin v. Löwis
>>> First, XML-RPC is not the only mechanism using XML over a network
>>> connection. Second, you don't want to do this if you're dealing
>>> with several 100 MB of data just because you want to figure
>>> out the encoding.
>> That's my original claim/question: what SPECIFIC application do
>> you have in mind that transfers XML over a network and where you
>> would want to have such a stream codec?
> 
> XML-based web services used for business integration, e.g. based
> on ebXML.
> 
> A common use case from our everyday consulting business is e.g.
> passing market and trading data to portfolio pricing web services.

I still don't see the need for this feature from this example.
First, in ebXML messaging, the message are typically *not* large
(i.e. much smaller than 100 MB). Furthermore, the typical processing
of such a message would be to pass it directly to the XML parser,
no need for the functionality under discussion.

>> Right. However, I' will remain opposed to adding this to the
>> standard library until I see why one would absolutely need to
>> have that. Not every piece of code that is useful in some
>> application should be added to the standard library.
> 
> Agreed, but the application space of web services is large
> enough to warrant this.

If that was the case, wouldn't the existing Python web service
libraries already include such a functionality?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Proposal for new 2to23 tool

2007-11-11 Thread Graham Horler
I have been developing in Python since 1.5, and now have to support 2.1
as a minimum version.  I do like to keep my code runnable on newer
versions however, and am considering the feasability of forward
compatibility with Python 3.0.

I also notice the Leo[1] project could use some assistance with forward
compatibility.

So I was wondering if anyone else had a need for a 2to23.py tool to help
make code compatible with 3.0 but not break it for 2.x.

Such a tool could also include implementations of new builtins added in
python 3.0, or work in tandem with a "py3to2" library.  One such
function would be "print" (which would have to be renamed to
e.g. "prints" as "def print()" is a syntax error in 2.x).  This would
have the added benefit of staunching the flow of wasted effort into many
differing implementations of such things, and maybe direct some of it
into development of this tool.

Hope this is on topic, and has not already been considered and dismissed.

Thanks,
Graham

[1] http://webpages.charter.net/edreamleo/front.html

P.S. a suggested prints() implementation for py3to2.py, including raising
a TypeError exception for extra keyword args, and returning None.

It works in python 2.1 through to python 3.0a1.


def prints(*args, **kw):
kw.setdefault('sep', ' ')
kw.setdefault('end', '\n')
kw.setdefault('file', sys.stdout)

if len(kw) > 3:
for k in ('sep', 'end', 'file'):
del kw[k]
if len(kw) > 1:
raise TypeError(', '.join(map(repr, kw.keys())) +
' are invalid keyword arguments for this function')
else:
raise TypeError('%r is an invalid keyword argument for this 
function'
% list(kw.keys())[0])

kw['file'].write(kw['sep'].join(['%s' % a for a in args]) + kw['end'])
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal for new 2to23 tool

2007-11-11 Thread Brett Cannon
On Nov 11, 2007 4:00 PM, Graham Horler <[EMAIL PROTECTED]> wrote:
> I have been developing in Python since 1.5, and now have to support 2.1
> as a minimum version.  I do like to keep my code runnable on newer
> versions however, and am considering the feasability of forward
> compatibility with Python 3.0.
>
> I also notice the Leo[1] project could use some assistance with forward
> compatibility.
>
> So I was wondering if anyone else had a need for a 2to23.py tool to help
> make code compatible with 3.0 but not break it for 2.x.

What exactly are you proposing?  We already have 2to3
(http://svn.python.org/view/sandbox/trunk/2to3/) for source-to-source
translation from 2.x to 3.0.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal for new 2to23 tool

2007-11-11 Thread Jan Claeys
Op zondag 11-11-2007 om 17:19 uur [tijdzone -0800], schreef Brett
Cannon:
> On Nov 11, 2007 4:00 PM, Graham Horler <[EMAIL PROTECTED]> wrote:
> > I have been developing in Python since 1.5, and now have to support 2.1
> > as a minimum version.  I do like to keep my code runnable on newer
> > versions however, and am considering the feasability of forward
> > compatibility with Python 3.0.
> >
> > I also notice the Leo[1] project could use some assistance with forward
> > compatibility.
> >
> > So I was wondering if anyone else had a need for a 2to23.py tool to help
> > make code compatible with 3.0 but not break it for 2.x.
> 
> What exactly are you proposing?  We already have 2to3
> (http://svn.python.org/view/sandbox/trunk/2to3/) for source-to-source
> translation from 2.x to 3.0.

Graham wants to convert his code such that it works on both Python 2.x
(probably even early versions of it?) & Python 3.x.  Not 2 instances of
code, but one source that works on both 2.x and 3.x...


-- 
Jan Claeys

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com