Re: [Python-Dev] XML codec?
On 2007-11-10 09:54, Martin v. Löwis wrote: >> A non-seekable stream is not all that uncommon in network processing. > > Right. But what is the relationship to XML encoding autodetection? It pops up whenever you need to detect the encoding of the incoming XML data on the network connection, e.g. in XML RPC or data upload mechanisms. Even though XML data mostly uses UTF-8 in real life applications, a standards compliant XML interface must also support other possible encodings. It is also not always feasible to load all data into memory, so some form of buffering must be used. Since incremental codecs already implement buffering, it's only natural to let them take care of the auto detection. This approach is also needed if you want to stack stream codecs (not sure whether this is still possible in Py3, but that's how I designed them for Py2). Regards, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 11 2007) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> I don't know. Is an XML document ill-formed if it doesn't contain an > XML declaration, is not in UTF-8 or UTF-8, but there's external > encoding info? If there is external encoding info, matching the actual encoding, it would be well-formed. Of course, preserving that information would be up to the application. > This looks good. Now we would have to extent the code to detect and > replace the encoding in the XML declaration too. I'm still opposed to making this a codec. Right - for a pure Python solution, the processing of the XML declaration would still need to be implemented. >> I think there could be a much simpler routine to have the same >> effect. - if it's less than 4 bytes, answer "need more data". > > Can there be an XML document that is less then 4 bytes? I guess not. No, the smallest document has exactly 4 characters (e.g. ""). However, external entities may be smaller, such as "x". > But anyway: would a Python implementation of these two functions > (detect_encoding()/fix_encoding()) be accepted? I could agree to a Python implementation of this algorithm as long as it's not packaged as a codec. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
>>> A non-seekable stream is not all that uncommon in network processing. >> Right. But what is the relationship to XML encoding autodetection? > > It pops up whenever you need to detect the encoding of the > incoming XML data on the network connection, e.g. in XML RPC > or data upload mechanisms. No, it doesn't. For XML-RPC, you pass the XML payload of the HTTP request to the XML parser, and it deals with the encoding. > It is also not always feasible to load all data into memory, so > some form of buffering must be used. Again, I don't see the use case. For XML-RPC, it's very feasible and standard procedure to have the entire document in memory (in a processed form). > This approach is also needed if you want to stack stream codecs > (not sure whether this is still possible in Py3, but that's how > I designed them for Py2). The design of the Py2 codecs is fairly flawed, unfortunately. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
On 2007-11-11 14:51, Martin v. Löwis wrote: A non-seekable stream is not all that uncommon in network processing. >>> Right. But what is the relationship to XML encoding autodetection? >> It pops up whenever you need to detect the encoding of the >> incoming XML data on the network connection, e.g. in XML RPC >> or data upload mechanisms. > > No, it doesn't. For XML-RPC, you pass the XML payload of the > HTTP request to the XML parser, and it deals with the encoding. First, XML-RPC is not the only mechanism using XML over a network connection. Second, you don't want to do this if you're dealing with several 100 MB of data just because you want to figure out the encoding. >> It is also not always feasible to load all data into memory, so >> some form of buffering must be used. > > Again, I don't see the use case. For XML-RPC, it's very feasible > and standard procedure to have the entire document in memory > (in a processed form). You may not see the use case, but that doesn't really mean anything if the use cases exist in real life applications, right ?! >> This approach is also needed if you want to stack stream codecs >> (not sure whether this is still possible in Py3, but that's how >> I designed them for Py2). > > The design of the Py2 codecs is fairly flawed, unfortunately. Fortunately, this sounds like a fairly flawed argument to me ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 11 2007) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
> First, XML-RPC is not the only mechanism using XML over a network > connection. Second, you don't want to do this if you're dealing > with several 100 MB of data just because you want to figure > out the encoding. That's my original claim/question: what SPECIFIC application do you have in mind that transfers XML over a network and where you would want to have such a stream codec? If I have 100MB of XML in a file, using the detection API, I do f = open(filename) s = f.read(100) while True: coding = xml.utils.detect_encoding(s) if coding is not undetermined: break s += f.read(100) f.close() Having the loop here is paranoia: in my application, I might be able to know that 100 bytes are sufficient to determine the encoding always. >> Again, I don't see the use case. For XML-RPC, it's very feasible >> and standard procedure to have the entire document in memory >> (in a processed form). > > You may not see the use case, but that doesn't really mean > anything if the use cases exist in real life applications, > right ?! Right. However, I' will remain opposed to adding this to the standard library until I see why one would absolutely need to have that. Not every piece of code that is useful in some application should be added to the standard library. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Tracker Issues
ACTIVITY SUMMARY (11/04/07 - 11/11/07) Tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 1323 open (+21) / 11582 closed (+19) / 12905 total (+40) Open issues with patches: 419 Average duration of open issues: 687 days. Median duration of open issues: 791 days. Open Issues Breakdown open 1318 (+21) pending 5 ( +0) Issues Created Or Reopened (41) ___ test_import breaks on Linux 11/09/07 http://bugs.python.org/issue1377reopened gvanrossum py3k Backport abcoll to 2.6 11/04/07 http://bugs.python.org/issue1383created baranguren patch Windows fix for inspect tests11/04/07 CLOSED http://bugs.python.org/issue1384created tiran py3k, patch hmac module violates RFC for some hash functions, e.g. sha51211/04/07 CLOSED http://bugs.python.org/issue1385created jowagner py3k py3k-pep3137: patch to ensure that all codecs return bytes 11/04/07 CLOSED http://bugs.python.org/issue1386created amaury.forgeotdarc py3k, patch py3k-pep3137: patch for hashlib on Windows 11/04/07 CLOSED http://bugs.python.org/issue1387created amaury.forgeotdarc py3k, patch py3k-pep3137: possible ref leak in ctypes11/05/07 CLOSED http://bugs.python.org/issue1388created tiran py3k py3k-pep3137: struct module is leaking references11/05/07 CLOSED http://bugs.python.org/issue1389created tiran py3k toxml generates output that is not well formed 11/05/07 http://bugs.python.org/issue1390created drtomc Adds the .compact() method to bsddb db.DB objects11/05/07 http://bugs.python.org/issue1391created gregory.p.smith patch, rfe py3k-pep3137: issue warnings / errors on str(bytes()) and simila 11/05/07 CLOSED http://bugs.python.org/issue1392created tiran py3k, patch function comparing lacks NotImplemented error11/05/07 http://bugs.python.org/issue1393created _doublep simple patch, improving unreachable bytecode removing11/05/07 http://bugs.python.org/issue1394created _doublep patch py3k: duplicated line endings when using read(1) 11/06/07 http://bugs.python.org/issue1395created amaury.forgeotdarc py3k py3k-pep3137: patch for mailbox 11/06/07 CLOSED http://bugs.python.org/issue1396created tiran py3k, patch py3k-pep3137: failing unit test test_bsddb 11/06/07 http://bugs.python.org/issue1397created tiran py3k Can't pickle partial functions 11/07/07 CLOSED http://bugs.python.org/issue1398created danhs XML codec11/07/07 http://bugs.python.org/issue1399created doerwalter patch Py3k's print() flushing problem 11/07/07 http://bugs.python.org/issue1400created wojtekwalczak py3k urllib2 302 POST 11/07/07
Re: [Python-Dev] XML codec?
On 2007-11-11 18:56, Martin v. Löwis wrote: >> First, XML-RPC is not the only mechanism using XML over a network >> connection. Second, you don't want to do this if you're dealing >> with several 100 MB of data just because you want to figure >> out the encoding. > > That's my original claim/question: what SPECIFIC application do > you have in mind that transfers XML over a network and where you > would want to have such a stream codec? XML-based web services used for business integration, e.g. based on ebXML. A common use case from our everyday consulting business is e.g. passing market and trading data to portfolio pricing web services. > If I have 100MB of XML in a file, using the detection API, I do > > f = open(filename) > s = f.read(100) > while True: > coding = xml.utils.detect_encoding(s) > if coding is not undetermined: >break > s += f.read(100) > f.close() > > Having the loop here is paranoia: in my application, I might be > able to know that 100 bytes are sufficient to determine the encoding > always. Doing the detection with files is easy, but that was never questioned. >>> Again, I don't see the use case. For XML-RPC, it's very feasible >>> and standard procedure to have the entire document in memory >>> (in a processed form). >> You may not see the use case, but that doesn't really mean >> anything if the use cases exist in real life applications, >> right ?! > > Right. However, I' will remain opposed to adding this to the > standard library until I see why one would absolutely need to > have that. Not every piece of code that is useful in some > application should be added to the standard library. Agreed, but the application space of web services is large enough to warrant this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 11 2007) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] XML codec?
>>> First, XML-RPC is not the only mechanism using XML over a network >>> connection. Second, you don't want to do this if you're dealing >>> with several 100 MB of data just because you want to figure >>> out the encoding. >> That's my original claim/question: what SPECIFIC application do >> you have in mind that transfers XML over a network and where you >> would want to have such a stream codec? > > XML-based web services used for business integration, e.g. based > on ebXML. > > A common use case from our everyday consulting business is e.g. > passing market and trading data to portfolio pricing web services. I still don't see the need for this feature from this example. First, in ebXML messaging, the message are typically *not* large (i.e. much smaller than 100 MB). Furthermore, the typical processing of such a message would be to pass it directly to the XML parser, no need for the functionality under discussion. >> Right. However, I' will remain opposed to adding this to the >> standard library until I see why one would absolutely need to >> have that. Not every piece of code that is useful in some >> application should be added to the standard library. > > Agreed, but the application space of web services is large > enough to warrant this. If that was the case, wouldn't the existing Python web service libraries already include such a functionality? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Proposal for new 2to23 tool
I have been developing in Python since 1.5, and now have to support 2.1 as a minimum version. I do like to keep my code runnable on newer versions however, and am considering the feasability of forward compatibility with Python 3.0. I also notice the Leo[1] project could use some assistance with forward compatibility. So I was wondering if anyone else had a need for a 2to23.py tool to help make code compatible with 3.0 but not break it for 2.x. Such a tool could also include implementations of new builtins added in python 3.0, or work in tandem with a "py3to2" library. One such function would be "print" (which would have to be renamed to e.g. "prints" as "def print()" is a syntax error in 2.x). This would have the added benefit of staunching the flow of wasted effort into many differing implementations of such things, and maybe direct some of it into development of this tool. Hope this is on topic, and has not already been considered and dismissed. Thanks, Graham [1] http://webpages.charter.net/edreamleo/front.html P.S. a suggested prints() implementation for py3to2.py, including raising a TypeError exception for extra keyword args, and returning None. It works in python 2.1 through to python 3.0a1. def prints(*args, **kw): kw.setdefault('sep', ' ') kw.setdefault('end', '\n') kw.setdefault('file', sys.stdout) if len(kw) > 3: for k in ('sep', 'end', 'file'): del kw[k] if len(kw) > 1: raise TypeError(', '.join(map(repr, kw.keys())) + ' are invalid keyword arguments for this function') else: raise TypeError('%r is an invalid keyword argument for this function' % list(kw.keys())[0]) kw['file'].write(kw['sep'].join(['%s' % a for a in args]) + kw['end']) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal for new 2to23 tool
On Nov 11, 2007 4:00 PM, Graham Horler <[EMAIL PROTECTED]> wrote: > I have been developing in Python since 1.5, and now have to support 2.1 > as a minimum version. I do like to keep my code runnable on newer > versions however, and am considering the feasability of forward > compatibility with Python 3.0. > > I also notice the Leo[1] project could use some assistance with forward > compatibility. > > So I was wondering if anyone else had a need for a 2to23.py tool to help > make code compatible with 3.0 but not break it for 2.x. What exactly are you proposing? We already have 2to3 (http://svn.python.org/view/sandbox/trunk/2to3/) for source-to-source translation from 2.x to 3.0. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal for new 2to23 tool
Op zondag 11-11-2007 om 17:19 uur [tijdzone -0800], schreef Brett Cannon: > On Nov 11, 2007 4:00 PM, Graham Horler <[EMAIL PROTECTED]> wrote: > > I have been developing in Python since 1.5, and now have to support 2.1 > > as a minimum version. I do like to keep my code runnable on newer > > versions however, and am considering the feasability of forward > > compatibility with Python 3.0. > > > > I also notice the Leo[1] project could use some assistance with forward > > compatibility. > > > > So I was wondering if anyone else had a need for a 2to23.py tool to help > > make code compatible with 3.0 but not break it for 2.x. > > What exactly are you proposing? We already have 2to3 > (http://svn.python.org/view/sandbox/trunk/2to3/) for source-to-source > translation from 2.x to 3.0. Graham wants to convert his code such that it works on both Python 2.x (probably even early versions of it?) & Python 3.x. Not 2 instances of code, but one source that works on both 2.x and 3.x... -- Jan Claeys ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com