Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Paul Moore
2009/1/28 Raymond Hettinger : > [Adam Olsen] >> >> It'd also help if the file repr gave the encoding: > > +1 from me too. That will be a big help. Definitely. People *are* going to get confused by encoding errors - let's give them all the help we can. Paul

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Raymond Hettinger
[Adam Olsen] It'd also help if the file repr gave the encoding: +1 from me too. That will be a big help. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.o

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Daniel Stutzbach
On Wed, Jan 28, 2009 at 1:42 PM, Adam Olsen wrote: > It'd also help if the file repr gave the encoding: > +1 -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC ___ Python-Dev mailing list Python-Dev@pyth

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Adam Olsen
On Wed, Jan 28, 2009 at 11:52 AM, Paul Moore wrote: > Ah, I see. That is entirely obvious. The key bit of information is > that the default io encoding is cp1252, not cp850. I know that in > theory, I see the consequences often enough (:-)), but it isn't > "instinctive" for me. And the simple "def

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Jean-Paul Calderone
On Wed, 28 Jan 2009 18:52:41 +, Paul Moore wrote: 2009/1/28 "Martin v. Löwis" : Well, first try to understand what the error *is*: py> unicodedata.name('\u0153') 'LATIN SMALL LIGATURE OE' py> unicodedata.name('£') 'POUND SIGN' py> ascii('£') "'\\xa3'" py> ascii('£'.encode('cp850').decode('

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Terry Reedy
Steven Bethard wrote: On Wed, Jan 28, 2009 at 10:29 AM, "Martin v. Löwis" wrote: Notice that the determination of the specific encoding used is fairly elaborate: - if IO is to a terminal, Python tries to determine the encoding of the terminal. This is mostly relevant for Windows (which uses,

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Paul Moore
2009/1/28 "Martin v. Löwis" : > Well, first try to understand what the error *is*: > > py> unicodedata.name('\u0153') > 'LATIN SMALL LIGATURE OE' > py> unicodedata.name('£') > 'POUND SIGN' > py> ascii('£') > "'\\xa3'" > py> ascii('£'.encode('cp850').decode('cp1252')) > "'\\u0153'" > > So when Pytho

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Martin v. Löwis
> This a very helpful explanation. Is it in the docs somewhere, or if it > isn't, could it be? I actually don't know. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Martin v. Löwis
Paul Moore wrote: > 2009/1/28 "Martin v. Löwis" : >> print(open("a1").read()) >>> Traceback (most recent call last): >>> File "", line 1, in >>> File "D:\Apps\Python30\lib\io.py", line 1491, in write >>> b = encoder.encode(s) >>> File "D:\Apps\Python30\lib\encodings\cp850.py", line 1

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Steven Bethard
On Wed, Jan 28, 2009 at 10:29 AM, "Martin v. Löwis" wrote: > Notice that the determination of the specific encoding used is fairly > elaborate: > - if IO is to a terminal, Python tries to determine the encoding of > the terminal. This is mostly relevant for Windows (which uses, > by default, the

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Martin v. Löwis
> Thanks for the explanation. It might be clearer to document this a > little more explicitly in the docs for open() (on the basis that > people using open() are the most likely to be naive about encodings). > I'll see if I can come up with an appropriate doc patch. Notice that the determination o

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Paul Moore
2009/1/28 "Martin v. Löwis" : > print(open("a1").read()) >> Traceback (most recent call last): >> File "", line 1, in >> File "D:\Apps\Python30\lib\io.py", line 1491, in write >> b = encoder.encode(s) >> File "D:\Apps\Python30\lib\encodings\cp850.py", line 19, in encode >> return

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Paul Moore
2009/1/28 "Martin v. Löwis" : > Paul Moore wrote: >> Hmm, I just checked and on Windows, it >> appears that sys.getdefaultencoding() is UTF-8. That seems odd - I >> would have thought the majority of Windows systems were NOT set to use >> UTF-8 by default... > > In Python 3, sys.getdefaultencoding(

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Martin v. Löwis
print(open("a1").read()) > Traceback (most recent call last): > File "", line 1, in > File "D:\Apps\Python30\lib\io.py", line 1491, in write > b = encoder.encode(s) > File "D:\Apps\Python30\lib\encodings\cp850.py", line 19, in encode > return codecs.charmap_encode(input,self.err

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Martin v. Löwis
Paul Moore wrote: > Hmm, I just checked and on Windows, it > appears that sys.getdefaultencoding() is UTF-8. That seems odd - I > would have thought the majority of Windows systems were NOT set to use > UTF-8 by default... In Python 3, sys.getdefaultencoding() is "utf-8" on all platforms, just as

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Martin v. Löwis
> PS Can anyone comment on why Python defaults to utf-8 on Windows? Don't panic. It doesn't, and you are misinterpreting what you are seeing. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/pytho

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Antoine Pitrou
Le mercredi 28 janvier 2009 à 16:54 +, Paul Moore a écrit : > I do think it's worth taking care over the default encoding, though. > Quite apart from performance, getting "correct" behaviour is > important. I can't speak for Unix, but on Windows, the following > behaviour feels like a bug to me

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Paul Moore
2009/1/28 Antoine Pitrou : > If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's > quite obvious why it is so :-) There is a (very) fast path for chunks of pure > ASCII data, and (fast but not blazingly fast) fallback for non ASCII data. Thanks for the explanation. > Pleas

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Antoine Pitrou
Paul Moore gmail.com> writes: > > > > As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized > > in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). > > The dataset for iobench isn't pure ASCII though, and that's why it's not as fast. > > Ah, t

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Victor Stinner
Le Wednesday 28 January 2009 12:41:07 Antoine Pitrou, vous avez écrit : > > Why not testing io.open() or codecs.open() which create unicode strings? > > There is no doubt that io.open() and codecs.open() in 2.x are much slower > than the io-c branch. However, nobody is expecting very good performan

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Paul Moore
2009/1/28 Antoine Pitrou : > Paul Moore gmail.com> writes: >> >> It would be helpful to limit this cost as much as possible - maybe >> that's simply ensuring that the default encoding for open is (in the >> majority of cases) a highly-optimised one whose costs *don't* dominate >> in the way you de

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Antoine Pitrou
Victor Stinner haypocalc.com> writes: > > Le Wednesday 28 January 2009 11:55:16 Antoine Pitrou, vous avez écrit : > > 2.x has no encoding costs, which explains why it's so much faster. > > Why not testing io.open() or codecs.open() which create unicode strings? The goal is to test the idiomatic

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Antoine Pitrou
Paul Moore gmail.com> writes: > > It would be helpful to limit this cost as much as possible - maybe > that's simply ensuring that the default encoding for open is (in the > majority of cases) a highly-optimised one whose costs *don't* dominate > in the way you describe As I pointed out, utf-8,

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Victor Stinner
Le Wednesday 28 January 2009 11:55:16 Antoine Pitrou, vous avez écrit : > 2.x has no encoding costs, which explains why it's so much faster. Why not testing io.open() or codecs.open() which create unicode strings? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Paul Moore
2009/1/28 Antoine Pitrou : > When writing large chunks of text (4096, 1e6), bookkeeping costs become > marginal and encoding costs dominate. 2.x has no encoding costs, which > explains why it's so much faster. Interesting. However, it's still "slower" in terms of perception. In 2.x, I regularly do

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Antoine Pitrou
Hello, Raymond Hettinger rcn.com> writes: > > >MB/S MB/SMB/S > >in C in py3k in 2.7 C/3k 2.7/3k > > ** Text append ** > > 10M write 1e6 units at a time261.00 218.000 1540.000 1.20 7.06 > > 20K w

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Raymond Hettinger
[Scott David Daniels] Comparison of three cases (including performance rations): MB/S MB/SMB/S in C in py3k in 2.7 C/3k 2.7/3k ** Text append ** 10M write 1e6 units at a time261.00 218.000 1540.000 1

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Scott David Daniels
Raymond Hettinger wrote: [Antoine Pitrou] Now here are some performance figures. Text I/O is done in utf-8 with universal newlines enabled: That's a substantial boost. How does it compare to Py2.x equivalents? Comparison of three cases (including performance rations):

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Victor Stinner
Benjamin Peterson a écrit : There are also several IO bugs that should be fixed before it becomes official like #5006. I looked at this one, but I discovered another a bug with f.tell(): it's now issue #5008. This issue is now closed, that I will look again to #5006. See also #5016 (f.seekab

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Antoine Pitrou
Daniel Stutzbach stutzbachenterprises.com> writes: > For the "10MB whole contents at once" test, we then have: > (assuming the code does no pipelining of disk I/O with decoding) > > 10MB / 980MB/s to read from disk = 10 ms > 10MB / 250MB/s to decode to utf8 = 40 ms > 10MB / (10ms + 40ms) = 200 MB

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Daniel Stutzbach
On Tue, Jan 27, 2009 at 6:15 PM, Antoine Pitrou wrote: > It's some arbitrary text composed of 95% ASCII characters and 5% non-ASCII. > On > this specific example, utf8 decodes at around 250 MB/s, latin1 at almost 1 > GB/s > (on the same machine on which I ran the benchmarks). > For the "10MB who

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Antoine Pitrou
Daniel Stutzbach stutzbachenterprises.com> writes: > > What kind of input are you using for the Text tests?  I'm kind of surprised that the conversion to Unicode results in such a dramatic slowdown, if you're feeding it plain text (characters 0x00 through 0x7f). It's some arbitrary text composed

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Daniel Stutzbach
On Tue, Jan 27, 2009 at 5:44 PM, Antoine Pitrou wrote: > Daniel Stutzbach stutzbachenterprises.com> writes: > > That's because in Python 3, the Text IO has to convert to Unicode, > correct? > > Yes, exactly. > What kind of input are you using for the Text tests? I'm kind of surprised that the

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Antoine Pitrou
Daniel Stutzbach stutzbachenterprises.com> writes: > > Thanks, Antoine!  To make comparison easier, I put together the results into a Google Spreadsheet:http://spreadsheets.google.com/pub?key=pbqSxQEo4UXwPlifXmvPHGQ Thanks, that's much more readable indeed. > That's because in Python 3, the Te

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Daniel Stutzbach
On Tue, Jan 27, 2009 at 4:54 PM, Antoine Pitrou wrote: > Daniel Stutzbach stutzbachenterprises.com> writes: > > Would it be much trouble to also compare performance with Python 2.6? > > Here are the results on trunk. > Thanks, Antoine! To make comparison easier, I put together the results into

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Bill Janssen
> - fix the _ssl bug which prevents some tests from passing (issue #4967) I see you've already got a patch for this. I'll try it out. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Brett Cannon
On Tue, Jan 27, 2009 at 14:44, Antoine Pitrou wrote: > Raymond Hettinger rcn.com> writes: >> >> What is involved in finishing io-in-c? > > Off the top of my head: > - fix the _ssl bug which prevents some tests from passing (issue #4967) > - clean up io.py (and decide what to do with the remaining

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Raymond Hettinger
[Antoine Pitrou] Now here are some performance figures. Text I/O is done in utf-8 with universal newlines enabled: That's a substantial boost. How does it compare to Py2.x equivalents? Raymond ___ Python-Dev mailing list Python-Dev@python.org http

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Antoine Pitrou
Daniel Stutzbach stutzbachenterprises.com> writes: > > Would it be much trouble to also compare performance with Python 2.6? Here are the results on trunk. Keep in mind Text IO, while it's still `open("r", filename)`, does not mean the same thing. === 2.7 I/O (trunk) === ** Binary input **

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Daniel Stutzbach
On Tue, Jan 27, 2009 at 4:44 PM, Antoine Pitrou wrote: > Now here are some performance figures. Text I/O is done in utf-8 with > universal > newlines enabled: > Would it be much trouble to also compare performance with Python 2.6? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Benjamin Peterson
On Tue, Jan 27, 2009 at 4:44 PM, Antoine Pitrou wrote: > Raymond Hettinger rcn.com> writes: >> >> What is involved in finishing io-in-c? > > Off the top of my head: > - fix the _ssl bug which prevents some tests from passing (issue #4967) > - clean up io.py (and decide what to do with the remaini

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Antoine Pitrou
Raymond Hettinger rcn.com> writes: > > What is involved in finishing io-in-c? Off the top of my head: - fix the _ssl bug which prevents some tests from passing (issue #4967) - clean up io.py (and decide what to do with the remaining Python code: basically, the parts of StringIO which are implem