Nadeem Vawda added the comment:
> I agree that making lzma.open() wrap its return value in a BufferedReader
> (or BufferedWriter, as appropriate) is the way to go.
On second thoughts, there's no need to change the behavior for mode='wb'.
We can just return a BufferedReader for mode='rb', and leave the current
behavior (returning a raw LZMAFile) in place for mode='wb'.
I also ran some additional benchmarks for the bz2 and gzip modules. It
looks like those two modules would also benefit from having their open()
functions use io.BufferedReader:
[lzma]
$ time xzcat src.xz | wc -l
1057980
real 0m0.543s
user 0m0.556s
sys 0m0.024s
$ ../cpython/python -m timeit -s 'import lzma, io' 'f = lzma.open("src.xz",
"r")' 'for line in f: pass'
10 loops, best of 3: 2.01 sec per loop
$ ../cpython/python -m timeit -s 'import lzma, io' 'f =
io.BufferedReader(lzma.open("src.xz", "r"))' 'for line in f: pass'
10 loops, best of 3: 795 msec per loop
[bz2]
$ time bzcat src.bz2 | wc -l
1057980
real 0m1.322s
user 0m1.324s
sys 0m0.044s
$ ../cpython/python -m timeit -s 'import bz2, io' 'f = bz2.open("src.bz2",
"r")' 'for line in f: pass'
10 loops, best of 3: 3.71 sec per loop
$ ../cpython/python -m timeit -s 'import bz2, io' 'f =
io.BufferedReader(bz2.open("src.bz2", "r"))' 'for line in f: pass'
10 loops, best of 3: 2.04 sec per loop
[gzip]
$ time zcat src.gz | wc -l
1057980
real 0m0.310s
user 0m0.296s
sys 0m0.028s
$ ../cpython/python -m timeit -s 'import gzip, io' 'f = gzip.open("src.gz",
"r")' 'for line in f: pass'
10 loops, best of 3: 1.94 sec per loop
$ ../cpython/python -m timeit -s 'import gzip, io' 'f =
io.BufferedReader(gzip.open("src.gz", "r"))' 'for line in f: pass'
10 loops, best of 3: 556 msec per loop
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18003>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com