New submission from Michael Fox:
import lzma
count = 0
f = lzma.LZMAFile('bigfile.xz' ,'r')
for line in f:
count += 1
print(count)
Comparing python2 with pyliblzma to python3.3.1 with the new lzma:
m@air:~/q/topaz/parse_datalog$ time python lzmaperf.py
102368
real 0m0.062s
user 0m0.056s
sys 0m0.004s
m@air:~/q/topaz/parse_datalog$ time python3 lzmaperf.py
102368
real 0m7.506s
user 0m7.484s
sys 0m0.012s
Profiling shows most of the time is spent here:
102371 6.881 0.000 6.972 0.000 lzma.py:247(_read_block)
I also notice that reading the entire file into memory with f.read() is
perfectly fast.
I think it has something to do with lack of buffering.
----------
components: Library (Lib)
messages: 189488
nosy: Michael.Fox, nadeem.vawda
priority: normal
severity: normal
status: open
title: New lzma crazy slow with line-oriented reading.
type: performance
versions: Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18003>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com