[issue16394] Improving tee() memory footprint
New submission from Carsten Milkau: The memory footprint of itertools.tee can be reduced substantially by using a shared buffer for the child iterators (see sample code). If local queues are desired for efficient threading support, they can be combined with a global queue, allowing to constrain the size of local queues. -- components: Library (Lib) files: tee.py messages: 174632 nosy: cami priority: normal severity: normal status: open title: Improving tee() memory footprint type: performance versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4 Added file: http://bugs.python.org/file27856/tee.py ___ Python tracker <http://bugs.python.org/issue16394> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16394] Improving tee() memory footprint
Carsten Milkau added the comment: No. The sample code is a demonstration how to do it, it's by no means a full-fledged patch. The drawback of the current implementation is that if you tee n-fold, and then advance one of the iterators m times, it fills n queues with m references each, for a total of (n-1)*m references. The documentation explicitly mentions this is unfortunate. I only demonstrate that it is perfectly unnecessary to fill n separate queues, as you can use a single queue and index into it. Instead of storing duplicate references, you can store a counter with each cached item reference. Replacing duplicates by refcounts, it turns (n-1)*m references into 2*m references (half of which being the counters). Not in the demo code: you can improve this further by storing items in iterator-local queues when that iterator is the only one that still needs to return it, and in a shared queue with refcount when there are more of them. That way, you eleminate the overhead of storing (item, 1) instead of item for a fix-cost per-iterator. -- versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue16394> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16394] Improving tee() memory footprint
Carsten Milkau added the comment: Oh great! Then I can use it as-is. How about reassigning the issue to documentation (for clarifying the inefficiency warning)? -- ___ Python tracker <http://bugs.python.org/issue16394> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16394] Improving tee() memory footprint
Changes by Carsten Milkau : -- assignee: -> docs@python components: +Documentation -Library (Lib) nosy: +docs@python ___ Python tracker <http://bugs.python.org/issue16394> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com