[issue16394] Improving tee() memory footprint

2012-11-03 Thread Carsten Milkau

New submission from Carsten Milkau:

The memory footprint of itertools.tee can be reduced substantially by using a 
shared buffer for the child iterators (see sample code). If local queues are 
desired for efficient threading support, they can be combined with a global 
queue, allowing to constrain the size of local queues.

--
components: Library (Lib)
files: tee.py
messages: 174632
nosy: cami
priority: normal
severity: normal
status: open
title: Improving tee() memory footprint
type: performance
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file27856/tee.py

___
Python tracker 
<http://bugs.python.org/issue16394>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16394] Improving tee() memory footprint

2012-11-03 Thread Carsten Milkau

Carsten Milkau added the comment:

No. The sample code is a demonstration how to do it, it's by no means a 
full-fledged patch.

The drawback of the current implementation is that if you tee n-fold, and then 
advance one of the iterators m times, it fills n queues with m references each, 
for a total of (n-1)*m references. The documentation explicitly mentions this 
is unfortunate.

I only demonstrate that it is perfectly unnecessary to fill n separate queues, 
as you can use a single queue and index into it. Instead of storing duplicate 
references, you can store a counter with each cached item reference. Replacing 
duplicates by refcounts, it turns (n-1)*m references into 2*m references (half 
of which being the counters). 

Not in the demo code: you can improve this further by storing items in 
iterator-local queues when that iterator is the only one that still needs to 
return it, and in a shared queue with refcount when there are more of them. 
That way, you eleminate the overhead of storing (item, 1) instead of item for a 
fix-cost per-iterator.

--
versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3

___
Python tracker 
<http://bugs.python.org/issue16394>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16394] Improving tee() memory footprint

2012-11-03 Thread Carsten Milkau

Carsten Milkau added the comment:

Oh great! Then I can use it as-is. How about reassigning the issue to 
documentation (for clarifying the inefficiency warning)?

--

___
Python tracker 
<http://bugs.python.org/issue16394>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16394] Improving tee() memory footprint

2012-11-03 Thread Carsten Milkau

Changes by Carsten Milkau :


--
assignee:  -> docs@python
components: +Documentation -Library (Lib)
nosy: +docs@python

___
Python tracker 
<http://bugs.python.org/issue16394>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com