[issue1677872] Efficient reverse line iterator

2007-11-08 Thread Mark Russell

Mark Russell added the comment:

Sure - I'll do an updated patch at the weekend.

_
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1677872>
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1677872] Efficient reverse line iterator

2007-12-09 Thread Mark Russell

Mark Russell added the comment:

Here's an updated version of the patch.  Changes:

- Updated to work against current py3k branch (r59441)
- Added support for universal newlines
- Added unit tests
- Added docs

The patch includes documentation for reversed() and __reversed__() (in the 
library and reference manuals respectively) which are independent of the 
reverse lines iterator - I can split those out to separate patch if needed.

I also updated the expected output from test_profile and test_cProfile, 
although I think a better fix would be to filter out the stdlib-related stuff 
from the expected output, as currently these tests break whenever io.py is 
changed.

Added file: http://bugs.python.org/file8902/reverse-file-iterator-20071209.diff

_
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1677872>
_Index: Doc/reference/datamodel.rst
===
--- Doc/reference/datamodel.rst (revision 59439)
+++ Doc/reference/datamodel.rst (working copy)
@@ -1662,12 +1662,27 @@
Iterator objects also need to implement this method; they are required to 
return
themselves.  For more information on iterator objects, see :ref:`typeiter`.
 
+.. method:: object.__reversed__(self)
+
+   Called (if present) by the :func:`reversed()` builtin to implement
+   reverse iteration.  It should return a new iterator object that iterates
+   over all the objects in the container in reverse order.
+
+   If the :meth:`__reversed__()` method is not provided, the
+   :func:`reversed()` builtin will fall back to using the sequence protocol
+   (:meth:`__len__()` and :meth:`__getitem__()`). Objects should normally
+   only provide :meth:`__reversed__()` if they do not support the sequence
+   protocol and an efficient implementation of reverse iteration is possible.
+   
 The membership test operators (:keyword:`in` and :keyword:`not in`) are 
normally
 implemented as an iteration through a sequence.  However, container objects can
 supply the following special method with a more efficient implementation, which
 also does not require the object be a sequence.
 
 
+
+
+
 .. method:: object.__contains__(self, item)
 
Called to implement membership test operators.  Should return true if 
*item* is
Index: Doc/library/stdtypes.rst
===
--- Doc/library/stdtypes.rst(revision 59439)
+++ Doc/library/stdtypes.rst(working copy)
@@ -1937,7 +1937,16 @@
right.  However, using :meth:`seek` to reposition the file to an absolute
position will flush the read-ahead buffer.
 
+.. method:: file.__reversed__()
 
+   Return a new iterator that returns lines in reverse order (but without
+   reading the entire file into memory first).  Normally called via the
+   :func:`reversed()` builtin, as in ``for line in reversed(f): print(line)``.
+   Useful for scanning backwards through large files without reading the
+   entire file first.  Note that this changes the current position of the
+   underlying file object, so you should not interleave use of reverse and
+   forward iteration over the same file object.
+
 .. method:: file.read([size])
 
Read at most *size* bytes from the file (less if the read hits EOF before
Index: Doc/library/functions.rst
===
--- Doc/library/functions.rst   (revision 59439)
+++ Doc/library/functions.rst   (working copy)
@@ -869,8 +869,9 @@
 
 .. function:: reversed(seq)
 
-   Return a reverse :term:`iterator`.  *seq* must be an object which supports
-   the sequence protocol (the :meth:`__len__` method and the 
:meth:`__getitem__`
+   Return a reverse :term:`iterator`.  *seq* must be an object which has
+   a :meth:`__reversed__` method [#]_ or supports the sequence protocol
+   (the :meth:`__len__` method and the :meth:`__getitem__`
method with integer arguments starting at ``0``).
 
 
@@ -1099,6 +1100,8 @@
any I/O has been performed, and there's no reliable way to determine whether
this is the case.
 
+.. [#] See :ref:`sequence-types`
+
 .. [#] In the current implementation, local variable bindings cannot normally 
be
affected this way, but variables retrieved from other scopes (such as 
modules)
can be.  This may change.
Index: Lib/io.py
===
--- Lib/io.py   (revision 59439)
+++ Lib/io.py   (working copy)
@@ -1136,6 +1136,125 @@
)[self.seennl]
 
 
+class TextIOReverseIterator:
+"""Line-based reverse iterator wrapper for IOBase objects.
+
+This class is used to implement TextIOWrapper.__reversed__().
+It searches backwards for encoded line terminator, which
+works for UTF-8 but not for encodings where one character encoding
+can be a substring of another longer one.
+   

[issue1582] Documentation patch for reversed() and __reversed__()

2007-12-10 Thread Mark Russell

New submission from Mark Russell:

This patch adds documentation for the reversed() builtin and 
__reversed__() special method.

--
components: Documentation
files: reverse-2.6-docs.diff
messages: 58369
nosy: mark_t_russell
severity: normal
status: open
title: Documentation patch for reversed() and __reversed__()
versions: Python 2.6
Added file: http://bugs.python.org/file8912/reverse-2.6-docs.diff

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1582>
__Index: Doc/reference/datamodel.rst
===
--- Doc/reference/datamodel.rst (revision 59453)
+++ Doc/reference/datamodel.rst (working copy)
@@ -1819,12 +1819,27 @@
Iterator objects also need to implement this method; they are required to 
return
themselves.  For more information on iterator objects, see :ref:`typeiter`.
 
+.. method:: object.__reversed__(self)
+
+   Called (if present) by the :func:`reversed()` builtin to implement
+   reverse iteration.  It should return a new iterator object that iterates
+   over all the objects in the container in reverse order.
+
+   If the :meth:`__reversed__()` method is not provided, the
+   :func:`reversed()` builtin will fall back to using the sequence protocol
+   (:meth:`__len__()` and :meth:`__getitem__()`). Objects should normally
+   only provide :meth:`__reversed__()` if they do not support the sequence
+   protocol and an efficient implementation of reverse iteration is possible.
+   
 The membership test operators (:keyword:`in` and :keyword:`not in`) are 
normally
 implemented as an iteration through a sequence.  However, container objects can
 supply the following special method with a more efficient implementation, which
 also does not require the object be a sequence.
 
 
+
+
+
 .. method:: object.__contains__(self, item)
 
Called to implement membership test operators.  Should return true if 
*item* is
Index: Doc/library/functions.rst
===
--- Doc/library/functions.rst   (revision 59453)
+++ Doc/library/functions.rst   (working copy)
@@ -974,8 +974,9 @@
 
 .. function:: reversed(seq)
 
-   Return a reverse :term:`iterator`.  *seq* must be an object which supports
-   the sequence protocol (the :meth:`__len__` method and the 
:meth:`__getitem__`
+   Return a reverse :term:`iterator`.  *seq* must be an object which has
+   a :meth:`__reversed__` method [#]_ or supports the sequence protocol
+   (the :meth:`__len__` method and the :meth:`__getitem__`
method with integer arguments starting at ``0``).
 
.. versionadded:: 2.4
@@ -1342,6 +1343,8 @@
any I/O has been performed, and there's no reliable way to determine whether
this is the case.
 
+.. [#] See :ref:`sequence-types`
+
 .. [#] In the current implementation, local variable bindings cannot normally 
be
affected this way, but variables retrieved from other scopes (such as 
modules)
can be.  This may change.
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1677872] Efficient reverse line iterator

2007-12-10 Thread Mark Russell

Mark Russell added the comment:

As Guido requested I've split off the generic reversed() and __reversed__()
doc additions to this patch against 2.6: http://bugs.python.org/issue1582

The I/O error from reversed(open("/etc/passwd")) was caused by the inner
TextIOWrapper calling close() (via the inherited IOBase.__del__() method).
I've fixed it by having TextIOReverseIterator keep a reference to the file
object, and added a test case for the bug.

I think it's at least questionable that TextIOWrapper.close() is calling 
buffer.close() on a buffer that it did not create.  I assumed that keeping
a reference to the buffer object would be enough to keep the buffer open,
and I suspect this is likely to trip up others in future.  I think
TextIOWrapper.close() should probably just set a flag (for the use of its 
own closed() method) and rely on reference counting to call close() 
on the buffer object.  If that sounds on the right lines I'm happy to think
about it a bit more and submit a patch.

Added file: http://bugs.python.org/file8913/reverse-file-iterator-20071210.diff

_
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1677872>
_Index: Doc/library/stdtypes.rst
===
--- Doc/library/stdtypes.rst(revision 59453)
+++ Doc/library/stdtypes.rst(working copy)
@@ -1937,7 +1937,16 @@
right.  However, using :meth:`seek` to reposition the file to an absolute
position will flush the read-ahead buffer.
 
+.. method:: file.__reversed__()
 
+   Return a new iterator that returns lines in reverse order (but without
+   reading the entire file into memory first).  Normally called via the
+   :func:`reversed()` builtin, as in ``for line in reversed(f): print(line)``.
+   Useful for scanning backwards through large files without reading the
+   entire file first.  Note that this changes the current position of the
+   underlying file object, so you should not interleave use of reverse and
+   forward iteration over the same file object.
+
 .. method:: file.read([size])
 
Read at most *size* bytes from the file (less if the read hits EOF before
Index: Lib/io.py
===
--- Lib/io.py   (revision 59453)
+++ Lib/io.py   (working copy)
@@ -1136,6 +1136,126 @@
)[self.seennl]
 
 
+class TextIOReverseIterator:
+"""Line-based reverse iterator wrapper for IOBase objects.
+
+This class is used to implement TextIOWrapper.__reversed__().
+It searches backwards for encoded line terminator, which
+works for UTF-8 but not for encodings where one character encoding
+can be a substring of another longer one.
+"""
+
+# XXX Should we check for encodings that are known to work?  Currently
+# we would return incorrect results for a codec where, say, the encoding
+# of newline could appear as a substring of the encoding for some other
+# character or where the codec can have a non-default state at the start
+# of a line (do such encodings exist?).
+
+def __init__(self, buffer, encoding, newline=None,
+ buffer_size=DEFAULT_BUFFER_SIZE, wrapped_file=None):
+if not isinstance(encoding, str):
+raise ValueError("invalid encoding: %r" % encoding)
+buffer.seek(0, 2)
+self.buffer = buffer
+self._wrapped_file = wrapped_file # Keep ref to avoid premature close
+self._bufsize = buffer_size
+self._encoding = encoding
+self._translate_newlines = newline is None
+if newline:
+self._enc_cr = self._enc_lf = None
+else:
+self._enc_cr = '\r'.encode(encoding)
+self._enc_lf = '\n'.encode(encoding)
+if self._enc_cr + self._enc_lf != '\r\n'.encode(encoding):
+raise ValueError('unsupported encoding: %r' % encoding)
+self._newline = newline.encode(encoding) if newline else None
+self._limpos = buffer.tell()
+self._bufpos = self._limpos
+self._pending = b''
+
+def _extend_buffer_backwards(self):
+(bufpos, limpos, bufsize) = (self._bufpos, self._limpos, self._bufsize)
+
+newpos = (bufpos // bufsize) * bufsize
+if newpos == bufpos:
+newpos -= bufsize
+assert newpos >= 0
+nbytes = bufpos - newpos
+assert nbytes != 0
+
+self.buffer.seek(newpos, 0)
+assert self.buffer.tell() == newpos, \
+   'seek() arrived at %r (expected %r)' % (seekpos, newpos)
+newbuf = self.buffer.read(nbytes)
+assert len(newbuf) == nbytes, 'Unexpected EOF'
+
+if limpos > bufpos:
+newbuf += self._pending[:limpos - bufpos]
+ 

[issue1677872] Efficient reverse line iterator

2010-07-21 Thread Mark Russell

Mark Russell  added the comment:

I'll do a C version of the patch (hopefully in the next week or so).

--

___
Python tracker 
<http://bugs.python.org/issue1677872>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com