[issue1677872] Efficient reverse line iterator
Mark Russell added the comment: Sure - I'll do an updated patch at the weekend. _ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1677872> _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1677872] Efficient reverse line iterator
Mark Russell added the comment: Here's an updated version of the patch. Changes: - Updated to work against current py3k branch (r59441) - Added support for universal newlines - Added unit tests - Added docs The patch includes documentation for reversed() and __reversed__() (in the library and reference manuals respectively) which are independent of the reverse lines iterator - I can split those out to separate patch if needed. I also updated the expected output from test_profile and test_cProfile, although I think a better fix would be to filter out the stdlib-related stuff from the expected output, as currently these tests break whenever io.py is changed. Added file: http://bugs.python.org/file8902/reverse-file-iterator-20071209.diff _ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1677872> _Index: Doc/reference/datamodel.rst === --- Doc/reference/datamodel.rst (revision 59439) +++ Doc/reference/datamodel.rst (working copy) @@ -1662,12 +1662,27 @@ Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see :ref:`typeiter`. +.. method:: object.__reversed__(self) + + Called (if present) by the :func:`reversed()` builtin to implement + reverse iteration. It should return a new iterator object that iterates + over all the objects in the container in reverse order. + + If the :meth:`__reversed__()` method is not provided, the + :func:`reversed()` builtin will fall back to using the sequence protocol + (:meth:`__len__()` and :meth:`__getitem__()`). Objects should normally + only provide :meth:`__reversed__()` if they do not support the sequence + protocol and an efficient implementation of reverse iteration is possible. + The membership test operators (:keyword:`in` and :keyword:`not in`) are normally implemented as an iteration through a sequence. However, container objects can supply the following special method with a more efficient implementation, which also does not require the object be a sequence. + + + .. method:: object.__contains__(self, item) Called to implement membership test operators. Should return true if *item* is Index: Doc/library/stdtypes.rst === --- Doc/library/stdtypes.rst(revision 59439) +++ Doc/library/stdtypes.rst(working copy) @@ -1937,7 +1937,16 @@ right. However, using :meth:`seek` to reposition the file to an absolute position will flush the read-ahead buffer. +.. method:: file.__reversed__() + Return a new iterator that returns lines in reverse order (but without + reading the entire file into memory first). Normally called via the + :func:`reversed()` builtin, as in ``for line in reversed(f): print(line)``. + Useful for scanning backwards through large files without reading the + entire file first. Note that this changes the current position of the + underlying file object, so you should not interleave use of reverse and + forward iteration over the same file object. + .. method:: file.read([size]) Read at most *size* bytes from the file (less if the read hits EOF before Index: Doc/library/functions.rst === --- Doc/library/functions.rst (revision 59439) +++ Doc/library/functions.rst (working copy) @@ -869,8 +869,9 @@ .. function:: reversed(seq) - Return a reverse :term:`iterator`. *seq* must be an object which supports - the sequence protocol (the :meth:`__len__` method and the :meth:`__getitem__` + Return a reverse :term:`iterator`. *seq* must be an object which has + a :meth:`__reversed__` method [#]_ or supports the sequence protocol + (the :meth:`__len__` method and the :meth:`__getitem__` method with integer arguments starting at ``0``). @@ -1099,6 +1100,8 @@ any I/O has been performed, and there's no reliable way to determine whether this is the case. +.. [#] See :ref:`sequence-types` + .. [#] In the current implementation, local variable bindings cannot normally be affected this way, but variables retrieved from other scopes (such as modules) can be. This may change. Index: Lib/io.py === --- Lib/io.py (revision 59439) +++ Lib/io.py (working copy) @@ -1136,6 +1136,125 @@ )[self.seennl] +class TextIOReverseIterator: +"""Line-based reverse iterator wrapper for IOBase objects. + +This class is used to implement TextIOWrapper.__reversed__(). +It searches backwards for encoded line terminator, which +works for UTF-8 but not for encodings where one character encoding +can be a substring of another longer one. +
[issue1582] Documentation patch for reversed() and __reversed__()
New submission from Mark Russell: This patch adds documentation for the reversed() builtin and __reversed__() special method. -- components: Documentation files: reverse-2.6-docs.diff messages: 58369 nosy: mark_t_russell severity: normal status: open title: Documentation patch for reversed() and __reversed__() versions: Python 2.6 Added file: http://bugs.python.org/file8912/reverse-2.6-docs.diff __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1582> __Index: Doc/reference/datamodel.rst === --- Doc/reference/datamodel.rst (revision 59453) +++ Doc/reference/datamodel.rst (working copy) @@ -1819,12 +1819,27 @@ Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see :ref:`typeiter`. +.. method:: object.__reversed__(self) + + Called (if present) by the :func:`reversed()` builtin to implement + reverse iteration. It should return a new iterator object that iterates + over all the objects in the container in reverse order. + + If the :meth:`__reversed__()` method is not provided, the + :func:`reversed()` builtin will fall back to using the sequence protocol + (:meth:`__len__()` and :meth:`__getitem__()`). Objects should normally + only provide :meth:`__reversed__()` if they do not support the sequence + protocol and an efficient implementation of reverse iteration is possible. + The membership test operators (:keyword:`in` and :keyword:`not in`) are normally implemented as an iteration through a sequence. However, container objects can supply the following special method with a more efficient implementation, which also does not require the object be a sequence. + + + .. method:: object.__contains__(self, item) Called to implement membership test operators. Should return true if *item* is Index: Doc/library/functions.rst === --- Doc/library/functions.rst (revision 59453) +++ Doc/library/functions.rst (working copy) @@ -974,8 +974,9 @@ .. function:: reversed(seq) - Return a reverse :term:`iterator`. *seq* must be an object which supports - the sequence protocol (the :meth:`__len__` method and the :meth:`__getitem__` + Return a reverse :term:`iterator`. *seq* must be an object which has + a :meth:`__reversed__` method [#]_ or supports the sequence protocol + (the :meth:`__len__` method and the :meth:`__getitem__` method with integer arguments starting at ``0``). .. versionadded:: 2.4 @@ -1342,6 +1343,8 @@ any I/O has been performed, and there's no reliable way to determine whether this is the case. +.. [#] See :ref:`sequence-types` + .. [#] In the current implementation, local variable bindings cannot normally be affected this way, but variables retrieved from other scopes (such as modules) can be. This may change. ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1677872] Efficient reverse line iterator
Mark Russell added the comment: As Guido requested I've split off the generic reversed() and __reversed__() doc additions to this patch against 2.6: http://bugs.python.org/issue1582 The I/O error from reversed(open("/etc/passwd")) was caused by the inner TextIOWrapper calling close() (via the inherited IOBase.__del__() method). I've fixed it by having TextIOReverseIterator keep a reference to the file object, and added a test case for the bug. I think it's at least questionable that TextIOWrapper.close() is calling buffer.close() on a buffer that it did not create. I assumed that keeping a reference to the buffer object would be enough to keep the buffer open, and I suspect this is likely to trip up others in future. I think TextIOWrapper.close() should probably just set a flag (for the use of its own closed() method) and rely on reference counting to call close() on the buffer object. If that sounds on the right lines I'm happy to think about it a bit more and submit a patch. Added file: http://bugs.python.org/file8913/reverse-file-iterator-20071210.diff _ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1677872> _Index: Doc/library/stdtypes.rst === --- Doc/library/stdtypes.rst(revision 59453) +++ Doc/library/stdtypes.rst(working copy) @@ -1937,7 +1937,16 @@ right. However, using :meth:`seek` to reposition the file to an absolute position will flush the read-ahead buffer. +.. method:: file.__reversed__() + Return a new iterator that returns lines in reverse order (but without + reading the entire file into memory first). Normally called via the + :func:`reversed()` builtin, as in ``for line in reversed(f): print(line)``. + Useful for scanning backwards through large files without reading the + entire file first. Note that this changes the current position of the + underlying file object, so you should not interleave use of reverse and + forward iteration over the same file object. + .. method:: file.read([size]) Read at most *size* bytes from the file (less if the read hits EOF before Index: Lib/io.py === --- Lib/io.py (revision 59453) +++ Lib/io.py (working copy) @@ -1136,6 +1136,126 @@ )[self.seennl] +class TextIOReverseIterator: +"""Line-based reverse iterator wrapper for IOBase objects. + +This class is used to implement TextIOWrapper.__reversed__(). +It searches backwards for encoded line terminator, which +works for UTF-8 but not for encodings where one character encoding +can be a substring of another longer one. +""" + +# XXX Should we check for encodings that are known to work? Currently +# we would return incorrect results for a codec where, say, the encoding +# of newline could appear as a substring of the encoding for some other +# character or where the codec can have a non-default state at the start +# of a line (do such encodings exist?). + +def __init__(self, buffer, encoding, newline=None, + buffer_size=DEFAULT_BUFFER_SIZE, wrapped_file=None): +if not isinstance(encoding, str): +raise ValueError("invalid encoding: %r" % encoding) +buffer.seek(0, 2) +self.buffer = buffer +self._wrapped_file = wrapped_file # Keep ref to avoid premature close +self._bufsize = buffer_size +self._encoding = encoding +self._translate_newlines = newline is None +if newline: +self._enc_cr = self._enc_lf = None +else: +self._enc_cr = '\r'.encode(encoding) +self._enc_lf = '\n'.encode(encoding) +if self._enc_cr + self._enc_lf != '\r\n'.encode(encoding): +raise ValueError('unsupported encoding: %r' % encoding) +self._newline = newline.encode(encoding) if newline else None +self._limpos = buffer.tell() +self._bufpos = self._limpos +self._pending = b'' + +def _extend_buffer_backwards(self): +(bufpos, limpos, bufsize) = (self._bufpos, self._limpos, self._bufsize) + +newpos = (bufpos // bufsize) * bufsize +if newpos == bufpos: +newpos -= bufsize +assert newpos >= 0 +nbytes = bufpos - newpos +assert nbytes != 0 + +self.buffer.seek(newpos, 0) +assert self.buffer.tell() == newpos, \ + 'seek() arrived at %r (expected %r)' % (seekpos, newpos) +newbuf = self.buffer.read(nbytes) +assert len(newbuf) == nbytes, 'Unexpected EOF' + +if limpos > bufpos: +newbuf += self._pending[:limpos - bufpos] +
[issue1677872] Efficient reverse line iterator
Mark Russell added the comment: I'll do a C version of the patch (hopefully in the next week or so). -- ___ Python tracker <http://bugs.python.org/issue1677872> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com