On Fri, May 8, 2009 at 1:03 PM, <pyt...@bdurham.com> wrote: > Note: Following cross-posted to python-list where it got queued due to > suspicious subject line. > > I'm looking for suggestions on technique (not necessarily code) about the > most pythonic way to normalize vertical whitespace in blocks of text so that > there is never more than 1 blank line between paragraphs. Our source text > has newlines normalized to single newlines (\n vs. combinations of \r and > \n), but there may be leading and trailing whitespace around each newline. > > Approaches: > > 1. split text to list of lines that get stripped then: > > a. walk this list building a new list of lines that track and ignore extra > blank lines > > -OR- > > b. re-join lines and replace '\n\n\n' wth' \n\n' until no more '\n\n\n' > matches exist > > 2. use regular expressions to match and replace whitespace pattern of 3 or > more adjacent \n's with surrounding whitespace > > 3. a 3rd party text processing library designed for efficiently cleaning up > text
1 sounds simple enough. That is the first thing I thought of. 2 should also be easy if you have any regex-fu and might turn out to be faster if you have a lot of text. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor