[Mailman-Users] How to wrap text in archived messages
Mark Dale via Mailman-Users writes: > I'm looking for a way to wrap lines in archived messages. Executive summary: There's not really a good way to do this. It's extremely complicated, *especially* in email (as opposed to most "normal" text) because of quoting conventions in email. > With zero understanding of Python my attempts to implement this > have failed so far and I may well be barking up the wrong tree > completely. Any clues or pointers gratefully received. It's not your lack of Python, it's that reliably reformatting email for different formats of email is a *very* hard problem in natural language processing, and requires some knowledge of message user agent internals. And that's why Pipermail punts by just wrapping the whole thing in a PRE element. Works for Mutt users (= Unix email elders). Gory details follow (because I think it's an interesting problem!) > Looking at the HTML page source -- in both cases (wrapped and > unwrapped) I see the message content is enclosed by PRE tags. Right. PRE is not very pretty as HTML goes, but it works OK for all RFC-conforming text/plain email. I assume that that in fact this comes from text/plain parts created by the author's MUA, because the agents that we use to transform a text/html part to text/plain will format to a reasonable width such as 72 characters. > And the lines in that block that seem responsible for the PRE tags are ... > > lines.insert(0, '') > lines.append('') > > My question is: Can those PRE tags be removed and replaced with > something equivalent to PHP's "nl2br" (which inserts a line break > BR in place of new line entries)? No, because there *are no* newlines to break those very long lines. These MUAs use newline to mean "paragraph break", not "line break". You might get a better result in these messages by removing the "PRE" tags, and wrapping each line with "...", but that's a real hack, and almost certain to make RFC-conforming email look quite ugly, because every line becomes a paragraph, and you'll lose all indentation. Eg, in the code blocks you posted, all the lines will end up flush left. If your members are posting code or poetry, or using indented block quotations etc, they're likely to be extremely unhappy with the result. Python's standard library does have a textwrap module, but I'm not at all sure it's suitable for this. If you know that the long lines of a message are actually paragraphs, you can use something like from textwrap import wrap # work backward because wrapping changes indicies of later lines for i in range(len(lines) - 1, -1, -1): # NDT = detect_prefix(lines[i]) lines[i:i+1] = wrap(lines[i], initial_indent=NDT, subsequent_indent=NDT) If a line is indented or has a quoting prefix, you have to detect that for yourself and set NDT to that prefix. Something like import re prefix_re = re.compile('[ >]*') def detect_prefix(line): m = prefix_re.match(line) return m.group(0) should capture most indentation and quoting prefixes, but there are other conventions. Whether you use P elements or the textwrap module, it's probably a good idea to find out how long the long lines are, and what percentage of the message they are, and avoid trying to wrap a message that looks like it "mostly" has lines of reasonable length. If you don't, and your target is the old "typewriter standard" width of 66, and somebody using an RFC-conforming MUA just prefers 72, you'll reformat their mail into alternating lines of about 60 characters and 10 characters. Yuck ... Which of the above would work better for you depends a lot on the typical content of your list. But issues with quoting and indentation are likely to have you tearing your hair out. Steve -- Mailman-Users mailing list -- mailman-users@python.org To unsubscribe send an email to mailman-users-le...@python.org https://mail.python.org/mailman3/lists/mailman-users.python.org/ Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/ https://mail.python.org/archives/list/mailman-users@python.org/
[Mailman-Users] Re: How to wrap text in archived messages
>> I'm looking for a way to wrap lines in archived messages >> And the lines in that block that seem responsible for the PRE tags are ... >> >> lines.insert(0, '') >> lines.append('') >> >> My question is: Can those PRE tags be removed and replaced with >> something equivalent to PHP's "nl2br" (which inserts a line break >> BR in place of new line entries)? > No, because there *are no* newlines to break those very long lines. > These MUAs use newline to mean "paragraph break", not "line break". But there are "newlines" and there isn't any need to insert linebreaks into those long lines -- they just need to wrap. Looking at, for example, a message that originally has 3 paragraphs of text: %(body)s ** First paragraph that is a really long line of text. Second paragraph that is a really long line of text. Third paragraph that is a really long line of text. * If those PRE tags are removed, then all 3 lines get joined up and displayed as one continuous line. It solves the non-wrap problem but it looses the "paragraphs". So that's a no go. As said, if this was PHP we could use the "nl2br" function - which inserts line breaks before all newlines. -- would give us ... /// First paragraph of body that is a really long line of text. Second paragraph of body that is a really long line of text. Third paragraph of body that is a really long line of text. The BR tags would preserve the space between the lines and give the appearance of paragraphs in the HTML Pipermail archive page. Granted the HTML would not be strictly kosher, but then neither are the PRE tags strictly kosher as they're are not being used as they should be. The main thing is that the lines would wrap according the to width of the window and eliminate the need for horizontal scrolling. > You might get a better result in these messages by removing the "PRE" > tags, and wrapping each line with "...", but that's a real > hack, and almost certain to make RFC-conforming email look quite ugly, > because every line becomes a paragraph, and you'll lose all > indentation. Eg, in the code blocks you posted, all the lines will > end up flush left. If your members are posting code or poetry, or > using indented block quotations etc, they're likely to be extremely > unhappy with the result. Agreed. To horrible to even think about. > > Python's standard library does have a textwrap module, but I'm not at > all sure it's suitable for this. If you know that the long lines of a > message are actually paragraphs, you can use something like > > from textwrap import wrap > # work backward because wrapping changes indicies of later lines > for i in range(len(lines) - 1, -1, -1): > # NDT = detect_prefix(lines[i]) > lines[i:i+1] = wrap(lines[i], initial_indent=NDT, > subsequent_indent=NDT) > This is sort of where I was looking to go, but as you've pointed out, there's no telling if the text will be in paragraphs, code blocks etc. Does the Python code snippet that I mentioned ... def nl2br(s): return '\n'.join(s.split('\n')) ... make any sense as a Python equivalent of PHP's "nl2br" function (to accomplish the insertion of the BR line break tags)? Cheers, Mark -- Mailman-Users mailing list -- mailman-users@python.org To unsubscribe send an email to mailman-users-le...@python.org https://mail.python.org/mailman3/lists/mailman-users.python.org/ Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/ https://mail.python.org/archives/list/mailman-users@python.org/
[Mailman-Users] Re: How to wrap text in archived messages
>> You might get a better result in these messages by removing the "PRE" >> tags, and wrapping each line with "...", but that's a real >> hack, and almost certain to make RFC-conforming email look quite ugly, >> because every line becomes a paragraph, and you'll lose all >> indentation. Eg, in the code blocks you posted, all the lines will >> end up flush left. If your members are posting code or poetry, or >> using indented block quotations etc, they're likely to be extremely >> unhappy with the result. D'oh! I just saw the error in my whole way of thinking. And of course, any such "nl2br" equivalent will do exactly the same as wrapping with P tags -- with everything left aligned. But thank you Steve, for taking the time and trouble to explain. It is indeed a whole can of worms. Much to learn. /Mark // Python's standard library does have a textwrap module, but I'm not at all sure it's suitable for this. If you know that the long lines of a message are actually paragraphs, you can use something like from textwrap import wrap # work backward because wrapping changes indicies of later lines for i in range(len(lines) - 1, -1, -1): # NDT = detect_prefix(lines[i]) lines[i:i+1] = wrap(lines[i], initial_indent=NDT, subsequent_indent=NDT) If a line is indented or has a quoting prefix, you have to detect that for yourself and set NDT to that prefix. Something like import re prefix_re = re.compile('[ >]*') def detect_prefix(line): m = prefix_re.match(line) return m.group(0) should capture most indentation and quoting prefixes, but there are other conventions. Whether you use P elements or the textwrap module, it's probably a good idea to find out how long the long lines are, and what percentage of the message they are, and avoid trying to wrap a message that looks like it "mostly" has lines of reasonable length. If you don't, and your target is the old "typewriter standard" width of 66, and somebody using an RFC-conforming MUA just prefers 72, you'll reformat their mail into alternating lines of about 60 characters and 10 characters. Yuck ... Which of the above would work better for you depends a lot on the typical content of your list. But issues with quoting and indentation are likely to have you tearing your hair out. // -- Mailman-Users mailing list -- mailman-users@python.org To unsubscribe send an email to mailman-users-le...@python.org https://mail.python.org/mailman3/lists/mailman-users.python.org/ Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/ https://mail.python.org/archives/list/mailman-users@python.org/