On Tue, Jan 11, 2005 at 08:52:43AM -0800, Don Armstrong wrote: > tbm brought to my attention another bug that should be fixed, namely > the maintainer e-mail address that is sent out when someone closes the > bug: > > http://bugs.donarmstrong.com/cgi/bugreport.cgi?bug=18&msg=12 > > The patch now has moved the de_rfc1522 code into Debbugs::MIME since > this slice of code needs to be used in both the cgi scripts and the > process script. > > I've also taken the liberty of cleaning up the way that Exporter is > used, and replaced munging with @ISA with a straightforward use base > 'Exporter';
OK, here's a very belated review. Humble apologies for the delay. In general I've applied all the changes I've suggested in my local copy already, so there's no need to send another patch containing them; however, I have not yet written the metadata migration script I suggest below, nor have I dealt with the RFC1522 encoding on output. I've checked the Debbugs::MIME parts of your patch with my modifications into CVS HEAD so that you can see what I've done, together with a debian/control patch to add the appropriate dependency. In general, I think this is a very good start. Thanks, and thanks for pestering me about it until I got round to it! > Index: Debbugs/MIME.pm > =================================================================== > RCS file: /cvs/debbugs/source/Debbugs/MIME.pm,v > retrieving revision 1.1 > diff -u -r1.1 MIME.pm > --- Debbugs/MIME.pm 3 Aug 2003 09:46:30 -0000 1.1 > +++ Debbugs/MIME.pm 11 Jan 2005 16:43:49 -0000 > @@ -2,19 +2,23 @@ > > use strict; > > -use File::Path; > -use MIME::Parser; > -use Exporter (); > -use vars qw($VERSION @ISA @EXPORT_OK); > +use base qw(Exporter); > +use vars qw($VERSION @EXPORT_OK @EXPORT); > > BEGIN { > $VERSION = 1.00; > > - @ISA = qw(Exporter); > - @EXPORT_OK = qw(parse); > + @EXPORT_OK = qw(parse de_rfc1522 getmailbody); > + @EXPORT = qw(getmailbody); > } > > -sub getmailbody ($); Hm, not sure I like using @EXPORT rather than the deliberate forward declaration; plus, getmailbody is not really meant to be exported, because it's an internal function (despite the fact that scripts/process.in and scripts/service.in currently use another version of it, since Debbugs::MIME represents an incomplete modularisation effort). The rest of the Exporter cleanup looks OK, though. > @@ -95,5 +99,33 @@ > > return { header => [EMAIL PROTECTED], body => [EMAIL PROTECTED]; > } > + > +=head2 de_rfc1522 > + > + de_rfc1522('=?iso-8859-1?Q?D=F6n_Armstr=F3ng?= <[EMAIL PROTECTED]>') > + > +Turn RFC-1522 names into the UTF-8 equivalent. [#61342 et al] Bug reference should probably live in a comment rather than being exported to the documentation etc. > + > +=cut > + > +# Set up the default rfc1522 decoder, which turns all charsets that > +# are supported into the appropriate UTF-8 charset. > +MIME::WordDecoder->default(new MIME::WordDecoder(['*' => sub {my > ($data,$charset) = @_; > + $charset =~ > s/^(UTF)\-(\d+)/$1$2/i; > + return $data > unless utf8_supported_charset($charset); > + return > to_utf8({-string => $data, > + > -charset => $charset, > + }); > + }, > + ],)); This should be in a BEGIN block, I think. I've also reformatted to 80 columns. > @@ -181,7 +181,7 @@ > $indexentry .= htmlpackagelinks($status{package}, 0); > > $indexentry .= "Reported by: <a href=\"" . submitterurl($status{originator}) > - . "\">" . htmlsanit($status{originator}) . "</a>;\n"; > + . "\">" . htmlsanit(de_rfc1522($status{originator})) . > "</a>;\n"; > > $indexentry .= "Owned by: " . htmlsanit($status{owner}) . ";\n" > if length $status{owner}; I realise it's a database format change, but I'd really prefer to have the metadata files be pure UTF-8, so that we don't have to process them for display every time, and to make things like searching easier. We can always write a migration script. > @@ -369,7 +369,9 @@ > $normstate= 'go'; > push @mail, $_; > } elsif ($normstate eq 'html') { > - $this .= $_; > + # XXX This is broken. The BTS shouldn't be writing > + # HTML into the log... but it does. > + $this .= de_rfc1522($_); > } elsif ($normstate eq 'go') { > s/^\030//; > if (!$suppressnext && !$found_msgid && Not sure that qualifies for an XXX comment here. The right place to note the design flaw is really in Debbugs/Log.pm, which I've done. Debbugs::Log also provides a better basis for designing the migration to a new, less broken-by-design record type. Again, I think the html record should really be in UTF-8, and a migration script written. > @@ -409,7 +411,7 @@ > $thisheader = "<h2>Message sent:</h2>\n"; > } else { > s/\04/, /g; s/\n$//; > - $thisheader = "<h2>Message sent to > ".htmlsanit($_).":</h2>\n"; > + $thisheader = "<h2>Message sent to > ".htmlsanit(de_rfc1522($_)).":</h2>\n"; > } > $this = ""; > $normstate= 'kill-body'; And again. Basically, I think mail text should be preserved as-is in the log (since it means you can send out notifications by mail and preserve the MIME structure as it was received), but everything else counts as metadata and should be in a canonical form. > Index: scripts/process.in > =================================================================== > RCS file: /cvs/debbugs/source/scripts/process.in,v > retrieving revision 1.86 > diff -u -r1.86 process.in > --- scripts/process.in 5 Aug 2004 15:09:30 -0000 1.86 > +++ scripts/process.in 11 Jan 2005 16:43:49 -0000 > @@ -17,6 +17,9 @@ > require "$lib_path/errorlib"; > $ENV{'PATH'} = $lib_path.':'.$ENV{'PATH'}; > > +# for de_rfc1522 > +use Debbugs::MIME qw(de_rfc1522); > + > chdir( "$gSpoolDir" ) || die "chdir spool: $!\n"; > > #open(DEBUG,"> /tmp/debbugs.debug"); No need for the comment. > @@ -252,6 +255,7 @@ > } > $markedby= $header{'from'} eq $replyto ? $replyto : > "$header{'from'} (reply to $replyto)"; > + $markedby = de_rfc1522($markedby); > if ($codeletter eq 'F') { > (&appendlog,&finish) if length($data->{forwarded}); > $receivedat= "[EMAIL PROTECTED]"; I'd suggest instead simply: - $header{$v} = $_; + $header{$v} = de_rfc1522($_); ... and likewise in scripts/service.in. That way, all metadata trivially gets decoded with very little effort, and is stored in canonical form as suggested above. > @@ -430,6 +434,7 @@ > X-$gProject-PR-Package: $data->{package} > X-$gProject-PR-Keywords: $data->{keywords} > Reply-To: [EMAIL PROTECTED] > +Content-Type: text/plain; charset="utf-8" > > This is an automatic notification regarding your $gBug report > #$ref: $data->{subject}, This gets more complicated, and is one of the reasons I'd been avoiding this work. The right way to do this, and fix a few other bugs along the way, is to attach the text of the original report using MIME rather than simply appending it; that way, it can have its original content-type, including character set. So, I'll avoid this part for the time being. We won't be making anything significantly worse by doing this in two stages anyway, since it's already broken. Do we need to re-RFC1522-encode headers in outgoing messages? I think we probably do. MIME::Words knows how to do that, and despite the warning in its documentation I think the encode_mimewords() function is safe for encoding from UTF-8. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]