Duncan <[EMAIL PROTECTED]> wrote: > >Internet message headers as used in both mail and news were originally >specified as 7-bit US-ASCII/ANSI, and for backward compatibility and >interoperability, that remains the case at the "raw" level. The problem is >that while the standards were later adapted to allow other charsets in the >message body, with the MIME standards (see RFCs 2041-2045)
I'm afraid your numbers are bit off - e.g. 2041 is "Mobile Network Tracing" etc. The MIME RFCs are 2045 ("Multipurpose Internet Mail Extensions (MIME) Part One") to 2049 ("Multipurpose Internet Mail Extensions (MIME) Part Five"), though 2045-2047 are the actual format standards. The purpose of this apparent nitpicking will be obvious in a moment.:-) >There is however a newer (I believe) and less formal (I'm not sure how far >it has gotten in the RFC standardization process) workaround. I've never >found the details useful enough to take the necessary time to grok, but in >somewhat vague and likely not entirely correct terms, a header can include >an inline ISO-standard charset reference and then be encoded into it. I'm not sure why you think this is of any lesser "weight" than the others, it is in fact one in the sequence that you *should* have given above, namely 2047 - and it was one of the two original MIME RFCs (1522 in the set of 1521+1522), and it's certainly as formal as any standards- track RFC. FWIW, all of 2045-2047 and 2049 are "DRAFT STANDARD". But anyway, the header is not "encoded into the charset", it *is* in the charset given by the reference, and encoded into Quoted-"Printable" or Base64, just like the body may be (there are some minor differences in the encoding rules vs those that are used for the body). In effect it's the equivalent of the Content-Type + Content-Transfer-Encoding headers + the body, but for a single header line, and all contained in that single header line. The main difference is that the header *must* (according to the RFCs:-) be encoded to fulfill the 7-bit-only requirement, while the body may in some cases be unencoded (but labeled via C-T) 8-bit with a 'C-T-E: 8bit' header. >The problem with implementing this is that because it's newer and less >formal, various implementations have minor or not so minor differences, and >aren't necessarily entirely interoperable and compatible with one another. Hm, well, it's a very simple standard, though I think I recall seeing some mention of some client getting the white-space/word-break handling wrong - that's just a plain bug though. >In fact, according to what I've read (being I've never had reason to >personally find out), it's not uncommon at all for various "foreign" >newsgroups and mailing lists to omit the ISO-xxxx or whatever bit entirely, >and simply assume that all headers are encoded in the "native" charset. True, and not only that, we may even have "native" charset in the body without proper Content-Type and Content-Transfer-Encoding headers - the horror!:-) But I think both this and the common non-2047-compliance of headers in News is due to a) (as you mention) it's largely superfluous since the language used in the newsgroup is an implicit indication of what charset to expect, b) no significant part of Usenet has (or had) any problems with 8-bit headers or content, and last but not least c) there has never been a formal standardization on the use of MIME for Usenet messages (it could probably be argued that there has never been a formal standardization of the format of Usenet messages at all...). Item c) might change (see (currently) draft-ietf-usefor-usefor-12.txt), but I'm not holding my breath - the working group can soon celebrate its 10th anniversary without having managed to produce an RFC.:-) >With that background, it's not unexpected that pan would have problems in >the area. Even the best clients will have occasional problems in this area, >due as I said to the immature/incomplete standardization and resulting >partially incompatible implementations. Per above, I would say that it's not a matter of immature/incomplete standardization of the "MIME header format" (RFC 2047), but of immature/incomplete standardization of the Usenet message format in general. I.e. you can't really claim that a News reader is buggy for not being MIME-compliant - but on the other hand a good reader (like Pan) is expected to be able to deal with all kinds of non-standard encodings for "binary" message bodies, so it might as well do the comparatively straightforward MIME thing too (even though it's quite unusual for "binaries" in News). Anyway, FWIW, Pan (0.122) seems to have no problems decoding headers such as Subject: =?iso-8859-1?q?Re:_19"_sk=E4rm_1440x900_p=E5_Linux_Fedora_Core5_-_HJ=C4LP!!!?= (from <[EMAIL PROTECTED]>), without me telling it anything in particular. This is just "simple" 8-bit iso-8859/x though, things may be different with multibyte charsets (I can't determine that, since I have no way of knowing what the result of the decoding is supposed to look like). Well, utf-8 which is two bytes per character for the non-ascii part of iso-8859/x seems to work fine too, but of course that's still one byte per actual character. And there we get nice stuff like Subject: Re: =?utf-8?Q?Skapa_filsystem_f?= =?utf-8?Q?=C3=B6r_DVD_med_filer_p?= =?utf-8?Q?=C3=A5_4_GB?= From: =?utf-8?Q?Daniel_Sp=C3=A5ngberg?= <[EMAIL PROTECTED]> (from <[EMAIL PROTECTED]>) - quoted-*printable* is really a joke at this point, though admittedly that Subject line demonstrates a particularly moronic way of doing the encoding. --Per Hedeland _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users