"Roger Shum" <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Tue, 20 Feb 2007 18:55:53 -0800:
> I am using PAN 0.123 under Fedora Core 6. > I would like to know if there is any way to change > the character encoding in the header pane. As I only know English I've not had to worry about such details here and therefore have little personal experience in the frustrations of "alternate charset Internet message headers". As such, I don't know the specifics of pan's behavior. However, I understand the issue (to a degree) from a technical perspective. Perhaps an explanation will help clarify why it's a problem, anyway. (It's possible you know more than me on this. If so, just consider that this may be helpful to others reading, and of course correct me if I screwed something up. =8^0 After all, I'm here to learn as well as to help where I can, too. =8^) The problem is that as with much of computer technology, the original specifications were designed only with US-ASCII in mind -- little if any thought was given to truly international standards until the original specifications were well laid out and accepted by all implementations, making change difficult. Internet message headers as used in both mail and news were originally specified as 7-bit US-ASCII/ANSI, and for backward compatibility and interoperability, that remains the case at the "raw" level. The problem is that while the standards were later adapted to allow other charsets in the message body, with the MIME standards (see RFCs 2041-2045) specifying the charset as part of the content-type header (as for example content-type: text/plain; charset="iso-8859-1"), that header isn't typically downloaded as part of the "overview" headers, so it's only available /after/ the message itself is downloaded. As such, it /cannot/ effectively specify the charset to be used for the subject, author, and other headers, because it isn't available until the message itself is downloaded. Headers /must/ remain 7- bit US-ASCII/ANSI, or they /will/ break interoperability and backward compatibility. There is however a newer (I believe) and less formal (I'm not sure how far it has gotten in the RFC standardization process) workaround. I've never found the details useful enough to take the necessary time to grok, but in somewhat vague and likely not entirely correct terms, a header can include an inline ISO-standard charset reference and then be encoded into it. Clients understanding and implementing the format can then decode the characters and display them as appropriate. Clients not understanding/ implementing the format decoding will display instead the raw 7-bit US-ASCII version, including the ISO-xxxx reference and what will look like a bunch of random characters that form the raw encoded header or portion thereof. The problem with implementing this is that because it's newer and less formal, various implementations have minor or not so minor differences, and aren't necessarily entirely interoperable and compatible with one another. Where they aren't compatible, they will normally simply fall back to displaying the "raw" US-ASCII encoding. In fact, according to what I've read (being I've never had reason to personally find out), it's not uncommon at all for various "foreign" newsgroups and mailing lists to omit the ISO-xxxx or whatever bit entirely, and simply assume that all headers are encoded in the "native" charset. This DEFINITELY breaks any pretense or hope of standardized interoperability, as there's simply no way for a client to independently determine what specific messages may be using. Still, it's not entirely unreasonable to expect (in this case) native Chinese charsets (BIG5) in dedicated Chinese groups, or native Japanese in Japanese groups, or Korean in Korean groups, or..., just as it wasn't unreasonable (at the time) for programmers after all faced with much stricter memory and bandwidth restrictions to assume US-ASCII in the original implementation, particularly given that the Internet originated as a US-DoD project and was only available to the US military/scientific/educational establishment. It's only the folks trying to view different groups (or mail messages) in different languages, or to a lessor extent those using software not natively developed in their language and not fully adapted to it, that have issues. With that background, it's not unexpected that pan would have problems in the area. Even the best clients will have occasional problems in this area, due as I said to the immature/incomplete standardization and resulting partially incompatible implementations. All that said, I suspect the group prefs default charset setting /should/ be resolving the worst of the issue. Perhaps it's broken at the moment? If you are a good coder, Charles has always been fairly receptive to patches, and I'm relatively sure he'd love to have someone who actually knows the territory and works with it daily, working with him on better i18n/l10n (internationalization and localization) support for pan. If you aren't such a coder yourself, perhaps you can be instrumental in helping to arrange for someone else with the requisite knowledge to take an interest, to everyone's benefit. =8^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users