[Pan-users] Re: How to set character encoding for the Header pane?

Duncan Wed, 21 Feb 2007 07:13:10 -0800

"Roger Shum" <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted below, on  Tue, 20 Feb 2007
18:55:53 -0800:


> I am using PAN 0.123 under Fedora Core 6.
> I would like to know if there is any way to change
> the character encoding in the header pane.

As I only know English I've not had to worry about such details here and
therefore have little personal experience in the frustrations of "alternate
charset Internet message headers".  As such, I don't know the specifics of
pan's behavior.  However, I understand the issue (to a degree) from a
technical perspective.  Perhaps an explanation will help clarify why it's a
problem, anyway.  (It's possible you know more than me on this.  If so, just
consider that this may be helpful to others reading, and of course correct
me if I screwed something up. =8^0  After all, I'm here to learn as well as
to help where I can, too.  =8^)

The problem is that as with much of computer technology, the original
specifications were designed only with US-ASCII in mind -- little if any
thought was given to truly international standards until the original
specifications were well laid out and accepted by all implementations,
making change difficult.

Internet message headers as used in both mail and news were originally
specified as 7-bit US-ASCII/ANSI, and for backward compatibility and
interoperability, that remains the case at the "raw" level.  The problem is
that while the standards were later adapted to allow other charsets in the
message body, with the MIME standards (see RFCs 2041-2045) specifying the
charset as part of the content-type header (as for example content-type:
text/plain; charset="iso-8859-1"), that header isn't typically downloaded as
part of the "overview" headers, so it's only available /after/ the message
itself is downloaded.  As such, it /cannot/ effectively specify the charset
to be used for the subject, author, and other headers, because it isn't
available until the message itself is downloaded.  Headers /must/ remain 7-
bit US-ASCII/ANSI, or they /will/ break interoperability and backward
compatibility.

There is however a newer (I believe) and less formal (I'm not sure how far
it has gotten in the RFC standardization process) workaround.  I've never
found the details useful enough to take the necessary time to grok, but in
somewhat vague and likely not entirely correct terms, a header can include
an inline ISO-standard charset reference and then be encoded into it.
Clients understanding and implementing the format can then decode the
characters and display them as appropriate.  Clients not understanding/
implementing the format decoding will display instead the raw 7-bit US-ASCII
version, including the ISO-xxxx reference and what will look like a bunch of
random characters that form the raw encoded header or portion thereof.

The problem with implementing this is that because it's newer and less
formal, various implementations have minor or not so minor differences, and
aren't necessarily entirely interoperable and compatible with one another.
Where they aren't compatible, they will normally simply fall back to
displaying the "raw" US-ASCII encoding.

In fact, according to what I've read (being I've never had reason to
personally find out), it's not uncommon at all for various "foreign"
newsgroups and mailing lists to omit the ISO-xxxx or whatever bit entirely,
and simply assume that all headers are encoded in the "native" charset.
This DEFINITELY breaks any pretense or hope of standardized
interoperability, as there's simply no way for a client to independently
determine what specific messages may be using.  Still, it's not entirely
unreasonable to expect (in this case) native Chinese charsets (BIG5) in
dedicated Chinese groups, or native Japanese in Japanese groups, or Korean
in Korean groups, or..., just as it wasn't unreasonable (at the time) for
programmers after all faced with much stricter memory and bandwidth
restrictions to assume US-ASCII in the original implementation, particularly
given that the Internet originated as a US-DoD project and was only
available to the US military/scientific/educational establishment.  It's
only the folks trying to view different groups (or mail messages) in
different languages, or to a lessor extent those using software not natively
developed in their language and not fully adapted to it, that have issues.

With that background, it's not unexpected that pan would have problems in
the area.  Even the best clients will have occasional problems in this area,
due as I said to the immature/incomplete standardization and resulting
partially incompatible implementations.

All that said, I suspect the group prefs default charset setting /should/ be
resolving the worst of the issue.  Perhaps it's broken at the moment?  If
you are a good coder, Charles has always been fairly receptive to patches,
and I'm relatively sure he'd love to have someone who actually knows the
territory and works with it daily, working with him on better i18n/l10n
(internationalization and localization) support for pan.  If you aren't such
a coder yourself, perhaps you can be instrumental in helping to arrange for
someone else with the requisite knowledge to take an interest, to everyone's
benefit. =8^)

--
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/pan-users

[Pan-users] Re: How to set character encoding for the Header pane?

Reply via email to