Re: How to filter based on "garbage" subjects ... ?

Cyrus Daboo Tue, 30 Sep 2003 09:37:50 -0700

Hi Marc,

--On Tuesday, September 30, 2003 11:32 -0300 "Marc G. Fournier" <[EMAIL PROTECTED]> wrote:

|
| I've yet to be able to come up with a sieve rule that will allow me to
| filter all "garbage" subjects to a separate folder ... you know the ones
| that look like:
|
| Subject: =?euc-kr?q?(=B1=A4=B0=ED)=B5=F0=C1=F6=
|
| I've even tried to use Pine filtering to filter based on 8bit subjects,
| but it doesn't pick them up either ...
|
| For instance, under Pine, if I try to select all subjects with =B1= in
| them, which the above contains, it selects nothing, so I'm figuring there
| has to be some control characters in there somewhere ... ?
|
| Thoughts?
|

From the SIEVE RFC:


|       Implementations decode header charsets to UTF-8.  Two strings are
|       considered equal if their UTF-8 representations are identical.
|       Implementations should decode charsets represented in the forms
|       specified by [MIME] for both message headers and bodies.
|       Implementations must be capable of decoding US-ASCII, ISO-8859-1,
|       the ASCII subset of ISO-8859-* character sets, and UTF-8.

i.e. SIEVE should be decoding the =?euc-kr?.... header into its utf8 form BEFORE doing the comparison with the text you provide. i.e. the =B1 quoted-printable encoded character will have been decoded into the utf8 representation of that for the euc-kr character set, and thus won't match the text you provide. Actually the euc-ky character set is a multibyte character set so in fact the unicode character is made up of =B1 and =A4. By my reckoning that is the unicode character 0xad11 - I'll leave you to work out the utf8 encoding of that!

Basically you are going to have a hard time trying to filter on arbitrary unicode characters in some random character set given that sieve expects utf8 in its scripts.

--
Cyrus Daboo

Re: How to filter based on "garbage" subjects ... ?

Reply via email to