Re: UTF-8, EAI, and pgsql

Viktor Dukhovni Wed, 25 Jan 2017 13:51:43 -0800

> On Jan 25, 2017, at 4:26 PM, Wietse Venema <[email protected]> wrote:
> 
>> Even fancier would be dynamically adjusting the database encoding to
>> UTF-8 when the client includes the "SMTPUTF8" ESMTP parameter in its
>> "MAIL" command.  Since, presumably, in that case all non-ASCII data
>> in the SMTP dialogue are then UTF-8 encoded (and can be validated
>> as such before query construction).
> 
> That should work, at least for information in SMTP commands.  Not
> sure what happens with (canonical) header rewriting, header_checks,
> etc.


My reading of RFCs 6531/6532 is that when a client signals SMTPUTF8
any non-ASCII content in message headers can be assumed to be UTF-8
(such content is otherwise illegal).  So one might either reject
such messages on input, or just leave input that is not valid UTF-8
unchanged (skip table lookups).

Mind you, IIRC we don't yet have an interface to pass encoding
information to table drivers, such that UTF-8 could be enabled
when the client promises SMTPUTF8, and otherwise "C-locale" or
(or equivalent single-byte identity encoding such as "LATIN1").

Thus, for example, in PCRE tables I don't recall a way to enable
UTF-8 matching only for known UTF-8 input, or to mark the table
as valid for only UTF-8 input (if it enables utf-8 in its match
patterns).

-- 
        Viktor.

Re: UTF-8, EAI, and pgsql

Reply via email to