> On Jan 25, 2017, at 4:26 PM, Wietse Venema <[email protected]> wrote:
>
>> Even fancier would be dynamically adjusting the database encoding to
>> UTF-8 when the client includes the "SMTPUTF8" ESMTP parameter in its
>> "MAIL" command. Since, presumably, in that case all non-ASCII data
>> in the SMTP dialogue are then UTF-8 encoded (and can be validated
>> as such before query construction).
>
> That should work, at least for information in SMTP commands. Not
> sure what happens with (canonical) header rewriting, header_checks,
> etc.
My reading of RFCs 6531/6532 is that when a client signals SMTPUTF8
any non-ASCII content in message headers can be assumed to be UTF-8
(such content is otherwise illegal). So one might either reject
such messages on input, or just leave input that is not valid UTF-8
unchanged (skip table lookups).
Mind you, IIRC we don't yet have an interface to pass encoding
information to table drivers, such that UTF-8 could be enabled
when the client promises SMTPUTF8, and otherwise "C-locale" or
(or equivalent single-byte identity encoding such as "LATIN1").
Thus, for example, in PCRE tables I don't recall a way to enable
UTF-8 matching only for known UTF-8 input, or to mark the table
as valid for only UTF-8 input (if it enables utf-8 in its match
patterns).
--
Viktor.