[DNSOP] Re: Character encoding in DNS

Petr Menšík Fri, 21 Nov 2025 17:32:30 -0800

Yes, it is binary data. Any binary content is permitted. It only dependsin what ways you choose to display it.


On 19/11/2025 21:58, Andrew Sullivan wrote:

On Wed, Nov 19, 2025 at 07:09:13PM -0500, Marco Davids (IETF) wrote:
That said, I prefer not to pre-emptively include such guidance in mydraft.
The current text seems sufficient and in line with the style andintent of other I-Ds and RFCs:
It includes a paragraph ensuring interoperability (e.g., input from asender such as 'この美しいドメイン名を購入してください。' is correctlyinterpreted by the receiver) and cautions in Security Considerationson careful parsing.
The advice is inadequate, if you're going to require people tointerpret a series of octets as octets in a UTF-8-encoded string. Atthe very least, you need to specify whether automatic processing ofany kind of that content is permitted. If it _is_ permitted (and itwould appear to me that it is, given what you say llater about carefulparsing &c.) , then it seems to me you're going to have to specifylimits on what code points may or may not be included, normalizationforms, &c. If you don't specify all of that, then attempting tointerpret the octets in the RDATA as being UTF-8-encoded strings willbe at least fragile.

How we got from binary only data to automatic processing, normalizationform of some kind? If it contains only printable characters encoded inverified encoding, it is reasonably safe to not escape each byte.

It is not clear from the rest of the document whether the "use UTF-8"principle is in effect for all the subtypes possible in the record, oronly in the ftxt subtype. For instance, is the host part of an furientry required to be an ASCII string (i.e. if it's an IDN, must it bethe A-label form?) or may it include UTF-8 strings beyond theASCII-equivalent range? It seems to me it would be valuable tospecify which is meant.
Best regards,

A

Domain labels are out of scope, IDN is unrelated. They have to becompared case-insensitive, which require to decode each code point andlocate proper lowercase/uppercase letter matching the source. This isonly about how to display content of records, where no sorting or caseinsensitive comparisons need to be done. We have some safety checks tonot output complete garbage, escaping is always possible when in doubt.But no form of normalisation needs to be done. Contents are producedsomewhere else.

Content of records remain application specific. If I look on google.comTXT response, I do not see any escaped data. Even if it contains alsobinary contents in some base64 encoding.


--
Petr Menšík
Senior Software Engineer, RHEL
Red Hat, https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[DNSOP] Re: Character encoding in DNS

Reply via email to