Yes, it is binary data. Any binary content is permitted. It only depends in what ways you choose to display it.

On 19/11/2025 21:58, Andrew Sullivan wrote:
On Wed, Nov 19, 2025 at 07:09:13PM -0500, Marco Davids (IETF) wrote:

That said, I prefer not to pre-emptively include such guidance in my draft.

The current text seems sufficient and in line with the style and intent of other I-Ds and RFCs:

It includes a paragraph ensuring interoperability (e.g., input from a sender such as 'この美しいドメイン名を購入してください。' is correctly interpreted by the receiver) and cautions in Security Considerations on careful parsing.

The advice is inadequate, if you're going to require people to interpret a series of octets as octets in a UTF-8-encoded string. At the very least, you need to specify whether automatic processing of any kind of that content is permitted.  If it _is_ permitted (and it would appear to me that it is, given what you say llater about careful parsing &c.) , then it seems to me you're going to have to specify limits on what code points may or may not be included, normalization forms, &c.  If you don't specify all of that, then attempting to interpret the octets in the RDATA as being UTF-8-encoded strings will be at least fragile.
How we got from binary only data to automatic processing, normalization form of some kind? If it contains only printable characters encoded in verified encoding, it is reasonably safe to not escape each byte.


It is not clear from the rest of the document whether the "use UTF-8" principle is in effect for all the subtypes possible in the record, or only in the ftxt subtype.  For instance, is the host part of an furi entry required to be an ASCII string (i.e. if it's an IDN, must it be the A-label form?) or may it include UTF-8 strings beyond the ASCII-equivalent range?  It seems to me it would be valuable to specify which is meant.

Best regards,

A

Domain labels are out of scope, IDN is unrelated. They  have to be compared case-insensitive, which require to decode each code point and locate proper lowercase/uppercase letter matching the source. This is only about how to display content of records, where no sorting or case insensitive comparisons  need to be done. We have some safety checks to not output complete garbage, escaping is always possible when in doubt. But no form of normalisation needs to be done. Contents are produced somewhere else.

Content of records remain application specific. If I look on google.com TXT response, I do not see any escaped data. Even if it contains also binary contents in some base64 encoding.

--
Petr Menšík
Senior Software Engineer, RHEL
Red Hat, https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to