[DNSOP] Re: Character encoding in DNS

Andrew Sullivan Fri, 21 Nov 2025 18:59:56 -0800

On Sat, Nov 22, 2025 at 02:31:41AM -0500, Petr Menšík wrote:

Yes, it is binary data. Any binary content is permitted. It onlydepends in what ways you choose to display it.


Then the requirement about UTF-8 is vacuous and should be removed from the 
document.  The problem with that approach, of course, is that the 
interoperability argument for standardizing this at all is rather harmed.  But 
since there's a semantics to the tags in a TXT RR implementing the 
specification, it's sort of hard to believe it's really just a matter of 
whatever the display wants to do.

How we got from binary only data to automatic processing,normalization form of some kind? If it contains only printablecharacters encoded in verified encoding, it is reasonably safe to notescape each byte.


To the extent I understand this, I'm pretty sure I disagree.  I don't think this is the right list 
for elementary education about text encoding; but you seem already to have "only printable 
characters" in your assumptions there, so perhaps I'll ask you this: Are ZWJ and ZWNJ 
printable or not?  If you don't understand the question, don't know what the answer is, or don't 
know why that is a trick question then I suggest you can't wave away this set of problems.  If 
instead you want to say that literally _any_ binary data is allowed, then say that.  If you want to 
suggest something other than some PRECIS profile (I think it was in this thread that Paul Hoffman 
made a different suggestion), then do that.  But the IETF has made a hash of internationalization 
over and over again precisely because of specification writers waving away the problems inherent in 
writing systems and their encodings online.  Perhaps not ironically, one of the earliest examples 
of this is the DNS itself, which is "8 bit clean" except, of course, for that little part 
where some bits match other bits in some cases (this is case folding).  The intermediate systems 
are supposed to cache the original form, but that doesn't always happen.

record, or only in the ftxt subtype. For instance, is the host partof an furi entry required to be an ASCII string (i.e. if it's anIDN, must it be the A-label form?) or may it include UTF-8 stringsbeyond the ASCII-equivalent range? It seems to me it would bevaluable to specify which is meant.

Domain labels are out of scope, IDN is unrelated.


I find that a little hard to swallow given that the content of a furi entry is 
a URI.

They have to becompared case-insensitive, which require to decode each code point andlocate proper lowercase/uppercase letter matching the source.


Great.  What is the uppercase match of â? Is it different to the uppercase 
match of ä?  All the time in every language?  (a hint: IDNA2008 solves this 
problem for you, so there's never a case where the answer to this is ambiguous 
for an a-label/u-label pair.)

Content of records remain application specific. If I look ongoogle.com TXT response, I do not see any escaped data. Even if itcontains also binary contents in some base64 encoding.


Sure, but we weren't talking about any TXT record.  I was talking about 
draft-davids-forsalereg, not google.com TXT records, so I don't understand how 
they are relevant.

Best regards,

A

--
Andrew Sullivan
[email protected]

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[DNSOP] Re: Character encoding in DNS

Reply via email to