On Sat, Nov 22, 2025 at 02:31:41AM -0500, Petr Menšík wrote:
Yes, it is binary data. Any binary content is permitted. It only depends in what ways you choose to display it.
Then the requirement about UTF-8 is vacuous and should be removed from the document. The problem with that approach, of course, is that the interoperability argument for standardizing this at all is rather harmed. But since there's a semantics to the tags in a TXT RR implementing the specification, it's sort of hard to believe it's really just a matter of whatever the display wants to do.
How we got from binary only data to automatic processing, normalization form of some kind? If it contains only printable characters encoded in verified encoding, it is reasonably safe to not escape each byte.
To the extent I understand this, I'm pretty sure I disagree. I don't think this is the right list for elementary education about text encoding; but you seem already to have "only printable characters" in your assumptions there, so perhaps I'll ask you this: Are ZWJ and ZWNJ printable or not? If you don't understand the question, don't know what the answer is, or don't know why that is a trick question then I suggest you can't wave away this set of problems. If instead you want to say that literally _any_ binary data is allowed, then say that. If you want to suggest something other than some PRECIS profile (I think it was in this thread that Paul Hoffman made a different suggestion), then do that. But the IETF has made a hash of internationalization over and over again precisely because of specification writers waving away the problems inherent in writing systems and their encodings online. Perhaps not ironically, one of the earliest examples of this is the DNS itself, which is "8 bit clean" except, of course, for that little part where some bits match other bits in some cases (this is case folding). The intermediate systems are supposed to cache the original form, but that doesn't always happen.
record, or only in the ftxt subtype. For instance, is the host part of an furi entry required to be an ASCII string (i.e. if it's an IDN, must it be the A-label form?) or may it include UTF-8 strings beyond the ASCII-equivalent range? It seems to me it would be valuable to specify which is meant.
Domain labels are out of scope, IDN is unrelated.
I find that a little hard to swallow given that the content of a furi entry is a URI.
They have to be compared case-insensitive, which require to decode each code point and locate proper lowercase/uppercase letter matching the source.
Great. What is the uppercase match of â? Is it different to the uppercase match of ä? All the time in every language? (a hint: IDNA2008 solves this problem for you, so there's never a case where the answer to this is ambiguous for an a-label/u-label pair.)
Content of records remain application specific. If I look on google.com TXT response, I do not see any escaped data. Even if it contains also binary contents in some base64 encoding.
Sure, but we weren't talking about any TXT record. I was talking about draft-davids-forsalereg, not google.com TXT records, so I don't understand how they are relevant. Best regards, A -- Andrew Sullivan [email protected] _______________________________________________ DNSOP mailing list -- [email protected] To unsubscribe send an email to [email protected]
