On 22/11/2025 03:59, Andrew Sullivan wrote:
On Sat, Nov 22, 2025 at 02:31:41AM -0500, Petr Menšík wrote:
Yes, it is binary data. Any binary content is permitted. It only depends in what ways you choose to display it.

Then the requirement about UTF-8 is vacuous and should be removed from the document.  The problem with that approach, of course, is that the interoperability argument for standardizing this at all is rather harmed.  But since there's a semantics to the tags in a TXT RR implementing the specification, it's sort of hard to believe it's really just a matter of whatever the display wants to do.
I would not say it a requirement. I am not sure what document are you referring to. This is an idea without a formal draft. fval of draft-davids-forsalereg is a good example. "v=FORSALE1;fval=€999" is good for people and should be printed in a human friendly way, IMO.

How we got from binary only data to automatic processing, normalization form of some kind? If it contains only printable characters encoded in verified encoding, it is reasonably safe to not escape each byte.

To the extent I understand this, I'm pretty sure I disagree.  I don't think this is the right list for elementary education about text encoding; but you seem already to have "only printable characters" in your assumptions there, so perhaps I'll ask you this: Are ZWJ and ZWNJ printable or not?  If you don't understand the question, don't know what the answer is, or don't know why that is a trick question then I suggest you can't wave away this set of problems.  If instead you want to say that literally _any_ binary data is allowed, then say that.  If you want to suggest something other than some PRECIS profile (I think it was in this thread that Paul Hoffman made a different suggestion), then do that.  But the IETF has made a hash of internationalization over and over again precisely because of specification writers waving away the problems inherent in writing systems and their encodings online.  Perhaps not ironically, one of the earliest examples of this is the DNS itself, which is "8 bit clean" except, of course, for that little part where some bits match other bits in some cases (this is case folding).  The intermediate systems are supposed to cache the original form, but that doesn't always happen.

Python3 says ZWJ is not printable. These definitions are corner cases. It is up to higher layers to render text. That is job of GUI toolkits, terminal emulators and similar.

"\u200C".isprintable() == False

I think in native code iswprint() should be used to guess, whether to escape or not. When the first codepoint is non-printable, then the remaining can be escaped. It does not seem to be text for humans. Of course isprint() must not be used on raw undecoded UTF-8 bytes to have sensible results.

PRECIS or RFC 9839 is an implementation detail. As long as they both result in "háčkyčárky".isprintable() == True, any of them is fine. I think for printing the text RFC 9839 is sufficient and simpler form, I would recommend to have used in DNS software. They do not need to know what is uppercase letter  or a digit in arabic. They need to escape only any  Problematic Code Points specified and print the rest of utf-8 encoded text as a normal text. If a domain name can contain socks icon, why not a TXT record? Why are no emoticons rendered?


record, or only in the ftxt subtype. For instance, is the host part of an furi entry required to be an ASCII string (i.e. if it's an IDN, must it be the A-label form?) or may it include UTF-8 strings beyond the ASCII-equivalent range?  It seems to me it would be valuable to specify which is meant.

Domain labels are out of scope, IDN is unrelated.

I find that a little hard to swallow given that the content of a furi entry is a URI.

URI can have normalized form of only A-label input with percent encoded path characters in URI. It can be presented to user in U-label form without percent escaping in paths. In a nice way.

https://háčkyčárky.cz/ can be sent on-wire as https://xn--hkyrky-ptac70bc.cz/ without losing any information. Remote party can display it in form nice to humans. Most DNS tools do not present TXT records in similar form to users.


They  have to be compared case-insensitive, which require to decode each code point and locate proper lowercase/uppercase letter matching the source.

Great.  What is the uppercase match of â? Is it different to the uppercase match of ä?  All the time in every language?  (a hint: IDNA2008 solves this problem for you, so there's never a case where the answer to this is ambiguous for an a-label/u-label pair.)
Can you specify where exactly do I need that information? I want only printed TXT records with text not in english. With an exception of Multicast DNS label, where IDNA 2008 is not used, but utf-8 directly is. The only question is whether I need to do escaping or not when presenting response.

Content of records remain application specific. If I look on google.com TXT response, I do not see any escaped data. Even if it contains also binary contents in some base64 encoding.

Sure, but we weren't talking about any TXT record.  I was talking about    draft-davids-forsalereg, not google.com TXT records, so I don't understand how they are relevant.

Best regards,

A

Best Regards,
Petr

--
Petr Menšík
Senior Software Engineer, RHEL
Red Hat, https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to