On 16/11/2025 04:39, Viktor Dukhovni wrote:
On Fri, Nov 14, 2025 at 06:53:41PM +0100, Petr Menšík wrote:

When that is the case, a more robust approach is to publish the desired
(HTML) text via a suitable HTTP(S) server, and place an ASCII URL in the
RDATA.

As appropriate, the HTTP headers and/or markup can describe the language
and character encoding of the content.

If some application desperately wants UTF-8 in DNS RDATA, TXT records
are not in my view the best vehicle for that.

What record is that then? DNS-SD protocol puts a lot of text fields into TXT. A lot of them should be presented to the user. We do that in avahi GUI tools. Not the best engineering example, but at least tool with GUI. Is it necessary to define a new record type only to define how it should be presented? ASCII only string is valid UTF-8 string at the same time. With no exceptions.

https://www.ietf.org/rfc/rfc6763.html#section-6.5

We do not need per-language variants of the record. I am not proposing anything like that. URLs have percent encoded utf-8 data in path. I do not know any better example than TXT record itself.

People use also different character sets with letters not present in
US-ASCII. TXT records are unstructured and I think should be easy to process
by people. Some languages use latin letters with some additions, like my
native Czech. Other languages use completely different alphabet. Current
command-line tools escape UTF-8 encoding into /DDD form. Which is
definitively not easy to read by human. I think it should be presented as
UTF-8 encoded text whenever it is valid UTF-8 encoding. Escape it only if it
is not.
TXT records are a bit of a misnomer, in that, as already noted in this
thread, the payload cannot be assumed to be "text".  They are not
necessarily intended for presentation to a human reader.

Payload depends on the application using that content. If application specific uses it to store binary data, okay. I do not propose to change what data can be stored in TXT. I propose to change how it can be presented. I propose to remove unnecessary escaping in case when it can be verified as valid UTF-8 encoding. It is unusable for text.

Content of those records are application-specific. If application consumes them in wire format, it will not matter or change anything. Do you know application, which consumes binary data from TXT record from their presentation format?

I created bind9 feature request:
https://gitlab.isc.org/isc-projects/bind9/-/issues/5643
If I were making the decision at ISC, I'd decline to adopt the proposed
change.


But I think it should be clarified, how this should be presented. DNS-SD can
store quite a lot of information into those records. I think it makes sense
to allow native speakers to insert text descriptions in whatever language it
is easiest for them to read. Current utilities do not make that simple.
All sort of fun with BIDI, control characters, ...

You said it contains binary data. That should be then rendered as binary data. It might choose to escape non-printable codepoints when control characters appear. BIDI is fine. Not sure why arabic people should be forced to have description records oriented in different way than native for them. If the terminal can handle it, just pass it into the terminal as it is.

Current way is selective. It does not use base64 or similar encoding for normal ASCII letters. But it prevents using unicode text in useful form. I think extending presentation to UTF-8 should not break anything, except allow non-english speakers to use those records in a more friendly way too.

--
Petr Menšík
Software Engineer, RHEL
Red Hat,https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to