[DNSOP] Re: Character encoding in DNS

Petr Menšík Tue, 18 Nov 2025 17:44:30 -0800

On 16/11/2025 04:39, Viktor Dukhovni wrote:

On Fri, Nov 14, 2025 at 06:53:41PM +0100, Petr Menšík wrote:


When that is the case, a more robust approach is to publish the desired
(HTML) text via a suitable HTTP(S) server, and place an ASCII URL in the
RDATA.

As appropriate, the HTTP headers and/or markup can describe the language
and character encoding of the content.

If some application desperately wants UTF-8 in DNS RDATA, TXT records
are not in my view the best vehicle for that.

What record is that then? DNS-SD protocol puts a lot of text fields intoTXT. A lot of them should be presented to the user. We do that in avahiGUI tools. Not the best engineering example, but at least tool with GUI.Is it necessary to define a new record type only to define how it shouldbe presented? ASCII only string is valid UTF-8 string at the same time.With no exceptions.


https://www.ietf.org/rfc/rfc6763.html#section-6.5

We do not need per-language variants of the record. I am not proposinganything like that. URLs have percent encoded utf-8 data in path. I donot know any better example than TXT record itself.

People use also different character sets with letters not present in
US-ASCII. TXT records are unstructured and I think should be easy to process
by people. Some languages use latin letters with some additions, like my
native Czech. Other languages use completely different alphabet. Current
command-line tools escape UTF-8 encoding into /DDD form. Which is
definitively not easy to read by human. I think it should be presented as
UTF-8 encoded text whenever it is valid UTF-8 encoding. Escape it only if it
is not.

TXT records are a bit of a misnomer, in that, as already noted in this
thread, the payload cannot be assumed to be "text".  They are not
necessarily intended for presentation to a human reader.

Payload depends on the application using that content. If applicationspecific uses it to store binary data, okay. I do not propose to changewhat data can be stored in TXT. I propose to change how it can bepresented. I propose to remove unnecessary escaping in case when it canbe verified as valid UTF-8 encoding. It is unusable for text.

Content of those records are application-specific. If applicationconsumes them in wire format, it will not matter or change anything. Doyou know application, which consumes binary data from TXT record fromtheir presentation format?

I created bind9 feature request:
https://gitlab.isc.org/isc-projects/bind9/-/issues/5643

If I were making the decision at ISC, I'd decline to adopt the proposed
change.

But I think it should be clarified, how this should be presented. DNS-SD can
store quite a lot of information into those records. I think it makes sense
to allow native speakers to insert text descriptions in whatever language it
is easiest for them to read. Current utilities do not make that simple.

All sort of fun with BIDI, control characters, ...

You said it contains binary data. That should be then rendered as binarydata. It might choose to escape non-printable codepoints when controlcharacters appear. BIDI is fine. Not sure why arabic people should beforced to have description records oriented in different way than nativefor them. If the terminal can handle it, just pass it into the terminalas it is.

Current way is selective. It does not use base64 or similar encoding fornormal ASCII letters. But it prevents using unicode text in useful form.I think extending presentation to UTF-8 should not break anything,except allow non-english speakers to use those records in a morefriendly way too.


--
Petr Menšík
Software Engineer, RHEL
Red Hat,https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[DNSOP] Re: Character encoding in DNS

Reply via email to