> On 20 Nov 2025, at 02:49, Petr Menšík <[email protected]> > wrote: > > No, I disagree. > On 19/11/2025 03:47, John Levine wrote: >> It appears that Petr Menšík <[email protected]> said: >> >>>> If some application desperately wants UTF-8 in DNS RDATA, TXT records >>>> are not in my view the best vehicle for that. >>>> >> The spec has always been clear that TXT records are strings of >> arbitrary 8-bit data. If you want to put a particular interpretation >> on some TXT records, pick an underscore _prefix and write a spec that >> says what the format of the records is. See this registry for a dozen >> examples: >> >> https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#underscored-globally-scoped-dns-node-names > RFC 1035 says about TXT record: > TXT RRs are used to hold descriptive text. The semantics of the text depends > on the domain where it is found. > Now it does not specify anything about escaping the text when any byte with > value >=127 is used. Yes, it is 8bit data. But records designed to hold > generic binary data use presentation format in base64. TXT does not use that > by design. Your assumption UTF-8 code points are not letters, just because > they have higher byte value, are wrong in my opinion. > ASCII only text is always valid UTF-8. There does not need to by any change > to process utf-8 encoded zone file. When you escape it, then it is escaped. > When you do that, it becomes unreadable by humans.
Petr, when RFC 1035 was written UTF did not exist, nor did base64. You need to read RFC 1035 taking into account the state of the computing in 1987. Yes there is still code from that time running that needs to be able to read these records. Network ASCII existed i.e. RFC 20. Octets with high bits set had no specified character set so where unsafe to display. End of line conventions still differ in systems and emitting control characters is dangerous so they need to be escaped. RFC 1035 Because these files are text files several special encodings are necessary to allow arbitrary data to be loaded. In particular: of the root. @ A free standing @ is used to denote the current origin. \X where X is any character other than a digit (0-9), is used to quote that character so that its special meaning does not apply. For example, "\." can be used to place a dot character in a label. \DDD where each D is a digit is the octet corresponding to the decimal number described by DDD. The resulting octet is assumed to be text and is not checked for special meaning. ( ) Parentheses are used to group data that crosses a line boundary. In effect, line terminations are not recognized within parentheses. ; Semicolon is used to start a comment; the remainder of the line is ignored. >>> consumes them in wire format, it will not matter or change anything. Do >>> you know application, which consumes binary data from TXT record from >>> their presentation format? >>> >> Every authoritative DNS server does that when it reads a master file. >> The binary stuff is represented with decimal escapes, but so what, >> it's mechanically generated and mechanically consumed. > No, zone files can be often maintained by people in form of text files. I do > not reason why TXT records should be present in escaped form. These letters > are not _binary_, they are letters encoded in higher value bytes only. They > are still letters. Escaping can always be used on systems not able to cope > with UTF-8 normal form. Unless you do some iso-8859-1 to utf-8 or reverse > conversion, it won't break. If you do that, please stop that at once. > How often do you store records encoded in unknown record format? It can > specify anything. But it is not simple to work with. Can you guess what is > written in unknown record? That why we use normal presentation form whenever > possible. I ask to do that also for TXT records. >>> Current way is selective. It does not use base64 or similar encoding for >>> normal ASCII letters. But it prevents using unicode text in useful form. >>> >> That's how master files have been for 40 years. They're not going to change >> now. >> >> R's, >> John > I am not trying to change master file format. I want it consumable in its raw > 8bit utf-8 form. Common, this is not SMTP protocol where 8 bits usage causes > a problem. Both ldns-read-zone and named-compilezone understands raw 8bit > form. It does not need any change. It converts readable text with utf-8 nice > text to unreadable escaped text. > It can process it. I want it stop escaping unless it has very specific reason > to do so. It knows what is space and what is not. It is okay to escape quotes > or similar data not permitted inside records. As long as it can identify it > is still in inside record data, it can use binary input directly. > Can you please find me, where is escaping data in TXT records specified as > mandatory? I did not find it anywhere. It seems to be just a custom. It seems > not neccesary one to me. > TXT "Zkouška" > TXT "testíček" > This is correctly read by tools. I do not demand it has to be used this way. > Escaping is always possible. But escaping is not needed. named-checkzone > won't report any issue. It reads it correctly as binary input. I agree, it is > binary safe. > This is year 2025. I think we can stop pretending everything non-ASCII is not > printable as it is. Stop pretending only english letters belong into DNS for > some reason. > I would like everyone commenting on those to state what was their first > language. Did you grow in world where ASCII can represent every name or word? > Then you might not understand why this is important to people from different > backgrounds. > Regards, > Petr > -- > Petr Menšík > Software Engineer, RHEL > Red Hat, https://www.redhat.com/ > PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB > _______________________________________________ > DNSOP mailing list -- [email protected] > To unsubscribe send an email to [email protected] -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: [email protected] _______________________________________________ DNSOP mailing list -- [email protected] To unsubscribe send an email to [email protected]
