On 16-Dec-22 14:19, Dan Fandrich via curl-library wrote:
On Fri, Dec 16, 2022 at 01:18:12PM -0500, Timothe Litt via curl-library wrote:And/or the callback registration could specify "all domain names", "Just IDN" -The browsers (at least Firefox) do something subtle but pretty useful for avoiding spoofing. Based on the name registration policies of the TLD being used, they either show the IDN as expected in the URL bar, or just show the ugly punycode version of the name. TLDs with policies that forbid names that could lead to confusion (homographic attacks) get the desired behaviour (of seeing the IDN name) but those without policies, or with policies that could lead to confusion get the punycode version, making it obvious that some spoofing may have gone on to get you to that web page. Mozilla's original policy can be seen here: https://www-archive.mozilla.org/projects/security/tld-idn-policy-listThey've amended that policy since to allow displaying IDN in some cases even on those TLDs with bad or nonexistent policies. This only happens if all the characters in the TLD come from the same script. If a TLD mixes, for example, Cyrillic and Latin characters, it's displayed as punycode, but all Cyrillic is shown in all its UNICODE glory. The idea is that people (who can read that script) will recognize the different characters within that script and be able to tell them apart, and there won't be any mixing of similar-looking characters within a single domain name. That policy can be seen at https://wiki.mozilla.org/IDN_Display_Algorithm Lots of thought has been given to this problem already (Mozilla seems to have implemented the first policy 17 years ago), and curl could take advantage of that. But, since it's not a browser it can't use the same means of notifying the user (displaying punycode in the URL bar), but some viable alternatives to that have already been brought up here. Dan
As you say, curl isn't a browser. And hardcoding the TLDs' policies seems like whack-a-mole.
A simple callback function in the library to pass on any domain name would be fairly cheap, and would allow any policy. It's up to the UI to decide how to handle issues. curl could provide a sample policy such as the one I outlined. Perhaps the policy could be a loadable DLL, e.g. host-name-filter=idn-alias loads idn-alias.{so,exe,...)
I think curl is best at being a tool, not a policy engine. So making policies pluggable seems in line with the philosophy; hard-coding anything more involved than a default (and generally agreed-upon) list of homographic characters that triggers a warning unless whitelisted doesn't.
Unlike browsers, curl usually doesn't wander the web at random or follow search engine results or e-mailed links. If you setup or use a curl command, the risk seems a lot less - you have to think. A warning is a good safety net - but if you're in Japan, clearly you don't want to be warned about every local website. Then you get into the same automatic 'i'm annoyed, just say yes' syndrome seen with self-signed certificates. Thus, a whitelist...
So I think that the curl command could reasonably provide a simple 'warn on IDNs with risky characters' along with a whitelist. I don't think that trying to replicate the UI of a browser with complex policies is worthwhile (or feasible). A filter function hook in the library would allow experimentation and arbitrarily complex policies. And making the function loadable (in the command line tool), decouples the policy from curl proper.
Along those lines, you could imagine a filter policy that reads a file of regexes, one that uses Mozilla's code to decide, simple white/black lists, or ... anything you can imagine. And if it's in a dell selected by an option in .curlrc, it's pretty painless for the user. Plus, any other application that uses the library can link with curl's sample policy if it suits the application.
Anyhow, that's my 3 cents. I don't think a ban on IDNs is useful. I do think a flexible policy is required, and that the policy should be customizable and isolated from the mechanism. The suggested 'is this name ok' hook in the library does this at minimal cost. (The default function can be 'return true'... or the call skipped if no function registered.)
Timothe Litt ACM Distinguished Engineer -------------------------- This communication may not represent the ACM or my employer's views, if any, on the matters discussed.
OpenPGP_signature
Description: OpenPGP digital signature
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.html
