On 16-Dec-22 14:19, Dan Fandrich via curl-library wrote:
On Fri, Dec 16, 2022 at 01:18:12PM -0500, Timothe Litt via curl-library wrote:
And/or the callback registration could specify "all domain names", "Just IDN" -
The browsers (at least Firefox) do something subtle but pretty useful for
avoiding spoofing.  Based on the name registration policies of the TLD being
used, they either show the IDN as expected in the URL bar, or just show the
ugly punycode version of the name. TLDs with policies that forbid names that
could lead to confusion (homographic attacks) get the desired behaviour (of
seeing the IDN name) but those without policies, or with policies that could
lead to confusion get the punycode version, making it obvious that some
spoofing may have gone on to get you to that web page. Mozilla's original
policy can be seen here:
https://www-archive.mozilla.org/projects/security/tld-idn-policy-list

They've amended that policy since to allow displaying IDN in some cases even on
those TLDs with bad or nonexistent policies. This only happens if all the
characters in the TLD come from the same script. If a TLD mixes, for example,
Cyrillic and Latin characters, it's displayed as punycode, but all Cyrillic is
shown in all its UNICODE glory. The idea is that people (who can read that
script) will recognize the different characters within that script and be able
to tell them apart, and there won't be any mixing of similar-looking characters
within a single domain name. That policy can be seen at
https://wiki.mozilla.org/IDN_Display_Algorithm

Lots of thought has been given to this problem already (Mozilla seems to have
implemented the first policy 17 years ago), and curl could take advantage of
that. But, since it's not a browser it can't use the same means of notifying
the user (displaying punycode in the URL bar), but some viable alternatives
to that have already been brought up here.

Dan

As you say, curl isn't a browser.  And hardcoding the TLDs' policies seems like whack-a-mole.

A simple callback function in the library to pass on any domain name  would be fairly cheap, and would allow any policy.  It's up to the UI to decide how to handle issues.  curl could provide a sample policy such as the one I outlined.  Perhaps the policy could be a loadable DLL, e.g. host-name-filter=idn-alias loads idn-alias.{so,exe,...)

I think curl is best at being a tool, not a policy engine.  So making policies pluggable seems in line with the philosophy; hard-coding anything more involved than a default (and generally agreed-upon) list of homographic characters that triggers a warning unless whitelisted doesn't.

Unlike browsers, curl usually doesn't wander the web at random or follow search engine results or e-mailed links.  If you setup or use a curl command, the risk seems a lot less - you have to think.    A warning is a good safety net - but if you're in Japan, clearly you don't want to be warned about every local website. Then you get into the same automatic 'i'm annoyed, just say yes' syndrome seen with self-signed certificates.  Thus, a whitelist...

So I think that the curl command could reasonably provide a simple 'warn on IDNs with risky characters' along with a whitelist.  I don't think that trying to replicate the UI of a browser with complex policies is worthwhile (or feasible).  A filter function hook in the library would allow experimentation and arbitrarily complex policies.  And making the function loadable (in the command line tool), decouples the policy from curl proper.

Along those lines, you could imagine a filter policy that reads a file of regexes, one that uses Mozilla's code to decide, simple white/black lists, or ... anything you can imagine.  And if it's in a dell selected by an option  in .curlrc,   it's pretty painless for the user.  Plus, any other application that uses the library can link with curl's sample policy if it suits the application.

Anyhow, that's my 3 cents.  I don't think a ban on IDNs is useful.  I do think a flexible policy is required, and that the policy should be customizable and isolated from the mechanism. The suggested 'is this name ok' hook in the library does this at minimal cost.  (The default function can be 'return true'... or the call skipped if no function registered.)


Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Reply via email to