#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-----------------------------+-----------------------------------------
     Reporter:  Morgan Wahl  |                     Type:  Bug
       Status:  new          |                Component:  Uncategorized
      Version:  5.2          |                 Severity:  Normal
     Keywords:               |             Triage Stage:  Unreviewed
    Has patch:  0            |      Needs documentation:  0
  Needs tests:  0            |  Patch needs improvement:  0
Easy pickings:  0            |                    UI/UX:  0
-----------------------------+-----------------------------------------
 Hello,

 I was recently surprised to find that a simple detail view URL with a
 model ID in it was also accessible at a URL using "full width" digit
 characters. For example the page at "/pizza/123" could also be returned
 from "/pizza/123". That's the Unicode characters U+FF11 U+FF12 U+FF13.
 It turns out this is ultimately because the model `IntegerField` is using
 `int` to get an integer from the string that was originally in the URL.
 And I was surprised to find Python's `int` constructor uses
 `unicodedata.decimal` (or some equivalent) to translate from characters in
 a string to decimal digits.

 That was a cool accidental feature to discovery, however now I'm concerned
 about URL canonicalization. Python 3.13.3 accepts _68_ different
 characters for each digit. This means the same content is hypothetically
 accessible from many, many URLs. I've heard that can make a site look
 spammy to search engines. And maybe this could be an element of a security
 hole if something is assuming there is only one URL for a given page.

 The SEO problem could be addressed by setting a `<link rel=canonical>` in
 the page to point to `Pizza.objects.get(pk=id).get_absolute_url()` or some
 similar logic, or you could address the problem as a whole by setting up
 redirects or 404 responses, but all those approaches require a separate
 implementation for every view, since the view code ultimately doesn't know
 which parts of the URL are going to be treated as values of a
 `IntegerField`.

 Possible solutions I can think of are either:

 1. make some mechanism to very easily canonicalize URLs, by allowing users
 to somehow mark this situation explicitly in the URL conf, and then Django
 can set a property on the request object with the "canonicalized" URL.
 Then redirects or 404s or <link> tags could be implemented just once for
 all such URLs. (Redirects and 404s in a middleware, <link> tags in a base
 template.)
 2. Don't just pass strings to `int` in the model `IntegerField`. Instead
 only allow strings with ASCII digits to be used.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36483>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070197b2b54ee3-be8e4db9-1497-46d5-8a7d-e1f503bd55fb-000000%40eu-central-1.amazonses.com.

Reply via email to