#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-----------------------------+-----------------------------------------
Reporter: Morgan Wahl | Type: Bug
Status: new | Component: Uncategorized
Version: 5.2 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------+-----------------------------------------
Hello,
I was recently surprised to find that a simple detail view URL with a
model ID in it was also accessible at a URL using "full width" digit
characters. For example the page at "/pizza/123" could also be returned
from "/pizza/123". That's the Unicode characters U+FF11 U+FF12 U+FF13.
It turns out this is ultimately because the model `IntegerField` is using
`int` to get an integer from the string that was originally in the URL.
And I was surprised to find Python's `int` constructor uses
`unicodedata.decimal` (or some equivalent) to translate from characters in
a string to decimal digits.
That was a cool accidental feature to discovery, however now I'm concerned
about URL canonicalization. Python 3.13.3 accepts _68_ different
characters for each digit. This means the same content is hypothetically
accessible from many, many URLs. I've heard that can make a site look
spammy to search engines. And maybe this could be an element of a security
hole if something is assuming there is only one URL for a given page.
The SEO problem could be addressed by setting a `<link rel=canonical>` in
the page to point to `Pizza.objects.get(pk=id).get_absolute_url()` or some
similar logic, or you could address the problem as a whole by setting up
redirects or 404 responses, but all those approaches require a separate
implementation for every view, since the view code ultimately doesn't know
which parts of the URL are going to be treated as values of a
`IntegerField`.
Possible solutions I can think of are either:
1. make some mechanism to very easily canonicalize URLs, by allowing users
to somehow mark this situation explicitly in the URL conf, and then Django
can set a property on the request object with the "canonicalized" URL.
Then redirects or 404s or <link> tags could be implemented just once for
all such URLs. (Redirects and 404s in a middleware, <link> tags in a base
template.)
2. Don't just pass strings to `int` in the model `IntegerField`. Instead
only allow strings with ASCII digits to be used.
--
Ticket URL: <https://code.djangoproject.com/ticket/36483>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070197b2b54ee3-be8e4db9-1497-46d5-8a7d-e1f503bd55fb-000000%40eu-central-1.amazonses.com.