#28628: Audit for and abolish all use of '\d' in regexes
-------------------------------------+-------------------------------------
     Reporter:  James Bennett        |                    Owner:  Ad
         Type:                       |  Timmering
  Cleanup/optimization               |                   Status:  assigned
    Component:  Core (Other)         |                  Version:  dev
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  1                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Ad Timmering):

 I went through each of the use cases in the Django code - and found little
 reason/benefit to update most.

 Important background to this is that in all unicode matches for `\d`,
 `int(x)` actually properly casts the number back to a normal `int`. Most
 cases where a decimal is expected and extracted, it will be casted with
 `int(x)` so the problem goes away -- or might actually be beneficial to
 users (eg. I live in Japan where people frequently use full-width
 decimals, such as 012345 instead of 012345). Eg.
 {{{
 >>> int('\uABF9')  # MEETEI MAYEK DIGIT NINE
 9
 }}}


 Most use cases to me seem to fall in one of the below:

 a) We're processing user input which ''might'' actually be (inadvertently)
 input in non-ASCII; so result is likely desired - and the very least
 changing it could mean it's a braking change for users. ==> DON'T CHANGE

 b) Changing to a more restrictive regex seems harmless enough, but also
 doesn't add much value. Eg. when parsing a version number like "1.2.3"
 with something like `(\d)\.(\d)\.(\d)`.

 c) To me there was only one case of Django code where it might be
 beneficial to change, which is in `django.utils.http` processing of
 dates/times in HTTP headers - and the spec clearly requires ASCII digits.

 Inventory of use cases with thoughts [https://docs.google.com/document/d
 /1nc1uwTIghm-eIhiIlssNAH72KNFHoHAsL0gGlr9dRlg/edit# in this Google doc].
 Curious to thoughts of others.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/28628#comment:15>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/069.a06d63f8d0c2f953c275ba41be24c875%40djangoproject.com.

Reply via email to