#34654: Post-normalization performed on the Username field leading to the
bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Comment (by Sim4n6):
Indeed. Next is my understanding.
In the strip() function doc
https://docs.python.org/3/library/stdtypes.html?highlight=removing%20whitespace#str.strip
If the intended character is omitted, the function removes the whitespace.
The problem is that the stripping of the leading and the trailing
whitespace is performed before the Unicode normalization.
This case fits the attack "using Unicode encoding to bypass Validation
logic" : https://capec.mitre.org/data/definitions/71.html
I have done some initial fuzzing (nothing fancy) the Unicode character
BRAILLE PATTERN BLANK fits partially the criteria.
Visually looks like a space. But this one remains the same when
normalized.
The idea is there may be a Unicode character that when stripped using the
strip() function, it does not get removed. But after normalization with
the NFKC form, it may come back as whitespace.
If a user is able to set a username with whitespace, it kind of looks a
bit "fishy". Two usernames "sim4n6" and "sim4n6 " (with a trailing space).
And also some fuzzing leads to this code points too:
Code Point: 0xa8 Fuzzed: [ ̈sim4n6 ̈].
Code Point: 0xaf Fuzzed: [ ̄sim4n6 ̄].
Code Point: 0xb4 Fuzzed: [ ́sim4n6 ́].
Code Point: 0xb8 Fuzzed: [ ̧sim4n6 ̧].
Code Point: 0x2d8 Fuzzed: [ ̆sim4n6 ̆].
Code Point: 0x2d9 Fuzzed: [ ̇sim4n6 ̇].
Code Point: 0x2da Fuzzed: [ ̊sim4n6 ̊].
Code Point: 0x2db Fuzzed: [ ̨sim4n6 ̨].
Code Point: 0x2dc Fuzzed: [ ̃sim4n6 ̃].
Code Point: 0x2dd Fuzzed: [ ̋sim4n6 ̋].
Code Point: 0x37a Fuzzed: [ ͅsim4n6 ͅ].
Code Point: 0x384 Fuzzed: [ ́sim4n6 ́].
Code Point: 0x385 Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1fbd Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fbf Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fc0 Fuzzed: [ ͂sim4n6 ͂].
Code Point: 0x1fc1 Fuzzed: [ ̈͂sim4n6 ̈͂].
Code Point: 0x1fcd Fuzzed: [ ̓̀sim4n6 ̓̀].
Code Point: 0x1fce Fuzzed: [ ̓́sim4n6 ̓́].
Code Point: 0x1fcf Fuzzed: [ ̓͂sim4n6 ̓͂].
Code Point: 0x1fdd Fuzzed: [ ̔̀sim4n6 ̔̀].
Code Point: 0x1fde Fuzzed: [ ̔́sim4n6 ̔́].
Code Point: 0x1fdf Fuzzed: [ ̔͂sim4n6 ̔͂].
Code Point: 0x1fed Fuzzed: [ ̈̀sim4n6 ̈̀].
Code Point: 0x1fee Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1ffd Fuzzed: [ ́sim4n6 ́].
Code Point: 0x1ffe Fuzzed: [ ̔sim4n6 ̔].
Code Point: 0x2017 Fuzzed: [ ̳sim4n6 ̳].
Code Point: 0x203e Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0x309b Fuzzed: [ ゙sim4n6 ゙].
Code Point: 0x309c Fuzzed: [ ゚sim4n6 ゚].
Code Point: 0xfc5e Fuzzed: [ ٌّsim4n6 ٌّ].
Code Point: 0xfc5f Fuzzed: [ ٍّsim4n6 ٍّ].
Code Point: 0xfc60 Fuzzed: [ َّsim4n6 َّ].
Code Point: 0xfc61 Fuzzed: [ ُّsim4n6 ُّ].
Code Point: 0xfc62 Fuzzed: [ ِّsim4n6 ِّ].
Code Point: 0xfc63 Fuzzed: [ ّٰsim4n6 ّٰ].
Code Point: 0xfe49 Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4a Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4b Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4c Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe70 Fuzzed: [ ًsim4n6 ً].
Code Point: 0xfe72 Fuzzed: [ ٌsim4n6 ٌ].
Code Point: 0xfe74 Fuzzed: [ ٍsim4n6 ٍ].
Code Point: 0xfe76 Fuzzed: [ َsim4n6 َ].
Code Point: 0xfe78 Fuzzed: [ ُsim4n6 ُ].
Code Point: 0xfe7a Fuzzed: [ ِsim4n6 ِ].
Code Point: 0xfe7c Fuzzed: [ ّsim4n6 ّ].
Code Point: 0xfe7e Fuzzed: [ ْsim4n6 ْ].
Code Point: 0xffe3 Fuzzed: [ ̄sim4n6 ̄].
Those code points can be used as trailing and leading in usernames and
result in whitespace, after Unicode normalization the way Django does it.
char = chr(code_point)
# Username input with potential trialing and leading
whitespace
username = char + "sim4n6" + char
# this is the way it is done by Django
s = unicodedata.normalize("NFKC", username.strip())
if contains_whitespace(s):
# Print the Unicode character and its code point for
demonstration purposes
print(f"Code Point: {hex(code_point)}\tResult: [{s}].")
With those code points, it is not only visually a space but a regular one.
Voila
PS: I wrote about this once: https://sim4n6.beehiiv.com/p/unicode-
characters-bypass-security-checks
--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:6>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/01070188c902f633-d5220cc6-3805-46fa-9265-72b737afb074-000000%40eu-central-1.amazonses.com.