#33218: slugify() can't handle Turkish İ while allow_unicode = True
--------------------------------------+-------------------------
Reporter: sowinski | Owner: nobody
Type: Bug | Status: new
Component: CSRF | Version: dev
Severity: Normal | Keywords: slugify
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
--------------------------------------+-------------------------
Please see the following example.
The first character **test_str = "i̇zmit"** is not a normal i. It is the
**İ** from the Turkish alphabet.
Using allow_unicode=True should keep the Turkish **İ** instead of
replacing it with a normal i.
{{{
import unicodedata
import re
def slugify(value, allow_unicode=False):
"""
Convert to ASCII if 'allow_unicode' is False. Convert spaces or
repeated
dashes to single dashes. Remove characters that aren't alphanumerics,
underscores, or hyphens. Convert to lowercase. Also strip leading and
trailing whitespace, dashes, and underscores.
"""
value = str(value)
if allow_unicode:
value = unicodedata.normalize('NFKC', value)
else:
value = unicodedata.normalize('NFKD', value).encode('ascii',
'ignore').decode('ascii')
value = re.sub(r'[^\w\s-]', '', value.lower())
return re.sub(r'[-\s]+', '-', value).strip('-_')
test_str = "i̇zmit"
output = slugify(test_str, allow_unicode = True)
print(test_str)
print(output)
print(test_str == output)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/33218>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/051.bbbcc35076d4b49c7dc987cad553e84b%40djangoproject.com.