#33318: Truncator class recognizes different length of ellipsis(...) depending 
on
the LANGUAGE_CODE.
-------------------------------------+-------------------------------------
     Reporter:  YoungJoo Kim         |                    Owner:  nobody
         Type:  Bug                  |                   Status:  closed
    Component:                       |                  Version:  3.2
  Internationalization               |
     Severity:  Normal               |               Resolution:  invalid
     Keywords:  truncatechars        |             Triage Stage:
  Truncator ellipsis LANGUAGE_CODE   |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Description changed by YoungJoo Kim:

Old description:

> First, I am so sorry about my bad english skill..;;
>
> There is something strange about the `truncatechars` method of Django
> Template Language (DTL).
> In the `add_truncation_text` method of the Truncator class, the length of
> the `truncate` variable depends on the value of LANGUAGE_CODE(ex. 'en-
> us', 'ko-kr' ...) in settings.py
>
> [[br]]
>
> The `add_truncation_text` method of the Truncator class is as follows.
> {{{
> # django/utils/text.py
>
> def add_truncation_text(self, text, truncate=None):
>     if truncate is None:
>         truncate = pgettext(
>             'String to return when truncating text',
>             '%(truncated_text)s…')
>     if '%(truncated_text)s' in truncate:
>         return truncate % {'truncated_text': text}
>     # The truncation text didn't contain the %(truncated_text)s string
>     # replacement argument so just append it to the text.
>     if text.endswith(truncate):
>         # But don't append the truncation text if the current text
> already
>         # ends in this.
>         return text
>     return '%s%s' % (text, truncate)
> }}}
>
> The `truncate` variable is assigned the string `'%(truncated_text)s…'` by
> the `pgettext` method.
>
> In LANGUAGE_CODE `'en-us'(default)`, ellipsis is recognized as a string
> of length `1`.
> But in LANGUAGE_CODE `'ko-kr'`, ellipsis is recognized as three dots(...)
> and has length `3`.
>
> I think that the pgettext method is the cause.
>
> [[br]]
>
> So even though it is the same code, the output is different depending on
> the language.
>
> In the `chars` method of the same Truncator class, the number of strings
> to print ellipsis from is calculated through the `for` statement.
>
> The number of times this `for` loop is determined by the return value (==
> `truncate` variable) of the `add_truncation_text` method.
>
> {{{
> # django/utils/text.py
>
> def chars(self, num, truncate=None, html=False):
>     """
>     Return the text truncated to be no longer than the specified number
>     of characters.
>
>     `truncate` specifies what should be used to notify that the string
> has
>     been truncated, defaulting to a translatable string of an ellipsis.
>     """
>     self._setup()
>     length = int(num)
>     text = unicodedata.normalize('NFC', self._wrapped)
>
>     # Calculate the length to truncate to (max length - end_text length)
>     truncate_len = length
>     for char in self.add_truncation_text('', truncate):
>         if not unicodedata.combining(char):
>             truncate_len -= 1
>             if truncate_len == 0:
>                 break
>     if html:
>         return self._truncate_html(length, truncate, text, truncate_len,
> False)
>     return self._text_chars(length, truncate, text, truncate_len)
> }}}
>
> [[br]]
>
> In conclusion, output of the `truncatechars` method in HTML is as
> follows.
> {{{ <p>{{ fruit|truncatechars:6 }}</p> }}}
> 1. LANGUAGE_CODE = 'en-us' (default)
> {{{
> straw...
> pinea...
> }}}
> 2. LANGUAGE_CODE = 'ko-kr'
> {{{
> str...
> pin...
> }}}
>
> [[br]]
>
> Even if the language is different, the ellipsis should be recognized same
> as a string of length `1`.
>
> Users from other countries may misunderstand the functionality of the
> `truncatechars` method.
>
> Thank you!

New description:

 There is something strange about the `truncatechars` method of Django
 Template Language (DTL).
 In the `add_truncation_text` method of the Truncator class, the length of
 the `truncate` variable depends on the value of LANGUAGE_CODE(ex. 'en-us',
 'ko-kr' ...) in settings.py

 [[br]]

 The `add_truncation_text` method of the Truncator class is as follows.
 {{{
 # django/utils/text.py

 def add_truncation_text(self, text, truncate=None):
     if truncate is None:
         truncate = pgettext(
             'String to return when truncating text',
             '%(truncated_text)s…')
     if '%(truncated_text)s' in truncate:
         return truncate % {'truncated_text': text}
     # The truncation text didn't contain the %(truncated_text)s string
     # replacement argument so just append it to the text.
     if text.endswith(truncate):
         # But don't append the truncation text if the current text already
         # ends in this.
         return text
     return '%s%s' % (text, truncate)
 }}}

 The `truncate` variable is assigned the string `'%(truncated_text)s…'` by
 the `pgettext` method.

 In LANGUAGE_CODE `'en-us'(default)`, ellipsis is recognized as a string of
 length `1`.
 But in LANGUAGE_CODE `'ko-kr'`, ellipsis is recognized as three dots(...)
 and has length `3`.

 I think that the pgettext method is the cause.

 [[br]]

 So even though it is the same code, the output is different depending on
 the language.

 In the `chars` method of the same Truncator class, the number of strings
 to print ellipsis from is calculated through the `for` statement.

 The number of times this `for` loop is determined by the return value (==
 `truncate` variable) of the `add_truncation_text` method.

 {{{
 # django/utils/text.py

 def chars(self, num, truncate=None, html=False):
     """
     Return the text truncated to be no longer than the specified number
     of characters.

     `truncate` specifies what should be used to notify that the string has
     been truncated, defaulting to a translatable string of an ellipsis.
     """
     self._setup()
     length = int(num)
     text = unicodedata.normalize('NFC', self._wrapped)

     # Calculate the length to truncate to (max length - end_text length)
     truncate_len = length
     for char in self.add_truncation_text('', truncate):
         if not unicodedata.combining(char):
             truncate_len -= 1
             if truncate_len == 0:
                 break
     if html:
         return self._truncate_html(length, truncate, text, truncate_len,
 False)
     return self._text_chars(length, truncate, text, truncate_len)
 }}}

 [[br]]

 In conclusion, output of the `truncatechars` method in HTML is as follows.
 {{{ <p>{{ fruit|truncatechars:6 }}</p> }}}
 1. LANGUAGE_CODE = 'en-us' (default)
 {{{
 straw...
 pinea...
 }}}
 2. LANGUAGE_CODE = 'ko-kr'
 {{{
 str...
 pin...
 }}}

 [[br]]

 Even if the language is different, the ellipsis should be recognized same
 as a string of length `1`.

 Users from other countries may misunderstand the functionality of the
 `truncatechars` method.

 Thank you!

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33318#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.e6afbf3d15b7e0001bbce8637a0c11f6%40djangoproject.com.

Reply via email to