Right, sorry -- I'm gonna have to go with Eric on that, there are builtin
libraries that do just that (from unicodedata import normalize).


J. Leclanche / Adys


On Thu, Dec 31, 2009 at 1:30 AM, James Bennett <ubernost...@gmail.com>wrote:

> On Wed, Dec 30, 2009 at 5:05 PM, Jerome Leclanche <adys...@gmail.com>
> wrote:
> > When truncating characters, we are obviously talking about truncating
> just
> > that: characters. Truncating bytes is a behaviour implemented by |slice.
>
> You misunderstand: I'm not talking about bytes, I'm talking about
> composed and decomposed characters.
>
> For example, 'ΓΌ' can be represented as either:
>
> 1. 00fc  (LATIN SMALL LETTER U WITH DIARESIS), or
>
> 2. 0075 (LATIN SMALL LETTER U) *followed by* 0308 (COMBINING DIARESIS)
>
> Option 1 is composed, option 2 is decomposed and is actually *two
> Unicode characters*, not "two bytes", and so character-based slicing
> will chop off the combining diaresis. The only way to avoid this is to
> have the filter do Unicode normalization to composed characters (e.g.,
> normalization form NFC or NFKC).
>
>
> --
> "Bureaucrat Conrad, you are technically correct -- the best kind of
> correct."
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-develop...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-developers+unsubscr...@googlegroups.com<django-developers%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>
>
>

--

You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.


Reply via email to