John Machin wrote:
> Another point: there are many non-latin1 characters that could be
> mapped to ASCII. For example:
> u"\u0141ukasziewicz".translate(unaccented_map())
> doesn't work unless an entry is added to the no-decomposition table:
> 0x0141: u"L", # LATIN CAPITAL LETTER L WITH STROKE
>
> It looks like generating extra entries like that could be done, with
> the aid of unicodedata.name():
>
> LATIN CAPITAL LETTER X WITH blahblah -> "X"
> LATIN SMALL LETTER X WITH blahblah -> "X".lower()
>
> This would require a fair bit of care -- obviously there are special
> cases like LATIN CAPITAL LETTER O WITH STROKE. Eyeballing by regional
> experts is probably required.
see the comments over at
http://effbot.org/zone/unicode-convert.htm
for an extended table, eyeballed by a regional expert (and since he
makes the same point about OE vs Oe as you do, I'll probably have to
change the code ;-)
</F>
--
http://mail.python.org/mailman/listinfo/python-list