On 07/04/2010 17:21, luis jure wrote:
hello list.
i have a bunch of files with accented characters in their names, both
upper- and lower case. i want to rename them using the non-accented
equivalent. i thought that would be easy to do using something like tr.
big mistake. confronted with accented characters, tr outputs garbage.
searching the web, i found this: "Although the tr command respects C
locale environment variables, don't expect it to do anything sensible
with UTF-8 documents, such as being able to replace lower-case accented
characters with appropriate upper-case characters. The tr command works
best with ASCII and the other standard C locales."
i'm using es_UY.UTF8 and i can't make tr do anything useful.
It can be done with Perl. For example:
$ echo "El castellano es la lengua española oficial del Estado. Las
demás lenguas españolas serán también oficiales en las respectivas
Comunidades Autónomas" | perl -M'encoding utf8' -MUnicode::Normalize -pe
'$_=NFKD($_);s/\pM//og'
The following output should be seen:
El castellano es la lengua espanola oficial del Estado. Las demas
lenguas espanolas seran tambien oficiales en las respectivas Comunidades
Autonomas
Cheers,
--Kerin