On 07/04/2010 17:21, luis jure wrote:

hello list.

i have a bunch of files with accented characters in their names, both
upper- and lower case. i want to rename them using the non-accented
equivalent. i thought that would be easy to do using something like tr.
big mistake. confronted with accented characters, tr outputs garbage.

searching the web, i found this: "Although the tr command respects C
locale environment variables, don't expect it to do anything sensible
with UTF-8 documents, such as being able to replace lower-case accented
characters with appropriate upper-case characters. The tr command works
best with ASCII and the other standard C locales."

i'm using es_UY.UTF8 and i can't make tr do anything useful.

It can be done with Perl. For example:

$ echo "El castellano es la lengua española oficial del Estado. Las demás lenguas españolas serán también oficiales en las respectivas Comunidades Autónomas" | perl -M'encoding utf8' -MUnicode::Normalize -pe '$_=NFKD($_);s/\pM//og'

The following output should be seen:

El castellano es la lengua espanola oficial del Estado. Las demas lenguas espanolas seran tambien oficiales en las respectivas Comunidades Autonomas

Cheers,

--Kerin


Reply via email to