2008/5/12 Yannick Warnier <[EMAIL PROTECTED]>:
>> Why are you removing the accents? Why not store/process the data as
>> UTF-8, which supports all the accents in all the languages, and even
>> non-latin languages. You mention Arabic, which does not use accented
>> latin characters (Maybe you are thinking of Turkish, Ubek or Tadjic).
>> UTF-8 supports Arabic, Russian, Greek, Latin including modified
>> accented letters, and almost everything else save CJK.
>>
>> What is your end goal? Why are you removing the accents?
>
> Hi Dotan,
>
> I'm trying to give a universally-manageable directory name to an item
> using a free-text title. I want to avoid every type of accentuated
> character and everything outside of pure ASCII to make it the most
> portable possible.
> Generating a random hash is not acceptable as we want to be the most
> user-friendly possible.

I suppose that is a good reason. I actually tried to come up with a
user case that justifies the removal of latin accents, and couldn't.
I'll remember that. Tell me, what are you doing with Hebrew, Russian,
Arabic, and other non-latin scripts? If you want, I have some code
that roughly transliterates Hebrew <-> Latin on the
http://gibberish.co.il website.

> I'm talking about Arabic not to remove accentuated characters, but in
> case there would be a transliteration function that allows me to turn an
> Arabic character into something similar in terms of pronunciation but in
> ASCII.

If it needs to be transliterated back to Arabic you will have fun with
the letter forms! I can give you code that does it for Hebrew, but
Hebrew only has 5 final letters, and no explicit first- or middle-
forms.

> So the goal is to create a directory name that is both intuitive *and*
> portable for the user and the admin. The title is kept for the user, but
> there is a generic shortened code that is generated following the given
> title.
> We used to ask for a title in a webform, but realised our users liked it
> much better when we give them the possibility to generate the code
> themselves, but generating one ourselves by default.
> I just realised that the developer who did it seemed to make it using
> html codes directly, so we end up with codes like "EACUTETEACUTE" for an
> item called "été", while "ETE" would be far better.
>
> Yannick
>
>

Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-נ-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Reply via email to