On Thu, Dec 03, 2009 at 08:50:08PM +0100, Renat Golubchyk wrote:

> I'd suggest you use a unicode library. BTW, what about cyrillic
> letters or other alphabets? Those may have nothing to do with ASCII. Or
> is your project restricted to latin letters?

The data is already in normalized Unicode.  My problem is eliminating
errors from near misses :-( Cyrillic doesn't look like the same
problem -- no accents that I can see.  Chinese, Japanese, etc, same as
far as I know.  Arabic has lots of tricks on combining letters and
leaving out vowels, so it is probably an entirely different problem.

One thing I did not make clear is that this is for place names only,
like cities and whatever the equivalent of a US state or Canadian
province is, such as Busingen.

So do people type in Busingen different ways depending on how they
feel, do some people always leave off the umlaut, do some always use
it?  My biggest annoyance is that a lot of the google results come
from Americans full of theory about languages they only know from the
W3C recommendations.  Maybe email or real documents follow proper
usage much more closely than addresses on a web form, but I don't care
about them.  Maybe web forms in Germany, where they want a district,
do as many web sites do in English and have a menu of possible
districts, in which case no one types in umlauts anyway :-)

-- 
            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
     Felix Finch: scarecrow repairman & rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o

Reply via email to