Am 02.07.2014 um 19:25 schrieb Steve Litt <sl...@troubleshooters.com>:
> On Wed, 2 Jul 2014 18:40:18 +0200 > Bzzzz <lazyvi...@gmx.com> wrote: > >> On Wed, 2 Jul 2014 12:22:02 -0400 >> Steve Litt <sl...@troubleshooters.com> wrote: > >>> If worst comes to worst and I can't find a way to get grep to do >>> this, I'll just put together a substitution table, >>> convert /usr/share/dict/words to words.ascii, line for line, search >>> words.ascii, get the line number, and pull that line out of words. >>> Crude, but effective. >> >> AFAIK, this is the only way to be able to perform what you want. >> > > So then, the question becomes, where does there exist a list of common > letters that are, for want of a better word, "ornamented ascii"? > Umlauts, Carats, Circles, Grave accents, etc. This is a known problem without perfect solution. Some years ago I wrote a Perl module for this: https://metacpan.org/pod/Text::Undiacritic DESCRIPTION Changes characters with diacritics into their base characters. Also changes into base character in cases where UNICODE does not provide a decomposition. E.g. all characters '... WITH STROKE' like 'LATIN SMALL LETTER L WITH STROKE' do not have a decomposition. In the latter case the result will be 'LATIN SMALL LETTER L'. Removing diacritics is useful for matching text independent of spelling variants. But a more general approach would be to use some sort of approximate matching via calculating a similarity coefficient and displaying the best matching strings. See e.g. here: https://metacpan.org/release/Set-Similarity https://metacpan.org/pod/String::Similarity http://www.chokkan.org/software/simstring/ HTH Helmut Wollmersdorfer -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/8b5f736b-8417-4717-8b98-fa81369c3...@amodelo.de