Re: Lexical analysis tools for German language data

Bernd Fehling Thu, 12 Apr 2012 03:46:32 -0700

You might have a look at:
http://www.basistech.com/lucene/



Am 12.04.2012 11:52, schrieb Michael Ludwig:
> Given an input of "Windjacke" (probably "wind jacket" in English), I'd
> like the code that prepares the data for the index (tokenizer etc) to
> understand that this is a "Jacke" ("jacket") so that a query for "Jacke"
> would include the "Windjacke" document in its result set.
> 
> It appears to me that such an analysis requires a dictionary-backed
> approach, which doesn't have to be perfect at all; a list of the most
> common 2000 words would probably do the job and fulfil a criterion of
> reasonable usefulness.
> 
> Do you know of any implementation techniques or working implementations
> to do this kind of lexical analysis for German language data? (Or other
> languages, for that matter?) What are they, where can I find them?
> 
> I'm sure there is something out (commercial or free) because I've seen
> lots of engines grokking German and the way it builds words.
> 
> Failing that, what are the proper terms do refer to these techniques so
> you can search more successfully?
> 
> Michael

Re: Lexical analysis tools for German language data

Reply via email to