You might have a look at: http://www.basistech.com/lucene/
Am 12.04.2012 11:52, schrieb Michael Ludwig: > Given an input of "Windjacke" (probably "wind jacket" in English), I'd > like the code that prepares the data for the index (tokenizer etc) to > understand that this is a "Jacke" ("jacket") so that a query for "Jacke" > would include the "Windjacke" document in its result set. > > It appears to me that such an analysis requires a dictionary-backed > approach, which doesn't have to be perfect at all; a list of the most > common 2000 words would probably do the job and fulfil a criterion of > reasonable usefulness. > > Do you know of any implementation techniques or working implementations > to do this kind of lexical analysis for German language data? (Or other > languages, for that matter?) What are they, where can I find them? > > I'm sure there is something out (commercial or free) because I've seen > lots of engines grokking German and the way it builds words. > > Failing that, what are the proper terms do refer to these techniques so > you can search more successfully? > > Michael