Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : >> Some compounds probably should not be decompounded, like "Fahrrad" >> (farhren/Rad). With a dictionary-based stemmer, you might decide to >> avoid decompounding for words in the dictionary. > > Good point.
More or less, Fahrrad is generally abbreviated as Rad. (even though Rad can mean wheel and bike) >> Note that highlighting gets pretty weird when you are matching only >> part of a word. > > Guess it'll be a weird when you get it wrong, like "Noten" in > "Notentriegelung". This decomposition should not happen because Noten-triegelung does not have a correct second term. >> The Basis Technology linguistic analyzers aren't cheap or small, but >> they work well. > > We will consider our needs and options. Thanks for your thoughts. My question remains as to which domain it aims at covering. We had such need for mathematics texts... I would be pleasantly surprised if, for example, Differenzen-quotient would be decompounded. paul