Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 9:00 AM, Paul Libbrecht wrote: > More or less, Fahrrad is generally abbreviated as Rad. > (even though Rad can mean wheel and bike) A synonym could handle this, since "farhren" would not be a good match. It is judgement call, but this seems more like an equivalence "Fahrrad =

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
On Thursday 12 April 2012 18:00:14 Paul Libbrecht wrote: > Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : > >> Some compounds probably should not be decompounded, like "Fahrrad" > >> (farhren/Rad). With a dictionary-based stemmer, you might decide to > >> avoid decompounding for words in the dic

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 8:46 AM, Michael Ludwig wrote: > I remember from my linguistics studies that the terminus technicus for > these is "Fugenmorphem" (interstitial or joint morpheme). That is some excellent linguistic jargon. I'll file that with "hapax legomenon". If you don't highlight, you ca

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : >> Some compounds probably should not be decompounded, like "Fahrrad" >> (farhren/Rad). With a dictionary-based stemmer, you might decide to >> avoid decompounding for words in the dictionary. > > Good point. More or less, Fahrrad is generally ab