Andy, This will help with smooth injection of your multilingual documents into Solr (multilingual either in the sense of 1 doc containing fields in multiple languages or 1 index containing documents in different languages):
http://sematext.com/products/multilingual-indexer/index.html Re your other question about open-source morpho dictionaries - I don't know of any. Last time I looked for dictionaries I learned that they cost money. That said, the market for datasets is starting to grow, so you may be able to find more and cheaper dictionaries now. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Andy <angelf...@yahoo.com> > To: solr-user@lucene.apache.org > Sent: Mon, April 19, 2010 8:45:40 AM > Subject: Re: LucidWorks Solr > > Thanks for the explanation Mitch. You're right. There can't be universal > stemmers. What about multi-language stemmers? I'm mostly interested in > English, Spanish, German, French, Italian. Are there any stemmers that would > handle those languages? If not, what's the recommended way to deal with > documents in multiple languages? --- On Mon, 4/19/10, MitchK < > ymailto="mailto:mitc...@web.de" > href="mailto:mitc...@web.de">mitc...@web.de> wrote: > From: > MitchK < > href="mailto:mitc...@web.de">mitc...@web.de> > Subject: Re: > LucidWorks Solr > To: > href="mailto:solr-user@lucene.apache.org">solr-user@lucene.apache.org > > Date: Monday, April 19, 2010, 4:36 AM > > Andy, I think it is > important to know what a stemmer really > is. > > It reduces > words to their infinitves. Those infinitives do > not refer to the > > real infinitive everytime, but however: for the system, it > is an > infinitive, > since all its derivates could be reduced to the same > form. > Thats a stemmer. > > According to this, there can't > exist a stemmer for every > language, because > every language has > got its own rules of how to reduce a > word to its > > infinitive. > > If you apply a stemmer for english language on a > german > document, the > results might be unexpected. However, > sometimes it still > works good enough. > > Keep in mind > that this is an algorithm. It is not important > whether the > > created infinitive is the real infinitive. It is only > important that > most of > the derivate forms can be reduced to the same basic > form. > Please ask, if > something is not clear. > > > KStem: > The wiki[1] says that KStem is less aggressive as the > > standard stemmer. > I guess that this means that there are more rules for > how > to reduce a word > to its infinitive and according to this the > results might > be better. > > > [1] > href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem" > target=_blank > >http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > > Kind regards > - Mitch > -- > View this message in > context: > target=_blank > >http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html > Sent from > the Solr - User mailing list archive at > Nabble.com. > >