Andy,

This will help with smooth injection of your multilingual documents into Solr 
(multilingual either in the sense of 1 doc containing fields in multiple 
languages or 1 index containing documents in different languages):

  http://sematext.com/products/multilingual-indexer/index.html

Re your other question about open-source morpho dictionaries - I don't know of 
any.  Last time I looked for dictionaries I learned that they cost money.  That 
said, the market for datasets is starting to grow, so you may be able to find 
more and cheaper dictionaries now.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Andy <angelf...@yahoo.com>
> To: solr-user@lucene.apache.org
> Sent: Mon, April 19, 2010 8:45:40 AM
> Subject: Re: LucidWorks Solr
> 
> Thanks for the explanation Mitch.

You're right. There can't be universal 
> stemmers.

What about multi-language stemmers? I'm mostly interested in 
> English, Spanish, German, French, Italian. Are there any stemmers that would 
> handle those languages?

If not, what's the recommended way to deal with 
> documents in multiple languages?

--- On Mon, 4/19/10, MitchK <
> ymailto="mailto:mitc...@web.de"; 
> href="mailto:mitc...@web.de";>mitc...@web.de> wrote:

> From: 
> MitchK <
> href="mailto:mitc...@web.de";>mitc...@web.de>
> Subject: Re: 
> LucidWorks Solr
> To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
> 
> Date: Monday, April 19, 2010, 4:36 AM
> 
> Andy, I think it is 
> important to know what a stemmer really
> is.
> 
> It reduces 
> words to their infinitves. Those infinitives do
> not refer to the
> 
> real infinitive everytime, but however: for the system, it
> is an 
> infinitive,
> since all its derivates could be reduced to the same 
> form.
> Thats a stemmer.
> 
> According to this, there can't 
> exist a stemmer for every
> language, because
> every language has 
> got its own rules of how to reduce a
> word to its
> 
> infinitive.
> 
> If you apply a stemmer for english language on a 
> german
> document, the
> results might be unexpected. However, 
> sometimes it still
> works good enough. 
> 
> Keep in mind 
> that this is an algorithm. It is not important
> whether the
> 
> created infinitive is the real infinitive. It is only
> important that 
> most of
> the derivate forms can be reduced to the same basic 
> form.
> Please ask, if
> something is not clear.
> 
> 
> KStem:
> The wiki[1] says that KStem is less aggressive as the
> 
> standard stemmer.
> I guess that this means that there are more rules for 
> how
> to reduce a word
> to its infinitive and according to this the 
> results might
> be better.
> 
> 
> [1] 
> href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem"; 
> target=_blank 
> >http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> 
> Kind regards
> - Mitch
> -- 
> View this message in 
> context: 
> target=_blank 
> >http://n3.nabble.com/LucidWorks-Solr-tp727341p729110.html
> Sent from 
> the Solr - User mailing list archive at
> Nabble.com.
> 
> 

Reply via email to