Re: Multi language support

2010-01-13 Thread Lance Norskog
Robert Muir: Thank you for the pointer to that paper! On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht wrote: > Isn't the conclusion here that some "stopword and stemming free matching" > should be the best match if ever and to then gently degrade to  weaker forms > of matching? > > paul > > > Le

Re: Multi language support

2010-01-13 Thread Paul Libbrecht
Isn't the conclusion here that some "stopword and stemming free matching" should be the best match if ever and to then gently degrade to weaker forms of matching? paul Le 13-janv.-10 à 07:08, Walter Underwood a écrit : There is a band named "The The". And a producer named "Don Was". For

Re: Multi language support

2010-01-13 Thread Robert Muir
right, but we should not encourage users to significantly degrade overall relevance for all movies due to a few movies and a band (very special cases, as I said). In english, by not using stopwords, it doesn't really degrade relevance that much, so its a reasonable decision to make. This is not tr

Re: Multi language support

2010-01-12 Thread Walter Underwood
There is a band named "The The". And a producer named "Don Was". For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is "To Be and To Have (Être et Avoir)", which is all stopwords in two language

Re: Multi language support

2010-01-12 Thread Robert Muir
sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like 'th

Re: Multi language support

2010-01-12 Thread Robert Muir
I don't think this is something to consider across the board for all languages. The same grammatical units that are part of a word in one language (and removed by stemmers) are independent morphemes in others (and should be stopwords) so please take this advice on a case-by-case basis for each lan

Re: Multi language support

2010-01-12 Thread Lance Norskog
There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve wrote: > This is the way I've implemented multilingual search as well. > > 2010/1/11 Markus Jelsma > >> Hello, >> >> >> We have implemented langu

Re: Multi language support

2010-01-11 Thread Don Werve
This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma > Hello, > > > We have implemented language specific search in Solr using language > specific fields and field types. For instance, an en_text field type can > use an English stemmer, and list of stopwords and

Re: Multi language support

2010-01-11 Thread Markus Jelsma
Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both langu

Re: Multi-language support

2009-04-14 Thread Grant Ingersoll
On Apr 9, 2009, at 7:09 AM, revas wrote: Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. C