sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages:
http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like 'the the' then, instead consider using commongrams + defaultsimilarity.discountOverlaps = true so that you still get the benefits. as you can see from the above paper, they can be extremely important depending on the language, they just don't matter so much for English. On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog <goks...@gmail.com> wrote: > There are a lot of projects that don't use stopwords any more. You > might consider dropping them altogether. > > On Mon, Jan 11, 2010 at 2:25 PM, Don Werve <d...@madwombat.com> wrote: >> This is the way I've implemented multilingual search as well. >> >> 2010/1/11 Markus Jelsma <mar...@buyways.nl> >> >>> Hello, >>> >>> >>> We have implemented language specific search in Solr using language >>> specific fields and field types. For instance, an en_text field type can >>> use an English stemmer, and list of stopwords and synonyms. We, however >>> did not use specific stopwords, instead we used one list shared by both >>> languages. >>> >>> So you would have a field type like: >>> <fieldType name="en_text" class="solr.TextField" ... >>> <analyzer type=""> >>> <filter class="solr.StopFilterFactory" words="stopwords.en.txt"> >>> <filter class="solr.SynonymFilterFactory" synonyms="synoyms.en.txt"> >>> >>> etc etc. >>> >>> >>> >>> Cheers, >>> >>> - >>> Markus Jelsma Buyways B.V. >>> Technisch Architect Friesestraatweg 215c >>> http://www.buyways.nl 9743 AD Groningen >>> >>> >>> Alg. 050-853 6600 KvK 01074105 >>> Tel. 050-853 6620 Fax. 050-3118124 >>> Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 >>> >>> >>> On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: >>> >>> > Hi Solr users. >>> > >>> > I'm trying to set up a site with Solr search integrated. And I use the >>> > SolJava API to feed the index with search documents. At the moment I >>> > have only activated search on the English portion of the site. I'm >>> > interested in using as many features of solr as possible. Synonyms, >>> > Stopwords and stems all sounds quite interesting and useful but how do >>> > I set up this in a good way for a multilingual site? >>> > >>> > The site don't have a huge text mass so performance issues don't >>> > really bother me but still I'd like to hear your suggestions before I >>> > try to implement an solution. >>> > >>> > Best regards >>> > >>> > Daniel >>> >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Robert Muir rcm...@gmail.com