Robert Muir: Thank you for the pointer to that paper! On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht <p...@activemath.org> wrote: > Isn't the conclusion here that some "stopword and stemming free matching" > should be the best match if ever and to then gently degrade to weaker forms > of matching? > > paul > > > Le 13-janv.-10 à 07:08, Walter Underwood a écrit : > >> There is a band named "The The". And a producer named "Don Was". For a >> list of all-stopword movie titles at Netflix, see this post: >> >> http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html >> >> My favorite is "To Be and To Have (Être et Avoir)", which is all stopwords >> in two languages. And a very good movie. >> >> wunder >> >> On Jan 12, 2010, at 6:55 PM, Robert Muir wrote: >> >>> sorry, i forgot to include this 2009 paper comparing what stopwords do >>> across 3 languages: >>> >>> >>> http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf >>> >>> in my opinion, if stopwords annoy your users for very special cases >>> like 'the the' then, instead consider using commongrams + >>> defaultsimilarity.discountOverlaps = true so that you still get the >>> benefits. >>> >>> as you can see from the above paper, they can be extremely important >>> depending on the language, they just don't matter so much for English. >>> >>> On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog <goks...@gmail.com> wrote: >>>> >>>> There are a lot of projects that don't use stopwords any more. You >>>> might consider dropping them altogether. >>>> >>>> On Mon, Jan 11, 2010 at 2:25 PM, Don Werve <d...@madwombat.com> wrote: >>>>> >>>>> This is the way I've implemented multilingual search as well. >>>>> >>>>> 2010/1/11 Markus Jelsma <mar...@buyways.nl> >>>>> >>>>>> Hello, >>>>>> >>>>>> >>>>>> We have implemented language specific search in Solr using language >>>>>> specific fields and field types. For instance, an en_text field type >>>>>> can >>>>>> use an English stemmer, and list of stopwords and synonyms. We, >>>>>> however >>>>>> did not use specific stopwords, instead we used one list shared by >>>>>> both >>>>>> languages. >>>>>> >>>>>> So you would have a field type like: >>>>>> <fieldType name="en_text" class="solr.TextField" ... >>>>>> <analyzer type=""> >>>>>> <filter class="solr.StopFilterFactory" words="stopwords.en.txt"> >>>>>> <filter class="solr.SynonymFilterFactory" synonyms="synoyms.en.txt"> >>>>>> >>>>>> etc etc. >>>>>> >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> - >>>>>> Markus Jelsma Buyways B.V. >>>>>> Technisch Architect Friesestraatweg 215c >>>>>> http://www.buyways.nl 9743 AD Groningen >>>>>> >>>>>> >>>>>> Alg. 050-853 6600 KvK 01074105 >>>>>> Tel. 050-853 6620 Fax. 050-3118124 >>>>>> Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 >>>>>> >>>>>> >>>>>> On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: >>>>>> >>>>>>> Hi Solr users. >>>>>>> >>>>>>> I'm trying to set up a site with Solr search integrated. And I use >>>>>>> the >>>>>>> SolJava API to feed the index with search documents. At the moment I >>>>>>> have only activated search on the English portion of the site. I'm >>>>>>> interested in using as many features of solr as possible. Synonyms, >>>>>>> Stopwords and stems all sounds quite interesting and useful but how >>>>>>> do >>>>>>> I set up this in a good way for a multilingual site? >>>>>>> >>>>>>> The site don't have a huge text mass so performance issues don't >>>>>>> really bother me but still I'd like to hear your suggestions before I >>>>>>> try to implement an solution. >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Daniel >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Lance Norskog >>>> goks...@gmail.com >>>> >>> >>> >>> >>> -- >>> Robert Muir >>> rcm...@gmail.com >>> >> > >
-- Lance Norskog goks...@gmail.com