Robert Muir: Thank you for the pointer to that paper!

On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht <p...@activemath.org> wrote:
> Isn't the conclusion here that some "stopword and stemming free matching"
> should be the best match if ever and to then gently degrade to  weaker forms
> of matching?
>
> paul
>
>
> Le 13-janv.-10 à 07:08, Walter Underwood a écrit :
>
>> There is a band named "The The". And a producer named "Don Was". For a
>> list of all-stopword movie titles at Netflix, see this post:
>>
>> http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html
>>
>> My favorite is "To Be and To Have (Être et Avoir)", which is all stopwords
>> in two languages. And a very good movie.
>>
>> wunder
>>
>> On Jan 12, 2010, at 6:55 PM, Robert Muir wrote:
>>
>>> sorry, i forgot to include this 2009 paper comparing what stopwords do
>>> across 3 languages:
>>>
>>>
>>> http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf
>>>
>>> in my opinion, if stopwords annoy your users for very special cases
>>> like 'the the' then, instead consider using commongrams +
>>> defaultsimilarity.discountOverlaps = true so that you still get the
>>> benefits.
>>>
>>> as you can see from the above paper, they can be extremely important
>>> depending on the language, they just don't matter so much for English.
>>>
>>> On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog <goks...@gmail.com> wrote:
>>>>
>>>> There are a lot of projects that don't use stopwords any more. You
>>>> might consider dropping them altogether.
>>>>
>>>> On Mon, Jan 11, 2010 at 2:25 PM, Don Werve <d...@madwombat.com> wrote:
>>>>>
>>>>> This is the way I've implemented multilingual search as well.
>>>>>
>>>>> 2010/1/11 Markus Jelsma <mar...@buyways.nl>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>
>>>>>> We have implemented language specific search in Solr using language
>>>>>> specific fields and field types. For instance, an en_text field type
>>>>>> can
>>>>>> use an English stemmer, and list of stopwords and synonyms. We,
>>>>>> however
>>>>>> did not use specific stopwords, instead we used one list shared by
>>>>>> both
>>>>>> languages.
>>>>>>
>>>>>> So you would have a field type like:
>>>>>> <fieldType name="en_text" class="solr.TextField" ...
>>>>>> <analyzer type="">
>>>>>> <filter class="solr.StopFilterFactory" words="stopwords.en.txt">
>>>>>> <filter class="solr.SynonymFilterFactory" synonyms="synoyms.en.txt">
>>>>>>
>>>>>> etc etc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> -
>>>>>> Markus Jelsma          Buyways B.V.
>>>>>> Technisch Architect    Friesestraatweg 215c
>>>>>> http://www.buyways.nl  9743 AD Groningen
>>>>>>
>>>>>>
>>>>>> Alg. 050-853 6600      KvK  01074105
>>>>>> Tel. 050-853 6620      Fax. 050-3118124
>>>>>> Mob. 06-5025 8350      In: http://www.linkedin.com/in/markus17
>>>>>>
>>>>>>
>>>>>> On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:
>>>>>>
>>>>>>> Hi Solr users.
>>>>>>>
>>>>>>> I'm trying to set up a site with Solr search integrated. And I use
>>>>>>> the
>>>>>>> SolJava API to feed the index with search documents. At the moment I
>>>>>>> have only activated search on the English portion of the site. I'm
>>>>>>> interested in using as many features of solr as possible. Synonyms,
>>>>>>> Stopwords and stems all sounds quite interesting and useful but how
>>>>>>> do
>>>>>>> I set up this in a good way for a multilingual site?
>>>>>>>
>>>>>>> The site don't have a huge text mass so performance issues don't
>>>>>>> really bother me but still I'd like to hear your suggestions before I
>>>>>>> try to implement an solution.
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> Daniel
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goks...@gmail.com
>>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcm...@gmail.com
>>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to