Hi Gabriele,

If you have a copy of Lucene in Action 2, that may be the easiest place to read 
up on stopwords.  In short, when something is a stopword, it is just that 
stopword that gets removed and thus not indexed and thus when you search for 
it, it will not find a document that originally had that word.

Otis

P.S.
Yes, reply works better. :)
----

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Gabriele Kahlout <gabri...@mysimpatico.com>
>To: solr-user@lucene.apache.org; Otis Gospodnetic <otis_gospodne...@yahoo.com>
>Sent: Tuesday, September 27, 2011 6:43 PM
>Subject: Re: How to reserve ids?
>
>
>Otis,
>
>I'm following up on this as solving my problem though the stopwords mechanism 
>would be great. Do stopwords apply also to the url/id field?
>
>Continuing on the msn.com example, with "msn.com" as a stopword msn.com 
>webpage may still actually be indexed if neither the title nor the body 
>contains "msn.com". Isn't it?
>
>P.S.
>I just click on 'reply to all' (or reply on the phone). If it bothers you I'll 
>make the less lazy effort of selecting 'reply'
>
>
>On Tue, Sep 27, 2011 at 6:40 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> 
>wrote:
>
>Gabriele,
>>
>>Using "msn.com" as a stopword would simply mean that msn.com would not be 
>>indexed and therefore a search for "msn.com" would not yield results.  You 
>>could still search for "hotmail" and it may match documents that have 
>>"msn.com" token stored in them, even though "msn.com" is a stopword.
>>
>>Otis
>>
>>P.S.
>>No need to CC me, I'm on the list.
>>
>>----
>>Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>>________________________________
>>>From: Gabriele Kahlout <gabri...@mysimpatico.com>
>>>To: solr-user@lucene.apache.org; Otis Gospodnetic 
>>><otis_gospodne...@yahoo.com>
>>>Sent: Tuesday, September 27, 2011 1:58 AM
>>>Subject: Re: How to reserve ids?
>>
>>>
>>>I'm interested in the stopwords solution as it sounds like less work but i'm 
>>>not sure i understand how it works. By having msn.com as a stopword it 
>>>doesnt mean i wont get msn.com as a result for say 'hotmail'. My 
>>>understanding is that msn.com will never make it to the similarity function 
>>>and thus affect the score calculation. But seldom does the url anyway (in my 
>>>searches on content)!
>>>
>>>
>
>
>-- 
>Regards, 
>K. Gabriele
>
>--- unchanged since 20/9/10 ---
>P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt 
>within 48 hours then I don't resend the email.
>subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < 
>Now + 48h) ⇒ ¬resend(I, this).
>
>If an email is sent by a sender that is not a trusted contact or the email 
>does not contain a valid code then the email is not received. A valid code 
>starts with a hyphen and ends with "X".
>∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ 
>L(-[a-z]+[0-9]X)).
>
>
>
>

Reply via email to