Hi Gabriele, If you have a copy of Lucene in Action 2, that may be the easiest place to read up on stopwords. In short, when something is a stopword, it is just that stopword that gets removed and thus not indexed and thus when you search for it, it will not find a document that originally had that word.
Otis P.S. Yes, reply works better. :) ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ >________________________________ >From: Gabriele Kahlout <gabri...@mysimpatico.com> >To: solr-user@lucene.apache.org; Otis Gospodnetic <otis_gospodne...@yahoo.com> >Sent: Tuesday, September 27, 2011 6:43 PM >Subject: Re: How to reserve ids? > > >Otis, > >I'm following up on this as solving my problem though the stopwords mechanism >would be great. Do stopwords apply also to the url/id field? > >Continuing on the msn.com example, with "msn.com" as a stopword msn.com >webpage may still actually be indexed if neither the title nor the body >contains "msn.com". Isn't it? > >P.S. >I just click on 'reply to all' (or reply on the phone). If it bothers you I'll >make the less lazy effort of selecting 'reply' > > >On Tue, Sep 27, 2011 at 6:40 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> >wrote: > >Gabriele, >> >>Using "msn.com" as a stopword would simply mean that msn.com would not be >>indexed and therefore a search for "msn.com" would not yield results. You >>could still search for "hotmail" and it may match documents that have >>"msn.com" token stored in them, even though "msn.com" is a stopword. >> >>Otis >> >>P.S. >>No need to CC me, I'm on the list. >> >>---- >>Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>Lucene ecosystem search :: http://search-lucene.com/ >> >> >>>________________________________ >>>From: Gabriele Kahlout <gabri...@mysimpatico.com> >>>To: solr-user@lucene.apache.org; Otis Gospodnetic >>><otis_gospodne...@yahoo.com> >>>Sent: Tuesday, September 27, 2011 1:58 AM >>>Subject: Re: How to reserve ids? >> >>> >>>I'm interested in the stopwords solution as it sounds like less work but i'm >>>not sure i understand how it works. By having msn.com as a stopword it >>>doesnt mean i wont get msn.com as a result for say 'hotmail'. My >>>understanding is that msn.com will never make it to the similarity function >>>and thus affect the score calculation. But seldom does the url anyway (in my >>>searches on content)! >>> >>> > > >-- >Regards, >K. Gabriele > >--- unchanged since 20/9/10 --- >P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt >within 48 hours then I don't resend the email. >subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < >Now + 48h) ⇒ ¬resend(I, this). > >If an email is sent by a sender that is not a trusted contact or the email >does not contain a valid code then the email is not received. A valid code >starts with a hyphen and ends with "X". >∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ >L(-[a-z]+[0-9]X)). > > > >