Re: searching exact phrase with stop word returns bad results

Ahmet Arslan Wed, 13 Mar 2013 05:01:03 -0700

Hi,

You need an analyzer that injects these five tokens in your example:


john....@gmail.com => john doe @ gmail com

If you use autoGeneratePhraseQueries = true, then all of your three needs will 
be satisfied. Don't use quotes in your query. Just q=@gmail.com not 
q="@gmail.com "

I would go with custom tokenizer in your case but it could be simulated using 
MappingCharFilter with WhitespaceTokenizer.

"." => " "
"@" => " @ "

    <!-- charFilter + WhitespaceTokenizer  -->
    
    <fieldType name="text_char_norm" class="solr.TextField" 
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>
    

--- On Wed, 3/13/13, adfel70 <adfe...@gmail.com> wrote:

> From: adfel70 <adfe...@gmail.com>
> Subject: Re: searching exact phrase with stop word returns bad results
> To: solr-user@lucene.apache.org
> Date: Wednesday, March 13, 2013, 11:54 AM
> Am I the first needing this
> behaivour?
> Have you seen any set of tokenizer-filters for a similar
> requirement?
> 
> 
> 
> Upayavira wrote
> > Exact phrase search isn't exact phrase search as you
> are thinking of it.
> > A phrase search for "foo bar" searches for the terms
> foo and bar, and
> > then checks whether they are one position apart. If
> punctuation has been
> > removed during analysis, it *cannot* play a part in a
> search of any
> > kind.
> > 
> > You may be able to achieve what you want with a
> PatternTokenizer rather
> > than whitespace and removing the
> WordDelimiterFilterFactory.
> > 
> > Upayavira
> > 
> > On Wed, Mar 13, 2013, at 08:41 AM, adfel70 wrote:
> >> I want the following behaivour.
> >> if "
> 
> > john.doe@
> 
> > " is indexed to the field
> >> 1. searching 'john' or 'doe' or 'gmail.com' will
> retreive the doc.
> >> 2. searching '"@gmail.com' will retreive the doc.
> >> 3. searching '"gmail.com@"' will not retreive the
> doc.
> >> 
> >> All I can accomplish, but 3. 
> >> because the word delimiter removes '@', when I
> search "@gmail.com" or
> >> "gmail.com@" its like searching "gmail.com" which
> causes unrequired
> >> results. 
> >> This is an exact phrase search, so I would expect
> only docs with the
> >> exact
> >> phrase I search (including punctuations ) to be
> retrieved.
> >> 
> >> How can I achieve this?
> >> 
> >> Thanks.
> >> 
> >> 
> >> 
> >> Jack Krupansky-2 wrote
> >> > The Word Delimiter Filter will remove all
> punctuation characters. That
> >> is 
> >> > its function.
> >> > 
> >> > Maybe you should first describe in simple
> English what your token/term
> >> > rules 
> >> > are, and then it would be more clear what
> tokenizer and filters would
> >> be 
> >> > most appropriate.
> >> > 
> >> > -- Jack Krupansky
> >> > 
> >> > -----Original Message----- 
> >> > From: adfel70
> >> > Sent: Tuesday, March 12, 2013 3:14 AM
> >> > To: 
> >> 
> >> > solr-user@.apache
> >> 
> >> > Subject: Re: searching exact phrase with stop
> word returns bad results
> >> > 
> >> > I see that there is not token with @.
> >> > the question  is why.
> >> > this is my field type:
> >> > 
> > <fieldtype name="email_type" class="solr.TextField"
> >>
> >  > positionIncrementGap="100"
> autoGeneratePhraseQueries="false"
> >> > omitNorms="true">
> >> >       
> >> > 
> > <analyzer>
> >> > 
> > <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
> >> >           
> >> > 
> > <filter class="solr.LowerCaseFilterFactory"/>
> >> >           
> >> > 
> > <filter class="solr.WordDelimiterFilterFactory"
> >>
> >  > preserveOriginal="1" generateWordParts="1"
> generateNumberParts="1"
> >> > catenateWords="0" catenateNumbers="0"
> catenateAll="0"
> >> > splitOnCaseChange="0"/>
> >> >       
> >> > 
> > </analyzer>
> >> >     
> >> > 
> > </fieldtype>
> >> > any idea?
> >> > 
> >> > 
> >> > 
> >> > Erick Erickson wrote
> >> >> Take a look at admin/analysis for the
> field in question, feed it
> >> values
> >> >> and
> >> >> see how they are tokenized. My guess is
> that the token in the index is
> >> > 
> >> >> abc@
> >> > 
> >> >>  (single token), which of course
> won't match the fragment "@
> >> >> gmail.com" (assuming gmail.com@ is a
> typo)...
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >>
> >> >> On Wed, Mar 6, 2013 at 5:43 AM, adfel70
> &lt;
> >> > 
> >> >> adfel70@
> >> > 
> >> >> &gt; wrote:
> >> >>
> >> >>> Hi
> >> >>>
> >> >>> I have emails indexed with the default
> text_general fieldType.
> >> >>>
> >> >>> I find that if the email "
> >> > 
> >> >> abc@
> >> > 
> >> >> " is indexed, and I search for
> >> >>> "gmail.com@" (exact phrase search) I
> can a result, while I should not
> >> >>> get
> >> >>> one.
> >> >>>
> >> >>> Any idea how to solve this?
> >> >>>
> >> >>> thanks.
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> View this message in context:
> >> >>>
> >> http://lucene.472066.n3.nabble.com/searching-exact-phrase-with-stop-word-returns-bad-results-tp4045180.html
> >> >>> Sent from the Solr - User mailing list
> archive at Nabble.com.
> >> >>>
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > --
> >> > View this message in context: 
> >> >
> >> http://lucene.472066.n3.nabble.com/searching-exact-phrase-with-stop-word-returns-bad-results-tp4045180p4046560.html
> >> > Sent from the Solr - User mailing list archive
> at Nabble.com.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/searching-exact-phrase-with-stop-word-returns-bad-results-tp4045180p4046904.html
> >> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/searching-exact-phrase-with-stop-word-returns-bad-results-tp4045180p4046926.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>

Re: searching exact phrase with stop word returns bad results

Reply via email to