Re: Solr Quoted search confusions

Chris Male Fri, 21 Aug 2009 01:00:29 -0700

Hi,

I think the cause of the problem is the WordDelimiterFilterFactory.  With
your current configuration indexing i-like results in 3 terms being indexed
- i, like and ilike.  Then when you query for ilike, you match the 3rd
term.  The term ilike is created by the WordDelimiterFilter due to the
catenateWords="1" configuration.  When I change this to 0 only i and like
are created, hence ilike no longer matches i-like.


Hope that fixes your problem.

Thanks,
Chris

On Fri, Aug 21, 2009 at 7:16 AM, Vannia Rajan <kvanniara...@gmail.com>wrote:

> Hi,
>
> On Thu, Aug 20, 2009 at 9:13 PM, Chris Male <gento...@gmail.com> wrote:
>
> > Hi,
> >
> > What analyzers/filters have you configured for the field that you are
> > searching? One could be causing the various versions of "ilike" to be
> > indexed the same way.
> >
>
>   I'm using "text" field with the following analyzers / filters for the
> field "description" (which has various forms of word "ilike":
>
>        <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.StopFilterFactory"
>                        ignoreCase="true"
>                        words="stopwords.txt"
>                        enablePositionIncrements="true"
>                        />
>                <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>                <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>            </analyzer>
>        </fieldType>
>
>
> Is there anything that i could tune here to get the intended results?
>
>
> >
> > Thanks
> > Chris
> >
> > On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan <kvanniara...@gmail.com
> > >wrote:
> >
> > > Hi,*
> > >
> > >   *I need some help to clarify how solr indexes documents. I have 6
> > > documents with various forms of the word "ilike" (complete word and not
> > "i
> > > like") - one having "ilike" as such and others having a special
> character
> > > in
> > > between "i" and "like".
> > >
> > >   What i expected from solr is that, when i do a Quoted search "ilike",
> > it
> > > should return only the document that had "ilike" exactly. But, what i
> get
> > > from solr is that various forms of the word "ilike" are also included
> in
> > > the
> > > results. Is there an option/configuration that i can do to solr so that
> i
> > > will get only the result with exact word "ilike"?
> > > *
> > >
> > >  The result i obtained from solr is shown below,
> > >
> > > http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> > > <response>
> > > -
> > > <lst name="responseHeader">
> > > <int name="status">0</int>
> > > <int name="QTime">20</int>
> > > -
> > > <lst name="params">
> > > <str name="fl">description,score</str>
> > > <str name="q">"ilike"</str>
> > > </lst>
> > > </lst>
> > > -
> > > <result name="response" numFound="5" start="0" maxScore="0.5">
> > > -
> > > <doc>
> > > <float name="score">0.5</float>
> > > <str name="description">Ilike company is doing great!</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.375</float>
> > > <str name="description">I:like company is doing great!</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.3125</float>
> > > <str name="description">I-like it very much. Really, this can come
> > > up!.</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.3125</float>
> > > <str name="description">I;like it very much. Really, i say.</str>
> > > </doc>
> > > -
> > > <doc>
> > > <float name="score">0.25</float>
> > > -
> > > <str name="description">
> > > i.like it very much. full stop can come? i don't know.
> > > </str>
> > > </doc>
> > > </result>
> > > </response*
> > >
> > > --
> > > Thanks,
> > > Vanniarajan
> > >
> >
>
>
>
> --
> Thanks,
> Vanniarajan
>

Re: Solr Quoted search confusions

Reply via email to