RE: Searching words with spaces for word without spaces in solr

Dyer, James Wed, 30 Jul 2014 08:49:13 -0700

In addition to the analyzer configuration you're using, you might want to also 
use WordBreakSolrSpellChecker to catch possible matches that can't easily be 
solved through analysis.  For more information, see the section for it at 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking


James Dyer
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: sunshine glass [mailto:sunshineglassof2...@gmail.com] 
Sent: Wednesday, July 30, 2014 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching words with spaces for word without spaces in solr

This is the new configuration:

    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>           <filter class="solr.SynonymFilterFactory"
> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> expand="true"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </fieldType>
>
>
These are current docs in my index:

<result name="response" numFound="3" start="0">
<doc>
<str name="id">2</str>
<str name="title">Icecream</str>
<long name="_version_">1475063961342705664</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
</result>
</response>

Query:
http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true

Response:

<result name="response" numFound="2" start="0">
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
</result>
<lst name="debug">
<str name="rawquerystring">title:ice cream</str>
<str name="querystring">title:ice cream</str>
<str name="parsedquery">
(+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
</str>
<str name="parsedquery_toString">+(title:ice (title:cream))</str>
<lst name="explain">
<str name="1">
0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
[DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
maxDocs=3) 0.4375 = fieldNorm(doc=0)
</str>
<str name="3">
0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
[DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
fieldNorm(doc=2)
</str>
</lst>

Still not working ????


On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I'd spend some time with the admin/analysis page to understand the exact
> tokenization going on here. For instance, sequencing the
> shinglefilterfactory before worddelimiterfilterfactory may produce
> "interesting" resutls. And then throwing the Snowball factory at it and
> putting synonyms in front.... I suspect you're not indexing or searching
> what you think you are.
>
> Second, what happens when you query with &debug=query? That'll show you
> what the search string looks like.
>
> If that doesn't help, please post the results of looking at those things
> here, that'll provide some information for us to work with.
>
> Best,
> Erick
>
>
> On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> sunshineglassof2...@gmail.com> wrote:
>
> > Hi Folks,
> >
> > Any updates ??
> >
> >
> > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > sunshineglassof2...@gmail.com> wrote:
> >
> > > Dear Team,
> > >
> > > How can I handle compound word searches in solr ?.
> > > How can i search "hand bag" if I have "handbag" in my index. While
> using
> > > shingle in query analyzer, the query "ice cube" creates three tokens as
> > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > "icecubes".i.e not working for pair though I am using shingle filter.
> > >
> > > Here's the schema config.
> > >
> > >
> > >    1.  <fieldType name="text" class="solr.TextField"
> > >    positionIncrementGap="100">
> > >    2.       <analyzer type="index">
> > >    3.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    6.          <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > preserveOriginal="1"
> > >    generateWordParts="1" generateNumberParts="1"/>
> > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    10.       </analyzer>
> > >    11.       <analyzer type="query">
> > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    13.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > >    14.         <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > >    preserveOriginal="1"/>
> > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    18.       </analyzer>
> > >    19.     </fieldType>
> > >
> > >    Any help is appreciated.
> > >
> > >
> >
>

RE: Searching words with spaces for word without spaces in solr

Reply via email to