Re: Searching words with spaces for word without spaces in solr

sunshine glass Wed, 30 Jul 2014 07:55:37 -0700

This is the analysis page:



Please help me now.



On Wed, Jul 30, 2014 at 8:08 PM, sunshine glass <
sunshineglassof2...@gmail.com> wrote:

> This is the new configuration:
>
>     <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>
>>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>         <tokenizer class="solr.StandardTokenizerFactory"/>
>>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
>> outputUnigrams="true" tokenSeparator=""/>
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>           <filter class="solr.SynonymFilterFactory"
>> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
>> expand="true"/>
>>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.StandardTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />
>>
>>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
>> outputUnigrams="true" tokenSeparator=""/>
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </fieldType>
>>
>>
> These are current docs in my index:
>
> <result name="response" numFound="3" start="0">
> <doc>
> <str name="id">2</str>
> <str name="title">Icecream</str>
> <long name="_version_">1475063961342705664</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> </result>
> </response>
>
> Query:
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
>
> Response:
>
> <result name="response" numFound="2" start="0">
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> </result>
> <lst name="debug">
> <str name="rawquerystring">title:ice cream</str>
> <str name="querystring">title:ice cream</str>
> <str name="parsedquery">
> (+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
> </str>
> <str name="parsedquery_toString">+(title:ice (title:cream))</str>
> <lst name="explain">
> <str name="1">
> 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> maxDocs=3) 0.4375 = fieldNorm(doc=0)
> </str>
> <str name="3">
> 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
> [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
> product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
> of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
> fieldNorm(doc=2)
> </str>
> </lst>
>
> Still not working ????
>
>
> On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> I'd spend some time with the admin/analysis page to understand the exact
>> tokenization going on here. For instance, sequencing the
>> shinglefilterfactory before worddelimiterfilterfactory may produce
>> "interesting" resutls. And then throwing the Snowball factory at it and
>> putting synonyms in front.... I suspect you're not indexing or searching
>> what you think you are.
>>
>> Second, what happens when you query with &debug=query? That'll show you
>> what the search string looks like.
>>
>> If that doesn't help, please post the results of looking at those things
>> here, that'll provide some information for us to work with.
>>
>> Best,
>> Erick
>>
>>
>> On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
>> sunshineglassof2...@gmail.com> wrote:
>>
>> > Hi Folks,
>> >
>> > Any updates ??
>> >
>> >
>> > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
>> > sunshineglassof2...@gmail.com> wrote:
>> >
>> > > Dear Team,
>> > >
>> > > How can I handle compound word searches in solr ?.
>> > > How can i search "hand bag" if I have "handbag" in my index. While
>> using
>> > > shingle in query analyzer, the query "ice cube" creates three tokens
>> as
>> > > "ice","cube", "icecube". Only ice and cubes are searched but not
>> > > "icecubes".i.e not working for pair though I am using shingle filter.
>> > >
>> > > Here's the schema config.
>> > >
>> > >
>> > >    1.  <fieldType name="text" class="solr.TextField"
>> > >    positionIncrementGap="100">
>> > >    2.       <analyzer type="index">
>> > >    3.         <filter class="solr.SynonymFilterFactory"
>> > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
>> > expand="true"/>
>> > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>> > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >    6.          <filter class="solr.ShingleFilterFactory"
>> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
>> > >    7.          <filter class="solr.WordDelimiterFilterFactory"
>> > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
>> > preserveOriginal="1"
>> > >    generateWordParts="1" generateNumberParts="1"/>
>> > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
>> > >    9.         <filter class="solr.SnowballPorterFilterFactory"
>> > >    language="English" protected="protwords.txt"/>
>> > >    10.       </analyzer>
>> > >    11.       <analyzer type="query">
>> > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >    13.         <filter class="solr.SynonymFilterFactory"
>> > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>> > >    14.         <filter class="solr.ShingleFilterFactory"
>> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
>> > >    15.         <filter class="solr.WordDelimiterFilterFactory"
>> > >    preserveOriginal="1"/>
>> > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
>> > >    17.         <filter class="solr.SnowballPorterFilterFactory"
>> > >    language="English" protected="protwords.txt"/>
>> > >    18.       </analyzer>
>> > >    19.     </fieldType>
>> > >
>> > >    Any help is appreciated.
>> > >
>> > >
>> >
>>
>
>

Re: Searching words with spaces for word without spaces in solr

Reply via email to