If a user is searching on "ice cream" but your index has "icecream", you can treat this like a spelling error. WordBreakSolrSpellChecker would identify the fact that while "ice cream" is not in your index, "icecream" and then you can re-query for the corrected version without the space.
The problem with solving this with analyers, is that you can analyze "ice-cream" as either "ice cream" or "icecream" (split or catenate on hyphen). You can even analyze "IceCream > Ice Cream" (catenate on case change). But how is your analyzer going to know that "icecream" should index as two tokens: "ice" "cream" ? You're asking analysis to do too much in this case. This is where spellcheck can bridge the gap. Of course, if you have a discrete list of words you want split like this, then you can do it with analysis using index-time synonyms. In this case, you need to provide it with the list. See https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory for more information. James Dyer Ingram Content Group (615) 213-4311 -----Original Message----- From: sunshine glass [mailto:sunshineglassof2...@gmail.com] Sent: Thursday, July 31, 2014 10:32 AM To: solr-user@lucene.apache.org Subject: Re: Searching words with spaces for word without spaces in solr I am not clear with this. This link is related to spell check. Can you elaborate it more ? On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James <james.d...@ingramcontent.com> wrote: > In addition to the analyzer configuration you're using, you might want to > also use WordBreakSolrSpellChecker to catch possible matches that can't > easily be solved through analysis. For more information, see the section > for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking > > James Dyer > Ingram Content Group > (615) 213-4311 > > -----Original Message----- > From: sunshine glass [mailto:sunshineglassof2...@gmail.com] > Sent: Wednesday, July 30, 2014 9:38 AM > To: solr-user@lucene.apache.org > Subject: Re: Searching words with spaces for word without spaces in solr > > This is the new configuration: > > <fieldType name="text" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <charFilter class="solr.HTMLStripCharFilterFactory"/> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > > outputUnigrams="true" tokenSeparator=""/> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" > > language="English" protected="protwords.txt"/> > > <filter class="solr.SynonymFilterFactory" > > synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true" > > expand="true"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords_text_prime_search.txt" enablePositionIncrements="true" > /> > > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > > outputUnigrams="true" tokenSeparator=""/> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/> > > <filter class="solr.SnowballPorterFilterFactory" > > language="English" protected="protwords.txt"/> > > </fieldType> > > > > > These are current docs in my index: > > <result name="response" numFound="3" start="0"> > <doc> > <str name="id">2</str> > <str name="title">Icecream</str> > <long name="_version_">1475063961342705664</long> > </doc> > <doc> > <str name="id">3</str> > <str name="title">Ice-cream</str> > <long name="_version_">1475063961344802816</long> > </doc> > <doc> > <str name="id">1</str> > <str name="title">Ice Cream</str> > <long name="_version_">1475063961203245056</long> > </doc> > </result> > </response> > > Query: > http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true > > Response: > > <result name="response" numFound="2" start="0"> > <doc> > <str name="id">1</str> > <str name="title">Ice Cream</str> > <long name="_version_">1475063961203245056</long> > </doc> > <doc> > <str name="id">3</str> > <str name="title">Ice-cream</str> > <long name="_version_">1475063961344802816</long> > </doc> > </result> > <lst name="debug"> > <str name="rawquerystring">title:ice cream</str> > <str name="querystring">title:ice cream</str> > <str name="parsedquery"> > (+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord > </str> > <str name="parsedquery_toString">+(title:ice (title:cream))</str> > <lst name="explain"> > <str name="1"> > 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0) > [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 = > termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 = > idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight > in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = > termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0) > 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of: > 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 = > queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 = > queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 = > tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2, > maxDocs=3) 0.4375 = fieldNorm(doc=0) > </str> > <str name="3"> > 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2) > [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 = > termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 = > idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2, > product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = > idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH) > weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 = > score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 = > queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 = > queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq > of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 = > fieldNorm(doc=2) > </str> > </lst> > > Still not working ???? > > > On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > I'd spend some time with the admin/analysis page to understand the exact > > tokenization going on here. For instance, sequencing the > > shinglefilterfactory before worddelimiterfilterfactory may produce > > "interesting" resutls. And then throwing the Snowball factory at it and > > putting synonyms in front.... I suspect you're not indexing or searching > > what you think you are. > > > > Second, what happens when you query with &debug=query? That'll show you > > what the search string looks like. > > > > If that doesn't help, please post the results of looking at those things > > here, that'll provide some information for us to work with. > > > > Best, > > Erick > > > > > > On Fri, May 30, 2014 at 3:32 AM, sunshine glass < > > sunshineglassof2...@gmail.com> wrote: > > > > > Hi Folks, > > > > > > Any updates ?? > > > > > > > > > On Wed, May 28, 2014 at 12:13 PM, sunshine glass < > > > sunshineglassof2...@gmail.com> wrote: > > > > > > > Dear Team, > > > > > > > > How can I handle compound word searches in solr ?. > > > > How can i search "hand bag" if I have "handbag" in my index. While > > using > > > > shingle in query analyzer, the query "ice cube" creates three tokens > as > > > > "ice","cube", "icecube". Only ice and cubes are searched but not > > > > "icecubes".i.e not working for pair though I am using shingle filter. > > > > > > > > Here's the schema config. > > > > > > > > > > > > 1. <fieldType name="text" class="solr.TextField" > > > > positionIncrementGap="100"> > > > > 2. <analyzer type="index"> > > > > 3. <filter class="solr.SynonymFilterFactory" > > > > synonyms="synonyms_text_prime_index.txt" ignoreCase="true" > > > expand="true"/> > > > > 4. <charFilter class="solr.HTMLStripCharFilterFactory"/> > > > > 5. <tokenizer class="solr.StandardTokenizerFactory"/> > > > > 6. <filter class="solr.ShingleFilterFactory" > > > > maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/> > > > > 7. <filter class="solr.WordDelimiterFilterFactory" > > > > catenateWords="1" catenateNumbers="1" catenateAll="1" > > > preserveOriginal="1" > > > > generateWordParts="1" generateNumberParts="1"/> > > > > 8. <filter class="solr.LowerCaseFilterFactory"/> > > > > 9. <filter class="solr.SnowballPorterFilterFactory" > > > > language="English" protected="protwords.txt"/> > > > > 10. </analyzer> > > > > 11. <analyzer type="query"> > > > > 12. <tokenizer class="solr.StandardTokenizerFactory"/> > > > > 13. <filter class="solr.SynonymFilterFactory" > > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > 14. <filter class="solr.ShingleFilterFactory" > > > > maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/> > > > > 15. <filter class="solr.WordDelimiterFilterFactory" > > > > preserveOriginal="1"/> > > > > 16. <filter class="solr.LowerCaseFilterFactory"/> > > > > 17. <filter class="solr.SnowballPorterFilterFactory" > > > > language="English" protected="protwords.txt"/> > > > > 18. </analyzer> > > > > 19. </fieldType> > > > > > > > > Any help is appreciated. > > > > > > > > > > > > > >