Let me take that back, this actually works. q=bestbuy matches "Best Buy" and documents are returned.
<fieldType name="rl_keywords" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> <analyzer type="query"> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> </fieldType> I was using <tokenizer class="solr.StandardTokenizerFactory"/>, replacing it with <tokenizer class="solr.KeywordTokenizerFactory"/> did the trick. Not sure how it worked. The field value I am searching is "Best Buy", but when I search for "bestbuy", it returns a result. Thanks, -Utkarsh On Tue, Aug 20, 2013 at 4:48 PM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote: > Thanks Tamanjit and Erick. > I tried out the filters, most of the usecases work except "q=bestbuy". As > mentioned by Erick, that is a hard one to crack. > > I am looking into DictionaryCompoundWordTokenFilterFactory but compound > words like these: > http://www.manythings.org/vocabulary/lists/a/words.php?f=compound_wordsand > generic english words, it won't cover my need of custom compound words > of store names like BestBuy, WalMart or CirtuitCity. > > Thanks, > -Utkarsh > > > On Tue, Aug 20, 2013 at 4:43 AM, Jack Krupansky > <j...@basetechnology.com>wrote: > >> You could either have a synonym filter to replace "bestbuy" with "best >> buy" or use DictionaryCompoundWordTokenFil**terFactory to do the same. >> >> See: >> http://lucene.apache.org/core/**4_4_0/analyzers-common/org/** >> apache/lucene/analysis/**compound/**DictionaryCompoundWordTokenFil** >> terFactory.html<http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html> >> >> There are some examples in my book, but they are for German compound >> words since that was the original primary intent for this filter. But it >> should work for any words since it is a simple dictionary. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Erick Erickson >> Sent: Tuesday, August 20, 2013 7:21 AM >> To: solr-user@lucene.apache.org >> Subject: Re: What filter to use to search with spaces omitted/included >> between words? >> >> >> Also consider WordDelimterFilterFactory, which will break up the >> tokens on upper/lower case transitions. >> >> to get relevance, consider edismax-style query parsers and adding >> automatic phrase generation (with boosts usually). >> >> This one will be a problem: >> q=bestbuy >> >> There's no good generic way to get this to split up. One >> possibility is to use synonyms if the list is known, but >> otherwise there's no information here to distinguish it >> from "legitimate" words. >> >> edgeNgrams work on _tokens_, not words so I doubt >> they would help in this case either since there is only >> one token. >> >> Best >> Erick >> >> >> On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in < >> tamanjit.bin...@yahoo.co.in> wrote: >> >> Additionally, if you dont want results like q=best and result=bestbuy; >>> you >>> can use <charFilter class="solr.**PatternReplaceCharFilterFactor**y" >>> pattern="\W+" replacement=""/> to actually replace whitespaces with >>> nothing. >>> >>> >>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter** >>> s#CharFilterFactories<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories> >>> < >>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter** >>> s#CharFilterFactories<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories> >>> > >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.**nabble.com/What-filter-to-use-** >>> to-search-with-spaces-omitted-**included-between-words-** >>> tp4085576p4085601.html<http://lucene.472066.n3.nabble.com/What-filter-to-use-to-search-with-spaces-omitted-included-between-words-tp4085576p4085601.html> >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> > > > -- > Thanks, > -Utkarsh > -- Thanks, -Utkarsh