Thanks Tamanjit and Erick. I tried out the filters, most of the usecases work except "q=bestbuy". As mentioned by Erick, that is a hard one to crack.
I am looking into DictionaryCompoundWordTokenFilterFactory but compound words like these: http://www.manythings.org/vocabulary/lists/a/words.php?f=compound_words and generic english words, it won't cover my need of custom compound words of store names like BestBuy, WalMart or CirtuitCity. Thanks, -Utkarsh On Tue, Aug 20, 2013 at 4:43 AM, Jack Krupansky <j...@basetechnology.com>wrote: > You could either have a synonym filter to replace "bestbuy" with "best > buy" or use DictionaryCompoundWordTokenFil**terFactory to do the same. > > See: > http://lucene.apache.org/core/**4_4_0/analyzers-common/org/** > apache/lucene/analysis/**compound/**DictionaryCompoundWordTokenFil** > terFactory.html<http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html> > > There are some examples in my book, but they are for German compound words > since that was the original primary intent for this filter. But it should > work for any words since it is a simple dictionary. > > -- Jack Krupansky > > -----Original Message----- From: Erick Erickson > Sent: Tuesday, August 20, 2013 7:21 AM > To: solr-user@lucene.apache.org > Subject: Re: What filter to use to search with spaces omitted/included > between words? > > > Also consider WordDelimterFilterFactory, which will break up the > tokens on upper/lower case transitions. > > to get relevance, consider edismax-style query parsers and adding > automatic phrase generation (with boosts usually). > > This one will be a problem: > q=bestbuy > > There's no good generic way to get this to split up. One > possibility is to use synonyms if the list is known, but > otherwise there's no information here to distinguish it > from "legitimate" words. > > edgeNgrams work on _tokens_, not words so I doubt > they would help in this case either since there is only > one token. > > Best > Erick > > > On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in < > tamanjit.bin...@yahoo.co.in> wrote: > > Additionally, if you dont want results like q=best and result=bestbuy; you >> can use <charFilter class="solr.**PatternReplaceCharFilterFactor**y" >> pattern="\W+" replacement=""/> to actually replace whitespaces with >> nothing. >> >> >> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter** >> s#CharFilterFactories<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories> >> < >> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter** >> s#CharFilterFactories<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories> >> > >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.**nabble.com/What-filter-to-use-** >> to-search-with-spaces-omitted-**included-between-words-** >> tp4085576p4085601.html<http://lucene.472066.n3.nabble.com/What-filter-to-use-to-search-with-spaces-omitted-included-between-words-tp4085576p4085601.html> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > -- Thanks, -Utkarsh