You could either have a synonym filter to replace "bestbuy" with "best buy" or use DictionaryCompoundWordTokenFilterFactory to do the same.

See:
http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html

There are some examples in my book, but they are for German compound words since that was the original primary intent for this filter. But it should work for any words since it is a simple dictionary.

-- Jack Krupansky

-----Original Message----- From: Erick Erickson
Sent: Tuesday, August 20, 2013 7:21 AM
To: solr-user@lucene.apache.org
Subject: Re: What filter to use to search with spaces omitted/included between words?

Also consider WordDelimterFilterFactory, which will break up the
tokens on upper/lower case transitions.

to get relevance, consider edismax-style query parsers and adding
automatic phrase generation (with boosts usually).

This one will be a problem:
q=bestbuy

There's no good generic way to get this to split up. One
possibility is to use synonyms if the list is known, but
otherwise there's no information here to distinguish it
from "legitimate" words.

edgeNgrams work on _tokens_, not words so I doubt
they would help in this case either since there is only
one token.

Best
Erick


On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in <
tamanjit.bin...@yahoo.co.in> wrote:

Additionally, if you dont want results like q=best and result=bestbuy; you
can use <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="\W+" replacement=""/> to actually replace whitespaces with
nothing.


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
<
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
>



--
View this message in context:
http://lucene.472066.n3.nabble.com/What-filter-to-use-to-search-with-spaces-omitted-included-between-words-tp4085576p4085601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to