Let me take that back, this actually works. q=bestbuy matches "Best Buy"
and documents are returned.

        <fieldType name="rl_keywords" class="solr.TextField"
positionIncrementGap="100">
             <analyzer type="index">
               <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"

catenateWords="1"

catenateNumbers="1"

catenateAll="0"

preserveOriginal="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
            </analyzer>
            <analyzer type="query">
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"

catenateWords="1"

catenateNumbers="1"

catenateAll="0"

preserveOriginal="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
            </analyzer>
        </fieldType>

I was using <tokenizer class="solr.StandardTokenizerFactory"/>, replacing
it with <tokenizer class="solr.KeywordTokenizerFactory"/> did the trick.
Not sure how it worked. The field value I am searching is "Best Buy", but
when I search for "bestbuy", it returns a result.

Thanks,
-Utkarsh



On Tue, Aug 20, 2013 at 4:48 PM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote:

> Thanks Tamanjit and Erick.
> I tried out the filters, most of the usecases work except "q=bestbuy". As
> mentioned by Erick, that is a hard one to crack.
>
> I am looking into DictionaryCompoundWordTokenFilterFactory but compound
> words like these:
> http://www.manythings.org/vocabulary/lists/a/words.php?f=compound_wordsand 
> generic english words, it won't cover my need of custom compound words
> of store names like BestBuy, WalMart or CirtuitCity.
>
> Thanks,
> -Utkarsh
>
>
> On Tue, Aug 20, 2013 at 4:43 AM, Jack Krupansky 
> <j...@basetechnology.com>wrote:
>
>> You could either have a synonym filter to replace "bestbuy" with "best
>> buy" or use DictionaryCompoundWordTokenFil**terFactory to do the same.
>>
>> See:
>> http://lucene.apache.org/core/**4_4_0/analyzers-common/org/**
>> apache/lucene/analysis/**compound/**DictionaryCompoundWordTokenFil**
>> terFactory.html<http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html>
>>
>> There are some examples in my book, but they are for German compound
>> words since that was the original primary intent for this filter. But it
>> should work for any words since it is a simple dictionary.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Erick Erickson
>> Sent: Tuesday, August 20, 2013 7:21 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: What filter to use to search with spaces omitted/included
>> between words?
>>
>>
>> Also consider WordDelimterFilterFactory, which will break up the
>> tokens on upper/lower case transitions.
>>
>> to get relevance, consider edismax-style query parsers and adding
>> automatic phrase generation (with boosts usually).
>>
>> This one will be a problem:
>> q=bestbuy
>>
>> There's no good generic way to get this to split up. One
>> possibility is to use synonyms if the list is known, but
>> otherwise there's no information here to distinguish it
>> from "legitimate" words.
>>
>> edgeNgrams work on _tokens_, not words so I doubt
>> they would help in this case either since there is only
>> one token.
>>
>> Best
>> Erick
>>
>>
>> On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in <
>> tamanjit.bin...@yahoo.co.in> wrote:
>>
>>  Additionally, if you dont want results like q=best and result=bestbuy;
>>> you
>>> can use <charFilter class="solr.**PatternReplaceCharFilterFactor**y"
>>> pattern="\W+" replacement=""/> to actually replace whitespaces with
>>> nothing.
>>>
>>>
>>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
>>> s#CharFilterFactories<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories>
>>> <
>>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
>>> s#CharFilterFactories<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories>
>>> >
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.**nabble.com/What-filter-to-use-**
>>> to-search-with-spaces-omitted-**included-between-words-**
>>> tp4085576p4085601.html<http://lucene.472066.n3.nabble.com/What-filter-to-use-to-search-with-spaces-omitted-included-between-words-tp4085576p4085601.html>
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh

Reply via email to