Keyword tokenizer will probably cause you problems, since you'll never
match "best".
and searching name:best AND name:buy would fail as well.

And I'm surprised this is working at all, I'd really scrutinize why bestbuy
matches an
index with Best Buy, that makes no sense on the surface.

If you have a relatively small vocabulary, synonyms might work for you.

Best,
Erick


On Tue, Aug 20, 2013 at 8:04 PM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote:

> Let me take that back, this actually works. q=bestbuy matches "Best Buy"
> and documents are returned.
>
>         <fieldType name="rl_keywords" class="solr.TextField"
> positionIncrementGap="100">
>              <analyzer type="index">
>                <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
>
> catenateWords="1"
>
> catenateNumbers="1"
>
> catenateAll="0"
>
> preserveOriginal="1"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <tokenizer class="solr.KeywordTokenizerFactory"/>
>             </analyzer>
>             <analyzer type="query">
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
>
> catenateWords="1"
>
> catenateNumbers="1"
>
> catenateAll="0"
>
> preserveOriginal="1"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <tokenizer class="solr.KeywordTokenizerFactory"/>
>             </analyzer>
>         </fieldType>
>
> I was using <tokenizer class="solr.StandardTokenizerFactory"/>, replacing
> it with <tokenizer class="solr.KeywordTokenizerFactory"/> did the trick.
> Not sure how it worked. The field value I am searching is "Best Buy", but
> when I search for "bestbuy", it returns a result.
>
> Thanks,
> -Utkarsh
>
>
>
> On Tue, Aug 20, 2013 at 4:48 PM, Utkarsh Sengar <utkarsh2...@gmail.com
> >wrote:
>
> > Thanks Tamanjit and Erick.
> > I tried out the filters, most of the usecases work except "q=bestbuy". As
> > mentioned by Erick, that is a hard one to crack.
> >
> > I am looking into DictionaryCompoundWordTokenFilterFactory but compound
> > words like these:
> >
> http://www.manythings.org/vocabulary/lists/a/words.php?f=compound_wordsandgeneric
>  english words, it won't cover my need of custom compound words
> > of store names like BestBuy, WalMart or CirtuitCity.
> >
> > Thanks,
> > -Utkarsh
> >
> >
> > On Tue, Aug 20, 2013 at 4:43 AM, Jack Krupansky <j...@basetechnology.com
> >wrote:
> >
> >> You could either have a synonym filter to replace "bestbuy" with "best
> >> buy" or use DictionaryCompoundWordTokenFil**terFactory to do the same.
> >>
> >> See:
> >> http://lucene.apache.org/core/**4_4_0/analyzers-common/org/**
> >> apache/lucene/analysis/**compound/**DictionaryCompoundWordTokenFil**
> >> terFactory.html<
> http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html
> >
> >>
> >> There are some examples in my book, but they are for German compound
> >> words since that was the original primary intent for this filter. But it
> >> should work for any words since it is a simple dictionary.
> >>
> >> -- Jack Krupansky
> >>
> >> -----Original Message----- From: Erick Erickson
> >> Sent: Tuesday, August 20, 2013 7:21 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: What filter to use to search with spaces omitted/included
> >> between words?
> >>
> >>
> >> Also consider WordDelimterFilterFactory, which will break up the
> >> tokens on upper/lower case transitions.
> >>
> >> to get relevance, consider edismax-style query parsers and adding
> >> automatic phrase generation (with boosts usually).
> >>
> >> This one will be a problem:
> >> q=bestbuy
> >>
> >> There's no good generic way to get this to split up. One
> >> possibility is to use synonyms if the list is known, but
> >> otherwise there's no information here to distinguish it
> >> from "legitimate" words.
> >>
> >> edgeNgrams work on _tokens_, not words so I doubt
> >> they would help in this case either since there is only
> >> one token.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in <
> >> tamanjit.bin...@yahoo.co.in> wrote:
> >>
> >>  Additionally, if you dont want results like q=best and result=bestbuy;
> >>> you
> >>> can use <charFilter class="solr.**PatternReplaceCharFilterFactor**y"
> >>> pattern="\W+" replacement=""/> to actually replace whitespaces with
> >>> nothing.
> >>>
> >>>
> >>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
> >>> s#CharFilterFactories<
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
> >
> >>> <
> >>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
> >>> s#CharFilterFactories<
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>> http://lucene.472066.n3.**nabble.com/What-filter-to-use-**
> >>> to-search-with-spaces-omitted-**included-between-words-**
> >>> tp4085576p4085601.html<
> http://lucene.472066.n3.nabble.com/What-filter-to-use-to-search-with-spaces-omitted-included-between-words-tp4085576p4085601.html
> >
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>
>
>
> --
> Thanks,
> -Utkarsh
>

Reply via email to