Re: about analyzer and tokenizer

Dmitry Kan Mon, 26 May 2014 02:59:23 -0700

Hi Chun,

You can use the edge ngram filter [1] on your tokens, that will produce all
possible letter sequences in a certain (configurable) range, like: ma, ac,
bo, ok, mac, aac, boo, ook, book etc.
Then when querying, both mac and book should hit in the sequence and you
should get the macbook hit back. This comes at a price of increasing your
index size though.


[1]
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-EdgeN-GramFilter




On Mon, May 26, 2014 at 12:26 PM, rachun <rachun.c...@gmail.com> wrote:

> Dear all,
>
>
> How can I do this...
> I index the document  => Macbook
> then when I query mac book I should get the result.
>
> This is my schema setting...
>
> <fieldType name="text_th" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.ThaiWordFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_th.txt"/>
>       </analyzer>
> </fieldType>
>
> Any suggest would be very appreciate.
> Chun.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/about-analyzer-and-tokenizer-tp4138129.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: about analyzer and tokenizer

Reply via email to