Re: Flexible search field analyser/tokenizer configuration

Alexandre Rafalovitch Mon, 29 Sep 2014 14:48:19 -0700

The difference between TruncateTokenFilterFactory and
EdgeNGramFilterFactory is that Truncate will generate only one token
cut down to that size (3). While Edge... will have multiple tokens for
the same word matching 3-letters, 4-letters, etc. You can specify the
maximum. Usually, you use Edge... during tokenization only, not during
query (split analyzer definition).


So, if you tokenize "woman" you will get a token "wom" with Truncate
and "wom", "woma", "woman" with edge. Then, if you search with "wom"
both approaches will match. But if you search with "woman", only the
second approach will.

Assuming you also do not put Truncate filter in search analyzer. If
you do, then any word starting from "wom" will match: "woman", "women"
or "wompus".

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 29 September 2014 15:22, Markus Jelsma <markus.jel...@openindex.io> wrote:
> Yes, it appeared in 4.8 but you could use PatternReplaceFilterFactory to 
> simulate the same behavior.
>
> Markus
>
>
>
> -----Original message-----
>> From:PeterKerk <petervdk...@hotmail.com>
>> Sent: Monday 29th September 2014 21:08
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flexible search field analyser/tokenizer configuration
>>
>> Hi Ahmet,
>>
>> Am I correct that his this is only avalable in Solr4.8?
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.TruncateTokenFilterFactory
>>
>>
>> Also, I need to add your lines to both "index" and "query" analyzers? making
>> my definition like so:
>>
>> <fieldType name="searchtext" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                <filter class="solr.TruncateTokenFilterFactory" 
>> prefixLength="3"/>
>>       </analyzer>
>>       <analyzer type="query">
>>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                <filter class="solr.TruncateTokenFilterFactory" 
>> prefixLength="3"/>
>>       </analyzer>
>>     </fieldType>
>>
>> Your solution seems much easier to setup than what is proposed by
>> Alexandre...for my understanding, what is the difference?
>>
>> Thanks!
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4161778.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: Flexible search field analyser/tokenizer configuration

Reply via email to