using NGramTokenizerFactory for partial matching

Pete Smith Tue, 07 Apr 2009 07:38:26 -0700

Hi,

I want to use the NGramTokenizerFactory tokeniser to enable partial
matching on a field in my index. For instance for the field:


"Lorem ipsum"

I want it to match "lor" "lorem" and "lorem i". However I am finding it
matches the first two but not the third - the white space is causing
problems. Here are the relevant parts of my config: 

        <fieldType name="text_substring" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.NGramTokenizerFactory"
minGramSize="3" maxGramSize="15" />  
                <filter class="solr.LowerCaseFilterFactory"/>  
  </analyzer>
</fieldType>

<field name="title_partial" type="text_substring" indexed="true"
stored="true" required="true" />

I believe it is due to the mingramsize setting and that is applying to
each word. Can anyone tell me how I can support what I want to do?

Cheers,
Pete

-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com

using NGramTokenizerFactory for partial matching

Reply via email to