Edgengram

Brian Lamb Wed, 25 May 2011 13:53:51 -0700

Hi all,

I'm running into some confusion with the way edgengram works. I have the
field set up as:


<fieldType name="edgengram" class="solr.TextField"
positionIncrementGap="1000">
   <analyzer>
     <tokenizer class="solr.LowerCaseTokenizerFactory" />
       <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="100" side="front" />
   </analyzer>
</fieldType>

I've also set up my own similarity class that returns 1 as the idf score.
What I've found this does is if I match a string "abcdefg" against a field
containing "abcdefghijklmnop", then the idf will score that as a 7:

7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2)

I get why that's happening, but is there a way to avoid that? Do I need to
do a new field type to achieve the desired affect?

Thanks,

Brian Lamb

Edgengram

Reply via email to