Do you really need a custom similarity?
Did you try to put the attribute "omitTermFreqAndPositions" in your field?

It could be:

<field name="description" omitTermFreqAndPositions="true"    type="text"
indexed="true" stored="true"  multiValued="false" omitNorms="true" />

http://wiki.apache.org/solr/SchemaXml


On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav <jmluc...@gmail.com> wrote:

> I have the following setup:
>
>         <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>             <analyzer>
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>             </analyzer>
>         </fieldType>
>         <field name="description"    type="text"   indexed="true"
> stored="true"   multiValued="false" omitNorms="true" />
>
> I index my corpus, and I can see tf is as usual, in this doc is 14 times in
> this field:
> 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
> [DefaultSimilarity], result of:
>       4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
>         0.14165252 = queryWeight, product of:
>           10.0 = boost
>           8.5082035 = idf(docFreq=30, maxDocs=56511)
>           0.0016648936 = queryNorm
>         31.834784 = fieldWeight in 440, product of:
>           3.7416575 = tf(freq=14.0), with freq of:
>             14.0 = termFreq=14.0
>           8.5082035 = idf(docFreq=30, maxDocs=56511)
>           1.0 = fieldNorm(doc=440)
>
>
> Then I modify my schema:
>
>     <similarity class="solr.SchemaSimilarityFactory"/>
>         <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>             <analyzer>
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>             </analyzer>
>             <similarity class="com.customsolr.NoTfSimilarityFactory"/>
>         </fieldType>
>
> I just want to disable term freq > 1, so a term its either present or not.
>
> public class NoTfSimilarity extends DefaultSimilarity {
>         public float tf(float freq) {
>                 return freq > 0 ? 1.0f : 0.0f;
>         }
> }
>
> But I still see tf=14 in my query??
> 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
>         723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
>           85.08203 = queryWeight, product of:
>             10.0 = boost
>             8.5082035 = idf(docFreq=30, maxDocs=56511)
>             1.0 = queryNorm
>           8.5082035 = fieldWeight in 440, product of:
>             1.0 = tf(freq=14.0), with freq of:
>               14.0 = termFreq=14.0
>             8.5082035 = idf(docFreq=30, maxDocs=56511)
>             1.0 = fieldNorm(doc=440)
>
> anyone sees what I am missing?
> I am on solr4.0
>
> thanks
> xavier
>



-- 
Felipe Lahti
Consultant Developer - ThoughtWorks Porto Alegre

Reply via email to