Hi xavier,

Have you set the global similarity to solr.SchemaSimilarityFactory?

See <http://wiki.apache.org/solr/SchemaXml#Similarity>.

Steve

On Mar 21, 2013, at 9:44 AM, xavier jmlucjav <jmluc...@gmail.com> wrote:

> Hi Felipe,
> 
> I need to keep positions, that is why I cannot just use
> omitTermFreqAndPositions
> 
> 
> On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti <fla...@thoughtworks.com>wrote:
> 
>> Do you really need a custom similarity?
>> Did you try to put the attribute "omitTermFreqAndPositions" in your field?
>> 
>> It could be:
>> 
>> <field name="description" omitTermFreqAndPositions="true"    type="text"
>> indexed="true" stored="true"  multiValued="false" omitNorms="true" />
>> 
>> http://wiki.apache.org/solr/SchemaXml
>> 
>> 
>> On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav <jmluc...@gmail.com>
>> wrote:
>> 
>>> I have the following setup:
>>> 
>>>        <fieldType name="text" class="solr.TextField"
>>> positionIncrementGap="100">
>>>            <analyzer>
>>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>>                <filter class="solr.LowerCaseFilterFactory"/>
>>>            </analyzer>
>>>        </fieldType>
>>>        <field name="description"    type="text"   indexed="true"
>>> stored="true"   multiValued="false" omitNorms="true" />
>>> 
>>> I index my corpus, and I can see tf is as usual, in this doc is 14 times
>> in
>>> this field:
>>> 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
>>> [DefaultSimilarity], result of:
>>>      4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
>>>        0.14165252 = queryWeight, product of:
>>>          10.0 = boost
>>>          8.5082035 = idf(docFreq=30, maxDocs=56511)
>>>          0.0016648936 = queryNorm
>>>        31.834784 = fieldWeight in 440, product of:
>>>          3.7416575 = tf(freq=14.0), with freq of:
>>>            14.0 = termFreq=14.0
>>>          8.5082035 = idf(docFreq=30, maxDocs=56511)
>>>          1.0 = fieldNorm(doc=440)
>>> 
>>> 
>>> Then I modify my schema:
>>> 
>>>    <similarity class="solr.SchemaSimilarityFactory"/>
>>>        <fieldType name="text" class="solr.TextField"
>>> positionIncrementGap="100">
>>>            <analyzer>
>>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>>                <filter class="solr.LowerCaseFilterFactory"/>
>>>            </analyzer>
>>>            <similarity class="com.customsolr.NoTfSimilarityFactory"/>
>>>        </fieldType>
>>> 
>>> I just want to disable term freq > 1, so a term its either present or
>> not.
>>> 
>>> public class NoTfSimilarity extends DefaultSimilarity {
>>>        public float tf(float freq) {
>>>                return freq > 0 ? 1.0f : 0.0f;
>>>        }
>>> }
>>> 
>>> But I still see tf=14 in my query??
>>> 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
>>>        723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
>>>          85.08203 = queryWeight, product of:
>>>            10.0 = boost
>>>            8.5082035 = idf(docFreq=30, maxDocs=56511)
>>>            1.0 = queryNorm
>>>          8.5082035 = fieldWeight in 440, product of:
>>>            1.0 = tf(freq=14.0), with freq of:
>>>              14.0 = termFreq=14.0
>>>            8.5082035 = idf(docFreq=30, maxDocs=56511)
>>>            1.0 = fieldNorm(doc=440)
>>> 
>>> anyone sees what I am missing?
>>> I am on solr4.0
>>> 
>>> thanks
>>> xavier
>>> 
>> 
>> 
>> 
>> --
>> Felipe Lahti
>> Consultant Developer - ThoughtWorks Porto Alegre
>> 

Reply via email to