Re: Scoring algorithm?

Yonik Seeley Sat, 31 Oct 2009 07:35:22 -0700

On Sat, Oct 31, 2009 at 10:22 AM, Paul Tomblin <[email protected]> wrote:
> If I change the schema this way, do I need to re-submit all the
> documents to Solr?


Yep.  And you should delete the index first before doing so (some
field properties are contagious... merging a segment w/o norms and a
segment with norms will result in a single segment with norms).

>  And if I have them all sitting on disk as XML
> files that look like
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <doc>
> <field name=...">...</field>
> <field name=...">...</field>
> </doc>
> is there a quick way to submit them all to Solr?

The easiest way is to just use something like post.sh *.xml
That's slow performance-wise, but not a big deal of you don't have too
many docs.

-Yonik
http://www.lucidimagination.com


> On Sat, Oct 31, 2009 at 10:04 AM, Yonik Seeley
> <[email protected]> wrote:
>> On Sat, Oct 31, 2009 at 8:48 AM, Paul Tomblin <[email protected]> wrote:
>>> Am I right in thinking that a document that the sortable field is only
>>> two sentences long and contains the search term once will score higher
>>> than one that is 50 sentences long that contains the search term 4
>>> times?
>>
>> Yep.  Assuming 15 tokens per sentence, doc1 will have
>> lengthNorm = 1/(2*15)**.5 or 0.18 with  tf=1**.5 or 1
>> doc2 will have
>> lengthNorm  = 1/(50*15)**.5 or 0.04 with tf=4**.5 or 2
>>
>> Or if you don't want length normalization at all, simply use
>> omitNorms=true in the schema for this field.
>>
>>>  Is there a way to change it to score higher based only on
>>> number of hits?
>>
>> Yes, simply use omitNorms=true in the schema.xml for this field.
>>
>> If you still wanted a lengthNorm, you could change the balance by
>> creating a custom similarity and overriding either lengthNorm() or
>> tf()
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
>
> --
> http://www.linkedin.com/in/paultomblin
> http://careers.stackoverflow.com/ptomblin
>

Re: Scoring algorithm?

Reply via email to