If I change the schema this way, do I need to re-submit all the documents to Solr? And if I have them all sitting on disk as XML files that look like <?xml version="1.0" encoding="UTF-8" standalone="no"?> <doc> <field name=...">...</field> <field name=...">...</field> </doc> is there a quick way to submit them all to Solr?
On Sat, Oct 31, 2009 at 10:04 AM, Yonik Seeley <yo...@lucidimagination.com> wrote: > On Sat, Oct 31, 2009 at 8:48 AM, Paul Tomblin <ptomb...@xcski.com> wrote: >> Am I right in thinking that a document that the sortable field is only >> two sentences long and contains the search term once will score higher >> than one that is 50 sentences long that contains the search term 4 >> times? > > Yep. Assuming 15 tokens per sentence, doc1 will have > lengthNorm = 1/(2*15)**.5 or 0.18 with tf=1**.5 or 1 > doc2 will have > lengthNorm = 1/(50*15)**.5 or 0.04 with tf=4**.5 or 2 > > Or if you don't want length normalization at all, simply use > omitNorms=true in the schema for this field. > >> Is there a way to change it to score higher based only on >> number of hits? > > Yes, simply use omitNorms=true in the schema.xml for this field. > > If you still wanted a lengthNorm, you could change the balance by > creating a custom similarity and overriding either lengthNorm() or > tf() > > -Yonik > http://www.lucidimagination.com > -- http://www.linkedin.com/in/paultomblin http://careers.stackoverflow.com/ptomblin