On Sat, Oct 31, 2009 at 10:22 AM, Paul Tomblin <ptomb...@xcski.com> wrote: > If I change the schema this way, do I need to re-submit all the > documents to Solr?
Yep. And you should delete the index first before doing so (some field properties are contagious... merging a segment w/o norms and a segment with norms will result in a single segment with norms). > And if I have them all sitting on disk as XML > files that look like > <?xml version="1.0" encoding="UTF-8" standalone="no"?> > <doc> > <field name=...">...</field> > <field name=...">...</field> > </doc> > is there a quick way to submit them all to Solr? The easiest way is to just use something like post.sh *.xml That's slow performance-wise, but not a big deal of you don't have too many docs. -Yonik http://www.lucidimagination.com > On Sat, Oct 31, 2009 at 10:04 AM, Yonik Seeley > <yo...@lucidimagination.com> wrote: >> On Sat, Oct 31, 2009 at 8:48 AM, Paul Tomblin <ptomb...@xcski.com> wrote: >>> Am I right in thinking that a document that the sortable field is only >>> two sentences long and contains the search term once will score higher >>> than one that is 50 sentences long that contains the search term 4 >>> times? >> >> Yep. Assuming 15 tokens per sentence, doc1 will have >> lengthNorm = 1/(2*15)**.5 or 0.18 with tf=1**.5 or 1 >> doc2 will have >> lengthNorm = 1/(50*15)**.5 or 0.04 with tf=4**.5 or 2 >> >> Or if you don't want length normalization at all, simply use >> omitNorms=true in the schema for this field. >> >>> Is there a way to change it to score higher based only on >>> number of hits? >> >> Yes, simply use omitNorms=true in the schema.xml for this field. >> >> If you still wanted a lengthNorm, you could change the balance by >> creating a custom similarity and overriding either lengthNorm() or >> tf() >> >> -Yonik >> http://www.lucidimagination.com >> > > > > -- > http://www.linkedin.com/in/paultomblin > http://careers.stackoverflow.com/ptomblin >