Mike,
Thank you for your response.
cause:
If hl.requireFieldMatch set to true,
DefaultSolrHighlight.getQueryScorer()
uses QueryScorer(Query,IndexReader,String) constructor in Lucene
highlighter.
Then the constructor calls getIdfWeightedTerms() to get an array of
WeightedTerm.
In getIdfWeightedTerms(), idf is calculated to get weighted terms.
And the calculated idf can be minus with un-optimized index.
Okay, _this_ is the true bug. I don't see how lucene can return a
negative idf, optimized index or no.
I think that docFreq includes deleted docs count and this is Lucene's
feature.
This feature causes a negative idf, as long as the following fomula is used:
// o.a.l.s.highlight.QueryTermExtractor.java
float idf=(float)(Math.log((float)totalNumDocs/(double)(docFreq+1)) + 1.0);
Does DefaultSolrHighlight.getQueryScorer() use
QueryScorer(Query,IndexReader,String)
by design? If no, I'm happy to open a ticket.
Indeed it is by design: this is how requireFieldMatch is implemented,
as the lucene highlighter will require the field to match as well as
the term. A consequence of this is that the idf's as also folded into
the score, which is triggering the bug you are seeing.
Can we use QueryScorer(Query,String) instead of
QueryScorer(Query,IndexReader,String) to implement
hl.requireFieldMatch=true? I've opened SOLR-517 to follow up this problem.
Thank you,
Koji