jpountz commented on a change in pull request #2293:
URL: https://github.com/apache/lucene-solr/pull/2293#discussion_r570044511



##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java
##########
@@ -64,39 +62,35 @@
  * A {@link Query} that treats multiple fields as a single stream and scores 
terms as if you had
  * indexed them as a single term in a single field.
  *
- * <p>For scoring purposes this query implements the BM25F's simple formula 
described in:
- * http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf
+ * <p>The query works as follows:
  *
- * <p>The per-field similarity is ignored but to be compatible each field must 
use a {@link
- * Similarity} at index time that encodes norms the same way as {@link 
SimilarityBase#computeNorm}.
+ * <ol>
+ *   <li>Given a list of fields and weights, it pretends there is a synthetic 
combined field where
+ *       all terms have been indexed. It computes new term and collection 
statistics for this
+ *       combined field.
+ *   <li>It uses a disjunction iterator and {@link 
IndexSearcher#getSimilarity} to score documents.
+ * </ol>
+ *
+ * <p>In order for a similarity to be compatible, {@link 
Similarity#computeNorm} must be additive:
+ * the norm of the combined field is the sum of norms for each individual 
field. This is usually
+ * true, since norms often represent the field length. Per-field similarities 
are not supported.

Review comment:
       The requirement is actually stronger, we need a similarity that uses an 
additive normalization factor AND that encodes it using `SmallFloat#intToByte4` 
in the index since the decoding of norms is hardcoded as 
`SmallFloat#byte4ToInt` in `MultiFieldNormValues#advanceExact`.
   
   Also maybe mention explicitly that e.g. `BM25Similarity` and `DFRSimilarity` 
meet this requirement?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to