fragosoluana opened a new pull request, #15434:
URL: https://github.com/apache/lucene/pull/15434

   ### Description
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   When a query includes overlapping phrases, the expansion process may 
generate duplicate phrases—one with the original (possibly high) user-defined 
boost, and another one with the [boost of 
1](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L255-L257).
 As a result, the final boost value assigned to the QueryPhraseMap may be 
incorrect, since it is determined by whichever duplicate is processed last 
during the [creation of the 
QueryPhraseMap](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L411-L416)
 in the [markTerminal 
method](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L432).
   
   We could avoid boost overrides of conflicting expanded phrases by taking the 
max boost in markTerminal. The expectation is that if there are duplicate 
phrases, one is from the original query and the other is from the expand method 
with boost of 1. Therefore, it should have one phrase with boost > 1 from the 
original query, and another equals to 1 from the expanded query. For example, 
with the expanded phrases [“a b c”: 100,  “a b”: 20, “a b c”: 1, “b c”: 50], 
the final query phrase mapping would be “a b c”: 100.
   
   Fixes https://github.com/apache/lucene/issues/15433


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to