fragosoluana opened a new pull request, #15434: URL: https://github.com/apache/lucene/pull/15434
### Description <!-- If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide. https://github.com/apache/lucene/blob/main/CONTRIBUTING.md --> When a query includes overlapping phrases, the expansion process may generate duplicate phrases—one with the original (possibly high) user-defined boost, and another one with the [boost of 1](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L255-L257). As a result, the final boost value assigned to the QueryPhraseMap may be incorrect, since it is determined by whichever duplicate is processed last during the [creation of the QueryPhraseMap](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L411-L416) in the [markTerminal method](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L432). We could avoid boost overrides of conflicting expanded phrases by taking the max boost in markTerminal. The expectation is that if there are duplicate phrases, one is from the original query and the other is from the expand method with boost of 1. Therefore, it should have one phrase with boost > 1 from the original query, and another equals to 1 from the expanded query. For example, with the expanded phrases [“a b c”: 100, “a b”: 20, “a b c”: 1, “b c”: 50], the final query phrase mapping would be “a b c”: 100. Fixes https://github.com/apache/lucene/issues/15433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
