fragosoluana opened a new issue, #15433:
URL: https://github.com/apache/lucene/issues/15433

   ### Description
   
   During the expansion of phrase queries for highlighting, the same phrase can 
appear twice. When a query includes overlapping phrases, the expansion process 
may generate duplicate phrases—one with the original (possibly high) 
user-defined boost, and another one with the [boost of 
1](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L255-L257).
 As a result, the final boost value assigned to the phrase may be incorrect, 
since it is determined by whichever duplicate is processed last during the 
[creation of the 
QueryPhraseMap](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L411-L416)
 in the [markTerminal 
method](https://github.com/apache/lucene/blob/main/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java#L432).
   
   For example, the original query [“a b c”: 100,  “a b”: 20, “b c”: 50] 
assigns a boost of 100 to "a b c", but during expansion a duplicate "a b c" ("a 
b" + "b c") is generated with boost 1, which could ultimately override the 
intended boost in the QueryPhraseMap from 100 to 1. Unit test that would fail 
illustrating this example, but should pass:
   
   ```
   public void testQueryPhraseMapDuplicate() throws IOException {
       BooleanQuery.Builder query = new BooleanQuery.Builder();
   
       Query bq = toPhraseQuery(analyze("a b c", F, analyzerB), F);
       bq = new BoostQuery(bq, 100);
       query.add(bq, Occur.SHOULD);
   
       bq = toPhraseQuery(analyze("a b", F, analyzerB), F);
       bq = new BoostQuery(bq, 20);
       query.add(bq, Occur.SHOULD);
   
       bq = toPhraseQuery(analyze("b c", F, analyzerB), F);
       bq = new BoostQuery(bq, 50);
       query.add(bq, Occur.SHOULD);
   
       bq = query.build();
       FieldQuery fq = new FieldQuery(bq, true, true);
       Set<Query> flatQueries = new LinkedHashSet<>();
       fq.flatten(bq, searcher, flatQueries, 1f);
       assertCollectionQueries(
           fq.expand(flatQueries),
           pqF(100, "a", "b", "c"),
           pqF(20, "a", "b"),
           // "a b c": 1 -> expanded "a b" + "b c"
           new BoostQuery(pqF(1f, "a", "b", "c"), 1f),
           pqF(50, "b", "c"));
   
       Map<String, QueryPhraseMap> map = fq.rootMaps;
       QueryPhraseMap qpm = map.get("f").subMap.get("a");
       assertEquals(0, qpm.boost, 0.0);
       QueryPhraseMap qpm1 = qpm.subMap.get("b");
       assertEquals(20, qpm1.boost, 0.0);
       QueryPhraseMap qpm2 = qpm1.subMap.get("c");
       // fails here because qm2.boost is 1
       assertEquals(100, qpm2.boost, 0.0);
   
       QueryPhraseMap qpm3 = map.get("f").subMap.get("b");
       assertEquals(0, qpm3.boost, 0.0);
       QueryPhraseMap qpm4 = qpm3.subMap.get("c");
       assertEquals(50, qpm4.boost, 0.0);
     }
   ```
   
   ### Version and environment details
   
   - Lucene version: 10.3.2
   - Component: lucene-highlighter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to