waziqi89 commented on code in PR #12222:
URL: https://github.com/apache/lucene/pull/12222#discussion_r1167188703


##########
lucene/highlighter/src/test/org/apache/lucene/search/vectorhighlight/TestSimpleFragmentsBuilder.java:
##########
@@ -226,7 +226,7 @@ public void testDiscreteMultiValueHighlighting() throws 
Exception {
     result = sfb.createFragments(reader, 0, F, ffl, 3);
     assertEquals(2, result.length);
     assertEquals("text to <b>highlight</b>", result[0]);
-    assertEquals("<b>highlight</b> other text", result[1]);
+    assertEquals("<b>highlight</b> other", result[1]);

Review Comment:
   Thanks @romseygeek for reviewing.
   To be honest, I believe the current baseFragmentBuilder process a multivalue 
field wrongly in terms of maxCharSize.
   
   For this test case, we have two value fields:
   1. some text to highlight (22 chars)
   2. highlight other text (20 chars)
   
   Status quo (in main/master):
   fragment 1: text to <b>highlight</b>
   fragment 2: <b>highlight</b> other text
   
   Please note that the fragment 1 excludes the first word "some", while there 
are 22 chars only in the entire first field. Even though I didn't spend too 
much time on how the multivalue fields are processed, I believe current logic 
is to treat all values as a single value field, and construct the FragInfos; 
Then split fragments into multiple chunks if they are cross multiple fields. 
This behavior will cause the newly splitted fragments smaller than the desired 
maxFragmentSize if we don't expand them after splitting.
   
   In this test case, my fix to the bug I aimed to fix (wrong fragEnd) only 
make the fragment splitting logic looks worse. But those are two different bugs.
   
   If we want to change the splitting logics, a mini-rework is supposed to be 
done in this class, I am afraid. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to