waziqi89 commented on code in PR #12222:
URL: https://github.com/apache/lucene/pull/12222#discussion_r1167188703


##########
lucene/highlighter/src/test/org/apache/lucene/search/vectorhighlight/TestSimpleFragmentsBuilder.java:
##########
@@ -226,7 +226,7 @@ public void testDiscreteMultiValueHighlighting() throws 
Exception {
     result = sfb.createFragments(reader, 0, F, ffl, 3);
     assertEquals(2, result.length);
     assertEquals("text to <b>highlight</b>", result[0]);
-    assertEquals("<b>highlight</b> other text", result[1]);
+    assertEquals("<b>highlight</b> other", result[1]);

Review Comment:
   Thanks @romseygeek for reviewing.
   To be honest, I believe the current baseFragmentBuilder process a multivalue 
field wrongly in terms of maxCharSize.
   
   For this test case, we have two value fields:
   1. some text to highlight (22 chars)
   2. highlight other text (20 chars)
   
   Status quo (in main/master):
   fragment 1: text to <b>highlight</b>
   fragment 2: <b>highlight</b> other text
   
   Please note that the fragment 1 excludes the first word "some", while there 
are 22 chars only in the entire first field. Even though I didn't spend too 
much time on how the multivalue fields are processed, I believe current logic 
is to treat the all values as a single value field, and construct the 
FragInfos; Then split it into multiple chunks if the Fragments are cross 
multiple fields. This behavior will cause the newly splitted fragments smaller 
than the original maxFragmentSize.
   
   In this use case, my fix to the bug I aimed to fix (wrong fragEnd) only make 
the fragment splitting logic looks worse. But those are two different bugs.
   
   If we want to change the splitting logics, a mini-rework is supposed to be 
done in this class, I am afraid. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to