waziqi89 commented on code in PR #12222: URL: https://github.com/apache/lucene/pull/12222#discussion_r1167188703
########## lucene/highlighter/src/test/org/apache/lucene/search/vectorhighlight/TestSimpleFragmentsBuilder.java: ########## @@ -226,7 +226,7 @@ public void testDiscreteMultiValueHighlighting() throws Exception { result = sfb.createFragments(reader, 0, F, ffl, 3); assertEquals(2, result.length); assertEquals("text to <b>highlight</b>", result[0]); - assertEquals("<b>highlight</b> other text", result[1]); + assertEquals("<b>highlight</b> other", result[1]); Review Comment: Thanks @romseygeek for reviewing. To be honest, I believe the current baseFragmentBuilder process a multivalue field wrongly in terms of maxCharSize. For this test case, we have two value fields: 1. some text to highlight (22 chars) 2. highlight other text (20 chars) Status quo (in main/master): fragment 1: text to <b>highlight</b> fragment 2: <b>highlight</b> other text Please note that the fragment 1 excludes the first word "some", while there are 22 chars only in the entire first field. Even though I didn't spend too much time on how the multivalue fields are processed, I believe current logic is to treat all values as a single value field, and construct the FragInfos; Then split fragments into multiple chunks if they are cross multiple fields. This behavior will cause the newly splitted fragments smaller than the desired maxFragmentSize if we don't expand them after splitting. In this use case, my fix to the bug I aimed to fix (wrong fragEnd) only make the fragment splitting logic looks worse. But those are two different bugs. If we want to change the splitting logics, a mini-rework is supposed to be done in this class, I am afraid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org