dsmiley commented on a change in pull request #1123: LUCENE-9093: Unified
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361827668
##########
File path:
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldHighlighter.java
##########
@@ -159,8 +160,9 @@ public Object highlightFieldForDoc(LeafReader reader, int
docId, String content)
break;
}
// advance breakIterator
- passage.setStartOffset(Math.max(this.breakIterator.preceding(start +
1), 0));
- passage.setEndOffset(Math.min(this.breakIterator.following(start),
contentLength));
+ passage.setStartOffset(Math.max(this.breakIterator.preceding(start +
1), lastPassageEnd));
Review comment:
Oh wait; something occurred to me. The breakIterator.preceding impl doesn't
intrinsically know that FieldHighlighter is going to call `Math.max(...,
lastPassageEnd)` on it. And I recall you are adding this change here in
FieldHighlighter because the updated LengthGoalBreakIterator might want to look
further back to the left into a zone that might have been part of a previous
Passage. Maybe `LengthGoalBreakIterator.preceding` should examine `current()`
at the start and ensure it doesn't yield a break before that. Then
FieldHighlighter wouldn't change. Without this small proposal, the length of
this passage will be undersized because LengthGoalBreakIterator doesn't know
FieldHighlighter is going to chop off some of the beginning thanks to that
`max()`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]