Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified highlighter with word separator never gives context to the left URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361750286
########## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java ########## @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx is a known break // called at start of new Passage given first word start offset @Override public int preceding(int offset) { - return baseIter.preceding(offset); // no change needed + final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // convert DONE to 0 + fragmentEndFromPreceding = baseIter.following(fragmentStart); + if (fragmentEndFromPreceding == DONE) { + fragmentEndFromPreceding = baseIter.last(); + } + final int centerLength = fragmentEndFromPreceding - fragmentStart; + final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) * fragmentAlignment); Review comment: > hl.closestTo ... I wish it simply always worked this way... I'm guessing whoever wrote the original UH decided against it for performance reasons. There is a [comment](https://github.com/apache/lucene-solr/blob/e06ad4cfb587e1bb69a80c0b531df0e52c2da89b/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java#L149) to suggest that. Also note that depending on which index is closer the BI may had to invoke `following()` one extra time inside `moveToBreak()`. I think `moveToBreak()` is only necessary when a default-summary is being made and I can remove it from the code when a match is being highlighted. The performance hit would be depending on the fragment's length that contains the goal index. In WORD mode I'd hope this is negligible. It will necessarily be worse for SENTENCE. You've brought up a new point to keep in mind: simplifying the options. I'll be back with updated code that we can discuss further. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org