dsmiley commented on a change in pull request #1123: LUCENE-9093: Unified highlighter with word separator never gives context to the left URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361828673
########## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java ########## @@ -173,8 +205,30 @@ private int moveToBreak(int idx) { // precondition: idx is a known break // called at start of new Passage given first word start offset @Override - public int preceding(int offset) { - return baseIter.preceding(offset); // no change needed + public int preceding(int matchStartIndex) { + final int targetIdx = (matchStartIndex - 1) - (int)(lengthGoal * fragmentAlignment); + if (targetIdx <= 0) { + return 0; + } + final int beforeIdx = baseIter.preceding(targetIdx + 1); + if (beforeIdx == DONE) { + return 0; + } + if (beforeIdx == targetIdx) { // right on the money + return beforeIdx; + } + if (isMinimumLength) { // thus never undershoot + return beforeIdx; + } + + // note: it is a shame that we invoke following() *one more time*; BI's are sometimes expensive. + + // Find closest break to target + final int afterIdx = baseIter.following(targetIdx - 1); + if (afterIdx - targetIdx < targetIdx - beforeIdx && afterIdx < matchStartIndex) { + return afterIdx; + } + return beforeIdx; Review comment: No moveToBreak and so the underlying BI here is not consistent. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org