dsmiley commented on a change in pull request #1123: LUCENE-9093: Unified
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361486777
##########
File path:
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
##########
@@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx
is a known break
// called at start of new Passage given first word start offset
@Override
public int preceding(int offset) {
- return baseIter.preceding(offset); // no change needed
+ final int fragmentStart = Math.max(baseIter.preceding(offset), 0); //
convert DONE to 0
+ fragmentEndFromPreceding = baseIter.following(fragmentStart);
+ if (fragmentEndFromPreceding == DONE) {
+ fragmentEndFromPreceding = baseIter.last();
+ }
+ final int centerLength = fragmentEndFromPreceding - fragmentStart;
+ final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) *
fragmentAlignment);
Review comment:
I'm noticing that the logic here for fragment alignment doesn't seem to care
whatsoever about where "offset" is within it's segment. But isn't that super
relevant? As I consider a BI on SENTENCE based segments, I think it is... but
I can see how you overlooked this when focusing on WORD scenarios. For example
assuming 0.5 fragmentAlignment, and if the "offset" (the match) happens to
occur at the right end of the segment, and lets say the length goal is only 10
chars larger than centerLength, then shouldn't we expand to the right and not
the left?
You're going to hate me for this but let me make a proposal :-) What if we
multiply fragmentAlignment by lengthGoal and interpret this as a minimum number
of characters wanted to the left of the start of the match (`offset`). The
difference of that with lengthGoal indicates minimum chars wanted to the right.
We use the delegate BI to find the passage start, and then we can consult
fragmentAlignment with where `offset` is relative to the start to decide how
much text to the right of `offset` we want.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]