dsmiley commented on a change in pull request #1123: LUCENE-9093: Unified 
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361486777
 
 

 ##########
 File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
 ##########
 @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx 
is a known break
   // called at start of new Passage given first word start offset
   @Override
   public int preceding(int offset) {
-    return baseIter.preceding(offset); // no change needed
+    final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // 
convert DONE to 0
+    fragmentEndFromPreceding = baseIter.following(fragmentStart);
+    if (fragmentEndFromPreceding == DONE) {
+      fragmentEndFromPreceding = baseIter.last();
+    }
+    final int centerLength = fragmentEndFromPreceding - fragmentStart;
+    final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) * 
fragmentAlignment);
 
 Review comment:
   I'm noticing that the logic here for fragment alignment doesn't seem to care 
whatsoever about where "offset" is within it's segment.  But isn't that super 
relevant?  As I consider a BI on SENTENCE based segments, I think it is... but 
I can see how you overlooked this when focusing on WORD scenarios.  For example 
assuming 0.5 fragmentAlignment, and if the "offset" (the match) happens to 
occur at the right end of the segment, and lets say the length goal is only 10 
chars larger than centerLength, then shouldn't we expand to the right and not 
the left?
   
   You're going to hate me for this but let me make a proposal :-)  What if we 
multiply fragmentAlignment by lengthGoal and interpret this as a minimum number 
of characters wanted to the left of the start of the match (`offset`).  The 
difference of that with lengthGoal indicates minimum chars wanted to the right. 
 We use the delegate BI to find the passage start, and then we can consult 
fragmentAlignment with where `offset` is relative to the start to decide how 
much text to the right of `offset` we want.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to