Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified 
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361750286
 
 

 ##########
 File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
 ##########
 @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx 
is a known break
   // called at start of new Passage given first word start offset
   @Override
   public int preceding(int offset) {
-    return baseIter.preceding(offset); // no change needed
+    final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // 
convert DONE to 0
+    fragmentEndFromPreceding = baseIter.following(fragmentStart);
+    if (fragmentEndFromPreceding == DONE) {
+      fragmentEndFromPreceding = baseIter.last();
+    }
+    final int centerLength = fragmentEndFromPreceding - fragmentStart;
+    final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) * 
fragmentAlignment);
 
 Review comment:
   > hl.closestTo ... I wish it simply always worked this way...
   
   I'm guessing whoever wrote the original UH decided against it for 
performance reasons. There is a  
[comment](https://github.com/apache/lucene-solr/blob/e06ad4cfb587e1bb69a80c0b531df0e52c2da89b/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java#L149)
 to suggest that. Also note that depending on which index is closer the BI may 
had to invoke `following()` one extra time inside `moveToBreak()`. I think 
`moveToBreak()` is only necessary when a default-summary is being made and I 
can remove it from the code when a match is being highlighted.
   
   The performance hit would be depending on the fragment's length that 
contains the goal index. In WORD mode I'd hope this is negligible. It will 
necessarily be worse for SENTENCE.
   
   You've brought up a new point to keep in mind: simplifying the options. I'll 
be back with updated code that we can discuss further.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to