Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified 
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361426082
 
 

 ##########
 File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
 ##########
 @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx 
is a known break
   // called at start of new Passage given first word start offset
   @Override
   public int preceding(int offset) {
-    return baseIter.preceding(offset); // no change needed
+    final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // 
convert DONE to 0
+    fragmentEndFromPreceding = baseIter.following(fragmentStart);
 
 Review comment:
   Unfortunately no. The fragmentStart argument is the start of the match that 
could be anything depending on the tokenizer in the index analyzer chain. Even 
if we assume it's the start of a word or a phrase, the underlying BI can break 
on different places. In case of SENTENCE the preceding() call here will find 
the beginning of the sentence. In case of SEPARATOR, which is customizable by 
query, the breaks can be anywhere else.
   We could only assume fragmentStart is a break point if the underlying BI 
would be the same as the tokenizer in the index analyzer chain. (I'm not sure, 
but the query analyzer chain could be different I think.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to