dsmiley commented on a change in pull request #1123: LUCENE-9093: Unified 
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361745552
 
 

 ##########
 File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
 ##########
 @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx 
is a known break
   // called at start of new Passage given first word start offset
   @Override
   public int preceding(int offset) {
-    return baseIter.preceding(offset); // no change needed
+    final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // 
convert DONE to 0
+    fragmentEndFromPreceding = baseIter.following(fragmentStart);
+    if (fragmentEndFromPreceding == DONE) {
+      fragmentEndFromPreceding = baseIter.last();
+    }
+    final int centerLength = fragmentEndFromPreceding - fragmentStart;
+    final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) * 
fragmentAlignment);
 
 Review comment:
   I just want to confirm: your suggestion will be the code you already wrote 
above but have not yet pushed and that breaks that one test.  Yes. we can just 
change that test slightly.
   
   One thing that vexes me a bit about highlighting is that there are so many 
choices and options to name -- configuration-itis.  Good defaults help.  Also 
simply making a reasonable choice for 98% of users and letting the 2% do 
something themselves is also better than adding yet another tuning nob.  I'm 
not sure where the things we are discussing lie.  (Oh and deprecate/removing 
the other highlighters would help too ;-)  Any way...
   
   > hl.fragalign
   
   Since you propose the default be 0.5 then in what scenario can you imagine 
someone would specify it?  I suggest `hl.fragAlignRatio` as a better name if we 
keep this.
   
   > hl.closestTo
   
   I wish it simply always worked this way (`true`), to be honest.  If I had to 
give this option a name, it'd definitely have "fragsize" in its name to link 
it's meaning to the other parameter.  Perhaps `hl.fragsizeIsMinimum` inverted 
meaning.  Sadly `hl.fragsize` is not camel cased; ah well.  
   
   Maybe these should be _secret_ options with code comments about filing a 
JIRA issue if you want to change them :-).  After all... I'm not sure _all_ 
options need to be documented.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to