dsmiley commented on a change in pull request #1123: LUCENE-9093: Unified highlighter with word separator never gives context to the left URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361745552
########## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java ########## @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx is a known break // called at start of new Passage given first word start offset @Override public int preceding(int offset) { - return baseIter.preceding(offset); // no change needed + final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // convert DONE to 0 + fragmentEndFromPreceding = baseIter.following(fragmentStart); + if (fragmentEndFromPreceding == DONE) { + fragmentEndFromPreceding = baseIter.last(); + } + final int centerLength = fragmentEndFromPreceding - fragmentStart; + final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) * fragmentAlignment); Review comment: I just want to confirm: your suggestion will be the code you already wrote above but have not yet pushed and that breaks that one test. Yes. we can just change that test slightly. One thing that vexes me a bit about highlighting is that there are so many choices and options to name -- configuration-itis. Good defaults help. Also simply making a reasonable choice for 98% of users and letting the 2% do something themselves is also better than adding yet another tuning nob. I'm not sure where the things we are discussing lie. (Oh and deprecate/removing the other highlighters would help too ;-) Any way... > hl.fragalign Since you propose the default be 0.5 then in what scenario can you imagine someone would specify it? I suggest `hl.fragAlignRatio` as a better name if we keep this. > hl.closestTo I wish it simply always worked this way (`true`), to be honest. If I had to give this option a name, it'd definitely have "fragsize" in its name to link it's meaning to the other parameter. Perhaps `hl.fragsizeIsMinimum` inverted meaning. Sadly `hl.fragsize` is not camel cased; ah well. Maybe these should be _secret_ options with code comments about filing a JIRA issue if you want to change them :-). After all... I'm not sure _all_ options need to be documented. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org