Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361650672
##########
File path:
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
##########
@@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx
is a known break
// called at start of new Passage given first word start offset
@Override
public int preceding(int offset) {
- return baseIter.preceding(offset); // no change needed
+ final int fragmentStart = Math.max(baseIter.preceding(offset), 0); //
convert DONE to 0
+ fragmentEndFromPreceding = baseIter.following(fragmentStart);
+ if (fragmentEndFromPreceding == DONE) {
+ fragmentEndFromPreceding = baseIter.last();
+ }
+ final int centerLength = fragmentEndFromPreceding - fragmentStart;
+ final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) *
fragmentAlignment);
Review comment:
I have a working version with this new behavior. There is just a single
assertion that fails
[here](https://github.com/apache/lucene-solr/blob/e06ad4cfb587e1bb69a80c0b531df0e52c2da89b/solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java#L267).
It's because the `fragsize=60` there means to try and get 60 characters on the
right side of the match as the default `fragalign` is zero. To pass the test
the query parameters need to be changed to have
`hl.fragsize=50&hl.fragalign=0.5`.
This is not a problem by itself, but there is no way to make the new
behavior completely backwards-compatible like the previous one.
It could also be a bit jarring to have `fragsize` with a distinctively
different meaning compared to the other highlighters:
- For original&fastVector the `fragsize` is the target size of the snippet.
- For unified the `fragsize` is the size of the contextual text around the
match. Ie.: the snippet's target size is `fragsize+matchsize`.
I'm still in favor of this change, there's literally half as many calls to
`preceding()` and `following()` of the underlying BI and it's much clearer as
well.
Should I change the failing test and push this for review? What about the
backward-compatibility? I'm not sure how to change the docs or the default
parameters.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]