[ 
https://issues.apache.org/jira/browse/LUCENE-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172186#comment-17172186
 ] 

Dawid Weiss commented on LUCENE-9439:
-------------------------------------

Hi Alan. Thank you for your feedback. Works like a charm. The "no-positions" 
strategy approach allows for some interesting deviations - one could add match 
regions for entire values or just for tokens returned from analysis (so you can 
"see" individual tokens over the value text).

I piggybacked a small fix to disjunction matches iterator because it looked 
like a bug to me (unrelated). [1]

Otherwise it's really well separated from existing code and works great for me. 
For example, I tried interval queries and they just work out of the box. A more 
complex expression highlights more than it should but this is related to the 
match range returned so is nicely decoupled from the "highlighting engine" 
itself. [2]

I think it's worth adding to Lucene. Would have to get rid of the assertj 
dependency first though. Or maybe we should add it and allow its use? The nice 
thing about assertj is that it formats assertion failures in a much better way, 
especially for stream or collection assertions.

[1] 
https://github.com/apache/lucene-solr/pull/1721/files#diff-f5538289e23aabdd53bc3bcbc59da342
[2] 
https://github.com/apache/lucene-solr/blob/c0562c1f2d789679432f9d72375aa3747e4b6526/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/MatchRegionRetrieverTest.java#L335-L355


> Matches API should enumerate hit fields that have no positions (no iterator)
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-9439
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9439
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-9439.patch, matchhighlighter.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> I have been fiddling with Matches API and it's great. There is one corner 
> case that doesn't work for me though -- queries that affect fields without 
> positions return {{MatchesUtil.MATCH_WITH_NO_TERMS}} but this constant is 
> problematic as it doesn't carry the field name that caused it (returns null).
> The associated fromSubMatches combines all these constants into one (or 
> swallows them) which is another problem.
> I think it would be more consistent if MATCH_WITH_NO_TERMS was replaced with 
> a true match (carrying field name) returning an empty iterator (or a constant 
> "empty" iterator NO_TERMS).
> I have a very compelling use case: I wrote an "auto-highlighter" that runs on 
> top of Matches API and automatically picks up query-relevant fields and 
> snippets. Everything works beautifully except for cases where fields are 
> searchable but don't have any positions (token-like fields).
> I can work on a patch but wanted to reach out first - [~romseygeek]?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to