Alan Woodward created LUCENE-9099:
-------------------------------------

             Summary: Correctly handle repeats in ordered and unordered 
intervals
                 Key: LUCENE-9099
                 URL: https://issues.apache.org/jira/browse/LUCENE-9099
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Alan Woodward
            Assignee: Alan Woodward


If you have repeating intervals in an ordered or unordered interval source, you 
currently get somewhat confusing behaviour:

* ORDERED(a, a, b) will return an extra interval over just `a b` if it first 
matches `a a b`, meaning that you can get incorrect results if used in a 
CONTAINING filter - CONTAINING(ORDERED(x, y), ORDERED(a, a, b)) will match on 
the document `a x a b y`
* UNORDERED(a, a) will match on documents that just containg a single `a`.

It is possible to deal with the unordered case when building sources by 
rewriting duplicates to nested ORDERED clauses, so that UNORDERED(a, b, c, a, 
b) becomes UNORDERED(ORDERED(a, a), ORDERED(b, b), c), but this then breaks 
MAXGAPS filtering.

We should try and fix this within intervals themselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to