romseygeek opened a new pull request #1097: LUCENE-9099: Correctly handle 
repeats in ORDERED and UNORDERED intervals
URL: https://github.com/apache/lucene-solr/pull/1097
 
 
   If you have repeating intervals in an ordered or unordered interval source, 
you currently get somewhat confusing behaviour:
   
   * `ORDERED(a, a, b)` will return an extra interval over just `a b` if it 
first matches `a a b`, meaning that you can get incorrect results if used in a 
`CONTAINING` filter - `CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match 
on the document `a x a b y`
   * `UNORDERED(a, a)` will match on documents that just containg a single `a`.
   
   This commit adds a `RepeatingIntervalsSource` that correctly handles repeats 
within ordered and unordered sources.  It also changes the way that gaps are 
calculated within ordered and unordered sources, by using a new `width()` 
method on `IntervalIterator`.  The default implementation just returns `end() - 
start() + 1`, but `RepeatingIntervalsSource` instead returns the sum of the 
widths of its child iterators.  This preserves `maxgaps` filtering on ordered 
and unordered sources that contain repeats.
   
   In order to correctly handle matches in this scenario, 
`IntervalsSource#matches` now always returns an explicit 
`IntervalsMatchesIterator` rather than a plain `MatchesIterator`, which adds 
`gaps()` and `width()` methods so that submatches can be combined in the same 
way that subiterators are.  Extra checks have been added to `checkIntervals()` 
to ensure that the same intervals are returned by both iterator and matches, 
and a fix to `DisjunctionIntervalIterator#matches()` is also included - 
`DisjunctionIntervalIterator` minimizes its intervals, while 
`MatchesUtils.disjunction` does not, so there was a discrepancy between the two 
methods.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to