romseygeek opened a new pull request #1097: LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals URL: https://github.com/apache/lucene-solr/pull/1097 If you have repeating intervals in an ordered or unordered interval source, you currently get somewhat confusing behaviour: * `ORDERED(a, a, b)` will return an extra interval over just `a b` if it first matches `a a b`, meaning that you can get incorrect results if used in a `CONTAINING` filter - `CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match on the document `a x a b y` * `UNORDERED(a, a)` will match on documents that just containg a single `a`. This commit adds a `RepeatingIntervalsSource` that correctly handles repeats within ordered and unordered sources. It also changes the way that gaps are calculated within ordered and unordered sources, by using a new `width()` method on `IntervalIterator`. The default implementation just returns `end() - start() + 1`, but `RepeatingIntervalsSource` instead returns the sum of the widths of its child iterators. This preserves `maxgaps` filtering on ordered and unordered sources that contain repeats. In order to correctly handle matches in this scenario, `IntervalsSource#matches` now always returns an explicit `IntervalsMatchesIterator` rather than a plain `MatchesIterator`, which adds `gaps()` and `width()` methods so that submatches can be combined in the same way that subiterators are. Extra checks have been added to `checkIntervals()` to ensure that the same intervals are returned by both iterator and matches, and a fix to `DisjunctionIntervalIterator#matches()` is also included - `DisjunctionIntervalIterator` minimizes its intervals, while `MatchesUtils.disjunction` does not, so there was a discrepancy between the two methods.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org