[ https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mayya Sharipova updated LUCENE-9541: ------------------------------------ Description: Not completely sure if this is a bug. BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents. For example if BitSetConjuctionDISI _disi_ is composed of DocIdSetIterator _a_ of docs [0,1] and BitSetIterator _b_ of docs [0,1]. Doing `b.nextDoc()` we are collecting doc0, doing `disi.nextDoc` we again collecting the same doc0. It seems that other conjunction iterators don't have this behaviour, if we are advancing any of their component pass a certain document, the whole conjunction iterator will also be advanced pass this document. This behaviour was exposed in this [PR|https://github.com/apache/lucene-solr/pull/1903]. was: Not completely sure if this is a bug. BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents. This behaviour was exposed in this [PR|https://github.com/apache/lucene-solr/pull/1903]. > BitSetConjunctionDISI can advance to docs before its components > --------------------------------------------------------------- > > Key: LUCENE-9541 > URL: https://issues.apache.org/jira/browse/LUCENE-9541 > Project: Lucene - Core > Issue Type: Bug > Reporter: Mayya Sharipova > Priority: Minor > > Not completely sure if this is a bug. > BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, > and doesn't consider that its another component – BitSetIterator may have > already advanced passed a certain doc. This may result in duplicate documents. > For example if BitSetConjuctionDISI _disi_ is composed of DocIdSetIterator > _a_ of docs [0,1] and BitSetIterator _b_ of docs [0,1]. Doing `b.nextDoc()` > we are collecting doc0, doing `disi.nextDoc` we again collecting the same > doc0. > It seems that other conjunction iterators don't have this behaviour, if we > are advancing any of their component pass a certain document, the whole > conjunction iterator will also be advanced pass this document. > > This behaviour was exposed in this > [PR|https://github.com/apache/lucene-solr/pull/1903]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org