jasperjiaguo opened a new issue, #10396: URL: https://github.com/apache/pinot/issues/10396
Recently @vvivekiyer and @SabrinaZhaozyf found a case where q2: `SELECT ... FROM ... WHERE type = 'type' AND date > 'date' AND (isSubnetOf('subnet1', ip) OR isSubnetOf('subnet2', ip))` is significantly slower than q1: `SELECT ... FROM ... WHERE type = 'type' AND date > 'date' AND (isSubnetOf('subnet1', ip))` (type has inv index and date has range index) The execution stats indicates that the numEntriesScannedInFilter for q2 is significantly larger than q1: nESI_q2 >> nESI_q1 However if we continue to increase the number of isSubnetOf('...', ip) to 3, then (nESI_q2-nESI_q1) ~ (nESI_q3-nESI_q2) This means there's a performance degradation once we have the AND-OR structure. After some digging, we found that this is due to the current implementation of `AndDocIdSet` and `AndDocIdIterator`: When we execute q2, `(isSubnetOf('subnet1', ip) OR isSubnetOf('subnet2', ip))`(->OrDocIdIterator) becomes `remainingDocIdIterators` in `AndDocIdSet` and produce a composite `AndDocIdIterator`, where the `next()` function uses a greedy algorithm for filter evaluation. Meanwhile, the intersection result of `type = 'type' AND date > 'date' ` is not pushed down to the scanning in OR predicate. Therefore, OR predicate could end up scanning the entire dataset in [`advance()`](https://github.com/apache/pinot/blob/98faf2bfac7ec69ced93eec25cf57237a911c560/pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/AndDocIdIterator.java#L51) if the matching doc ids in OR predicate is sparse. One approach of resolving this is after evaluating `type = 'type' AND date > 'date'` to a bitmap, we wire it down to scanning operators so that they can skip the row ids not in the bitmap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org