tarun11Mavani commented on PR #15756: URL: https://github.com/apache/pinot/pull/15756#issuecomment-2869182301
> Wow, how did I miss this.. By reading the code, this would skip docs matched with bitmap and produce wrong result if there are multiple filters on sorted column and some filters on column with inverted index. Can you try to add a query to verify this? Great suggestion @Jackie-Jiang . You are right. This is not just a regression in query performance but also a correctness issue. I added a sorted column and inverted index in fineFoodReviews table and wrote a simple query to verify this before and after the bugfix. I paused the ingestion and reloaded the segments in both cases to make sure indices are created properly. Query used: ``` select ProductId, Score, Text from fineFoodReviews Where ( REGEXP_LIKE(Text, 'name') AND ( ProductId = 'B000F9Z29K' OR Score > 4 OR Score < 2 ) ) order by ProductId limit 1000 ``` Without the change in this PR (15756), the above query returns 7 docs as it ignores the BitmapBasedDocIdIterator inside the OrDocIdSet [#here](https://github.com/apache/pinot/blob/aecd31eafa830da5038ad0d5d8c62fda7b499e02/pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/OrDocIdSet.java#L108) and only selects the doc based on the SortedDocIdIterator. Due to this bug, all records with `Score > 4 OR Score < 2` is selected but record with `ProductId = 'B000F9Z29K'` is not. <img width="1307" alt="image" src="https://github.com/user-attachments/assets/1c0ebfc1-e7fd-411d-aac0-b266cc73d2b7" /> After adding the BitmapBasedDocIdIterator into bitmapBasedDocIdIterators as part of this PR, the code path remains the same but now OrDocIdSet also adds the matching doc ids into the selected docIds that are returned from OrDocIdSet. <img width="1265" alt="image" src="https://github.com/user-attachments/assets/a1f3c9f1-e7a8-4fb6-9ee7-ce56739b8591" /> <img width="1290" alt="image" src="https://github.com/user-attachments/assets/3fb53030-e4a1-4096-bf9d-f83ad8c96882" /> This validates your point that, > Essentially query will get wrong result when numSortedDocIdIterators + numBitmapBasedDocIdIterators > 1, where with existing code numBitmapBasedDocIdIterators is always 0. So we need multiple sortedDocIdIterators to trigger the bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org