tarun11Mavani commented on PR #15756:
URL: https://github.com/apache/pinot/pull/15756#issuecomment-2869182301

   > Wow, how did I miss this.. By reading the code, this would skip docs 
matched with bitmap and produce wrong result if there are multiple filters on 
sorted column and some filters on column with inverted index. Can you try to 
add a query to verify this?
   
   Great suggestion @Jackie-Jiang . 
   You are right. This is not just a regression in query performance but also a 
correctness issue. I added a sorted column and inverted index in 
fineFoodReviews table and wrote a simple query to verify this before and after 
the bugfix. I paused the ingestion and reloaded the segments in both cases to 
make sure indices are created properly.
   
   
   Query used:
   ```
   select ProductId, Score, Text
   from fineFoodReviews
   Where (
        REGEXP_LIKE(Text, 'name')
     AND
       (
         ProductId = 'B000F9Z29K'
         OR Score > 4
         OR Score < 2
       )
     )
   order by ProductId
   limit 1000
   ```
   
   Without the change in this PR (15756), the above query returns 7 docs as it 
ignores the BitmapBasedDocIdIterator inside the OrDocIdSet 
[#here](https://github.com/apache/pinot/blob/aecd31eafa830da5038ad0d5d8c62fda7b499e02/pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/OrDocIdSet.java#L108)
  and only selects the doc based on the 
   SortedDocIdIterator. 
   Due to this bug, all records with `Score > 4 OR Score < 2` is selected but 
record with `ProductId = 'B000F9Z29K'` is not. 
   <img width="1307" alt="image" 
src="https://github.com/user-attachments/assets/1c0ebfc1-e7fd-411d-aac0-b266cc73d2b7";
 />
   
   
   After adding the BitmapBasedDocIdIterator into bitmapBasedDocIdIterators as 
part of this PR, the code path remains the same but now OrDocIdSet also adds 
the matching doc ids into the selected docIds that are returned from OrDocIdSet.
   <img width="1265" alt="image" 
src="https://github.com/user-attachments/assets/a1f3c9f1-e7a8-4fb6-9ee7-ce56739b8591";
 />
   
   <img width="1290" alt="image" 
src="https://github.com/user-attachments/assets/3fb53030-e4a1-4096-bf9d-f83ad8c96882";
 />
   
   This validates your point that,
   
   > Essentially query will get wrong result when numSortedDocIdIterators + 
numBitmapBasedDocIdIterators > 1, where with existing code 
numBitmapBasedDocIdIterators is always 0. So we need multiple 
sortedDocIdIterators to trigger the bug.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to