Jackie-Jiang commented on PR #10254: URL: https://github.com/apache/pinot/pull/10254#issuecomment-1424673481
We already pay the cost of sorting for big IN clause, so maybe we should always use merge sort instead of applying bloom filter. A very important factor not mentioned is linear scan vs random access. I feel merge sort could always be faster because of this. Bloom filter can be applied for small IN clause as well (currently it is already applied during segment pruning, and we can probably move that to the predicate evaluator). Can you please share some numbers for these 3 approaches? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org