richardstartin commented on pull request #8411: URL: https://github.com/apache/pinot/pull/8411#issuecomment-1081007479
> Suggest just adding `int getNumMatchingDocs()` (return can be int as it is inner segment) and `RoaringBitmap getBitmap()` without the boolean check methods. We can add default implementation of them by just scanning the `BlockDocIdIterator`, and override the filter operators that can accelerate this process. > > This way, all the single `COUNT` aggregation query can be solved with the short-circuited operator. Even if there is no special optimization for the filter (scan-based), we can still save the overhead of creating 10000 docs blocks. Makes sense. I will follow up with this. > Not sure how much extra optimization we can get from `BitmapCollection`, but that does add complexity to the logic, so I'd suggest first adding the basic `RoaringBitmap` one, then have a separate enhancement to compare their performance I'd prefer to keep it this way: * a lot of collective effort has gone in to the counting methods over the years and `BitmapCollection` just encapsulates them. * Implementing negation of a compressed bitmap by flipping its bits means you get a very dense bitmap, unless you already have a dense bitmap and you have incurred a sunk cost. * I don't anticipate it needing to change. * it has high test coverage if you are concerned about correctness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org