hristo-stripe opened a new issue #7463: URL: https://github.com/apache/pinot/issues/7463
We've tried running `select distinctcount(field) from table` which times out with the default timeout. However, running `select distinctcount(field) from table_REALTIME` and `select distinctcount(field) from table_OFFLINE` both complete in less than 15ms. After a discussion with @Jackie-Jiang, it appears this can be tracked down to the fact that querying the realtime/offline tables separately allows the broker to respond to this query by using metadata only and not having to scan any segments. However, when the hybrid table gets queried, the high-level query gets split into `select distinctcount(field) from table_REALTIME where time_field > $IMPLICIT_TIME` and `select distinctcount(field) from table_OFFLINE where time_field <= $IMPLICIT_TIME` And this filter causes every server to perform a full scan of every segment. This can be optimized by performing an optimization on the filters relative to every individual segment and checking if the query can be answered simply with metadata without scanning the segment. In most cases, the filter will be true for all rows of most segments and therefore is not needed in that case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org