hristo-stripe opened a new issue #7463:
URL: https://github.com/apache/pinot/issues/7463


   We've tried running `select distinctcount(field) from table` which times out 
with the default timeout.
   
   However, running
   `select distinctcount(field) from table_REALTIME`
   and
   `select distinctcount(field) from table_OFFLINE`
   both complete in less than 15ms.
   
   After a discussion with @Jackie-Jiang, it appears this can be tracked down 
to the fact that
   querying the realtime/offline tables separately allows the broker to respond 
to this query by
   using metadata only and not having to scan any segments.
   
   However, when the hybrid table gets queried, the high-level query gets split 
into
   `select distinctcount(field) from table_REALTIME where time_field > 
$IMPLICIT_TIME`
   and
   `select distinctcount(field) from table_OFFLINE where time_field <= 
$IMPLICIT_TIME`
   
   And this filter causes every server to perform a full scan of every segment.
   This can be optimized by performing an optimization on the filters relative 
to every
   individual segment and checking if the query can be answered simply with 
metadata
   without scanning the segment.
   
   In most cases, the filter will be true for all rows of most segments and 
therefore is not needed in that case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to