alexch2000 opened a new issue, #12060:
URL: https://github.com/apache/pinot/issues/12060

   We are experiencing a significant performance issue in Pinot, specifically 
related to how DateTime field filtering is applied in our queries.
   
   **Dataset Description**: Our dataset consists of a large table containing 
event records. Each record includes fields like `name`, `stage`, `status`, and 
a `timestamp`.
   
   **Query Performance Issue**: We've observed that the location of the 
DateTime filter in our query significantly impacts the query's performance. 
There is a noticeable performance degradation – up to 90% slower – depending on 
how and where the DateTime filter is applied.
   
   **Query Example**:  Initially, when the timestamp filter is not the first 
condition in the query, we notice that almost 80% of the documents are scanned, 
and the query completion time is around 8.5 seconds. Below is an example of the 
query generated by our custom connector to Pinot
   
   ```
   SELECT
     name,
     sum(CAST(CASE WHEN ((status = 'success') AND (stage = 'start')) THEN 1 
ELSE 0 END AS LONG))
   FROM
     events_table
   WHERE
     ((status = 'success') AND (stage IN ('complete', 'start'))
       AND (((timestamp BETWEEN 1699920000000 AND 1701129600000) AND (stage = 
'start')) OR (stage = 'complete')
       )
     )
   GROUP BY
     name
   LIMIT
     10000
   ```
   As you can see that almost 80% of docs were scanned + time to finish the 
query is  around 8.5s. 
   <img width="685" alt="Screenshot 2023-11-28 at 14 56 39" 
src="https://github.com/apache/pinot/assets/994452/7f9764fb-c8e5-4bf5-baac-797a381f4358";>
   
   **Improved Performance with Adjusted Query**: However, when we move the 
timestamp check to the first condition in the query, there is a significant 
improvement. The query completion time drops to approximately 5 seconds, and 
far fewer documents are scanned.
   
   ```
   SELECT
     name,
     sum(CAST(CASE WHEN ((status = 'success') AND (stage = 'start')) THEN 1 
ELSE 0 END AS LONG))
   FROM
     events_table
   WHERE
     timestamp BETWEEN 1699920000000 AND 1701129600000
     AND (
       (
         (status = 'success')
         AND (stage IN ('complete', 'start'))
       )
       AND (
         (stage = 'start')
         OR (stage = 'complete')
       )
     )
   GROUP BY
     name
   LIMIT
     10000
   ```
   
   <img width="677" alt="Screenshot 2023-11-28 at 14 36 49" 
src="https://github.com/apache/pinot/assets/994452/b9126c9c-e885-4129-b04d-282b53a8a9b4";>
   
   We expect the query performance to be consistent regardless of the 
positioning of the DateTime filter. However, the current behavior shows a 
critical variance in execution times.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to