alexch2000 opened a new issue, #12060: URL: https://github.com/apache/pinot/issues/12060
We are experiencing a significant performance issue in Pinot, specifically related to how DateTime field filtering is applied in our queries. **Dataset Description**: Our dataset consists of a large table containing event records. Each record includes fields like `name`, `stage`, `status`, and a `timestamp`. **Query Performance Issue**: We've observed that the location of the DateTime filter in our query significantly impacts the query's performance. There is a noticeable performance degradation – up to 90% slower – depending on how and where the DateTime filter is applied. **Query Example**: Initially, when the timestamp filter is not the first condition in the query, we notice that almost 80% of the documents are scanned, and the query completion time is around 8.5 seconds. Below is an example of the query generated by our custom connector to Pinot ``` SELECT name, sum(CAST(CASE WHEN ((status = 'success') AND (stage = 'start')) THEN 1 ELSE 0 END AS LONG)) FROM events_table WHERE ((status = 'success') AND (stage IN ('complete', 'start')) AND (((timestamp BETWEEN 1699920000000 AND 1701129600000) AND (stage = 'start')) OR (stage = 'complete') ) ) GROUP BY name LIMIT 10000 ``` As you can see that almost 80% of docs were scanned + time to finish the query is around 8.5s. <img width="685" alt="Screenshot 2023-11-28 at 14 56 39" src="https://github.com/apache/pinot/assets/994452/7f9764fb-c8e5-4bf5-baac-797a381f4358"> **Improved Performance with Adjusted Query**: However, when we move the timestamp check to the first condition in the query, there is a significant improvement. The query completion time drops to approximately 5 seconds, and far fewer documents are scanned. ``` SELECT name, sum(CAST(CASE WHEN ((status = 'success') AND (stage = 'start')) THEN 1 ELSE 0 END AS LONG)) FROM events_table WHERE timestamp BETWEEN 1699920000000 AND 1701129600000 AND ( ( (status = 'success') AND (stage IN ('complete', 'start')) ) AND ( (stage = 'start') OR (stage = 'complete') ) ) GROUP BY name LIMIT 10000 ``` <img width="677" alt="Screenshot 2023-11-28 at 14 36 49" src="https://github.com/apache/pinot/assets/994452/b9126c9c-e885-4129-b04d-282b53a8a9b4"> We expect the query performance to be consistent regardless of the positioning of the DateTime filter. However, the current behavior shows a critical variance in execution times. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org