itschrispeck opened a new issue, #16082: URL: https://github.com/apache/pinot/issues/16082
I had worked on a POC earlier this year for storing time series data in pinot - the columnar format has some inefficiencies we were trying to overcome. Wanted to share some of the ideas here, and gauge interest in first class support for metrics data. Given the time series query engine was contributed last year, a storage format optimized for these query patterns seems like a natural evolution. Our POC showed close to double the ingestion speed per core/improved query perf, despite lacking many time series specific optimizations (e.g. including encoding, chunking, filtering, etc.). The performance improvements we saw show the value in providing such a format to handle metrics data at larger scales. The POC approach packaged time series data into an index, but some alternative approaches (e.g. storing chunks of data in rows, and buffering datapoints in a transformer) may be simpler/cleaner to integrate with Pinot's existing query path/structures. The POC code is linked at the beginning of the doc, which also covers the POC implementation: https://docs.google.com/document/d/103T7gSJ7bF1MjZNQQjZDxwHTLH3xZk_5v-Jloe79sIc/edit?tab=t.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org