itschrispeck opened a new issue, #16082:
URL: https://github.com/apache/pinot/issues/16082

   I had worked on a POC earlier this year for storing time series data in 
pinot - the columnar format has some inefficiencies we were trying to overcome. 
Wanted to share some of the ideas here, and gauge interest in first class 
support for metrics data. Given the time series query engine was contributed 
last year, a storage format optimized for these query patterns seems like a 
natural evolution. 
   
   Our POC showed close to double the ingestion speed per core/improved query 
perf, despite lacking many time series specific optimizations (e.g. including 
encoding, chunking, filtering, etc.). The performance improvements we saw show 
the value in providing such a format to handle metrics data at larger scales. 
   
   The POC approach packaged time series data into an index, but some 
alternative approaches (e.g. storing chunks of data in rows, and buffering 
datapoints in a transformer) may be simpler/cleaner to integrate with Pinot's 
existing query path/structures. 
   
   The POC code is linked at the beginning of the doc, which also covers the 
POC implementation: 
https://docs.google.com/document/d/103T7gSJ7bF1MjZNQQjZDxwHTLH3xZk_5v-Jloe79sIc/edit?tab=t.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to