richardstartin opened a new issue #7616: URL: https://github.com/apache/pinot/issues/7616
We would like to introduce a new format for raw forward indexes, which does not need a constant number of documents per chunk, instead partitioning columns based on uncompressed size. It is expected that this design will lead to: * less memory consumption when there are large values in a raw column * fewer chunks than when the number of documents is derived * more balanced chunk sizes than when the number of documents is derived * will provide support for realtime segments by breaking the dependency on column statistics for sizing The format would be opt in for the foreseeable future. [Design document](https://docs.google.com/document/d/1Y7MyQGmDD2fI7brOOFQtToxd8ML837qRuc3IlNYFvCw/edit?usp=sharing) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org