snleee commented on issue #5089: URL: https://github.com/apache/pinot/issues/5089#issuecomment-1195734895
To add here, I think that we should introduce the column-based interface (maybe it's the same idea as `Design an interface (close to the idea of the stats collector) to store all the column data`) for data indexing. If the input data is based on the columnar format, we will be able to generate dictionary/indices column by column. This will probably consume much less heap because we don't need to store all column data at the same time. Also, we can add the parallelization config to make the engine process multiple columns concurrently to speed up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org