icefury71 opened a new issue #4225: Make Pinot schema evolution easier URL: https://github.com/apache/incubator-pinot/issues/4225 This has been referenced in a few issues already: https://github.com/apache/incubator-pinot/issues/74 https://github.com/apache/incubator-pinot/issues/4029 But I'm creating a new issue to highlight the end-end problem. Here are the current steps to perform schema evolution in Pinot: 1) Create a new schema with default values for new columns being added 2) Update schema using Controller API 3) Rolling restart the Pinot servers to "reload" the segments to reflect the default value. In general, restarting a Pinot server is very expensive (can take anywhere between 5 to 30 mins depending on number of segments). If we need to evolve the schema frequently, this becomes a huge operational overhead. An alternate way to resolve this issue is to backfill the old segments but this is an expensive process as well. **A better approach:** a) Segments which have been committed / ONLINE We can try to lookup the new schema "on the fly" during query processing using some technique (for eg: @mayankshriv suggested using virtual columns for old segments populated with default value). That way we don't depend on any server restart. b) Segments which are currently open / CONSUMING We need to solve this issue: https://github.com/apache/incubator-pinot/issues/151 (how to reflect new schema in an open segment).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org