deemoliu opened a new issue, #14499: URL: https://github.com/apache/pinot/issues/14499
Currently schema updates and schema reload are not atomic operations. if we add new columns without schema reload, 1. there is no force commit for the consuming segments 2. by default the new segment will ingest with new schema 3. the old segment will not reflect the new column unless the reload is triggered. 4. when query the newly added column or `select *`, there will be merge exceptions on segments with different schema and produce query failure to customer. Customer need to perform reload or change query to bypass the error. For some scenarios, we also schema evolution without reload on purpose. e.g. many tables with massive data and segments, it's risky to perform schema reload safety because it might overwhelm heap. The inconsistent schema on Pinot segment and schema will be hard to maintain. We proposed the following approach to solve this. Add an config (so that we don't change current behavior) and support Lazy reload - if in the query there are new columns doesn’t exist in one of the segment, just return default value for it other options includes, - add a query option (so that we don't change current behavior) and do not return result for the newly added column until completely reloaded. - support Schema versioning in Pinot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org