deemoliu opened a new issue, #14499:
URL: https://github.com/apache/pinot/issues/14499

   Currently schema updates and schema reload are not atomic operations. if we 
add new columns without schema reload, 
   1. there is no force commit for the consuming segments
   2. by default the new segment will ingest with new schema 
   3. the old segment will not reflect the new column unless the reload is 
triggered.
   4. when query the newly added column or `select *`, there will be merge 
exceptions on segments with different schema and produce query failure to 
customer. Customer need to perform reload or change query to bypass the error.
   
   For some scenarios, we also schema evolution without reload on purpose. e.g. 
many tables with massive data and segments, it's  risky to perform schema 
reload safety because it might overwhelm heap. The inconsistent schema on Pinot 
segment and schema will be hard to maintain.  
   
   We proposed the following approach to solve this.
   Add an config (so that we don't change current behavior) and support Lazy 
reload - if in the query there are new columns doesn’t exist in one of the 
segment, just return default value for it
   
   other options includes,
   - add a query option (so that we don't change current behavior) and do not 
return result for the newly added column until completely reloaded. 
   - support Schema versioning in Pinot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to