praveenc7 commented on PR #15350:
URL: https://github.com/apache/pinot/pull/15350#issuecomment-2844116537

   
   
   > Is there a design doc for this? I’m thinking of solving this for the 
general case—whenever a newly-added column is accessed, we could inject a 
virtual column on the fly whose values are null (default null value, with the 
null vector set for all docs).
   
   Thanks for looking into this, @Jackie-Jiang. We did explore the “on-the-fly 
virtual column” idea, but ultimately chose to skip projecting columns until 
every segment has fully loaded them. The two main reasons are:
   
   1.  Schema visibility on offline servers (immutable segments)
   
       - Offline servers do not immediately receive the latest table schema.
       - Injecting a virtual column without the correct data type would require 
a HelixRefreshMessage (or equivalent controller event) to guarantee all offline 
hosts refresh their local schemas before they start producing default values.
   
   2. Inconsistent defaults across mixed segment states
   
       - During reload, a broker/server often contains a mix of segments—some 
already include the new column, while others do not.
      - If we unconditionally materialise a “default-null” virtual column, 
brokers/servers must reconcile two different views:
        - Segments that carry real data for the new column.
        - Segments that carry a synthetic, all-null representation.
   
      - During broker reduce-phase/ server merging, we would still need logic 
to guarantee consistency. The same problem re-appears at the broker layer if 
different servers refresh at different times.
   
   Given these trade-offs, we concluded it is safer—and clearer for users—to 
withhold the column entirely until the load is 100 % complete. That contract 
avoids incorrect or partially-correct results and eliminates the need for extra 
reconciliation logic in the query path.
   
   cc : @vvivekiyer 
   
   We can certainly document this behaviour more explicitly in a short design 
note or page if that helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to