praveenc7 commented on PR #15350: URL: https://github.com/apache/pinot/pull/15350#issuecomment-2844116537
> Is there a design doc for this? I’m thinking of solving this for the general case—whenever a newly-added column is accessed, we could inject a virtual column on the fly whose values are null (default null value, with the null vector set for all docs). Thanks for looking into this, @Jackie-Jiang. We did explore the “on-the-fly virtual column” idea, but ultimately chose to skip projecting columns until every segment has fully loaded them. The two main reasons are: 1. Schema visibility on offline servers (immutable segments) - Offline servers do not immediately receive the latest table schema. - Injecting a virtual column without the correct data type would require a HelixRefreshMessage (or equivalent controller event) to guarantee all offline hosts refresh their local schemas before they start producing default values. 2. Inconsistent defaults across mixed segment states - During reload, a broker/server often contains a mix of segments—some already include the new column, while others do not. - If we unconditionally materialise a “default-null” virtual column, brokers/servers must reconcile two different views: - Segments that carry real data for the new column. - Segments that carry a synthetic, all-null representation. - During broker reduce-phase/ server merging, we would still need logic to guarantee consistency. The same problem re-appears at the broker layer if different servers refresh at different times. Given these trade-offs, we concluded it is safer—and clearer for users—to withhold the column entirely until the load is 100 % complete. That contract avoids incorrect or partially-correct results and eliminates the need for extra reconciliation logic in the query path. cc : @vvivekiyer We can certainly document this behaviour more explicitly in a short design note or page if that helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org