Re: [I] [Spec] Linking Schema ID to Data & Delete Files [iceberg]

via GitHub Sat, 30 Aug 2025 10:00:01 -0700


manirajv06 commented on issue #13855:
URL: https://github.com/apache/iceberg/issues/13855#issuecomment-3239402792


   Thanks @emkornfield 
   
   > What use-cases have high enough schema churn that we expect to get a large 
performance boost here?
   
   Even if schema churn is less or schema changes doesn't happen very often for 
the most of the use cases, Skipping significant number of files from being 
processed for the tables having huge number of records boost the performance 
significantly. Isn't it?
   
   > I think linking schema ID probably solves 90 + % of the use case defined.
   
   As explained earlier, it might mislead us when using optional fields. 
Referring earlier response
   
   "However, max field id of linked schema id could help us in making lenient 
(with few % error rate) decisions which might not be correct for the places 
where decisions have to be made strictly."
   
   May be, Am I missing anything here?
   
   > If we want to optimize the last 10% of the use case, I'd say it would like 
be better to link two columns:
   
   I didn't get this two way linking. Can you please explain this?
   
   > can check it out, there had been previous work on this area
   
   Thanks @singhpk234 , Will check it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Spec] Linking Schema ID to Data & Delete Files [iceberg]

Reply via email to