manirajv06 commented on issue #13855: URL: https://github.com/apache/iceberg/issues/13855#issuecomment-3239402792
Thanks @emkornfield > What use-cases have high enough schema churn that we expect to get a large performance boost here? Even if schema churn is less or schema changes doesn't happen very often for the most of the use cases, Skipping significant number of files from being processed for the tables having huge number of records boost the performance significantly. Isn't it? > I think linking schema ID probably solves 90 + % of the use case defined. As explained earlier, it might mislead us when using optional fields. Referring earlier response "However, max field id of linked schema id could help us in making lenient (with few % error rate) decisions which might not be correct for the places where decisions have to be made strictly." May be, Am I missing anything here? > If we want to optimize the last 10% of the use case, I'd say it would like be better to link two columns: I didn't get this two way linking. Can you please explain this? > can check it out, there had been previous work on this area Thanks @singhpk234 , Will check it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
