pvary commented on PR #8555: URL: https://github.com/apache/iceberg/pull/8555#issuecomment-1719194158
> * This PR was updated so the committer will now only reload table at checkpoints. The committer was already updating the table metadata as part of the Iceberg commit and upon restart. Writers will always use the original table metadata and only use the reloaded table to get an updated FileIO, though the hope is that the writers will used the reloaded metadata in the future. Thanks! > * This PR was designed so that a different and more optimal table supplier solution could be plugged in later. My concern here is that we tie ourselves to a "random" place of time to refresh the metadata. In my opinion we will have very specific events where we need refresh the metadata - namely, when we have a record with unexpected schema. The proposed solution does not point to this direction. > There were limitations with the options we explored for centralized table reload, such as using the new token delegation framework or using a broadcast (I'd be happy to discuss details if you're interested). I think @gaborgsomogyi and my self would be happy to discuss this problems, since the token delegation framework was designed to solve the exact same issues (Kerberos/AWS token refresh) in Flink, I think it would be good to solve the token renewal with it, if it is possible. > We went with this initial solution as a starting point. This feature is disabled by default and marked as experimental, so should only be used in cases where it is known that a job will not overburden the catalog with load requests. My problem is that having a solution which does not point to the right direction would make future work even harder. > * This PR was not meant to solve schema evolution problems but rather make a change that will take a step towards that long term goal. As mentioned above, I think it does not point to the right direction. We need the refresh capability, but we specifically need it with a way to trigger manually. Also refreshing the whole table to get the new credentials seems problematic as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
