pvary commented on PR #8555:
URL: https://github.com/apache/iceberg/pull/8555#issuecomment-1719194158

   > * This PR was updated so the committer will now only reload table at 
checkpoints. The committer was already updating the table metadata as part of 
the Iceberg commit and upon restart. Writers will always use the original table 
metadata and only use the reloaded table to get an updated FileIO, though the 
hope is that the writers will used the reloaded metadata in the future.
   
   Thanks!
   
   > * This PR was designed so that a different and more optimal table supplier 
solution could be plugged in later.
   
   My concern here is that we tie ourselves to a "random" place of time to 
refresh the metadata. In my opinion we will have very specific events where we 
need refresh the metadata - namely, when we have a record with unexpected 
schema. The proposed solution does not point to this direction.
   
   > There were limitations with the options we explored for centralized table 
reload, such as using the new token delegation framework or using a broadcast 
(I'd be happy to discuss details if you're interested).
   
   I think @gaborgsomogyi and my self would be happy to discuss this problems, 
since the token delegation framework was designed to solve the exact same 
issues (Kerberos/AWS token refresh) in Flink, I think it would be good to solve 
the token renewal with it, if it is possible.
   
   > We went with this initial solution as a starting point. This feature is 
disabled by default and marked as experimental, so should only be used in cases 
where it is known that a job will not overburden the catalog with load requests.
   
   My problem is that having a solution which does not point to the right 
direction would make future work even harder.
   
   > * This PR was not meant to solve schema evolution problems but rather make 
a change that will take a step towards that long term goal.
   
   As mentioned above, I think it does not point to the right direction. We 
need the refresh capability, but we specifically need it with a way to trigger 
manually. Also refreshing the whole table to get the new credentials seems 
problematic as well.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to