bryanck opened a new pull request, #8555:
URL: https://github.com/apache/iceberg/pull/8555

   Currently, when a Flink sink job is started, write operators are serialized 
with an initial table instance and that instance is used for the lifetime of 
the job. There are cases where a table needs to be reloaded at runtime by the 
subtasks. For example, if a REST catalog returns expiring credentials in the 
table load config response, those credentials cannot later be refreshed and 
will eventually expire. Also, if a task restarts, the original credentials 
created with the job will be used and they may have long expired.
   
   Longer term, the ability to reload a table could be used to support schema 
evolution, though that is not addressed in this PR. The table schema and 
partition spec from the initial table instance are still used for now here.
   
   This PR updates the initialization of the Flink sink so that a table 
supplier is used in place of a table instance. This supplier can implement 
reload/refresh logic as needed. The initial supplier implementation created 
here simply reloads the table at a given interval from the Iceberg catalog.
   
   This initial supplier implementation puts additional load on the Iceberg 
catalog, so by default the reload option is turned off. Different table 
suppliers could be implemented in the future that use a centralized mechanism 
for obtaining refreshed table instances to cut down on the extra catalog load. 
Different options were explored, such as using a broadcast or the token 
delegation manager, but each had limitations or complexities, so it was felt 
that a better first step would be to introduce the abstraction to allow for 
table reload with a simple initial implementation of that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to