stevenzwu commented on PR #7171:
URL: https://github.com/apache/iceberg/pull/7171#issuecomment-1550643780

   > @stevenzwu, We also have a requirement to migrate the table without 
restarting the Flink job since users may have thousands of production streaming 
jobs online. Right now, I don't have a full solution in my mind, the early 
thinking is to notify the task manager to update the writer after checkpoint. 
Do you have a such kind requirement as well? Any idea?
   
   @chenjunjiedada we probably can take this discussion in a separate issue. I 
remember some previous ask in this area about handling table schema evolution 
without manual intervention. I couldn't seem to find the PR or issue. there are 
two slightly different asks.
   
   1. table schema is already updated/synced via external mechanism (like 
control plane). Just need the writer and committer to pick up the latest schema 
(or partition spec) without job restart.
   2. need writer to detect table schema is out of sync with the record schema. 
automatically update the table schema and write with latest schema.
   
   case 1 can be implemented with resolving the write schema (or partition 
spec) not during job initialization, rather during task initialization. writers 
periodically check (e.g. every checkpoint cycle) if table schema or partition 
spec changed. if changed, writers can fail the job. Restart and task 
initialization will load the latest schema and spec. However, it does bring 
scalability concern because every writer task (hundreds or more) need to load a 
Iceberg table from catalog to retrieve the schema and partition spec.
   
   Case 2 can be implemented similarly. But it is more risky. if bad records 
(schema) can cause unintended change in Iceberg table schema. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to