stevenzwu commented on PR #7171: URL: https://github.com/apache/iceberg/pull/7171#issuecomment-1550643780
> @stevenzwu, We also have a requirement to migrate the table without restarting the Flink job since users may have thousands of production streaming jobs online. Right now, I don't have a full solution in my mind, the early thinking is to notify the task manager to update the writer after checkpoint. Do you have a such kind requirement as well? Any idea? @chenjunjiedada we probably can take this discussion in a separate issue. I remember some previous ask in this area about handling table schema evolution without manual intervention. I couldn't seem to find the PR or issue. there are two slightly different asks. 1. table schema is already updated/synced via external mechanism (like control plane). Just need the writer and committer to pick up the latest schema (or partition spec) without job restart. 2. need writer to detect table schema is out of sync with the record schema. automatically update the table schema and write with latest schema. case 1 can be implemented with resolving the write schema (or partition spec) not during job initialization, rather during task initialization. writers periodically check (e.g. every checkpoint cycle) if table schema or partition spec changed. if changed, writers can fail the job. Restart and task initialization will load the latest schema and spec. However, it does bring scalability concern because every writer task (hundreds or more) need to load a Iceberg table from catalog to retrieve the schema and partition spec. Case 2 can be implemented similarly. But it is more risky. if bad records (schema) can cause unintended change in Iceberg table schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
