mxm opened a new pull request, #12424: URL: https://github.com/apache/iceberg/pull/12424
The Flink Iceberg sink allows writing data from a continuous Flink stream to an Iceberg table. It already powers countless production jobs and has passed the test of time. But for many users, the static nature of the Flink Iceberg sink became a serious limitation. The target table, its schema and partitioning spec, and even the write parallelism are statically configured. Any changes to these require rebuilding and restarting the underlying Flink job. We present the next generation Flink Iceberg sink, the Dynamic Flink Iceberg Sink. From the user perspective, there are three main advantages to using the Dynamic Iceberg sink. It supports: 1. Writing to any number of tables (No more 1:1 sink/table relationship). 2. Dynamically creating and updating tables based on a user-supplied routing. 3. Dynamically updating the schema and partition spec of tables. All of this is done from within the sink, controlled by the user via the DynamicRecord class, without any Flink job restart. Design document: https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s/edit Major credits to @pvary who came up with the design and wrote the initial implementation. We have done some benchmarking to compare the old Iceberg sink with the Dynamic Iceberg sink. We found that the performance overhead of the Dynamic Sink is neglectable (see the results below). Of course, the real power of the Dynamic Sink unfolds for writing to multiple tables which is not supported in the regular Iceberg sink. ``` TestDynamicIcebergSinkPerf.testIcebergSink [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 1119211093289016971 written 5000000 records in 9405 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 3781801840532551473 written 5000000 records in 8948 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 3033420860172241073 written 5000000 records in 8555 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 2032850461702063007 written 5000000 records in 8781 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 7558061307456947485 written 5000000 records in 8872 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 601297722355936169 written 5000000 records in 8330 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 6249065641929321708 written 5000000 records in 8512 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 2068854608451193539 written 5000000 records in 8351 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 2246238791649680214 written 5000000 records in 8433 ms TestDynamicIcebergSinkPerf.testDynamicSink [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 6885587612182426678 written 5000000 records in 9862 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 5143658647354768578 written 5000000 records in 9611 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 6787994330850859040 written 5000000 records in 8935 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 6788803513469485538 written 5000000 records in 8923 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 8592131276344399796 written 5000000 records in 9335 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 4567624953033955114 written 5000000 records in 8973 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 139484213775208428 written 5000000 records in 9275 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 5103930001749514761 written 5000000 records in 8982 ms [main] INFO org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSinkPref - TEST RESULT: Snapshot 4174643942796111771 written 5000000 records in 9209 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org