pvary commented on issue #9773: URL: https://github.com/apache/iceberg/issues/9773#issuecomment-1968821622
Hi Victor, In the example you have provided, you created a table with a primary key: ``` tEnv.executeSql( "CREATE TABLE IF NOT EXISTS catalog.db.flink_sink_append (id int primary key, some_value string) " + "WITH ('format-version'='2')" ); ``` BTW, I had to change the code to run with 1.18 Flink to: ``` tEnv.executeSql( "CREATE TABLE IF NOT EXISTS catalog.db.flink_sink_append (id int primary key NOT ENFORCED, some_value string) " + "WITH ('format-version'='2')" ); ``` In this case, the table has a primary key, and it is a valid behaviour to prevent inserting 2 rows with the same key, that is why you do not see 2 rows in the result. The meaning of `upsert` is: The Sink will not receive the `-U`, `+U` records, just `+U` records, and it should be handle that. See: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/data_stream_api/#examples-for-fromchangelogstream So if the `upsert` is `false` (default), then we expect the caller to send not only the `+U` (after update) records, but also the `-U` (before update) records too. In this case based on the `-U` records we write the equality deletes, and based on the `+U` records we write the inserts. If the `upsert` is `true`, then we expect only the `+U` records, so for every new record we also write an equality delete to removed the previous version, to ensure that - if exists - the old record is removed. Receiving two `+U` records without the corresponding `-U` record for the same primary key is valid for the `upsert` mode, but it is invalid for the non-upsert mode. So in this case the stream is invalid (does not match the configuration), so the behaviour is undefined. I hope this makes sense. Peter So -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org