pvary commented on issue #9773:
URL: https://github.com/apache/iceberg/issues/9773#issuecomment-1968821622

   Hi Victor,
   
   In the example you have provided, you created a table with a primary key:
   ```
           tEnv.executeSql(
                   "CREATE TABLE IF NOT EXISTS catalog.db.flink_sink_append (id 
int primary key, some_value string) " +
                           "WITH ('format-version'='2')"
           );
   ```
   
   BTW, I had to change the code to run with 1.18 Flink to:
   ```
           tEnv.executeSql(
               "CREATE TABLE IF NOT EXISTS catalog.db.flink_sink_append (id int 
primary key NOT ENFORCED, some_value string) " +
                   "WITH ('format-version'='2')"
           );
   ```
   
   In this case, the table has a primary key, and it is a valid behaviour to 
prevent inserting 2 rows with the same key, that is why you do not see 2 rows 
in the result.
   
   The meaning of `upsert` is: The Sink will not receive the `-U`, `+U` 
records, just `+U` records, and it should be handle that. See: 
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/data_stream_api/#examples-for-fromchangelogstream
   
   So if the `upsert` is `false` (default), then we expect the caller to send 
not only the `+U` (after update) records, but also the `-U` (before update) 
records too. In this case based on the `-U` records we write the equality 
deletes, and based on the `+U` records we write the inserts.
   
   If the `upsert` is `true`, then we expect only the `+U` records, so for 
every new record we also write an equality delete to removed the previous 
version, to ensure that - if exists - the old record is removed.
   
   Receiving two `+U` records without the corresponding `-U` record for the 
same primary key is valid for the `upsert` mode, but it is invalid for the 
non-upsert mode. So in this case the stream is invalid (does not match the 
configuration), so the behaviour is undefined.
   
   I hope this makes sense.
   
   Peter
   
   So 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to