klion26 opened a new issue, #8973:
URL: https://github.com/apache/iceberg/issues/8973

   ### Feature Request / Improvement
   
   Hi community, I want to discuss a feature described as the title here, 
please let me know what do you think about this, thanks.
   
   One of our customers had reached to us with this feature, the data pipeline 
here is `[datasource] --> [flinkcdc] --> [iceberg]`, he want to support add an 
additional `opType` column when creating Iceberg Table(when config is enabled), 
the `opType` represents the latest update action(`INSERT`, `UPDATE`, `DELETE` 
etc), and when executing the `DELETE` action, we do not **delete** the row in 
iceberg table, but only to record the `opType`(the other column but `opType` 
will no change), the other action(`INSERT`, `UPDATE`) remains the same as now. 
Let me show some example for this
   
   ```
   +------+-------------------+---------------+
   | id |        data|           opType | 
   +------+-------------------+---------------+
   +------+-------------------+---------------+
   ```
   
   after insert a row the table changes to 
   ```
   +------+-------------------+---------------+
   | id |        data|           opType | 
   +------+-------------------+---------------+
   | 1 |        "abcdefg" |           "INSERT" | 
   +------+-------------------+---------------+
   ```
   
   after an execution about `delete` (DELETE FROM table  where id = 1), the 
table changes to
   
   > The columns but `opType` remains as same with before
   ```
   +------+-------------------+---------------+
   | id |        data|           opType | 
   +------+-------------------+---------------+
   | 1 |        "abcdefg" |           "DELETE" | 
   +------+-------------------+---------------+
   ``` 
   
   The reason why need this feature is that:
   1. use for row-level traceback
   2. can be used to do audit analysis (such as want to know what data has been 
deleted)
   
   No sure if this feature makes sense to Iceberg.
   
   PS: Although turning on this feature may consume more storage, this is 
business-related and we can provide options for users which is disabled by 
default.
   
   
   ### Query engine
   
   Flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to